SQL Intermediate

Jayanti prasad Ph.D
7 min readSep 1, 2024

--

This tutorial is continuation of my earlier tutorial on SQL. Here all the examples are given with MySQL. In order to carry out the exercises we will use the employees data set given here . You can download the folder and in that you will find multiple scripts for creating a database with a set of tables.

The main script ‘employees.sql’ can be used to create a database named ‘employees’ with a set of tables. Once this is done you can check that the database has the tables. Once it is confirmed that the database and tables have been created we can change to the database and run other ‘load’ scripts which will populate out tables with the data. You may need to figure out how to run a sql script. Here is the step-by-step procedure :

Step 1: Change to the directory in which you have your sql scripts:

Figure 1: SQL scripts

Step 2: Login to Mysql

Figure 2: Login to SQL

Step 3: Now run the ‘employees.sql’ script

Figure 3: Creating database and tables

Let us look at the first few quarries in the employees.sql

Figure 4: Creating database and tables

You can see that we are dropping the database if exists and then creating it and the same is done for tables also. The script creates many tables and let us check those.

The database has 8 tables and let us check ‘employees’ and ‘salaries’ tables.

Now let us do some exercise on these two tables.

  1. Count the total number of employees;

There are 300,024 number of employees / rows in the table.

2. Find the top ten employees who have highest salaries in decreasing order.

The query is this

select * from employees inner join salaries on employees.emp_no = salaries.emp_no order by salaries.salary desc limit 10;

We got the list of top salaried employees in decreasing order but there are some issue — some columns emp_no is repeated so we can specify which of the columns we actually want to retrieve by the following query +.

The Query is as following :

select employees.emp_no, employees.first_name, employees.last_name, employees.gender, salaries.salary from employees inner join salaries on employees.emp_no = salaries.emp_no order by salaries.salary limit 10;

This looks good. Now let us add the department information also in the result and for that we will be using department table.

The full query is here :

select Y.emp_no, Y.first_name, Y.last_name, Y.gender, Y.salary, departments.dept_no, departments.dept_name from departments inner join (select dept_no, X.emp_no, X.first_name, X.last_name, X.salary, X.gender from dept_emp inner join (select employees.emp_no, employees.first_name, employees.last_name, employees.gender, salaries.salary from employees inner join salaries on employees.emp_no = s
alaries.emp_no order by salaries.salary limit 10) as X on X.emp_no = dept_emp.emp_no order by X.salary desc ) as Y on Y.dept_no = depa
rtments.dept_no order by salary desc;

There are still some issues, such as duplicate rows. Let us drop duplicate rows.

The full query is here :

select P.emp_no, P.first_name, P.last_name, P.gender, P.dept_name, P.salary from (select R.emp_no, R.first_name, R.last_name, R.gender, R.dept_name, R.salary, R.rn from (select Y.emp_no, Y.first_name, Y.last_name, Y.gender, Y.salary, departments.dept_no, departments.dept_name, (row_number() over (partition by Y.emp_no)) as rn from departments inner join (select dept_no, X.emp_no, X.first_name, X.last_name, X.salary, X.gender from dept_emp inner join (select employees.emp_no, employees.first_name, employees.last_name, employees.gender, salaries.salary from employees inner join salaries on employees.emp_no = salaries.emp_no order by salaries.salary limit 10) as X on X.emp_no = dept_emp.emp_no order by X.salary desc ) as Y on Y.dept_no = departments.dept_no order by salary desc) as R having R.rn=1 order by salary desc) as P;

Note that this is a complex query and there is a lot of scope for improving. I asked ChatGPT to simplyfy the query and here is the response.

simplify this sql query for mysql without using ‘LIMIT & IN/ALL/ANY/SOME subquery : select P.emp_no, P.first_name, P.last_name, P.gender, P.dept_name, P.salary from (select R.emp_no, R.first_name, R.last_name, R.gender, R.dept_name, R.salary, R.rn from (select Y.emp_no, Y.first_name, Y.last_name, Y.gender, Y.salary, departments.dept_no, departments.dept_name, (row_number() over (partition by Y.emp_no)) as rn from departments inner join (select dept_no, X.emp_no, X.first_name, X.last_name, X.salary, X.gender from dept_emp inner join (select employees.emp_no, employees.first_name, employees.last_name, employees.gender, salaries.salary from employees inner join salaries on employees.emp_no = salaries.emp_no order by salaries.salary limit 10) as X on X.emp_no = dept_emp.emp_no order by X.salary desc ) as Y on Y.dept_no = departments.dept_no order by salary desc) as R having R.rn=1 order by salary desc) as P;

For the above prompt we get the following code:

 — Get the top 10 highest salaries
WITH TopSalaries AS (
SELECT DISTINCT s.salary
FROM salaries s
ORDER BY s.salary DESC
LIMIT 10
),
 — Get employee details with their department
EmployeeDetails AS (
SELECT e.emp_no, e.first_name, e.last_name, e.gender, s.salary, d.dept_name
FROM employees e
INNER JOIN salaries s ON e.emp_no = s.emp_no
INNER JOIN dept_emp de ON e.emp_no = de.emp_no
INNER JOIN departments d ON de.dept_no = d.dept_no
WHERE s.salary IN (SELECT salary FROM TopSalaries)
)
 — Retrieve the top salary per employee
SELECT ed.emp_no, ed.first_name, ed.last_name, ed.gender, ed.dept_name, ed.salary
FROM EmployeeDetails ed
JOIN (
SELECT emp_no, MAX(salary) AS max_salary
FROM EmployeeDetails
GROUP BY emp_no
) AS max_salaries
ON ed.emp_no = max_salaries.emp_no
AND ed.salary = max_salaries.max_salary
ORDER BY ed.salary DESC;

With the following explanation :

This was a complex exercise and it uses many concepts given below.

  1. Order by
  2. Limit
  3. Inner join
  4. Alias
  5. Function over
  6. Partition by
  7. Subqueries

Let us understand these one by one

  1. Order by

This is used to show the results in a sorted manner. See the examples below

Note that it works for both — numeric and alphanumeric and we can also specify if we want ‘ascending’ or ‘descening order’

2. Limit

This is just to specify how many results we want.

3. Inner join

There are different type of joins and all of them have one common property — the tables we wish to join must have a common column.

4. Alias

When a query become big and complex we can use an alias for it.

The query :

select * from employees inner join (select emp_no, salary from salaries order by salary desc limit 10) as X on employees.emp_no= X.emp_no;

Here the entire query inside the bracket is represented by ‘X’ alias. This becomes quite handy when we have complex queries.

5. Function over

we can use this to count the duplicate entries (can count the occurange of entries in any column)

Using alias we can print the unique entries ;

The queries is a follows:

select X.emp_id, X.emp_name, X.emp_city from (select emp_id, emp_name, emp_city, row_number () over (partition by emp_city) as row_count from test_table) as X where X.row_count = 1;

7. Subqueries

In this example we can easily sport multiple ‘SELECT’ in a query so our main query is made of multiple subqueries.

SQL is still one of the most important skills data scientists / data analysists need although very soon remembering complex queries may be out of fashion due to tools like ChatGPT.

Note that all the examples given here are for learning purpose and in actual practice we use much more efficient queries.

If you have found this tutorial useful please like /clap, share & comment.

All the examples here are done with ‘Server version: 8.0.39 MySQL Community Server — GPL’.

--

--

Jayanti prasad Ph.D
Jayanti prasad Ph.D

Written by Jayanti prasad Ph.D

Physicist, Data Scientist and Blogger.

No responses yet