Mastering SQL

Enrolled: 0 students

Working hours

Monday	9:30 am - 6.00 pm
Tuesday	9:30 am - 6.00 pm
Wednesday	9:30 am - 6.00 pm
Thursday	9:30 am - 6.00 pm
Friday	9:30 am - 5.00 pm
Saturday	Closed
Sunday	Closed

Course Overview:

This comprehensive 6-month course is designed to take you from a beginner to an advanced level in SQL (Structured Query Language). SQL is a fundamental tool for managing and manipulating relational databases, and mastering it can open up numerous career opportunities in data analysis, database administration, and software development. Throughout this course, you will learn SQL syntax, database design principles, advanced querying techniques, and practical applications through hands-on projects.

Course Structure:

Month 1: Foundations of SQL

Week 1: Introduction to Databases and SQL

– Overview of databases and their importance

– Introduction to SQL and its applications

Week 2: Basic SQL Syntax

– Understanding SQL statements: SELECT, INSERT, UPDATE, DELETE

– Retrieving data from a single table

Week 3: Filtering and Sorting Data

– Using WHERE and ORDER BY clauses

– Filtering data with comparison and logical operators

Week 4: Working with Multiple Tables

– Understanding table joins: INNER JOIN, LEFT JOIN, RIGHT JOIN

– Combining data from multiple tables

Month 2: Advanced SQL Queries

Week 5: Aggregating Data

– Using aggregate functions: COUNT, SUM, AVG, MAX, MIN

– Grouping data with GROUP BY clause

Week 6: Subqueries and Nested Queries

– Understanding subqueries and their applications

– Writing correlated and uncorrelated subqueries

Week 7: Modifying Data

– Updating and deleting data with SQL

– Using transactions to maintain data integrity

Week 8: Advanced Query Techniques

– Using CASE statements for conditional logic

– Working with NULL values

Month 3: Database Design

Week 9: Introduction to Database Design

– Understanding the importance of good database design

– Normalization and denormalization concepts

Week 10: Entity-Relationship Modeling

– Entity-relationship diagrams (ERDs)

– Identifying entities, attributes, and relationships

Week 11: Database Constraints

– Primary keys, foreign keys, and unique constraints

– Enforcing data integrity with constraints

Week 12: Indexing and Performance Optimization

– Understanding indexes and their impact on performance

– Optimizing SQL queries for efficiency

Month 4: Advanced Database Concepts

Week 13: Stored Procedures and Functions

– Creating and executing stored procedures

– Writing user-defined functions (UDFs)

Week 14: Triggers and Transactions

– Implementing triggers for automated actions

– Managing transactions and locks

Week 15: Views and Materialized Views

– Creating and managing database views

– Understanding materialized views for performance optimization

Week 16: Security and Permissions

– Granting and revoking permissions

– Implementing security best practices

Month 5: Data Manipulation and Reporting

Week 17: Importing and Exporting Data

– Loading data into a database from external sources

– Exporting data to different formats

Week 18: Working with Dates and Times

– Handling date and time data types

– Performing date calculations and manipulations

Week 19: Advanced Reporting Techniques

– Writing complex queries for reporting purposes

– Generating reports using SQL queries

Week 20: Introduction to Business Intelligence Tools

– Overview of BI tools like Tableau, Power BI, etc.

– Integrating SQL with BI for data analysis and visualization

Month 6: Capstone Project and Final Review

Week 21-24: Capstone Project

– Apply all the skills learned throughout the course to a real-world project

– Design and implement a database from scratch

– Develop complex queries, stored procedures, and reports

Week 25-26: Final Review and Certification

– Review key concepts and topics covered in the course

– Prepare for certification exams (if applicable)

– Receive course completion certificate

Additional Resources:

– Textbooks and online resources for further reading

– Practice exercises and quizzes for self-assessment

– Discussion forums for interacting with instructors and peers

– Access to a virtual lab environment for hands-on practice

By the end of this course, you will have the knowledge and skills to confidently work with SQL databases, design efficient database structures, write complex queries, and manipulate data for various business needs.

Course Overview:

Course Structure:

Month 1: Foundations of SQL

Week 1: Introduction to Databases and SQL

– Overview of databases and their importance

– Introduction to SQL and its applications

Week 2: Basic SQL Syntax

– Understanding SQL statements: SELECT, INSERT, UPDATE, DELETE

– Retrieving data from a single table

Week 3: Filtering and Sorting Data

– Using WHERE and ORDER BY clauses

– Filtering data with comparison and logical operators

Week 4: Working with Multiple Tables

– Understanding table joins: INNER JOIN, LEFT JOIN, RIGHT JOIN

– Combining data from multiple tables

Month 2: Advanced SQL Queries

Week 5: Aggregating Data

– Using aggregate functions: COUNT, SUM, AVG, MAX, MIN

– Grouping data with GROUP BY clause

Week 6: Subqueries and Nested Queries

– Understanding subqueries and their applications

– Writing correlated and uncorrelated subqueries

Week 7: Modifying Data

– Updating and deleting data with SQL

– Using transactions to maintain data integrity

Week 8: Advanced Query Techniques

– Using CASE statements for conditional logic

– Working with NULL values

Month 3: Database Design

Week 9: Introduction to Database Design

– Understanding the importance of good database design

– Normalization and denormalization concepts

Week 10: Entity-Relationship Modeling

– Entity-relationship diagrams (ERDs)

– Identifying entities, attributes, and relationships

Week 11: Database Constraints

– Primary keys, foreign keys, and unique constraints

– Enforcing data integrity with constraints

Week 12: Indexing and Performance Optimization

– Understanding indexes and their impact on performance

– Optimizing SQL queries for efficiency

Month 4: Advanced Database Concepts

Week 13: Stored Procedures and Functions

– Creating and executing stored procedures

– Writing user-defined functions (UDFs)

Week 14: Triggers and Transactions

– Implementing triggers for automated actions

– Managing transactions and locks

Week 15: Views and Materialized Views

– Creating and managing database views

– Understanding materialized views for performance optimization

Week 16: Security and Permissions

– Granting and revoking permissions

– Implementing security best practices

Month 5: Data Manipulation and Reporting

Week 17: Importing and Exporting Data

– Loading data into a database from external sources

– Exporting data to different formats

Week 18: Working with Dates and Times

– Handling date and time data types

– Performing date calculations and manipulations

Week 19: Advanced Reporting Techniques

– Writing complex queries for reporting purposes

– Generating reports using SQL queries

Week 20: Introduction to Business Intelligence Tools

– Overview of BI tools like Tableau, Power BI, etc.

– Integrating SQL with BI for data analysis and visualization

Month 6: Capstone Project and Final Review

Week 21-24: Capstone Project

– Apply all the skills learned throughout the course to a real-world project

– Design and implement a database from scratch

– Develop complex queries, stored procedures, and reports

Week 25-26: Final Review and Certification

– Review key concepts and topics covered in the course

– Prepare for certification exams (if applicable)

– Receive course completion certificate

Additional Resources:

– Textbooks and online resources for further reading

– Practice exercises and quizzes for self-assessment

– Discussion forums for interacting with instructors and peers

– Access to a virtual lab environment for hands-on practice

Week 1: Introduction to SQL

Introduction to databases

A database is a structured set of data held in a computer. It provides ways to store, retrieve and organize information.

Why do we need them?

In the past few weeks, you stored and retrieved data using files. This is fine for simple data but it can quickly become an issue as your application becomes more complex and needs to store and manipulate more complicated data. For example, imagine you are asked to develop a booking system for a medium-sized hotel. You will need to store the list of rooms available for booking, and as you add more features, you will need to save users information, the reviews they post, the bookings each user makes and payment information. You can see that the data you need to handle can become very complicated, especially when you need to consider that data are not static, as they can be updated or deleted. To work more effectively with data, we can then use a database, which presents the following benefits:

A database defines a structure for your data and the relationships between entities
A database provides convenient and performant ways to safely store and retrieve data
A database provides a mechanism to check the validity of your data

Types of Databases

In general, there are two common types of databases:

Non-Relational
Relational

Non-Relational Database

In a non-relational database, data is stored in key-value pairs. For example:

Here, customers’ data are stored in key-value pairs.

Commonly used non-relational database management systems (Non-RDBMS) are MongoDB, Amazon DynamoDB, Redis, etc.

Relational Database

In a relational database, data is stored in tabular format. For example,

Here, customers is a table inside the database.

The first row is the attributes of the table. Each row after that contains the data of a customer.

In a relational database, two or more tables may be related to each other. Hence the term Relational. For example,

Here, orders and customers are related through customer_id.

Commonly used relational database management systems (RDBMS) are MySQL, PostgreSQL, MSSQL, Oracle etc.

Note: To access data from these relational databases, SQL (Structured Query Language) is used.

There are many different kinds of database and different implementations. Sometimes, a database type is a better fit to certain use case or certain problems. The most well-known database types include relational database, key/value database, graph database and document database (also known as NoSQL). For this class, we will focus specifically on relational database as they are the most widely used and supported. You can consult DB-Engines to see a ranking of the most used database, as you can see, there are a lot of them!

Introduction to PostgreSQL

“PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads. The origins of PostgreSQL date back to 1986 as part of the POSTGRES project at the University of California at Berkeley and has more than 30 years of active development on the core platform.” (source: postgresql.org)

What is SQL?

Pronounced S-Q-L or sequel
Stands for Structured Query Language
SQL is the standard language used to communicate with relational database
SQL statements are used to query, create, update, delete records in a database as well as many other tasks
SQL statements are executed by a RDBMS.

What is an RDBMS?

Stands for Relational Database Management System
It is a program that processes SQL statements to manage a relational database
PostgreSQL is an RDBMS.

What characterizes a relational database?

As mentioned previously, a relational database is a specific type of database. Data is stored in tables (relations) of rows (tuples) and columns (attributes) as per the example below:

Communicating with the database using SQL

We use SQL to perform actions on the database and initially we can use a terminal-like utility to do this. The utility is named psql and is run using the command:

psql <dbname> <username>

The command prompt from psql is the database name currently connected:

cyf_hotel=>

In psql, you can use the command help to show the help menu. Within the command prompt, you can enter SQL statements and run them against PostgreSQL. To quit psql, enter the command \q.

Download the following file to a directory on your computer. This file creates the sample data you can use for the following sections. To do this, click the file to open it in a github formatted page, then right click the Raw button in the bar just above the code and select Save As (or Save Link As or similar) to save it:

build-hotel.sql

Once you have the file downloaded to a known directory, execute the file build-hotel.sql from psql as shown below (replace /your/sql/path/ with the path to the download directory used above):

\include /your/sql/path/build-hotel.sql

Check that you have built all the required tables:

\dt

You should see a listing of your tables as follows (with your own username as owner):

Here is the table diagram of the hotel database:

Types of SQL jobs

Learning SQL will open up opportunities in a range of different careers. Let’s take a look at some of the options available.

Data scientist

A data scientist is an analytical data expert – they extract, analyse and interpret big data from a range of sources in order to solve problems. SQL is crucial to data scientists as databases are at the core of their work due to the data analytics they have to carry out.

Check out this in-depth data science course if you are interested in learning more about this career option.

SEO analyst

An SEO analyst analyses data and optimises site content in order to increase organic search traffic. SQL is beneficial to this role as they work with a lot of big data, and databases are far stronger than excel documents which are often used.

If the idea of an SEO role interests you, why not try our SEO and WordPress training course to get started?

Software engineer

A software engineer develops and builds computer systems software and application software. Being a software engineer requires knowledge of programming languages in order to build the software, and most programmers are required to have some knowledge of SQL.

If you’re interested in becoming a software engineer, our ExpertTrack on software development fundamentals will guide you through some of the essentials.

Business analyst

A business analyst analyses data and documents market environments to advise business decisions. This role massively ties into SQL as it is very data heavy, and you’ll almost definitely be working with relational databases.

To learn more about data for business, check out this incredible course on business analytics.

As well as SQL being useful to get you into one of these roles, it can also be beneficial to those with their own business or plans to start one up. Relational databases can help you store, sort and modify huge amounts of data.

How to learn SQL

Taking that first step to learn a programming language may seem daunting at first, but with patience and dedication, you can reach your full potential. Start off with the basics and gradually move on to more advanced commands when you’re ready.

Thankfully, there are many online courses and resources available to help you learn this useful language. Trying out a course, such as our introduction to databases and SQL, is a great place to start.

As well as online courses, another great way to learn is to watch video tutorials and become comfortable with the methods and software used.

One of the best ways to retain any information you learn and remember the SQL commands is to practise. You can download a free SQL database management system such as MySQL and put your knowledge to the test, playing with the functions and exploring the database.

There are even websites that simulate an SQL management system such as SQLfiddle. On sites like this, you can play around with writing statements without the need to install any software.

Once you are comfortable with your SQL knowledge, you can start working towards an SQL certification. This will look great on your CV, and will really boost your chances of getting one of those SQL jobs mentioned above.

Week 2: Basic SQL Syntax

The SQL data manipulation language (DML) is used to query and modify database data. In this chapter, we will describe how to use the SELECT, INSERT, UPDATE, and DELETE SQL DML command statements, defined below.

SELECT – to query data in the database

INSERT – to insert data into a table

UPDATE – to update data in a table

DELETE – to delete data from a table

In the SQL DML statement:

Each clause in a statement should begin on a new line.
The beginning of each clause should line up with the beginning of other clauses.
If a clause has several parts, they should appear on separate lines and be indented under the start of the clause to show the relationship.
Upper case letters are used to represent reserved words.
Lower case letters are used to represent user-defined words.

SELECT Statement

The SELECT statement, or command, allows the user to extract data from tables, based on specific criteria. It is processed according to the following sequence:

SELECT DISTINCT item(s)

FROM table(s)

WHERE predicate

GROUP BY field(s)

ORDER BY fields

We can use the SELECT statement to generate an employee phone list from the Employees table as follows:

This action will display the employee’s last name, first name, and phone number from the Employees table, seen in Table 16.1.

In this next example, we will use a Publishers table (Table 16.2). (You will notice that Canada is misspelled in the Publisher Country field for Example Publishing and ABC Publishing. To correct misspelling, use the UPDATE statement to standardize the country field to Canada – see UPDATE statement later in this chapter.)

This action will display the publisher’s name and city from the Publishers table.

If you just want the publisher’s name under the display name city, you would use the SELECT statement with no comma separating pub_name and city:

Performing this action will display only the pub_name from the Publishers table with a “city” heading. If you do not include the comma, SQL Server assumes you want a new column name for pub_name.

SELECT statement with WHERE criteria

Sometimes you might want to focus on a portion of the Publishers table, such as only publishers that are in Vancouver. In this situation, you would use the SELECT statement with the WHERE criterion, i.e., WHERE city = ‘Vancouver’.

These first two examples illustrate how to limit record selection with the WHERE criterion using BETWEEN. Each of these examples give the same results for store items with between 20 and 50 items in stock.

Example #1 uses the quantity, qty BETWEEN 20 and 50.

Example #2, on the other hand, uses qty >=20 and qty <=50 .

Example #3 illustrates how to limit record selection with the WHERE criterion using NOT BETWEEN.

The next two examples show two different ways to limit record selection with the WHERE criterion using IN, with each yielding the same results.

Example #4 shows how to select records using province= as part of the WHERE statement.

Example #5 select records using province IN as part of the WHERE statement.

The final two examples illustrate how NULL and NOT NULL can be used to select records. For these examples, a Books table (not shown) would be used that contains fields called Title, Quantity, and Price (of book). Each publisher has a Books table that lists all of its books.

Example #6 uses NULL.

Example #7 uses NOT NULL.

Using wildcards in the LIKE clause

The LIKE keyword selects rows containing fields that match specified portions of character strings. LIKE is used with char, varchar, text, datetime and smalldatetime data. A wildcard allows the user to match fields that contain certain letters. For example, the wildcard province = ‘N%’ would give all provinces that start with the letter ‘N’. Table 16.3 shows four ways to specify wildcards in the SELECT statement in regular express format.

Table 16.3. How to specify wildcards in the SELECT statement.

In example #1, LIKE ‘Mc%’ searches for all last names that begin with the letters “Mc” (e.g., McBadden).

For example #2: LIKE ‘%inger’ searches for all last names that end with the letters “inger” (e.g., Ringer, Stringer).

In, example #3: LIKE ‘%en%’ searches for all last names that have the letters “en” (e.g., Bennett, Green, McBadden).

SELECT statement with ORDER BY clause

You use the ORDER BY clause to sort the records in the resulting list. Use ASC to sort the results in ascending order and DESC to sort the results in descending order.

For example, with ASC:

And with DESC:

SELECT statement with GROUP BY clause

The GROUP BY clause is used to create one output row per each group and produces summary values for the selected columns, as shown below.

Here is an example using the above statement.

If the SELECT statement includes a WHERE criterion where price is not null,

then a statement with the GROUP BY clause would look like this:

Using COUNT with GROUP BY

We can use COUNT to tally how many items are in a container. However, if we want to count different items into separate groups, such as marbles of varying colours, then we would use the COUNT function with the GROUP BY command.

The below SELECT statement illustrates how to count groups of data using the COUNT function with the GROUP BY clause.

Using AVG and SUM with GROUP BY

We can use the AVG function to give us the average of any group, and SUM to give the total.

Exaple #1 uses the AVG FUNCTION with the GROUP BY type.

Example #2 uses the SUM function with the GROUP BY type.

Example #3 uses both the AVG and SUM functions with the GROUP BY type in the SELECT statement.

Restricting rows with HAVING

The HAVING clause can be used to restrict rows. It is similar to the WHERE condition except HAVING can include the aggregate function; the WHERE cannot do this.

The HAVING clause behaves like the WHERE clause, but is applicable to groups. In this example, we use the HAVING clause to exclude the groups with the province ‘BC’.

INSERT statement

The INSERT statement adds rows to a table. In addition,

INSERT specifies the table or view that data will be inserted into.
Column_list lists columns that will be affected by the INSERT.

If a column is omitted, each value must be provided.
If you are including columns, they can be listed in any order.
Columns with the IDENTITY property should not be explicitly listed in the column_list or values_clause.

The syntax for the “insert” statement is:

When inserting rows with the INSERT statement, these rules apply:

Inserting an empty string (‘ ‘) into a varchar or text column inserts a
single space.
All char columns are right-padded to the defined length.
All trailing spaces are removed from data inserted into varchar columns, except in strings that contain only spaces. These strings are truncated to a single space.
If an “insert” statement violates a constraint, default or rule, or if it is the wrong data type, the statement fails and SQL Server displays an error message.

A default value is entered if the column has a DEFAULT constraint, if a default is bound to the column, or if a default is bound to the underlying user-defined data type.
NULL is entered if the column allows NULLs and no default value exists for the column.
An error message is displayed and the row is rejected if the column is defined as NOT NULL and no default exists.

This example uses INSERT to add a record to the publisher’s Authors table.

This following example illustrates how to insert a partial row into the Publishers table with a column list. The country column had a default value of Canada so it does not require that you include it in your values.

To insert rows into a table with an IDENTITY column, follow the below example. Do not supply the value for the IDENTITY nor the name of the column in the column list.

Inserting specific values into an IDENTITY column

By default, data cannot be inserted directly into an IDENTITY column; however, if a row is accidentally deleted, or there are gaps in the IDENTITY column values, you can insert a row and specify the IDENTITY column value.

To allow an insert with a specific identity value, the IDENTITY_INSERT option can be used as follows.

Inserting rows with a SELECT statement

We can sometimes create a small temporary table from a large table. For this, we can insert rows with a SELECT statement. When using this command, there is no validation for uniqueness. Consequently, there may be many rows with the same pub_id in the example below.

This example creates a smaller temporary Publishers table using the CREATE TABLE statement. Then the INSERT with a SELECT statement is used to add records to this temporary Publishers table from the publis table.

In this example, we’re copying a subset of data.

In this example, the publishers’ data are copied to the tmpPublishers table and the country column is set to Canada

UPDATE statement

The UPDATE statement changes data in existing rows either by adding new data or modifying existing data.

This example uses the UPDATE statement to standardize the country field to be Canada for all records in the Publishers table.

This example increases the royalty amount by 10% for those royalty amounts between 10 and 20.

Including subqueries in an UPDATE statement

The employees from the Employees table who were hired by the publisher in 2010 are given a promotion to the highest job level for their job type. This is what the UPDATE statement would look like.

DELETE statement

The DELETE statement removes rows from a record set. DELETE names the table or view that holds the rows that will be deleted and only one table or row may be listed at a time. WHERE is a standard WHERE clause that limits the deletion to select records.

The DELETE syntax looks like this.

The rules for the DELETE statement are:

If you omit a WHERE clause, all rows in the table are removed (except for indexes, the table, constraints).
DELETE cannot be used with a view that has a FROM clause naming more than one table. (Delete can affect only one base table at a time.)

What follows are three different DELETE statements that can be used.

Deleting all rows from a table.

Deleting selected rows:

Deleting rows based on a value in a subquery:

Built-in Functions

There are many built-in functions in SQL Server such as:

Aggregate: returns summary values
Conversion: transforms one data type to another
Date: displays information about dates and times
Mathematical: performs operations on numeric data
String: performs operations on character strings, binary data or expressions
System: returns a special piece of information from the database
Text and image: performs operations on text and image data

Below you will find detailed descriptions and examples for the first four functions.

Aggregate functions

Aggregate functions perform a calculation on a set of values and return a single, or summary, value. Table 16.4 lists these functions.

Table 16.4 A list of aggregate functions and descriptions.

Below are examples of each of the aggregate functions listed in Table 16.4.

Example #1: AVG

Example #2: COUNT

Example #3: COUNT (*)

Example #4: MAX

Example #5: MIN

Example #6: SUM

Conversion function

he conversion function transforms one data type to another.

In the example below, a price that contains two 9s is converted into five characters. The syntax for this statement is SELECT ‘The date is ‘ + CONVERT(varchar(12), getdate()).

In this second example, the conversion function changes data to a data type with a different size.

Date function

The date function produces a date by adding an interval to a specified date. The result is a datetime value equal to the date plus the number of date parts. If the date parameter is a smalldatetime value, the result is also a smalldatetime value.

The DATEADD function is used to add and increment date values. The syntax for this function is DATEADD(datepart, number, date).

In this example, the function DATEDIFF(datepart, date1, date2) is used.

This command returns the number of datepart “boundaries” crossed between two specified dates. The method of counting crossed boundaries makes the result given by DATEDIFF consistent across all data types such as minutes, seconds, and milliseconds.

For any particular date, we can examine any part of that date from the year to the millisecond.

The date parts (DATEPART) and abbreviations recognized by SQL Server, and the acceptable values are listed in Table 16.5.

Table 16.5. Date part abbreviations and values.

Mathematical functions

Mathematical functions perform operations on numeric data. The following example lists the current price for each book sold by the publisher and what they would be if all prices increased by 10%.

Joining Tables

Joining two or more tables is the process of comparing the data in specified columns and using the comparison results to form a new table from the rows that qualify. A join statement:

Specifies a column from each table
Compares the values in those columns row by row
Combines rows with qualifying values into a new row

Although the comparison is usually for equality – values that match exactly – other types of joins can also be specified. All the different joins such as inner, left (outer), right (outer), and cross join will be described below.

Inner join

An inner join connects two tables on a column with the same data type. Only the rows where the column values match are returned; unmatched rows are discarded.

Example #1

Example #2

Left outer join

A left outer join specifies that all left outer rows be returned. All rows from the left table that did not meet the condition specified are included in the results set, and output columns from the other table are set to NULL.

This first example uses the new syntax for a left outer join.

This is an example of a left outer join using the old syntax.

Right outer join

A right outer join includes, in its result set, all rows from the right table that did not meet the condition specified. Output columns that correspond to the other table are set to NULL.

Below is an example using the new syntax for a right outer join.

This second example show the old syntax used for a right outer join.

Full outer join

A full outer join specifies that if a row from either table does not match the selection criteria, the row is included in the result set, and its output columns that correspond to the other table are set to NULL.

Here is an example of a full outer join.

Cross join

A cross join is a product combining two tables. This join returns the same rows as if no WHERE clause were specified. For example:

Key Terms

aggregate function: returns summary valuesASC: ascending order
conversion function: transforms one data type to another
cross join: a product combining two tables
date function: displays information about dates and times
DELETE statement: removes rows from a record set
DESC: descending order
full outer join: specifies that if a row from either table does not match the selection criteria
GROUP BY: used to create one output row per each group and produces summary values for the selected columns
inner join: connects two tables on a column with the same data type
INSERT statement: adds rows to a table
left outer join: specifies that all left outer rows be returned
mathematical function: performs operations on numeric data
right outer join: includes all rows from the right table that did not meet the condition specified
SELECT statement: used to query data in the database
string function: performs operations on character strings, binary data or expressions
system function: returns a special piece of information from the database
text and image functions: performs operations on text and image data
UPDATE statement: changes data in existing rows either by adding new data or modifying existing data
wildcard: allows the user to match fields that contain certain letters.

Week 3: Filtering and Sorting Data

Terms You Need to Understand

```
The WHERE clause
```
```
The ORDER BY clause
```
```
Filtering
```
```
Sorting
```
```
Comparison condition
```
```
Logical condition
```
```
Top-N query
```
```
Ascending sort
```
```
Descending sort
```

Concepts You Need to Master

```
Filtered queries
```
```
Sorted queries
```
```
Precedence of logical conditions
```

What the available comparison conditions are

What the available logical conditions are

Comparison conditions compare expressions

Logical conditions allow for multiple comparisons

What equi, anti, and range comparison conditions are

The use of LIKE, IN, EXISTS, BETWEEN, ANY, SOME, and ALL comparison conditions

Logical condition precedence: (), NOT, AND, OR

```
NULL values and sorting
```
```
Sorting methods
```

This chapter covers filtering of rows using the WHERE clause and sorting of retrieved data using the ORDER BY clause. The WHERE clause applies to both queries and most DML commands; the ORDER BY clause applies to queries only.

Filtering with the WHERE Clause

The WHERE clause extends the syntax of the SELECT statement, allowing filtering of rows returned from a query.

NOTE

A WHERE clause is applied to a query during the initial reading process, regardless of whether reading of rows involves I/O, reading from database buffer cash, or both. Other clauses, such as the ORDER BY clause, are applied after all rows are retrieved. Thus, applying WHERE clause filtering first limits the number of rows sorted by an ORDER BY clause, for instance. Proper use of WHERE clause filtering is good coding practice, leading to better-performing queries in general.

WHERE Clause Syntax

The preceding chapter examined the basics of the SELECT statement with respect to retrieving data from the database. The basic SELECT statement is made up of a SELECT clause, with a list of items to be retrieved, plus a FROM clause. In its simplest form the SELECT clause specifies columns in a table, and the FROM clause specifies the table from which column values are to be selected.

The WHERE clause allows inclusion of wanted rows and filtering out of unwanted rows. The syntax for the WHERE clause is shown in Figure

The WHERE clause is an optional addition to a SELECT command. The WHERE clause can also be used in both UPDATE and DELETE DML commands.
The WHERE clause is used to filter out unwanted rows from the resulting row set or retain required rows.
The WHERE clause in its simplest form is a simple comparison between two expressions. An expression can be a simple column, or include schema and table or view names, aliases, and even another expression.

Figure 3.1 shows a number of important points.

A WHERE clause is structured as follows:

The following example finds all movies ranking at less than 1000:

The WHERE clause is shown in the preceding example and in the following example such that the two expressions RANK and 1000 are compared using the comparison condition <. RANK is a column in the MOVIE table and 1000 is an expression:

It follows that both sides of the comparison condition can be table columns:

Additionally, both sides of the comparison condition can be expressions:

There are a multitude of conditions, and it is necessary to examine all possible comparison conditions available for use in the WHERE clause.

Comparison Conditions

Comparison conditions allow for two expressions to be compared with each other in various ways. These different methods of comparison depend on the comparison condition used, as listed here:

Equi (=), Anti (!=, <>), and Range (<, >, <=, >=). Equi implies equality (=) between two expressions that are being compared. Anti implies that two expressions being compared are not equal to each other (!= or <>). Range implies that one expression is greater than (>), less than (<), less than or equal to (=>), or greater than or equal to (>=).

Syntax:

For example:

[ NOT ] LIKE. LIKE uses special wild card characters performing pattern matching between expressions. The % (percentage) character attempts to match zero or more characters in a string, and the _ (underscore) character matches exactly one character in a string.

NOTE
The underscore character _ is also known as the underbar character.
Syntax:

For example, this query finds all movies with the vowel e anywhere in the movie title:

The next query finds all movies beginning with a capital letter A:

This query finds only movies with the vowel e in the second character position of their title:

NOTE
Queries using strings in expressions are case-sensitive, and thus uppercase A is different from lowercase a just as uppercase E is different from lowercase e.
[ NOT ] IN. IN is used to test for membership of an expression within a set or list of expressions.

NOTE
Oracle calls IN a membership condition.
Syntax:

For example:

Using strings:

Using the negative form of IN:

NOTE
IN is best used for short lists of literal values.
[ NOT ] EXISTS. Like IN, the EXISTS comparison condition is used to test for membership of an expression within a set or list of expressions.

Syntax:

CAUTION

Don’t forget that EXISTS has no expression on the left, only on the right.
For example, this query will find all rows because the subquery always exists:

A better use for EXISTS is typically to validate the calling query against the results of a subquery. The following query finds all movies with no recognition:

An equivalent query using IN would be as follows:

NOTE
Subqueries are covered in detail in Chapter 8, “Subqueries and Other Specialized Queries.”
[ NOT ] BETWEEN. BETWEEN verifies expressions between a range of two values.

Syntax:

The first of the following examples finds all values between and including a range of 900 to 1000 and is the equivalent of the second example using simple range conditions and a conjunction:

The second of the preceding two queries uses the AND logical condition. Logical conditions (conjunctions) are discussed in the next section.
This next example will produce no rows at all because there is no such range beginning at 1000, counting upwards to 900:

CAUTION
This is a potential trick question: BETWEEN 1000 AND 900 is invalid but does not cause an error in SQL*Plus.
This example finds rows between a range of string values:

NOTE
Oracle calls BETWEEN a range condition.
ANY, SOME, and ALL. These comparisons all check an expression for membership in a set or list of elements. ANY and SOME are the same and allow a response when any element in the set matches. ALL produces a response only when all elements in a set match an expression.

Syntax:

Both of the following two examples will produce the same result, returning all movies made in the years 1998, 1999, and 2000:

This example would produce no result because there are no movies made in all three of the years 1998, 1999, and 2000:

IS [ NOT ] NULL. NULL values are tested for using the IS NULL comparison. In testing for NULL, IS NULL implies equal to NULL and IS NOT NULL implies not equal to NULL. In other words, = NULL and != NULL are both syntactically invalid and will cause errors.

Syntax:

The following two queries will include only NULL values and exclude only NULL valued TYPECAST actors, respectively:

NOTE

The function NVL(<expression>, <replace>) is used to replace NULL expressions with the replacement value (see Chapter 5, “Single Row Functions”). The SQL*Plus environmental setting SET NULL has the same effect in SQL*Plus (see Chapter 9, “SQL*Plus Formatting”).

Caution – Make sure you understand all comparison conditions. The preceding list covers WHERE clause comparison conditions. You can also join multiple comparisons together using conjunctions, otherwise known as logical conditions.

Logical Conditions

Following is a synopsis of the syntax diagram as shown in Figure 3.2:

Different pairs of one or more expressions can be linked together in the same WHERE clause using the logical operators NOT, AND and OR.

NOT has highest precedence (is processed first), followed by AND and then OR.

Logical conditions can be used to form conjunctions or concatenations between multiple comparisons in a WHERE clause. There can be any number of comparison conjunctions. As shown in Figure 3.2 there are two logical conditions: AND and OR. Both AND and OR can also have the optional NOT clause applied, resulting in the opposite.

NOTE

NOT by itself, as well and AND and OR, is sometimes classified as a logical condition, even though it only reverses AND and OR. AND requires that both of two comparison expressions must be true for a true result. OR requires that only one of two comparison expressions must be true for a true result.

The following lines are syntax examples of AND and OR:

AND requires that both <expression1> and <expression2> are true:

OR requires that either <expression1> or <expression2> is true:

CAUTION

Make sure you understand the use of AND, OR, and NOT logical conditions.

The precedence of logical conditions by default is first left to right, followed by NOT, AND, and finally OR.

CAUTION

Remember the precedence sequence of logical conditions: (), NOT, AND, OR.

Precedence is explained from a mathematical perspective in Chapter 4, “Operators, Conditions, Pseudocolumns, and Expressions.” Additionally, the rules of precedence apply where parentheses (round brackets) can be used to change the order of resolution of an expression, or increase the precedence of a bracketed part of an expression. Thus, the use of parentheses can change the order of evaluation of NOT, AND, and OR. The term precedence means that one part of an expression is forced to be executed before other parts.

The following syntax demonstrates precedence further. <expression1> is evaluated first, followed by <expression2> and finally <expression3>:

In the next example, <expression1> is still evaluated first, but it is compared using OR with the result of <expression2> and <expression3>, not simply <expression2>, followed by a spurious AND conjunction with <expression3>:

<expression1> OR (<expression2> AND <expression3>) Questions on the precedence of logical conditions using parentheses are very likely. This simple example returns movies with regular rankings of greater than 1000 that have a review ranking of greater than 4, two different types of rankings:

SELECT TITLE, RANK, REVIEW_RANK, YEAR FROM MOVIE WHERE RANK > 1000

OR REVIEW_RANK > 4;

Figure 3.3 shows the result of the following two examples:

SELECT TITLE, RANK, REVIEW_RANK, YEAR FROM MOVIE

WHERE YEAR = 2000 AND RANK > 1000 OR REVIEW_RANK > 4;

SELECT TITLE, RANK, REVIEW_RANK, YEAR FROM MOVIE

WHERE YEAR = 2000 AND (RANK > 1000 OR REVIEW_RANK > 4);

Note in Figure 3.3 how the two different queries retrieve different numbers of rows. This is a direct result of the use of parentheses, changing the precedence (sequence of evaluation) of the logical conditions AND and OR.

CAUTION

Don’t get confused! Comparison and logical conditions are sometimes known as comparison and logical operators. Oracle documentation uses the terms comparison and logical conditions.

The Importance of Precedence

The syntax

evaluates differently from this:

<expression1> OR (<expression2> AND <expression3>)

The importance of precedence generally determines that a clause without proper precedence such as p OR q AND r will produce a spurious result. On the other hand, p OR (q AND r) forces q AND r to be evaluated before comparison with p. This implies that p OR the result of q AND r produces a true response. For example, if p=round, q=large, and r=four-sided, then testing for a large rectangle will succeed, and correctly so. On the other hand, p OR q AND r will fail because a large four-sided rectangle cannot possibly be both round and four-sided. Mathematically, p OR (q AND r) implies that either p is true or the combination of q AND r is true; p does not have to be true. On the contrary, p OR q AND r effectively implies (p OR q) AND r, a completely different expression, stating that q can be false if r is true and the expression will still yield a true result, which is false. Fascinating, huh?

select * from

(select title from movie order by title)

where rownum < 4;

There are two important points to remember about Top-N queries:

where rownum > n produces a NULL result. The following example will return a result of “no rows selected” (null).

There are two important points to remember about Top-N queries:

Where Rownum > n produces a null result. The following example will return a result of “no rows selected” (null).

Select * from (select title, rank, Review_ Rank, year from movie order by rank desc)

Where Rownum > 4;

Caution

where Rownum > n will produce no rows. This is a likely trick question!

Top-N queries can be confused by application of an order by clause. The Rownum pseudocolumn filters out all rows but those specified as being less than a specified value. If a sort order is applied to the calling query containing the Rownum filter as opposed to the subquery, a spurious result could occur. This is aptly demonstrated in Figure 3.4.

CAUTION

Remember to place an ORDER BY clause in the inline view subquery section of a Top-N query. You need to sort results before the ROWNUM comparison.

Figure 3.4 shows a notable difference between placing an ORDER BY clause in an inline view and placing it outside of an inline view. The reason is that the ORDER BY clause in a query will always be executed on the filtered result. In other words, the WHERE clause is always executed before the ORDER BY clause. Obviously, placing an ORDER BY clause inside the inline view resolves this issue.

Sorting with the ORDER BY Clause

So far in this chapter, we have expanded the SELECT statement (including the FROM clause) with the WHERE clause for filtering. Now we will expand the SELECT statement further with the ORDER BY clause. The ORDER BY clause is used to sort rows returned by a query.

CAUTION

The ORDER BY clause is always executed on filtered query results, namely after the WHERE clause. Obviously, if there is no WHERE clause, this does not apply.

ORDER BY Clause Syntax

The syntax for the SELECT statement, including details of the optional ORDER BY clause, is shown in Figure 3.5.

Filtering (WHERE) and Sorting Data (ORDER BY)

Sorting with the ORDER BY Clause

CAUTION

The ORDER BY clause is always executed on filtered query results, namely after the WHERE clause. Obviously, if there is no WHERE clause, this does not apply.

ORDER BY Clause Syntax

The syntax for the SELECT statement, including details of the optional ORDER BY clause

Following is a synopsis of the syntax diagram The ORDER BY clause is optional and is used for sorting filtered row sets. Thus, a WHERE clause will always appear before an ORDER BY clause.

Elements in the ORDER BY clause can refer to columns, aliased columns, expressions (those expressions can be in the SELECT list or not), or positions of elements in the SELECT list.

Each element in an ORDER BY clause is sorted in ascending order by default. Each element can be forcibly sorted individually in ascending order (ASC) or descending order (DESC).

By default, in ascending order, NULL values will always be sorted last and thus appear last. In descending order NULL values will appear first. Sorted order of NULL values can be overridden using the NULLS FIRST clause and the NULLS LAST clause (the default). NULLS FIRST returns NULL values at the start of a query, and NULLS LAST returns NULL values at the end of a query.

Sorting Using the ORDER BY Clause
Rows in a query are sorted according to the sequence of elements listed in the ORDER BY clause, from left to right. Therefore, the leftmost ORDER BY clause element is the most significant sort parameter, and the rightmost element is the least important sort parameter.

we can see that all females (F) are sorted before all males (M). The TYPECAST column is sorted within GENDER. As a result, Teri Garr as a Comedian appears before George Clooney as an Action Drama actor.

Sorting Methods

The following example sorts on a single column:

The next example sorts on multiple columns, not in the order in which the columns are selected:

This example sorts columns both in the selected columns list and not in that list but in the accessed MOVIE table:

The next query will produce exactly the same result as the preceding query using positions of select list elements. YEAR is in fourth position, TITLE is in first position, and RANK is in second position:

If you change the sequence of select list elements, you have to change position numbers as well. This query produces the same rows as before but with the columns in a different order:

You can sort using aliased columns, even including those not in the select list:

And following from the previous query, we can sort using expressions in the select list:

You can even sort using an expression not in the query select list:

NOTE

Sorting using expressions implies that PL/SQL user-defined procedures can be used to sort with as well. PL/SQL is out of the scope of this book.

It follows that we can combine different sorting methods in the same ORDER BY clause. The following example is sorted by a column select list position, a column name in the select list, a column name in the tables but not in the select list, and finally an expression:

Sorting Modifiers

We can modify the way in which each column within an ORDER BY clause is sorted as shown in Figure 3.5, the ORDER BY clause syntax diagram, by using the ASC | DESC and NULLS { FIRST | LAST } optional modifiers.

If you look closely at Figure 3.6 again, you will see that the last two females, Madelaine Kahn and Diane Lane, have no TYPECAST setting. Their TYPECAST column values are NULL because they have not been set to anything, not even a space character. The default setting is NULLS LAST, meaning that NULL values are sorted last, as in Figure 3.6. If we wanted to specify NULLS LAST explicitly, we could use the following query modification:

You could return the NULL TYPECAST rows at the beginning of the females by using the following modification:

Now we can reorder the same query. Specify both of the individual columns as being ordered in descending order, placing males before females. Also, specify all TYPECAST values in reverse alphabetical order and NULLS FIRST:

NOTE

A NULL value is logically undefined and thus cannot be sorted, and can be placed only at the start of a sort (NULLS FIRST) or at the end of a sort (NULLS LAST).

This chapter has expanded the SELECT statement with the WHERE clause and the ORDER BY clause to cater for filtering and sorting of returned rows, respectively. The next chapter is the first of two chapters examining some Oracle SQL reference material, essential at this stage, covering operators, conditions, pseudocolumns, and expressions.

Week 4: Working with Multiple Tables

SQL Joins – LEFT Join, RIGHT Join, and INNER Join Explained

SQL is a programming language we use to interact with relational databases. SQL databases contain tables, which contain rows of data. These tables usually contain similar or related data.

In an office management web application database, you would have tables for employees, their departments, their managers, the projects they work on, and so on depending on the structure of your application.

In the employees table, you would find data like the employee ID, name, salary, department ID (used to link the employee to the department), and other fields that match your needs. The other tables would also contain data for their specific entities.

What Are Joins?

If you ever need to bring multiple tables in your database together to access the data, you use a JOIN.

Joins let you fetch data that is scattered across tables. For example, using the database tables that we’ll create in a moment, we’ll be able to get all the details of an employee, along with their manager name, and department they’re working in by using a join.

A join lets you use a single query to achieve this. You use a join because you can only get this data by bringing data from the employees table, departments table, and projects table together. In simple terms, you would be JOIN-ing these tables together.

To perform a join, you use the JOIN keyword. And we’ll see how it works in this tutorial.

Prerequisites:

To continue with this tutorial, you should know the basics of insertion and retrieval operations with SQL.

Also, you can setup a demo database that we’ll use for this article. The database should have tables like this:

CREATE TABLE employees (

id int,

emp_name varchar(100),

salary int,

dept_id int,

manager_id int

);

INSERT INTO employees

VALUES (1, ‘Idris’, 1000, 1, 1), (2, ‘Aweda’, 2000, 2, 2), (3, ‘Zubair’, 3000, 3, 2), (4, ‘Young’, 4000, 3, 3), (5, ‘Babu’, 5000, 1, 3), (6, ‘John’, 1000, 8, 1);

CREATE TABLE departments (

id int,

dept_name varchar(100)

);

INSERT INTO departments

VALUES (1, ‘Engineering’), (2, ‘Product’), (3, ‘Marketing’), (4, ‘Support’);

CREATE TABLE managers (

id int,

manager_name varchar(100),

dept_id int

);

INSERT INTO managers

VALUES (1, ‘Doe’, 1), (2, ‘Jane’, 2), (3, ‘May’, 4);

CREATE TABLE projects (

id int,

project_name varchar(100),

emp_id int

);

INSERT INTO projects

VALUES (1, ‘Fintech App’, 1), (1, ‘Fintech App’, 5), (1, ‘Fintech App’, 6), (2, ‘Cooking Website’, 1), (2, ‘Cooking Website’, 2);

SQL to quickly setup the tables used in the article

How to Use an Inner Join in SQL

There are many types of joins in SQL, and each one has a different purpose.

The inner join is the most basic type of join. It is so basic that sometimes, you can omit the join keyword and still perform an inner join.

For example, say you want to fetch the name of all employees in the organization, along with the name of their departments.

A simple join like this would do:

A very simple join query

So how does this actually work?

In this context, data retrieval involves querying multiple tables, each of which is assigned an alias. Utilizing aliases proves advantageous, particularly when dealing with identically named fields across tables, such as the ‘id’ field in this instance. This practice facilitates streamlined access to distinct fields by employing succinct aliases.

Furthermore, within the SELECT clause of the query, we explicitly specify the desired columns, employing the aliases to denote the source table for each value

And finally, to ensure that only correct values are matched to each other, and clause of the query specifies the conditions that must be met to be joined

So for the first employee, the dept_id is 1, so we fetch the department with id = 1, and it’s name is returned. This happens for as many rows as there are in the employees table.

The result of the query looks like this:

Here, notice that the number of employees returned is smaller than the number of employees that actually exist. This is because when you use an inner join, you only get records that exist in both tables.

That is, the employee with id = 6 that was not returned has a dept_it = 8. Now, this department isn’t in the departments table, so it wasn’t returned.

Another way to achieve this same result would be to actually spell out the join like this:

Or use the INNER JOIN like this:

These queries return exactly the same result as the first one. But they are more readable as they’re explicit.

The ‘on’ keyword in the query is used to specify the conditions for the join. It’s the same condition as in the first query.

INNER JOIN Use Case

In real applications, you use an inner join when only records that exist in both tables matter.

For example, in an inventory management application, you could have a table for sales, and another for products. Now, the sales table will contain product _ id (a reference to the sold product), along with other details like sold _ at (when the product was sold) and maybe customer details.

The products table, on the other hand, will have the name, price, and maybe the quantity of every product.

Now say it’s the end of the week and you need to do a sales report. You would need to fetch all sales records, along with the product name and price to display on a dashboard or export as a csv of some sort.

To do this you would use an inner join of the products table on the sales table because you do not care about products that were not sold – you only want to see every sale that was made, and the name and price of the product that was sold.

INNER JOIN Use Case

How to Use a Left Join in SQL

In another scenario, you might want to fetch all the employee names and their department names, but this time without leaving any employee or department name out. Here, you’d use a left join.

To get all employee names along with their department names, you can change your query to use a left join like this:

The result of this query looks like this now:

Now, employee with id = 6 and dept_id = 8 is returned, with the department name being set as NULL because there is no department with id = 8.

LEFT JOIN Use Case

In real applications, you use a left join when there’s a primary, always existing entity that can be related to another entity that doesn’t always exist.

An easy use case would be in a multi-vendor ecommerce application, after a user signs up they can set up a store and add products to the store.

A user, on signing up, doesn’t automatically have a store until they create it. So if you try to view all users, with their store details, you would use a left join of the stores table on the users table. This is because every record in the users table is important, store or no store.

When the user has a store set up, the store details are returned, and if otherwise, null is returned. But, you wouldn’t be losing any existing data.

How to Use a Right Join in SQL

The right join works like the opposite of the left join.

Now, your result looks like this:

Now, every department in the departments table was returned. And employees in those departments were returned too. For the last row, there is no employee with dept _ id = 4, which is why the NULL value gets returned.

RIGHT JOIN Use Case

In practice, the right join functions similarly to the left join. The distinction between them lies in the prioritization of the tables being joined.

How to Combine JOINS in SQL

So far, we’ve only joined one table to another. But, you can actually join as many tables as you like by using any or all of these joins together as you like.

For example, say you want to fetch the names of all employees, with their department names, manager names, and projects names. You would have to join the employees table to the departments table, the managers table, and the projects table. You can achieve this using this query:

In this query start in the employees table as a base table, then left join the departments table and next left join the managers table and the projects table also.

The result of this query will look like this:

The reason for using a left join here is because you have to fetch all employees. You could use an inner join in place of the left join in the managers table because all employees have a manager id. But to be safe, you can just use the lefty join.

How to Use a Cross Join in SQL

This operation is commonly referred to as a Cartesian join.

For example, if you do a cross join between tables employees and departments, your result will look like this:

CROSS JOIN employees and departments table.

CROSS JOIN of employees and departments tables.

Here you have 24 rows, which is a product of the number of rows in the employees table, 6, and the number of rows in the departments table, 4. The records were returned so that for every record in the employees table, it is mapped to a record in the departments table.

CROSS JOIN Use Case

A common use case of cross join would be in an ecommerce application it is possible to have size or color variations of all products. If you ever need to fetch a list of all products in different sizes, like this:

Cross joining a sizes table that contains an id for each size, a string size that can be either ‘Small’, ‘Medium’, or ‘Large’ and another field called ratio to affect how this size affects the product price. So, for every product, it is mapped to a size, and the price is calculated.

How to Use a Self Join in SQL

As the name implies, is when you try to join a table to itself. There is no self JOIN keyword.

Take this new categories table, for example. This table contains both main categories and sub-categories. If you ever have to fetch the categories and their sub-categories, you can use a SELF JOIN.

Here, see how the table was referenced twice. Be careful with the alias as it’s important in differentiating both instances. The result of this query looks like this:

SELF JOIN Use Case

In many applications, you find hierarchical data stored in a single table. Like the category and sub-category as shown in the previous example. Or as an employee and manager, because they’re both employees of the company.

In the latter, the table will have fields such as id, manager id (this is basically the id of another employee). To do this fetch, do a SELF JOIN of the employees table on the employees table like this:

This would correctly return the managers and the number of employees working under them.

Summary

I hope you now understand SQL JOINs, the different types, and when to use them so you can write better queries.

All the JOINs here work with MySQL. There are other JOINs like FULL OUTER JOIN and NATURAL JOIN that we didn’t discuss, but you can look into them yourself if you like.

If you have any questions or relevant advice, please get in touch with me to share them.

To read more of my articles or follow my work, you can connect with me on LinkedIn, Twitter, and Github. It’s quick, it’s easy, and it’s free!

For Illustration

Join multiple tables

If you want to get something meaningful out of data, you’ll almost always need to join multiple tables. In this article, we’ll show how to do that using different types of joins. To achieve that, we’ll combine INNER JOINs and LEFT JOINs. So, let’s start.

The Model

In the picture below you can see out existing model. It consists of 6 tables and we’ve already, more or less, described it in the previous articles.

The database is modeled and presented in a good manner (choosing names wisely, using naming convention, following the same rules throughout the whole model, lines/relations in schema do not overlap more than needed). This is crucial because before you join multiple tables, you need to identify these tables first.

We’ll talk about naming conventions and the advice on how to think when you’re writing SQL queries, later in this series. So far, let’s live with the fact that this model is pretty simple and we can do it fairly easily.

What do we know so far?

In this series, we’ve covered:

Basics related to SQL SELECT statement, and Compared INNER JOIN and LEFT JOIN

Combine both articles to write more complex SELECT statements that will join multiple tables.

Join multiple tables using INNER JOIN

Each example will begin with a precise problem definition, followed by the corresponding query solution. Let’s commence with the first scenario.

#1 We need to list all calls with their start time and end time. For each call, we want to display what was the outcome as well as the first and the last name of the employee who made that call. We’ll sort our calls by start time ascending.

Before we write the query, we’ll identify the tables we need to use. To do that, we need to determine which tables contain the data we need and include them. Also, we should include all tables along the way between these tables – tables that don’t contain data needed but serve as a relation between tables that do (that is not the case here).

The query that does the job is given below:

The query result is given below:

There are a few things I would like to point out here:

The tables we’ve joined are here because the data we need is located in these 3 tables
I adhere to the format (table _name. attribute _name) (e.g., employee. First _name). While this practice may seem unnecessary, it serves as a best practice to mitigate potential errors that could arise when multiple tables within the same query possess attributes with identical names.
We’ve used INNER JOIN 2 times in order to join 3 tables. This will result in returning only rows having pairs in another table
When you’re using only INNER JOINs to join multiple tables, the order of these tables in joins is not important. The only important thing is that you use appropriate join conditions after the “ON” (join using foreign keys)

Since all calls had related employee and call outcome, we would get the same result if we’ve used LEFT JOIN instead of the INNER JOIN

Join multiple tables using LEFT JOIN

Writing queries that use LEFT JOINs doesn’t differ a lot when compared to writing queries using INNER JOINs. The result would, of course, be different (at least in cases when some records don’t have a pair in other tables).

This is the problem we want to solve.

#2 List all counties and customers related to these countries. For each country display its name in English, the name of the city customer is located in as well as the name of that customer. Return even countries without related cities and customers.

The tables containing data we need are in the picture below:

First, let’s quickly check what the contents of these 3 tables are.

We can notice two important things:

While each city has a related country, not all countries have related cities (Spain & Russia don’t have them)
Same stands for the customers. Each customer has the city_ id value defined, but only 3 cities are being used (Berlin, Zagreb & New York)

Let’s first write down the query using INNER JOIN:

The query result is shown in the picture below:

We have 7 counties and 6 cities in our database, but our query returns only 4 rows. That is the result of the fact we have only 4 customers in our database. Each of these 4 is related to its city and the city is related to the country. So, INNER JOIN eliminated all these countries and cities without customers. But how to include these in the result too?

To do that, we’ll use LEFT JOIN. We’ll simply replace all “INNER” with “LEFT” so our query is as follows:

The result is shown in the picture below:

You can easily notice that now we have all the countries, even those without any related city (Russia & Spain), as well all cities, even those without customers (Warsaw, Belgrade & Los Angeles). The remaining 4 rows are the same as in the query using INNER JOIN.

LEFT JOIN – Tables order matters

While the order of JOINs in INNER JOIN isn’t important, the same doesn’t stand for the LEFT JOIN. When we use LEFT JOIN in order to join multiple tables, it’s important to remember that this join will include all rows from the table on the LEFT side of the JOIN. Let’s rearrange the previous query:

At first, you could easily say, that this query and the previous one are the same (this is true when using INNER JOIN). We’ve used the same tables, LEFT JOINs, and the same join conditions. Let’s take a look at the output first:

So, what happened here? Why do we have 4 rows (same 4 we had when we’ve used INNER JOIN)?

The answer is simple and it’s related to how LEFT JOIN works. It takes the first table (customer) and joins all its rows (4 of them) to the next table (city). The result of this is 4 rows because the customer could belong to only 1 city. Then we join these 4 rows to the next table (country), and again we have 4 rows because the city could belong to only 1 country.

The reason why we wouldn’t join these 3 tables in this way is given by the text of the example #2. The query is written in such a manner that returning 4 rows would be the answer to the following: Return names of all customers as well as cities and countries they are located in. Return even customers without related cities and countries.

Note: When you’re using LEFT JOIN, the order of tables in that statement is important and the query will return a different result if you change this order. The order actually depends on what you want to return as a result.

Join multiple tables using both – INNER JOIN & LEFT JOIN

This is also possible. Let’s again go with an example.

#3 Return the list of all countries and cities that have pairs (exclude countries which are not referenced by any city). For such pairs return all customers. Return even pairs not having a single customer.

The query that does the job is:

The result of the query is given in the picture below:

You can easily notice that we don’t have countries without any related city (these were Spain & Russia). The INNER JOIN eliminated these rows. Still, we do have cites without any customers (Belgrade, Los Angeles & Warsaw). This is the result of the fact we used LEFT JOIN between tables city and customer.

Conclusion

When you need to join multiple tables, you have INNER & LEFT JOIN on your disposal (RIGHT JOIN is rarely used and can be easily replaced by LEFT JOIN). Which join you’ll use depends directly on the task you need to solve and you’ll get the feeling along the way. In upcoming articles, we’ll discuss how to think and organize yourself when you need to write more complex queries.

Month 2: Advanced SQL Queries

Week 5: Aggregating Data

SQL Group By Tutorial: Count, Sum, Average, and Having Clauses Explained

The GROUP BY clause is a powerful but sometimes tricky statement to think about.

Even eight years later, every time I use a GROUP BY I have to stop and think about what it’s actually doing.

In this article we’ll look at how to construct a GROUP BY clause, what it does to your query, and how you can use it to perform aggregations and collect insights about your data.

Here’s what we’ll cover:

Setting up your database
Setting up example data (creating sales)
How does a GROUP BY work?
Writing GROUP BY clauses
Aggregations ( COUNT, SUM, AVG)
Working with multiple groups
Using functions in the GROUP BY
Filtering groups with HAVING
Aggregates with implicit grouping

Setting up your database
Before we can write our queries we need to setup our database.

For these examples we’ll be using PostgreSQL, but the queries and concepts shown here will easily translate to any other modern database system (like MySQL, SQL Server, and so on).

To work with our PostgreSQL database, we can use psql—the interactive PostgreSQL command line program. If you have another database client that you enjoy working with that’s fine too.

To begin, let’s create our database. With PostgreSQL already installed, we can run the command createdb <database-name> at our terminal to create a new database. I called mine fcc:

Next let’s start the interactive console by using the command psql, and connect to the database we just made using \c <database-name>:

Note: I’ve cleaned up the psql output in these examples to make it easier to read, so don’t worry if the output shown here isn’t exactly what you’ve seen in your terminal.

I encourage you to follow along with these examples and run these queries for yourself. You will learn and remember far more by working through these examples rather than just reading them.

Setting up the data (creating sales)

For our examples we’ll use a table that stores the sales records of various products across different store locations.

We’ll call this table sales, and it will be a simple representation of store sales: the location name, product name, price, and the time it was sold.

If we were building this table in a real application we’d set up foreign keys to other tables (like locations or products). But for illustrating the GROUP BY concepts we’ll just use simple TEXT columns.

Let’s create the table and insert some sales data:

We have three locations: HQ, Downtown, and 1st Street.

We have two products, Coffee and Bagel, and we insert these sales with different sold_at values to represent the items being sold at different days and times.

There are some sales today, some yesterday, and some from the day before yesterday.

How does a GROUP BY work?

To illustrate how the GROUP BY clause works, let’s first talk through an example.

Imagine we had a room full of people who were born in different countries.

If we wanted to find the average height of the people in the room per country, we would first ask these people to separate into groups based on their birth country.

Once they were separated into their groups we could then calculate the average height within that group.

This is how the GROUP BY clause works. First we define how we want to group the rows together—then we can perform calculations or aggregations on the groups.

Multiple groups

We can group the data into as many groups or sub-groups as we want.

For example, after asking people to separate into groups based on their birth countries, we could tell each of those groups of countries to separate further into groups based on their eye color.

By doing this, we have groups of people based on the combination of their birth country and their eye color.

Now we could find the average height within each of these smaller groups, and we’d have a more specific result: average height per country per eye color.

GROUP BY clauses are often used for situations where you can use the phrase per something or for each something:

Average height per birth country
Total number of people for each eye and hair color combination
Total sales per product

Writing GROUP BY clauses

A GROUP BY clause is very easy to write—we just use the keywords GROUP BY and then specify the field(s) we want to group by:

This simple query groups our sales data by the location column.

We’ve done the grouping—but what do we put in our SELECT?

The obvious thing to select is our location—we’re grouping by it so we at least want to see the name of the groups we made:

The result is our three locations:

If we look at our raw table data (SELECT * FROM sales;), we’ll see that we have four rows with a location of HQ, two rows with a location of Downtown, and two rows with a location of 1st Street:

By grouping on the location column, our database takes these inputs rows and identifies the unique locations among them—these unique locations serve as our “groups.”

But what about the other columns in our table?

If we try to select a column like product that we didn’t group by…

…we run into this error:

The problem here is we’ve taken eight rows and squished or distilled them down to three.

We can’t just return the rest of the columns like normal—we had eight rows, and now we have three.

What do we do with the remaining five rows of data? Which of the eight rows’ data should be displayed on these three distinct location rows?

There’s not a clear and definitive answer here.

To use the rest of our table data, we also have to distill the data from these remaining columns down into our three location groups.

This means that we have to aggregate or perform a calculation to produce some kind of summary information about our remaining data.

Aggregations (COUNT, SUM, AVG)

Once we’ve decided how to group our data, we can then perform aggregations on the remaining columns.

These are things like counting the number of rows per group, summing a particular value across the group, or averaging information within the group.

To start, let’s find the number of sales per location.

Since each record in our sales table is one sale, the number of sales per location would be the number of rows within each location group.

To do this we’ll use the aggregate function COUNT() to count the number of rows within each group:

We use COUNT(*) which counts all of the input rows for a group.

(COUNT() also works with expressions, but it has slightly different behavior.)

Here’s how the database executes this query:

FROM sales — First, retrieve all of the records from the sales table
GROUP BY location — Next, determine the unique location groups
SELECT … — Finally, select the location name and the count of the number of rows in that group

We also give this count of rows an alias using AS number_of_sales to make the output more readable. It looks like this:

The 1st Street location has two sales, HQ has four, and Downtown has two.

Here we can see how we’ve taken the remaining column data from our eight independent rows and distilled them into useful summary information for each location: the number of sales.

SUM

In a similar way, instead of counting the number of rows in a group, we could sum information within the group—like the total amount of money earned from those locations.

To do this we’ll use the SUM() function:

Instead of counting the number of rows in each group we sum the dollar amount of each sale, and this shows us the total revenue per location:

Average (AVG)

Finding the average sale price per location just means swapping out the SUM() function for the AVG() function:

Working with multiple groups

So far we’ve been working with just one group: location.

What if we wanted to sub-divide that group even further?

Similar to the “birth countries and eye color” scenario we started with, what if we wanted to find the number of sales per product per location?

To do this all we need to do is add the second grouping condition to our GROUP BY statement:

By adding a second column in our GROUP BY we further sub-divide our location groups into location groups per product.

Because we’re now also grouping by the product column, we can now return it in our SELECT!

(I’m going to throw some ORDER BY clauses on these queries to make the output easier to read.)

Looking at the result of our new grouping, we can see our unique location/product combinations:

Now that we have our groups, what do we want to do with the rest of our column data?

Well, we can find the number of sales per product per location using the same aggregate functions as before:

As an Exercise For The Reader™: find the total revenue (sum) of each product per location.

Using functions in the GROUP BY

Next, let’s try to find the total number of sales per day.

If we follow a similar pattern as we did with our locations and group by our sold_at column…

…we might expect to have each group be each unique day—but instead we see this:

It looks like our data isn’t grouped at all—we get each row back individually.

But, our data is actually grouped! The problem is each row’s sold_at is a unique value—so every row gets its own group!

The GROUP BY is working correctly, but this is not the output we want.

The culprit is the unique hour/minute/second information of the timestamp.

Each of these timestamps differ by hours, minutes, or seconds—so they are each placed in their own group.

We need to convert each of these date and time values into just a date:

2020-09-01 08:42:33.085995 => 2020-09-01
2020-09-01 09:42:33.085995 => 2020-09-01

Converted to a date, all of the timestamps on the same day will return the same date value—and will therefore be placed into the same group.

To do this, we’ll cast the sold_at timestamp value to a date:

In our GROUP BY clause we use ::DATE to truncate the timestamp portion down to the “day.” This effectively chops off the hours/minutes/seconds of the timestamp and just returns the day.

In our SELECT, we also return this same expression and give it an alias to pretty up the output.

For the same reason we couldn’t return product without grouping by it or performing some kind of aggregation on it, the database won’t let us return just sold_at—everything in the SELECT must either be in the GROUP BY or some kind of aggregate on the resulting groups.

The result is the sales per day that we originally wanted to see:

Filtering groups with HAVING

Next let’s look at how to filter our grouped rows.

To do this, let’s try to find days where we had more than one sale.

Without grouping, we would normally filter our rows by using a WHERE clause. For example:

With our groups, we may want to do something like this to filter our groups based on the count of rows…

Unfortunately, this doesn’t work and we receive this error:

ERROR: aggregate functions are not allowed in WHERE

Aggregate functions are not allowed in the WHERE clause because the WHERE clause is evaluated before the GROUP BY clause—there aren’t any groups yet to perform calculations on.

But, there is a type of clause that allows us to filter, perform aggregations, and it is evaluated after the GROUP BY clause: the HAVING clause.

The HAVING clause is like a WHERE clause for your groups.

To find days where we had more than one sale, we can add a HAVING clause that checks the count of rows in the group:

This HAVING clause filters out any rows where the count of rows in that group is not greater than one, and we see that in our result set:

Just for the sake of completeness, here’s the order of execution for all parts of a SQL statement:

FROM — Retrieve all of the rows from the FROM table
JOIN — Perform any joins
WHERE — Filter rows
GROUP BY – Form groups
HAVING – Filter groups
SELECT – Select the data to return
ORDER BY – Order the output rows
LIMIT – Return a certain number of rows

Aggregates with implicit grouping

The last topic we’ll look at is aggregations that can be performed without a GROUP BY—or maybe better said they have an implicit grouping.

These aggregations are useful in scenarios where you want to find one particular aggregate from a table—like the total amount of revenue or the greatest or least value of a column.

For example, we could find the total revenue across all locations by just selecting the sum from the entire table:

So far we’ve done $19 of sales across all locations (hooray!).

Another useful thing we could query is the first or last of something.

For example, what is the date of our first sale?

To find this we just use the MIN() function:

(To find the date of the last sale just substitute MAX()for MIN().)

Using MIN / MAX

While these simple queries can be useful as a standalone query, they’re often parts of filters for larger queries.

For example, let’s try to find the total sales for the last day that we had sales.

One way we could write that query would be like this:

This query works, but we’ve obviously hardcoded the date of 2020-09-01.

09/01/2020 may be the last date we had a sale, but it’s not always going to be that date. We need a dynamic solution.

This can be achieved by combining this query with the MAX() function in a subquery:

In our WHERE clause we find the largest date in our table using a subquery: SELECT MAX(sold_at::DATE) FROM sales.

Then, we use this max date as the value we filter the table on, and sum the price of each sale.

Implicit grouping

I say that these are implicit groupings because if we try to select an aggregate value with a non-aggregated column like this…

…we get our familiar error:

GROUP BY is a tool

As with many other topics in software development, GROUP BY is a tool.

There are many ways to write and re-write these queries using combinations of GROUP BY, aggregate functions, or other tools like DISTINCT, ORDER BY, and LIMIT.

Understanding and working with GROUP BY’s will take a little bit of practice, but once you have it down you’ll find an entirely new batch of problems are now solvable to you!

If you liked this post, you can follow me on twitter where I talk about database things and how to succeed in a career as a developer.

FOR MORE DETAILS

How to Aggregate Data Using Group By in SQL

SQL allows the user to store more than 30 types of data in as many columns as required, so sometimes, it becomes difficult to find similar data in these columns. Group By in SQL helps us club together identical rows present in the columns of a table. This is an essential statement in SQL as it provides us with a neat dataset by letting us summarize important data like sales, cost, and salary.

What Is Group By in SQL?

The Group By statement is used to group together any rows of a column with the same value stored in them, based on a function specified in the statement. Generally, these functions are one of the aggregate functions such as MAX() and SUM(). This statement is used with the SELECT command in SQL.

The SQL Group By statement uses the split-apply-combine strategy.

Split: The different groups are split with their values.
Apply: The aggregate function is applied to the values of these groups.
Combine: The values are combined in a single row.

The SUM() aggregate function, which results in the arithmetic sum of the rows’ values, has been applied to the groups in the above illustration.

To better understand how SQL Order By is used, let’s get an insight into this statement’s syntax.

The Syntax for Using Group By in SQL

The columns to be retrieved are specified in the SELECT statement and separated by commas. Any of the aggregate functions can be used on one or more than one of the columns being retrieved.
The table being used is specified in the table_name parameter of the FROM statement.
There is an optional WHERE clause, which can be used to specify any condition according to which the rows are to be selected.

NOTE: The WHERE clause always precedes the ORDER BY statement.

The GROUP BY statement lets the database system know that we wish to group the same value rows of the columns specified in this statement’s column_names parameter.
The optional ORDER BY statement is used to sort the resulting table in ascending order based on the column specified in this statement’s column_name parameter.

NOTE: The ORDER BY statement always appears after the GROUP BY statement and is applied to the groups of rows formed.

Aggregate functions are one of the most vital parts of the SQL Group By statement, so let’s see what these are.

Aggregate Functions

The five aggregate functions that we can use with the SQL Order By statement are:

AVG(): Calculates the average of the set of values.
COUNT(): Returns the count of rows.
SUM(): Calculates the arithmetic sum of the set of numeric values.
MAX(): From a group of values, returns the maximum value.
MIN(): From a group of values, returns the minimum value.

Let’s apply the concepts stated above to the columns of a table to get an even clearer picture of the Group By statement.

Using Group By on a Single Column

Let’s take a sample “Employee_dept” table:

In this table, to find the average salary for each department’s employees, we’ll use the following query:

We have used the AVG() aggregate function for this purpose.

This will result in the following table:

There was no ORDER BY clause, so the table has no definite order of rows.

Let’s see what happens when we use the ORDER BY clause.

To find the number of employees from each city, this is what is required:

And it will result in the following table:

As we can see, the groups have been ordered in ascending order, according to the number of employees from each city.

The WHERE clause can also be used in the query, and it precedes the GROUP BY statement. Let’s see an example of this.

To find the maximum salary for each department with “Dept_ID” greater than 1003:

The WHERE clause will be applied to the rows before they are grouped.

The above query will result in the following:

Let’s see what happens when we use an aggregate function without the GROUP BY statement.

To find the total salary of all the employees:

As we can see, the count function on “Dept_ID” returns the total number of records in the table, and the sum function on “Salary” returns the arithmetic sum of all the employees’ salaries.

The SQL Group By statement can be applied to multiple columns of a table in a single query. Let’s see how this is done.

Using Group By on Multiple Columns

In the “Employee_dept” table, to find the number of employees from each city in all the departments, we’ll use the following query:

As we can see, the GROUP BY is applied to two columns.

This will result in the following table:

As we can see, each city and department combination is represented in the table, and the number of employees for each of these combinations has been calculated.

Using the Having Clause With Group By

The WHERE clause cannot be used to filter the grouped rows based on a condition. We have the HAVING clause for this purpose. The syntax for using this clause is:

The HAVING clause always precedes the ORDER BY statement in the query.

In the “Employee_dept” table, we’ll use the following query to find the cities with multiple employees.

As we can see, a condition has been applied to the group of cities formed by the GROUP BY statement. This will result in the following:

Only the cities with multiple employees are returned.

With this, we come to the end of this article about the popular Group By statement.

Next Steps

Data aggregation or summarization is one of the most important tasks and skills for anyone who manages data. From profit and loss statements to creating a perfect visualization of data, SQL Group By is the tool that allows us to do these kinds of tasks efficiently.

Now that you know how to aggregate and summarize data, it is time for you to start querying, manipulating, and visualizing all kinds of data to move forward in your journey to become an expert in SQL. If you liked this article and want to get certified, check out our Business Analyst Master’s Program as it will help you learn the A-Z of SQL as well.

Do you have any questions for us? Please mention them in the comment section of our “How to Aggregate Data Using Group By in SQL” article, and we’ll have our experts in the field answer them for you right away!

Our Software Development Courses Duration And Fees

Software Development Course typically range from a few weeks to several months, with fees varying based on program and institution.

Week6: Subqueries and Nested Queries

Understanding Correlated and Uncorrelated Subqueries in SQL

Sub-queries are queries within another query. The result of the inner sub-query is fed to the outer query, which uses that to produce its outcome. If that outer query is itself the inner query to a further query then the query will continue until the final outer query completes.

There are two types of subqueries in SQL however, correlated sub-queries and uncorrelated subqueries. Let’s take a look at these.

Uncorrelated Subquery

An uncorrelated subquery is a type of subquery where the inner query doesn’t depend upon the outer query for its execution. It can complete its execution as a standalone query. Let us explain uncorrelated subqueries with the help of an example.

Suppose, you have database “schooldb” which has two tables: student and department. A department will have many students. This means that the student table has a column “dep_id” which contains the id of the department to which that student belongs. Now, suppose we want to retrieve records of all students from the “Computer” department.

The sub-query used in this case will be uncorrelated sub-query since the inner query will retrieve the id of the computer department from the department table; the result of this inner query will be directly fed into the outer query which retrieves records of students from the student table where “dep_id” column’s value is equal to value retrieved by inner query.

The inner query which retrieves the id of the department using name can be executed as standalone query as well.

Correlated Sub-query

A correlated sub-query is a type of query, where inner query depends upon the outcome of the outer query in order to perform its execution.

Suppose we have a student and department table in “schooldb” as discussed above. We want to retrieve the name, age and gender of all the students whose age is greater than the average age of students within their department.

In this case, the outer query will retrieve records of all the students iteratively and each record is passed to the inner query. For each record, the inner query will retrieve average age of the department for the student record passed by the outer query. If the age of the student is greater than average age, the record of the student will be included in the result, and if not not. Let’s see this in action.

Preparing the Data

Let’s create a database named “schooldb”. Run the following SQL in your query window:

1 CREATE DATABASE schooldb;

The above command will create a database named “schooldb” on your database server.

Next, we need to create a “department” table within the “schooldb” database. The department table shall have three columns: id, name and capacity. To create department table, execute following query:

Next lets add some dummy data to the table so that we can execute our sub-queries. Execute the following to create 5 departments: English, Computer, Civil, Maths and History.

Next we need to create a “student” table within our database. The student table will have five columns: id, name, age, gender, and dep_id.

The dep_id column will act as the foreign key column and will have values from the id column of the department table. This will create a one to many relationship between the department and student tables. Execute following query to create student table.

And then the below to add 10 students to the table.

Notice that values in “dep_id” column of the student table exists in the id column of the department table.

Now, let us see examples of both correlated and uncorrelated sub-queries.

Uncorrelated Sub-query Example

Let us execute a uncorrelated sub-query which retrieves records of all the students who belong to “Computer” department.

The output of the above SQL will be:

You can see that there are two queries. The inner query retrieves id of the “Computer” department while the outer query retrieves student records with that id value in the dep_id column.

We know that in the case of uncorrelated subqueries the inner query can be executed as standalone query and it will still work. Let’s check if this is true in this case. Execute the following query on the server.

1 SELECT id from department WHERE name = ‘Computer’;

The above query will execute successfully and will return 2 i.e. the of the “Computer” department. This is a uncorrelated subquery.

Correlated Subquery Example

We know that in case of correlated sub-queries, the inner query depends upon the outer query and cannot be executed as a standalone query.

Lets execute a correlated sub-query that retrieves results of all the students with age greater than average age within their department as discussed above.

The output of the above query will be:

We know that in the case of a correlated subquery, the inner query cannot be executed as standalone query. You can verify this by executing the following inner query on it’s own:

The above query will throw an error.

Other small differences between correlated and uncorrelated subqueries are:

1 The outer query executes before the inner query in the case of a correlated subquery. On the other hand in case of an uncorrelated subquery the inner query executes before the outer query.

2 Correlated sub-queries are slower. They take M x N steps to execute a query where M is the records retrieved by outer query and N is the number of iteration of inner query. Uncorrelated subqueries complete execution in M + N steps.

5 SQL Subquery Examples

SQL subqueries are basic tools if you want to communicate effectively with relational databases. In this article, I provide five subquery examples demonstrating how to use scalar, multirow, and correlated subqueries in the WHERE, FROM/JOIN, and SELECT clauses.

A subquery, or nested query, is a query placed within another SQL query. When requesting information from a database, you may find it necessary to include a subquery into the SELECT, FROM , JOIN, or WHERE clause. However, you can also use subqueries when updating the database (i.e. in INSERT, UPDATE, and DELETE statements).

There are several types of SQL subqueries:

Scalar subqueries return a single value, or exactly one row and exactly one column.
Multirow subqueries return either: One column with multiple rows (i.e. a list of values), or Multiple columns with multiple rows (i.e. tables).

Correlated subqueries, where the inner query relies on information obtained from the outer query.

You can read more about the different types of SQL subqueries elsewhere; here, I want to focus on examples. As we all know, it’s always easier to grasp new concepts with real-world use cases. So let’s get started.

5 Subquery Examples in SQL

Let’s say we run an art gallery. We have a database with four tables: paintings, artists, collectors, and sales. You can see the data stored in each table below.

Now let’s explore this data using SQL queries with different types of subqueries.

Example 1 – Scalar Subquery

We’ll start with a simple example: We want to list paintings that are priced higher than the average. Basically, we want to get painting names along with the listed prices, but only for the ones that cost more than average. That means that we first need to find this average price; here’s where the scalar subquery comes into play:

Our subquery is in the WHERE clause, where it filters the result set based on the listed price. This subquery returns a single value: the average price per painting for our gallery. Each listed price is compared to this value, and only the paintings that are priced above average make it to the final output:

If this seems a bit complicated, you may want to check out our interactive SQL Basics course and brush up on your essential SQL skills.

Examples 2 – Multi Row Subquery

Now let’s look into subqueries that return one column with multiple rows. These subqueries are often included in the WHERE clause to filter the results of the main query.

Suppose we want to list all collectors who purchased paintings from our gallery. We can get the necessary output using a multi row subquery. Specifically, we can use an inner query to list all collectors’ IDs present in the sales table – these would be IDs corresponding to collectors who made at least one purchase with our gallery. Then, in the outer query, we request the first name and last name of all collectors whose ID is in the output of the inner query. Here’s the code:

And here’s the output:

Interestingly, we could get the same result without a subquery by using an INNER JOIN (or just JOIN). This join type returns only records that can be found in both tables. So, if we join the collectors and the sales tables, we’ll get a list of collectors with corresponding records in the sales table. Note: I have also used the DISTINCT keyword here to remove duplicates from the output.

Here’s the query:

You can read more about choosing subquery vs. JOIN elsewhere in our blog.

Example 3 – Multi Row Subquery with Multiple Columns

When a subquery returns a table with multiple rows and multiple columns, that subquery is usually found in the FROM or JOIN clause. This allows you to get a table with data that was not readily available in the database (e.g. grouped data) and then join this table with another one from your database, if necessary.

Let’s say that we want to see the total amount of sales for each artist who has sold at least one painting in our gallery. We may start with a subquery that draws on the sales table and calculates the total amount of sales for each artist ID. Then, in the outer query, we combine this information with the artists’ first names and last names to get the required output:

We assign a meaningful alias to the output of our subquery (artist_sales). This way, we can easily refer to it in the outer query, when selecting the column from this table, and when defining the join condition in the ON clause. Note: Databases will throw an error if you don’t provide an alias for your subquery output.

Here’s the result of the query:

So, within one short SQL query, we were able to calculate the total sales for each artist based on the raw data from one table (sales), and then join this output with the data from another table (artists).

Subqueries can be quite powerful when we need to combine information from multiple tables. Let’s see what else we can do with subqueries.

Example 4 – Correlated Subquery

The following example will demonstrate how subqueries:

Can be used in the SELECT clause, and
Can be correlated (i.e. the main or outer query relies on information obtained from the inner query).

For each collector, we want to calculate the number of paintings purchased through our gallery. To answer this question, we can use a subquery that counts the number of paintings purchased by each collector. Here’s the entire query:

Notice how the inner query in this example actually runs for each row of the collectors table:

The subquery is placed in the SELECT clause because we want to have an additional column with the number of paintings purchased by the corresponding collector.
For each record of the collectors table, the inner subquery calculates the total number of paintings purchased by a collector with the corresponding ID.

Here’s the output:

As you see, the output of the subquery (i.e. the number of paintings) is different for each record and depends on the output of the outer query (i.e. the corresponding collector). Thus, we are dealing with a correlated subquery here.

Check out this guide if you want to learn how to write correlated subqueries in SQL. For now, let’s have one more correlated subquery example.

Example 5 – Correlated Subquery

This time, we want to show the first names and the last names of the artists who had zero sales with our gallery. Let’s try to accomplish this task using a correlated subquery in the WHERE clause:

Here is what’s going on in this query:

The outer query lists basic information on the artists, first checking if there are corresponding records in the sales
The inner query looks for records that correspond to the artist ID that is currently being checked by the outer query.
If there are no corresponding records, the first name and the last name of the corresponding artist are added to the output:

In our example, we have only one artist without any sales yet. Hopefully, he’ll land one soon.

Week 7: Modifying Data

Modifying Data in SQL

In the realm of computer science, mastering the art of modifying data in SQL is an essential skill for managing databases effectively. This article will provide you with a comprehensive understanding of the fundamental concepts, techniques, and best practices required for modifying data in SQL. By exploring SQL data types and operations, you will learn how to insert, update, and delete data from tables. Further insights will be provided on altering data types in SQL tables and addressing common challenges, such as joining tables and handling locked records. Lastly, the article will delve into real-world applications, showcasing practical uses and case studies that demonstrate efficient data modification in SQL. So, let’s embark on this journey to enhance your database management skills and optimise your use of SQL.

Modifying Data in SQL: An Overview

When working with databases, you often need to add, update or remove data. In SQL (Structured Query Language), modifying data is a common and essential task, ensuring that the information remains accurate and up-to-date. In this article, you will learn about the fundamental concepts for modifying data in SQL, including data types, operations, and the most used SQL statements for data manipulation.

Fundamental Concepts for Modifying Data in SQL

To effectively modify data in SQL, you need to understand some basic concepts such as tables, columns, and rows. A database table consists of columns, which represent attributes, and rows, representing individual data records. Here, we’ll discuss the core SQL statements used for modifying data and offer guidance on their applications.

Understanding SQL Data Types and Operations

Different data types and operations are available in SQL that affect how you work with and modify data. Recognising these data types and associated operations is crucial to efficient data manipulation.

Let’s examine the most common SQL data types:

CHAR(n): A fixed-length character string with a maximum length of n characters
VARCHAR(n): A variable-length character string with a maximum length of n characters
INT: An integer (whole number) value
DECIMAL(p, s): A signed decimal number with a total of p digits and s digits after the decimal point
DATE: A date value represented in the format ‘YYYY-MM-DD’
TIMESTAMP: A combination of date and time values

Understanding how operations work with these data types is essential for modifying data in SQL:

Now let’s discuss an example of modifying data by changing the price of a product in a table named ‘Products’, which has three columns: ‘ID’, ‘ProductName’ and ‘Price’. First, you’ll need to understand the data type of the ‘Price’ column. To modify the data, you can use an UPDATE statement and the appropriate arithmetic operation, for instance, increasing the product’s price by 10%:

Learning how SQL data types and operations work together allows you to efficiently modify data and ensures you make the most of your database activities.

How to Modify Data in SQL: Step-by-Step Guide

Modifying data in SQL involves using specific SQL commands to insert, update, and delete data from tables. This step-by-step guide covers the essentials of modifying data in SQL, ensuring you can effectively manage your database records.

SQL Command to Modify Data in a Table

There are three primary SQL commands to modify data in a table – INSERT, UPDATE, and DELETE. Each command has its specific syntax and usage, allowing you to insert new rows, modify existing rows, and remove rows from the table. These commands belong to the Data Manipulation Language (DML) subset of SQL, enabling the alteration of data within database tables.

The insert command allows you to add new rows of data to a table. There are two primary syntaxes for the insert command (insert into) with column names and values, and (insert into) with Select.

The first method specifies the column names and corresponding values within the (insert into) statement. Its general form is as follows:

(insert into) table_ name (column1, column2, column3, … )

Values (value1, value2, value3, … );

For example, to insert a new product into the ‘Products’ table with columns ‘ID’, ‘ProductName’, and ‘Price’, you can use the following statement:

The second method, (insert into) with SELECT, inserts data from another table or query result into the target table. This method is useful for copying data between tables or inserting data based on specific criteria. Its general form is:

(insert into) target_ table (column1, column2, column3, … )

SELECT column1, column2, column3, …

FROM source_ table

WHERE condition;

For example, to insert a list of products with a price lower than £10 from a ‘Sale Products’ table, use the following statement:

The UPDATE command allows you to modify existing rows in a table based on specified conditions. Its usage consists of specifying the columns to be updated, the new values for each column, and the conditions the record must meet to be updated.

The general syntax for the UPDATE command is:

UPDATE table_ name

SET column1 = new_value1, column2 = new_value2, …

WHERE condition;

To update the price of a product in the ‘Products’ table, you would use the following statement:

Deleting Data from a Table

The DELETE command allows you to remove rows from a table based on specified conditions. The syntax is relatively straightforward, requiring only a reference to the table and the conditions for deletion.

The general syntax of the DELETE command is:

DELETE FROM table_ name

WHERE condition;

For instance, if you need to delete a product from the table ‘Products’, you can use the following statement:

By understanding and applying these SQL commands, you can efficiently manipulate data in your database tables, ensuring records stay accurate and up-to-date.

Modifying Data Types in SQL: Techniques and Best Practices

Altering data types in SQL tables is sometimes necessary as your database evolves to meet changing requirements. A number of techniques and best practices exist to ensure that you modify data types safely and efficiently, while maintaining data integrity and database performance.

Altering Data Types in SQL Tables

Modifying data types mainly involves changing the structure of a table without losing its existing data. The SQL command used for this purpose is the ALTER TABLE statement. It allows you to change the data type of a column, add or drop columns, and rename columns, among other operations.

The general syntax for modifying data types is as follows

Alter table table_ name

Alter column column_ name

[new_ data_ type] [constraints]

There are a few important considerations when altering data types in sql tables:

Ensure compatibility between the old and new data types to prevent data loss.
Test your changes in a development or staging environment before applying them to the production database.
Perform a backup of your database before making significant changes, so you can easily revert if needed.
Consider any potential performance implications of altering data types, particularly for large tables.

Changing Data Types Safely and Efficiently

To change data types without losing data and with minimal impact on performance, follow these recommendations:

Perform a compatibility check between the existing and new data types. If the conversion is not straightforward, you may need to create a temporary column with the desired data type. Once the data is successfully converted, you can drop the original column and rename the new column to the original name.
Create a copy of the table with the new column data types and constraints. Then, use the (insert into select) then copy data. This method can be more efficient for large tables, as the table can be accessed while data is being copied.
Consider database performance and indexing strategies when altering data types, as the change could affect query performance and storage requirements. Modify your indexes and query optimization techniques as needed to maintain optimal performance.
Perform proper testing and review before applying changes to the production database. Always create backups and test your modifications in a development or staging environment before applying them in production.

For example, to change the data type of the ‘Price’ column you can use the following sql statement:

Following these best practices will ensure smooth transitions when modifying data types, allowing your database to adapt to evolving business needs while maintaining performance and integrity.

Modifying Data in SQL Explained: Common Scenarios and Solutions

As you work with databases, you’ll come across various scenarios requiring data modification. In these situations, understanding best practices, solutions, and potential challenges can help you manage your SQL databases effectively and efficiently.

Dealing with Data Modification Challenges

There are numerous challenges that can arise when modifying data in SQL, such as dealing with locked records, managing data integrity, and safely performing modifications on related tables. Addressing these challenges requires understanding the underlying principles, techniques, and possible solutions.

Joining Tables for Data Modification

Multiple sources is essential for tasks like record updates. Techniques such as inner join, outer join, and cross join facilitate this, each chosen based on operational needs.

Consider the following tips when joining tables for data modification:

Select the appropriate join type based on the desired result and relationship between the tables. For example, use INNER JOIN when updating records that have matching entries in both tables, or LEFT JOIN when updating records in one table even if there are no matching entries in the other table.
Use aliases when working with multiple tables with similar column names to improve readability.
Ensure proper indexing on join conditions to improve query performance.
Whenever possible, use subqueries and CTE (Common Table Expressions) to break complex queries into smaller, more manageable parts.

For instance, suppose you want to update the price of all ‘Products’ based on a specific discount applied in a ‘discounts’ table. The following query demonstrates an inner join for this purpose:

Handling Locked Records during Data Modification

Locked records occur when two or more concurrent transactions try to modify the same data. These locks are designed to maintain data consistency and integrity. However, they can occasionally lead to challenges when performing updates or deletions during data modification. To handle locked records effectively, it is essential to understand the following concepts:

Transaction Isolation Levels dictate the extent of concurrency control enforced during transactions. Higher levels offer greater consistency but may lead to more locking, while lower levels prioritize concurrency, risking potential data inconsistency.
Deadlocks occur when two or more transactions are mutually waiting for each other’s completion, resulting in indefinite blocking. SQL Server autonomously identifies and resolves deadlocks by designating one transaction as the victim and aborting it, thereby enabling the remaining transaction(s) to progress.
Locking Hints offer precise control over locking behavior for specific operations. Hints like ROWLOCK, PAGELOCK, TABLOCK, and NOWAIT allow manipulation of locks at the row, page, or table level. Additionally, they enable immediate operation failure if a lock cannot be obtained.

Here are some techniques for handling locked records during data modification:

Avoid long-running transactions, as they can increase the likelihood of record locking. Try to modify data as quickly as possible and commit the transaction.
Break large modification operations into smaller, manageable chunks. This technique reduces the likelihood of locked records and eases database contention.
Use lock timeouts or apply locking hints to limit the impact of record locking, reducing the chances of blocking other transactions.
Optimize your queries and indexes to minimize the time that records remain locked and prevent deadlocks.

Implementing these techniques can help you prevent and handle locked records during data modification, ensuring smooth, efficient performance in your SQL databases.

Modifying Data in SQL Example: Real-World Applications

Data modification in SQL is extensively utilized across a spectrum of industries, spanning e-commerce, finance, healthcare, and logistics. Proficiency in data manipulation techniques is indispensable for the upkeep, enhancement, and analysis of information within SQL databases. In this section, we will explore practical applications of SQL data modification and examine case studies illustrating effective data modification in real-world scenarios.

Practical Uses of SQL Data Modification

SQL data modification plays a crucial role in numerous sectors, ensuring that information stored in databases is accurate, up-to-date, and reliable.

Understanding how to modify data effectively in SQL is essential for addressing these diverse industry requirements and maintaining a reliable, efficient data management system.

Case Studies on Efficiently Modifying Data in SQL

In this section, we examine case studies that highlight the importance of efficiently modifying data in SQL and discuss the techniques and best practices used in each scenario.

Case Study 1: An e-commerce platform wants to apply a time-limited promotion across all products in their database:

end date, and discount percentage.
Use the INSERT statement to add new promotions to the ‘Promotions’ table.
Implement an UPDATE statement with an INNER JOIN, linking the ‘Products’ and ‘Promotions’ tables, to apply the promotion pricing to the relevant products within the valid date range.
After the promotion ends, use another UPDATE statement with an INNER JOIN to revert product prices back to their original values.

Case Study 2: A healthcare provider needs to update patient records after a series of laboratory tests:

Create a new table ‘Lab Results’ with columns for patient ID, test date, test type, and result.
Perform an insert intro statement to incorporate the new laboratory test results into the ‘Lab Results’ table.
Use an update statement with an inner join between the ‘Patients’ and ‘Lab Results’ tables to update patient records with the latest laboratory test results.
Employ appropriate data validation and integrity checks to ensure accuracy when modifying patient records.

These case studies emphasize the significance of effective data modification in SQL, as well as the importance of understanding and implementing best practices for complex real-world applications.

Week 8: Advanced Query Techniques

Mastering Advanced SQL Techniques for Data Analysis

In today’s data-driven world, organizations are faced with vast amounts of data that require sophisticated analysis techniques to extract meaningful insights. SQL (Structured Query Language) has long been a staple for data analysis, enabling professionals to query databases and retrieve relevant information. However, SQL is not limited to basic querying; it offers a plethora of advanced techniques that can elevate your data analysis capabilities to new heights. In this blog post, we delve into the world of advanced SQL techniques for data analysis. We will explore essential concepts and techniques that go beyond the fundamentals, equipping you with the knowledge and skills to tackle complex analysis tasks, optimize queries, and uncover hidden patterns in your datasets.

Here are advanced SQL techniques for data analysis

These advanced sql techniques empower you to handle complex analysis tasks, extract valuable insights, and make informed decisions. Invest in learning and applying these techniques to elevate your data analysis skills and unlock the hidden potential of your datasets. Now, put your knowledge into practice and harness SQL’s full power for data analysis. Happy learning!

FAQs: Frequently asked questions

What are Common Table Expressions (CTEs) in SQL, and how can they improve query readability and maintainability?

Common Table Expressions (CTEs) in SQL allow you to define temporary result sets within a query, making complex queries more manageable. CTEs improve readability by breaking down queries into smaller, named, and derived tables that can be referenced multiple times within the same query.

25 Advanced SQL Query Examples

To master advanced sql, studying example queries is key. In this article, we’ll present 25 examples of medium to high complexity sql queries. These examples can help you refresh your knowledge or prepare for a sql interview.

Most examples in this article will use the following employee table as a basis. Only a few will utilize other tables, which will be explained alongside the respective example.

The advanced SQL courses will teach you concepts like window functions, recursive queries, and complex SQL reports. Create a free LearnSQL.com account and try our interactive courses without having to spend any money. Then, if you like what you’re learning, you can buy full access.

Ok, let’s dig into our advanced SQL queries!

Example #1 – Ranking Rows Based on a Specific Ordering Criteria

Sometimes we need to create a SQL query to show a ranking of rows based on a specific order criteria. In this example query, we will show a list of all employees ordered by salary (highest salary first). The report will include the position of each employee in the ranking.

Here’s the code:

In the above query, we use the function RANK(). It is a window function that returns each row’s position in the result set, based on the order defined in the OVER clause (1 for the highest salary, 2 for the second-highest, and so on). We need to use an ORDER BY ranking clause at the end of the query to indicate the order on which the result set will be shown.

If you want to know more about ranking functions in SQL, I recommend our article What Is the RANK() Function in SQL, and How Do You Use It?

Example #2 – List The First 5 Rows of a Result Set

The next SQL query creates a report with the employee data for the top 5 salaries in the company. This kind of report must be ordered based on a given criteria; in our example, the order criteria will again be salary DESC:

The previous query uses a common table expression (CTE) named employee _ranking to create a virtual table consumed in the main query. Within the CTE, a subquery employs the rank() function to determine the position of each row in the ranking, ordered by salary descending. Lastly, an order by clause arranges the result set by rank in ascending order.

Example #3 – List the Last 5 Rows of a Result Set

This query is similar to the top 5 query, but we want the last 5 rows. We only need to introduce a change in the type of order, i.e. using ASC instead of DESC. In the CTE, we will create a ranking column based on an ascendent order of salary (lowest salary first):

In the main query, we filter rows with ranking values less than or equal to 5 and then order them by ranking value.

Example #4 – List The Second Highest Row of a Result Set

Let’s suppose we’d like to obtain the data of the employee with the second highest salary in the company. We can apply a similar approach to our previous query:

Be the first to add a review.

Please, login to leave a review

Mastering SQL

Popular Courses

Archive

Working hours

Course Overview:

Course Structure:

Month 1: Foundations of SQL

Week 1: Introduction to Databases and SQL

Week 2: Basic SQL Syntax

Week 3: Filtering and Sorting Data

Week 4: Working with Multiple Tables

Month 2: Advanced SQL Queries

Week 5: Aggregating Data

Week 6: Subqueries and Nested Queries

Week 7: Modifying Data

Week 8: Advanced Query Techniques

Month 3: Database Design

Week 9: Introduction to Database Design

Week 10: Entity-Relationship Modeling

Week 11: Database Constraints

Week 12: Indexing and Performance Optimization

Month 4: Advanced Database Concepts

Week 13: Stored Procedures and Functions

Week 14: Triggers and Transactions

Week 15: Views and Materialized Views

Week 16: Security and Permissions

Month 5: Data Manipulation and Reporting

Week 17: Importing and Exporting Data

Week 18: Working with Dates and Times

Week 19: Advanced Reporting Techniques

Week 20: Introduction to Business Intelligence Tools

Month 6: Capstone Project and Final Review

Week 21-24: Capstone Project

Week 25-26: Final Review and Certification

Additional Resources:

Course Overview:

Course Structure:

Month 1: Foundations of SQL

Week 1: Introduction to Databases and SQL

Week 2: Basic SQL Syntax

Week 3: Filtering and Sorting Data

Week 4: Working with Multiple Tables

Month 2: Advanced SQL Queries

Week 5: Aggregating Data

Week 6: Subqueries and Nested Queries

Week 7: Modifying Data

Week 8: Advanced Query Techniques

Month 3: Database Design

Week 9: Introduction to Database Design

Week 10: Entity-Relationship Modeling

Week 11: Database Constraints

Week 12: Indexing and Performance Optimization

Month 4: Advanced Database Concepts

Week 13: Stored Procedures and Functions

Week 14: Triggers and Transactions

Week 15: Views and Materialized Views

Week 16: Security and Permissions

Month 5: Data Manipulation and Reporting

Week 17: Importing and Exporting Data

Week 18: Working with Dates and Times

Week 19: Advanced Reporting Techniques

Week 20: Introduction to Business Intelligence Tools

Month 6: Capstone Project and Final Review

Week 21-24: Capstone Project

Week 25-26: Final Review and Certification

Additional Resources:

Week 1: Introduction to SQL

Introduction to databases​

Types of Databases

Non-Relational Database

Introduction to PostgreSQL​

What is SQL?​

What is an RDBMS?​

What characterizes a relational database?​

Week 2: Basic SQL Syntax

SELECT Statement

SELECT statement with WHERE criteria

Using wildcards in the LIKE clause

SELECT statement with ORDER BY clause

SELECT statement with GROUP BY clause

Introduction to databases

Introduction to PostgreSQL

What is SQL?

What is an RDBMS?

What characterizes a relational database?