Skip to main content
Reading Time: 7 minutes

What is SQL and why does it matter?

Data is nowadays often considered to be the ‘new gold’ (Von Ditfurth et al., 2018). Companies wish to gather as much data as possible to increase their business performance and/or to understand more of what their customers want. However, only gathering this data is not enough to profit from it. Just like raw materials, like gold, data needs to be processed for it to become information and therefore actually valuable to the company. Knowing how data is gathered, stored, accessed, processed, and displayed is key to helping companies work with data. 

SQL plays a big part in this general process. Not necessarily with gathering or transforming data into information, but with accessing and displaying information. It is a widely used programming language for querying relational databases. In this article, we focus on the use of databases and how SQL is used with databases. By mastering these advanced read queries, database professionals can unlock the full potential of SQL Server for sophisticated data analysis and reporting.

By understanding how to use (T-)SQL, you become familiar with how to access and filter data, which is a valuable skill to have when you wish to work with data in your career.

Do you wish to read more about the importance of (big) data and how it affects decision quality, then make sure to read Rick’s article on Big Data’s Impact On Decision Quality In Organizational Settings.

Introduction to T-SQL

T-SQL is an extension of SQL (Structured Query Language) used by Microsoft SQL Server. T-SQL provides a comprehensive set of features for querying and manipulating data stored in SQL Server databases. Advanced read queries in T-SQL enable users to extract valuable insights from complex datasets by leveraging advanced querying techniques and analytical functions.

How do we query data from databases?

A data warehouse consists of numerous (relational) databases that allow for data to be grouped into tables and set relationships between tables. The way these tables are related is called cardinality of which there are three types: one-to-one, one-to-many, and many-to-many. These tables are designed, maintained, and utilized by users with software to update and query data. Such software is referred to as a DBMS (Database Management System) and is often hosted on a server that serves a client (user) and accesses the requested data from the database. Each table is given a name and is divided into fields (columns) and records (rows). To organize, store, update, delete, and retrieve data from a database, a query language (like T-SQL) is used. 

What are PK’s and FK’s? And why do they matter?

As was mentioned previously, tables consist of fields (columns). Examples of a field could be cust_id, order_id, phone_number, zip_code, or date_of_birth. In the case of a relational database, we want to establish a relationship between tables. But to accomplish this, it is important to decide on what fields the relationships will be based. Additionally, we want those fields to be unique because otherwise, we will run into issues like duplicate records, data integrity problems, performance issues, indexing problems, etc. This is where Primary Keys (PK’s) come in. A Primary Key is one or multiple fields in a table, that uniquely identifies each record in the table. Without PK’s creating relational databases would either be impossible or very inefficient. A Foreign Key (FK) is one or multiple fields in a table that establishes a link between data in two tables. A Foreign Key refers to a Primary Key from another table. The table of the Primary Key is in this case called the parent table and the table with the foreign key is called the child table. 

Subqueries, what are they and what types are there?

Subqueries are nested queries that are embedded within another query, allowing for more complex and precise data retrieval. They are enclosed within parentheses and can be used in various parts of a SQL statement, such as the SELECT, FROM, WHERE, and HAVING clauses. Subqueries work by executing the inner query first and using its result set as input for the outer query. This enables filtering, aggregation, and correlation of data based on conditions specified in the subquery. There are several types of subqueries, including scalar subqueries, which return a single value, and row subqueries, which return one or more rows. Additionally, correlated subqueries are those that reference columns from the outer query, allowing for dynamic filtering based on the outer query’s results. Subqueries provide a powerful mechanism for performing complex data manipulations and are essential for writing efficient and expressive SQL queries.

What are JOINs?

JOINs are used to combine rows from two or more tables based on related columns between them. The most common types of joins include INNER JOIN, OUTER JOIN, LEFT JOIN, and RIGHT JOIN. INNER JOIN returns only the rows where there is a match in both tables based on the join condition. OUTER JOIN includes rows from both tables even if there is no match. LEFT JOIN returns all rows from the left table and the matched rows from the right table. Similarly, RIGHT JOIN returns all rows from the right table and the matched rows from the left table. These different types of joins allow for versatile data retrieval and analysis, facilitating complex queries to meet various business requirements.

Performance considerations

As is with business, time is money and efficiency is key. The same can be said for writing and executing SQL queries. Performance considerations when using T-SQL in practice are crucial for optimizing query execution. SQL Server processes queries, including joins, at runtime, which can consume significant system resources. Therefore, it’s essential to minimize unnecessary joins, as each additional join adds complexity and can degrade performance. Furthermore, creating indexes for foreign key columns can dramatically improve query performance by facilitating efficient data retrieval. Indexes allow SQL Server to quickly locate and access the relevant data, reducing the need for extensive table scanning and improving overall query execution speed. By carefully managing joins and leveraging indexes effectively, you can enhance T-SQL performance and ensure efficient data processing in SQL Server environments.

The following tips can also help increase the performance of your queries:

  • Use WHERE clause efficiently: Utilize the WHERE clause to filter rows early in the query execution process, reducing the amount of data processed by subsequent operations.
  • Avoid using SELECT *: Instead of selecting all columns with SELECT *, specify only the columns needed for the query. This reduces the amount of data transferred and processed, improving performance.
  • Limit the use of DISTINCT: Minimize the use of DISTINCT as it requires sorting and removing duplicates, which can be resource-intensive. Consider alternative approaches such as grouping or using more specific filtering conditions.
  • Use table variables or temp tables: Table variables and temporary tables can impact performance, especially for large datasets. Evaluate the performance implications of using table variables versus temporary tables and choose the appropriate option based on your specific requirements.
  • Avoid nested and correlated subqueries: Minimize the use of nested and correlated subqueries, as they can be inefficient and difficult to optimize. Consider using JOINs or derived tables instead for better performance.

Cheat sheet

Function Explanation Example
SELECT Retrieves data from one or more tables. SELECT column1, column2 FROM table_name;
DISTINCT Returns unique values from a column. SELECT DISTINCT column1 FROM table_name;
COUNT() Counts the number of rows or non-null values in a column. SELECT COUNT(*) FROM table_name;
SUM() Calculates the sum of values in a column. SELECT SUM(column1) FROM table_name;
AVG() Calculates the average of values in a column. SELECT AVG(column1) FROM table_name;
MIN() Finds the minimum value in a column. SELECT MIN(column1) FROM table_name;
MAX() Finds the maximum value in a column. SELECT MAX(column1) FROM table_name;
GROUP BY Groups rows that have the same values into summary rows. SELECT column1, COUNT(*) FROM table_name GROUP BY column1;
HAVING Adds conditions to the GROUP BY clause. SELECT column1, COUNT(*) FROM table_name GROUP BY column1 HAVING COUNT(*) > 1;
ORDER BY Sorts the result set in ascending or descending order. SELECT column1 FROM table_name ORDER BY column1 ASC;
LIMIT Limits the number of rows returned in a result set. SELECT column1 FROM table_name LIMIT 10;
TOP Limits the number of rows returned in a result set. SELECT TOP 10 column1 FROM table_name;
ROW_NUMBER() Assigns a unique sequential integer to each row. SELECT column1, ROW_NUMBER() OVER (ORDER BY column1) AS row_num FROM table_name;
RANK() Assigns a unique integer to each row, with duplicate values assigned the same rank. SELECT column1, RANK() OVER (ORDER BY column1) AS rank_num FROM table_name;
WHERE Here you can write a certain condition that must be met. SELECT * FROM CustomersWHERE Country = ‘USA’;
INNER JOIN Combines rows from two tables based on a specified condition, returning only the rows where there is a match in both tables. SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
OUTER JOIN  Combines rows from two tables based on a specified condition, including all rows from both tables even if there is no match, with NULL values filling in the unmatched columns. SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
LEFT OUTER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
LEFT JOIN  Returns all rows from the left table and the matched rows from the right table, with NULL values for unmatched rows in the right table. SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
LEFT JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
RIGHT JOIN  Returns all rows from the right table and the matched rows from the left table, with NULL values for unmatched rows in the left table. SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
RIGHT JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
UNION Combines the results of two or more SELECT statements into a single result set, removing duplicate rows by default. SELECT City FROM Customers
UNION
SELECT City FROM Suppliers;
INTERSECT Returns the intersection of the results of two SELECT statements, which are the rows that appear in both result sets.  SELECT City FROM Customers
INTERSECT
SELECT City FROM Suppliers;
CREATE Used to create database objects like tables, views, indexes, etc. CREATE TABLE Employees (
EmployeeID int,
LastName varchar(255),
FirstName varchar(255),
);
INSERT INTO  Adds one or more records to a table. INSERT INTO Customers (CustomerName, ContactName, Country)
VALUES (‘Alfreds Futterkiste’, ‘Maria Anders’, ‘Germany’);
UPDATE Modifies existing records in a table based on a specified condition. UPDATE Customers
SET ContactName = ‘Alfred Schmidt’, City = ‘Hamburg’
WHERE CustomerID = 1;
DELETE Removes one or more records from a table based on a specified condition. DELETE FROM Customers
WHERE CustomerID = 1;

If you wish to have a more extensive explanation and examples of some SQL queries, make sure to visit w3schools.com. Keep in mind that not every function operates similarly in T-SQL and vice versa. 

 

Literature

Von Ditfurth, J., Aholt, H., & Deloitte. (2018). Data is the new gold: The future of real estate service providers. Data Is the New Gold.

Leave a Reply