SQL Server LAG Function
SQL Server is one of the most popular and powerful relational databases of the modern age. It powers from small scale applications to massive enterprise-level applications.
Like many relational databases, SQL Server comes packed with lots of functions and tools that allow us to interact and manipulate the database. This can range from simple functions that fetches the maximum value or more complex queries that utilizes multiple database features.
One of the most powerful functions provided by SQL Server is the lag()
function. In this tutorial, we are going to introduce you to this function, what it does, how it works, and cover practical examples on how to use it.
SQL Server LAG Function
In SQL Server, the lag()
function is part of the SQL Server analytical functions. It allows us to access data from the previous rows in a current result set.
This removes the need for annoying and complex self-joins and subqueries.
A common use of the lag()
function is when you need to compare the current value with the value of the previous row. Or when calculating the difference between consecutive rows in a given table.
Let us dive deep into the world of SQL Server and learn more about this lag()
function.
Function Syntax
Let us start with the most basic features and explore the syntax of this function, the supported parameters and more.
The following shows the syntax of the lag()
function:
LAG (expression, offset, default) OVER (
[PARTITION BY partition_expression, ... ]
ORDER BY sort_expression [ASC | DESC], ...
)
In the syntax above, we start with the keyword LAG
followed by the parameters as shown:
expression
- this denotes the column or expression for which we wish to retrieve the previous value.offset
- this defines the number of rows back from the current row that you wish to retrieve. By default, the current value is set to1
.default
- this is an optional argument that allows us to specify a default value to be returned if there is no previous row available.partition by
- the partition by clause allows us to divide the rsult set into partitions and calculate the lag function within each partition. This is an optional clause and the function will function with or without it.order by
- specifies the order in which the rows are processed within each partition. You can also specify whether you want the order to be ascending or descending usingASC
andDESC
respectively.
Return Value
The LAG()
function returns the value from the previous row in the result set based on the specified offset and order.
NOTE: IF no previous row is available, for example for the first row in the result set/partition, the function returns the specified default
value or null
.
How it Works
How It Works
The following is a high-level overview of how the function operates in SQL Server.
- The function starts by dividing the data into paritions as specified by the
PARTITION BY
clause. If the clause is not specified, the function will treat the data as one single partition. - The function then processes each partition independently and the previous row reference resets at the begining of each partition.
- As you can guess, the data is sorted in each parition as specified by the
ORDER BY
clause. - For each row in a partition, the function looks back by the specified
offset
value and retrieves the value from the corresponding rows (order is preserved). - Once the function locates a previous row, it returns the value from that row. However, if no previous is found, the function returns the default value or null.
Example Usage
To solidify our understanding of this function, let us look at a real-world example.
Suppose we have a table called Sales
with the data as shown in the table below:
Suppose we wish to retrieve the previous sales quantity for each product, we can use the lag function as shown in the example below:
SELECT
SalesID,
SalesDate,
ProductID,
Quantity,
LAG(Quantity) OVER (PARTITION BY ProductID ORDER BY SalesDate) AS PreviousQuantity
FROM
Sales
ORDER BY
ProductID, SalesDate;
The query above should return the table as shown:
Handling Missing Data
It is important to handle cases where there’s no previous row to compare with. We can use the default
value instead of the function returning null
.
For example, we can use the value of 0
if there is no previous value as demonstrated in the query below:
SELECT
SalesID,
SalesDate,
ProductID,
Quantity,
LAG(Quantity, 1, 0) OVER (PARTITION BY ProductID ORDER BY SalesDate) AS PreviousQuantity
FROM
Sales
ORDER BY
ProductID, SalesDate;
You will notice that the function adds the value of 0
to the missing value instead of returning null.
Ranking within a Partition
To partition the data into various blocks and then assing rankings to the rows with the parition based on the specific order, we can use the parition by and order by clauses as demonstrated in the example below:
SELECT
SalesID,
SalesDate,
ProductID,
Quantity,
LAG(Quantity, 1, 0) OVER (PARTITION BY ProductID ORDER BY SalesDate) AS PreviousQuantity
FROM
Sales
ORDER BY
ProductID, SalesDate;
This should return similar results as the previous queries.
Tracking Changes in Values
We can also use the LAG function to track changes in values over time. Suppose you have a StockData
table with historical stock prices:
StockID | StockSymbol | Price | Date |
---|---|---|---|
1 | AAPL | 150.00 | 2023-01-01 |
2 | AAPL | 155.00 | 2023-01-02 |
3 | AAPL | 153.50 | 2023-01-03 |
4 | AAPL | 157.25 | 2023-01-04 |
5 | MSFT | 290.00 | 2023-01-01 |
6 | MSFT | 295.50 | 2023-01-02 |
7 | MSFT | 292.75 | 2023-01-03 |
8 | MSFT | 297.50 | 2023-01-04 |
To track daily price changes for each stock symbol, we can use the LAG function to calculate the price change:
SELECT
StockID,
StockSymbol,
Price,
Date,
LAG(Price, 1, 0) OVER (PARTITION BY StockSymbol ORDER BY Date) AS PreviousPrice,
Price - LAG(Price, 1, 0) OVER (PARTITION BY StockSymbol ORDER BY Date) AS PriceChange
FROM
StockData;
The query above will return the stock price details along with the previous day’s proce and the daily price change for each stock.
Boom!