Wednesday, September 3, 2014

Ranking Functions in SQL Server

In this article, I am giving a quick overview about Ranking Functions in SQL Server. After completing this article you will understand:

What is Ranking Function?
What are different Ranking Functions in SQL Server?

Please give your valuable suggestions and feedback to improve this article.

What is Ranking Function?

In SQL Server Ranking functions are used to give a rank to each record among all records or in a group.

Let’s understand what does this means. I will take a simple real world example to make you understand this. Let’s go to in your school days. Our class teacher maintains a register in which all student names are written. Each student is given a roll number based on their names for example if there are 50 students then each student will get one roll number between 1 to 50,  this is nothing but teacher has given a serial number to each student. When your result comes out, based on student marks now student will be given rank, which will be different from roll number (serial number) i.e roll number 10 came first, roll number 21 cane second and so on. This is nothing but teacher has given a rank based on the student marks. In some case suppose two students get the same marks, then same rank will be given to both the students.

All Ranking functions are non deterministic function, for more information see Deterministic and Non Deterministic functions in SQLServer.

What are the different Ranking Functions in SQL Server?

In SQL Server there are four types of Ranking Functions

1.       RANK
2.       DENSE_RANK
3.       ROW_NUMBER
4.       NTILE

Let’s understand each ranking function one by one.

Before we start, let's populate some data into our TEACHMESQLSERVER Database. If you don’t have database created then use the below query to create.

CREATE DATABASE TEACHMESQLSERVER

Let’s  create one table name as EMP in our database and populate data. To create and populate data you can run below set of SQL statements.

USE TEACHMESQLSERVER
GO
IF EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='EMP')
DROP TABLE EMP
GO
CREATE TABLE EMP(EMPNAME VARCHAR(MAX),DEPARTMENT VARCHAR(MAX), SALARY MONEY)
GO
INSERT INTO EMP VALUES ('GHANESH','IT',18000)
INSERT INTO EMP VALUES ('PRASAD','HR',2000)
INSERT INTO EMP VALUES ('GAUTAM','SALES',5000)
INSERT INTO EMP VALUES ('KUMAR','HR',85000)
INSERT INTO EMP VALUES ('SUMIT','IT',18000)
INSERT INTO EMP VALUES ('PRIYANKA','HR',25000)
INSERT INTO EMP VALUES ('KAPIL','SALES',5000)
INSERT INTO EMP VALUES ('GOLDY','HR',12000)
INSERT INTO EMP VALUES ('SACHIN','IT',21500)
INSERT INTO EMP VALUES ('OJAL','SALES',19500)
INSERT INTO EMP VALUES ('ISHU','HR',28000)
INSERT INTO EMP VALUES ('GAURAV','IT',15500)
INSERT INTO EMP VALUES ('SAURABH','SALES',20500)
INSERT INTO EMP VALUES ('MADHU','IT',18000)
INSERT INTO EMP VALUES('ATUL','SALES',35000)

GO
SELECT * FROM EMP
As you can see from the above query result set, data has been successfully populated. I will be using this data set in coming examples to explain all ranking functions.

RANK

Rank function returns the rank of each record among all records or in a partition. If two or more rows tie for a rank, each tied rows receives the same rank but RANK function does not always return consecutive integers. Don’t worry if second point is not clear to you I will explain this with example.

Syntax: RANK () OVER (PARTITION BY [column_list] ORDER BY [column_list])

PARTITION BY [Column_list] – This is optional if you want to divide the result set into partitions to which the function is applied then must give the column name, if you don’t use this clause then the function treats all records of the query result set as a single group.

ORDER BY [Column_list] – This is not optional it is must clause; this determines the order of the data before the function is applied.

Let’s understand the RANK function using example.

Problem 1:

You want to get each employee rank among all other employees based on their highest salary.

Solution:

You can use RANK function without PARTITION By Clause and ORDER BY Clause on SALARY in Descending order.

SQL Code:

SELECT EMPNAME, DEPARTMENT, SALARY , RANK() OVER(ORDER BY SALARY DESC) AS OVER_ALL_RANK_BY_SALARY FROM EMP

Output:


Explanation:

As you can see from the result set each employees has given a rank based on their salary. Employee KUMAR has the highest Salary rank is 1 and PRASAD has the lowest salary rank is 15. If you notice Employee MADHU, SUMIT and GHANESH have given rank 8 because all three has the same salary. Employee GAURAV has given 11 because 8 rank can be given to a least one person but two more employees shared the same rank so next two ranks 9 and 10 will not be given to any employee. This is the problem with the RANK Function it does not always return consecutive integers.

Problem 2:

You want to get each employee rank in their department based on their highest salary.

Solution:

You can use RANK function with PARTITION BY Clause on DEPARTMENT
 and ORDER BY Clause on SALARY in Descending order.

SQL Code:

SELECT EMPNAME, DEPARTMENT, SALARY , RANK() OVER(PARTITION BY DEPARTMENT ORDER BY SALARY DESC) AS RANK_IN_DEP_BY_SALARY FROM EMP

Output:

Explanation:

As you can see from the result set each employees has given a rank based on their salary in their department. In HR Department every employee has distinct salary so each has given different rank but in IT Department three employees has same salary so they has given rank 2 and next employee gets rank 5.This is the problem with the RANK Function it does not always return consecutive integers.

DENSE_RANK

RANK function does not always return consecutive integers to overcome this problem we have another Ranking function in SQL Server which is known as DENSE_RANK. DENSE_RANK function Returns the rank of rows within the partition of a result set or within all records, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come from the previous row. 

Syntax: DENSE_RANK () OVER (PARTITION BY [column_list] ORDER BY [column_list])

PARTITION BY [Column_list] – This is optional if you want to divide the result set into partitions to which the function is applied then you must give the column name, if you don’t use this clause then the function treats all records of the query result set as a single group.

ORDER BY [Column_list] – This is not optional it is a must clause; this determines the order of the data before the function is applied.

Let’s understand the DENSE_RANK function using the old examples.

Problem 1:

You want to get each employee rank without gap among all other employees based on their highest salary.

Solution:

You can use DENSE_RANK function without PARTITION By Clause and ORDER BY Clause on SALAY in Descending order.

SQL Code:

SELECT EMPNAME, DEPARTMENT, SALARY , DENSE_RANK() OVER(ORDER BY SALARY DESC) AS OVER_ALL_RANK_BY_SALARY FROM EMP

Output:

Explanation:

As you can see from the result set each employees has given a rank based on their salary. Whoever is having same salary has been given same rank i.e 8 and 11 but there is no gap in the ranking like RANK function.

Problem 2:

You want to get each employee rank without any gap in their department based on their highest salary.

Solution:

You can use DENSE_RANK function with PARTITION BY Clause on DEPARTMENT and ORDER BY Clause on SALAY in Descending order.

SQL Code:

SELECT EMPNAME, DEPARTMENT, SALARY , DENSE_RANK() OVER(PARTITION BY DEPARTMENT ORDER BY SALARY DESC) AS RANK_IN_DEP_BY_SALARY FROM EMP

Output:

Explanation:

As you can see from the result set each employees has given a rank based on their salary in their department. In HR Department every employee has distinct salary so each has given different rank but in IT Department three employees has same salary so they has given rank 2 and next Employee gets rank as 3. 

ROW_NUMBER

ROW_NUMBER function Returns the serial number of each record within a partition of a result set or within all records. It starts serial number as 1 for the first record and for next records it adds one. We get unique serial number for each record if using ROW_NUMBER function. If we are using ROW_NUMBER function within a partition of a result set then it starts with 1 in each partition.

Syntax: ROW_NUMBER () OVER (PARTITION BY [column_list] ORDER BY [column_list])

PARTITION BY [Column_list] – This is optional if you want to divide the result set into partitions to which the function is applied then you must give the column name, if you don’t use this clause then the function treats all records of the query result set as a single group.

ORDER BY [Column_list] – This is not optional it is a must clause; this determines the order of the data before the function is applied.

Let’s understand the ROW_NUMBER function with simple examples.

Problem 1:

You want to get serial number for each employee among all employees based on their highest salary.

Solution:

You can use ROW_NUMBER function without PARTITION By Clause and ORDER BY Clause on SALAY in Descending order.

SQL Code:

SELECT EMPNAME, DEPARTMENT, SALARY, ROW_NUMBER() OVER(ORDER BY SALARY DESC) AS ROW_NUMBER_BY_SALARY FROM EMP

Output: 

Explanation:

As you can see from the result set each employees has given a serial number based on their salary among all employees. Serial number starts with 1 and goes till 15 (because we have total 15 records in EMP table).Whoever is having same salary has not given same serial number like DENSE_RANK function.

Problem 2:

You want to get serial number for each employee in their department based on their highest salary.

Solution:

You can use ROW_NUMBER function with PARTITION BY Clause on DEPARTMENT and ORDER BY Clause on SALARY in Descending order.

SQL Code:

SELECT EMPNAME, DEPARTMENT, SALARY, ROW_NUMBER() OVER(PARTITION BY DEPARTMENT ORDER BY SALARY DESC) AS ROW_NUMBER_IN_DEP_BY_SALARY FROM EMP

Output: 

Explanation:

As you can see from the result set each employee has given a serial number based on their salary in their department and it’s unique, it starts with 1 and goes till 5 because each department has max 5 employees.

NTILE

NTILE function distributes all your records in an ordered groups with a specified group number. The groups are numbered, starting at 1 for the first group and adds 1 for the next group. For each record, NTILE returns the number of the group to which the record belongs.

Syntax: NTILE (Interger_Expression) OVER (PARTITION BY [column_list] ORDER BY [column_list])

Integer_Expression – It is a positive integer constant expression that specifies the number of groups into which each partition must be divided

PARTITION BY [Column_list] – This is optional if you want to divide the result set into partitions to which the function is applied then you must give the column name, if you don’t use this clause then the function treats all records of the query result set as a single group.

ORDER BY [Column_list] – This is not optional it is a must clause; this determines the order of the data before the function is applied.

Let’s understand the NTILE function with simple examples.

Problem 1:

You want to divide all employees into 3 groups based on their highest salary.

Solution:

You can use NTILE function without PARTITION By Clause and with Inter_expression equals to 3 and ORDER BY Clause on SALAY in Descending order.

SQL Code:

SELECT EMPNAME, DEPARTMENT, SALARY , NTILE(3) OVER(ORDER BY SALARY DESC) AS GROUP_NUMBER_BY_SALARY FROM EMP

Output: 

Explanation:

As you can see from the result set there are total 15 records in EMP table. All records have been divided into three groups. Each group has it group number with 5 records.

Problem 2:

You want divide each employee in their department based on their highest salary into 2 groups.

Solution:

You can use NTILE function with INTEGER EXPRESSION value equal to 2, PARTITION BY Clause on DEPARTMENT and ORDER BY Clause on SALAY in Descending order.

SQL Code:

SELECT EMPNAME, DEPARTMENT, SALARY , NTILE(3) OVER(PARTITION BY DEPARTMENT ORDER BY SALARY DESC) AS GROUP_NUMBER_IN_DEP_BY_SALARY FROM EMP

Output:  

Explanation:

As you can see from the result set there are total 3 Departments, each department has 5 records in it. All records in each department have been divided into 3 groups. Each group has it group number with its records.

If you liked the article hit the Facebook like button.

Keep Learning! 

1 comment:

Anonymous said...

This is the best article on Ranking functions I have read. Thanks a lot. Keep posting.