In this post, we will talk about how to group rows using the GROUP BY clause, filtering groups, and the difference between the WHERE and HAVING clauses.
What is group by clause?
Now, let's delve into the definition of the GROUP BY clause. The GROUP BY clause is utilized to group a selected set of rows into a set of summary rows based on the values of one or more columns or expressions. It is always employed in conjunction with one or more aggregate functions.
There are various aggregate functions available in SQL Server, such as SUM(), AVERAGE(), COUNT(), MIN(), MAX(), etc.
Let's understand them through an example. We have the TblOrganizationEmployee table, which includes columns like EmployeeName, EmployeeSalary, Country, Address, and DepartmentId. Additionally, there's the Department table containing columns for Id and DepartmentName.
The relationship between TblOrganizationEmployee and Department is established using the foreign key DepartmentId. If you require the SQL script for these tables, it can be found at the end of the post.
If you examine the TblOrganizationEmployee table, you'll notice the EmployeeSalary column.
Now, suppose I want to calculate the total salary being paid within my organization. Naturally, if you were to do this manually, you would take each salary value and add them together. However, in SQL Server, we have an aggregate function called SUM(), which we can utilize. Here's an example query:
select SUM(EmployeeSalary) from [dbo].[TblOrganizationEmployee]
that gives us the total salary.
Similarly, if you want to say let’s say, for example, I want to find out total employees in my organization.
select count(*) from [dbo].[TblOrganizationEmployee]
I want the total salary that I’m paying to my employees by country, how do we achieve this? Using group by.
If you were to do it manually, you would group the records by country, then sum the salaries within each group. Essentially, you'd take each set of records for Australia, for example, and add up their salaries. The same process would apply for records from India, and so on.
This is precisely why we use the GROUP BY clause. Let's delve into how to use it.
Select Country, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
Group by Country
Query Explantion
If you examine the query we've written, we're utilizing the SUM() aggregate function to calculate the total salary paid to our employees.
In our SELECT list, we require both the country and the total salary. Since there isn't a predefined column name for the total salary, we assign one using 'AS', naming it 'totalsalary'.
Now, let's inspect the query: SELECT country, the sum of salary FROM TblOrganizationEmployee.
What we're essentially specifying is that we want the country and the total salary, but we also need to indicate that we want to group the results by country. This is accomplished with the 'GROUP BY' clause. So, we GROUP BY country and then calculate the total salaries for each country. When we execute the query, we should observe the same output.
Let’s see what’s going to happen if I remove this group by clause and then execute this query.
Select Country, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
This query will produce an error because you're selecting the country column and the sum of salaries without specifying how to group them by country and then total the salaries. If I exclude the country column, it would make sense because I'd be asking for the total salaries across the entire table. However, including the country column requires specifying that the results should be grouped by country. Therefore, if I execute this query as it is, I will receive an error.
“Msg 8120, Level 16, State 1, Line 4
Column ‘TblOrganizationEmployee.Country’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.”
Now let’s remove the sum aggregate function in the select list and execute the query.
Select Country, EmployeeSalary as TotalSalary
from TblOrganizationEmployee
Group by Country
“Msg 8120, Level 16, State 1, Line 8you will get below error
Column ‘TblOrganizationEmployee.EmployeeSalary’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.”
The columns you include in the select list must adhere to one of two conditions: either they should have an aggregate function applied to them, or they should be included in the group by clause. If neither of these conditions is met, you will encounter the error mentioned above.
Group by based on multiple columns
We want to further group the data by DepartmentId within each country. For instance, in Australia, we aim to ascertain the total salary for employees in each department: Marketing, Research and Development, Accounting, and Finance. Hence, we are employing multiple column groupings, initially by country and subsequently by DepartmentId.
It’s straightforward. In the select list, we should include three columns: country, TotalSalary, and departmentId. Additionally, the departmentId column should be specified in the group by clause. This instructs the SQL server engine to first aggregate the salaries by country and then further group them by departmentId, providing the desired breakdown in that sequence.
Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
group by Country, DepartmentId
When we execute this query, you will see the output. However, if you observe our output, it's not sorted. Therefore, we used the ORDER BY clause to arrange the result.
Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
group by Country, DepartmentId
order by TotalSalary
Now we have seen we can group based on multiple columns.
How to use multiple aggregate functions?
If it's possible to use multiple aggregate functions, let's understand with an example. We are showing the total salary by country and DepartmentId. Let's say I want the total number of employees as well.
Now, if you remember, if you want to find out the total number of employees we can use COUNT().
If you look at the TblOrganizationEmployee table, we have 11 rows there.
And if I want to find the total number of employees within my organization, I can say COUNT(*) from TblOrganizationEmployee and we get the total count, which is 11.
For performance reasons, instead of using *, you can specify a specific column name, such as EmployeeName, to get the same result.
select count(EmployeeName) from [dbo].[TblOrganizationEmployee]
So what do we want? We not only want total salaries by country and DepartmentId, but we also want the total number of employees. So, if you want the total number of employees, we use the COUNT() aggregate function.
Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
group by Country, DepartmentId
order by TotalSalary
All right, so we have seen how to use multiple aggregate functions as well.
How to filter the groups?
But let’s say I want to see the Marketing (i.e., DepartmentId=1) department salaries. Now, usually to filter the rows, we can use the WHERE clause, where you can say DepartmentId is equal to 1.
Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
where DepartmentId=1
group by Country, DepartmentId
order by TotalSalary
The other way to get the same result is instead of using the where clause, we can use the having clause, but the having clause should come after the group by.
Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
group by Country, DepartmentId
having DepartmentId=1
order by TotalSalary
Difference between having and where clause
Now let’s see what the difference between these two queries is. When you use the WHERE clause, only DepartmentId=1 records are retrieved, and then they are grouped.
But when you use the HAVING clause, all the rows from the table TblOrganizationEmployee are retrieved, they are grouped by Country, DepartmentId, and then only DepartmentId=1 groups are shown.
The WHERE clause filters rows before aggregations, before groupings are performed, whereas the HAVING clause filters groups after the aggregations are performed.
Another difference is that you can use the HAVING clause only with the SELECT statement, whereas the WHERE clause can be used with other statements like INSERT and UPDATE as well.
Aggregate functions cannot be used in the WHERE clause. What do we mean by aggregate functions cannot be used in the WHERE clause? Let’s understand with an example.
select * from [dbo].[TblOrganizationEmployee] where count(id)>5
Excute above query you will get the error
“Msg 147, Level 15, State 1, Line 11
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.”
You cannot use aggregate functions with the WHERE clause, whereas aggregate functions can be used in the HAVING clause.
Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
group by Country, DepartmentId
having COUNT(EmployeeName)=1
order by TotalSalary
Table Script
/****** Object: Table [dbo].[Department] Script Date: 09/22/2020 6:06:10 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Department](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Department] [nvarchar](max) NULL,
CONSTRAINT [PK_Department] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
)
GO
/****** Object: Table [dbo].[TblOrganizationInfo] Script Date: 09/22/2020 6:06:11 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[TblOrganizationInfo](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Location] [nvarchar](50) NULL,
[TravelAgent] [nvarchar](50) NULL,
[BookingAmount] [float] NULL,
CONSTRAINT [PK_TblOrganizationInfo] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
)
GO
SET IDENTITY_INSERT [dbo].[Department] ON
GO
INSERT [dbo].[Department] ([Id], [Department]) VALUES (1, 'Marketing')
GO
INSERT [dbo].[Department] ([Id], [Department]) VALUES (2, 'Research and Development')
GO
INSERT [dbo].[Department] ([Id], [Department]) VALUES (3, 'Accounting and Finance')
GO
SET IDENTITY_INSERT [dbo].[Department] OFF
GO
SET IDENTITY_INSERT [dbo].[TblOrganizationInfo] ON
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (1, 'France', 'Michael', 2000)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (2, 'USA', 'Jackson', 2200)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (3, 'UK', 'Jacob', 3000)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (4, 'India', 'Luke', 2500)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (5, 'USA', 'Michael', 1500)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (6, 'France', 'Jacob', 1000)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (7, 'India', 'Luke', 1200)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (8, 'USA', 'Jacob', 1700)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (9, 'UK', 'Michael', 1900)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (10, 'France', 'Jacob', 2100)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (11, 'USA', 'Luke', 2500)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (12, 'India', 'Michael', 3000)
GO
SET IDENTITY_INSERT [dbo].[TblOrganizationInfo] OFF
GO
Read Similar Articles
- Solve ValueKind = Object : {} When Using A dynamic Object In C#
- How we can reset second dropdown after changin first dropdown value in Flutter?
- Solved - Uncaught Runtime error: Cannot read properties of null (reading 'useRef')
- Free Online Sample Rest API URL For Testing -[No key Needed]
- Using VBA to get data from API with bearer token to Excel
- Difference between View and Indexed View or Materialized
- Solved Error : Could not load file or assembly 'System.Runtime.CompilerServices.Unsafe'