LINQ is actually a shortened form of Language Integrate Query. LINQ defines features that can be used to retrieve data from any type of data source. This is why LINQ is most important because data itself is the basic foundation of any program and using LINQ, data can be easily retrieve from different types of Data Sources.

How to Use Group by in Linq?

In this post, I will explain the GROUP BY clause in LINQ to SQL. Grouping is a powerful feature that organizes a collection into groups, where each group is associated with a key. If you're a beginner, don't worry, this post is for you.

I have created two tables: Employee and Department.

Employee Table

Id Name Country Address Salary Department
1 Du monde entier France 67, rue des Cinquante Otages 5000 1
2 Chop-suey Chinese France Hauptstr. 29 2500 2
3 France restauration USA 54, rue Royale 2000 3
4 Howard Snyder UK 2732 Baker Blvd. 2400 2
5 Carlos Hernández USA Carrera 22 con Ave. Carlos Soublette #8-35 3500 1
6 France restauration UK 54, rue Royale 3200 1
7 John Steel USA 2732 Baker Blvd. 2800 2
8 Renate Messner UK Carrera 22 con Ave. Carlos Soublette #8-35 2700 3
9 Horst Kloss UK Rua da Panificadora, 12 2400 3
10 Guillermo Fernández France Carrera 22 con Ave. Carlos Soublette #8-35 3500 1
11 Georg Pipps France 54, rue Royale 3200 3
12 Isabel de Castro USA Av. del Libertador 900 2800 3
13 Bernardo Batista UK Grenzacherweg 237 2700 1

Department

Id Department
1 Marketing
2 Research and Development
3 Accounting and Finance

I want to calculate the total salary that I'm paying to employees by department. How can we achieve this using the GROUP BY clause? 

If we were to do it manually, we would take each record for the Marketing Department and then add up the total salary. Similarly, we would do the same for each record in the Research and Development department. Essentially, we first group the records by department and then sum the salary column within each group. 

That's why we are going to use the GROUP BY clause.

Sql Query:

Select Department, SUM(Salary) as TotalSalary
from Employee
Group by Department

Linq Query:

var results = from r in Employees
                         group r by r.Department into gp
                         select new
                         {
                             DepartmentId = gp.Key,
                             TotalSalary = gp.Sum(a => a.Salary)
                         };

    foreach (var r in results)
            {
                Debug.Print(($@"""{r.DepartmentId}"" ""{r.TotalSalary}"""));
            }

Output:

Department TotalSalary
1 17900
2 7700
3 13100
3 13100

1. `from r in Employees`: This part of the query specifies the source of the data, which is the Employees collection.

2. `group r by r.Department into gp`: This groups the employee records (`r`) by their department (`r.Department`). The `into gp` part creates a new group variable named `gp` to store the grouped results.

3. `select new { DepartmentId = gp.Key, TotalSalary = gp.Sum(a => a.Salary) }`: This part of the query selects the result for each group. It creates a new anonymous object with two properties:
   - `DepartmentId`: This is assigned the key of the group, which represents the department.
   - `TotalSalary`: This calculates the sum of the salaries (`a.Salary`) within each group using the `Sum` method.

This LINQ query groups the employee records by department and computes the total salary for each department, returning the results as a collection of anonymous objects with the department ID and total salary.

C# Linq Group By on Multiple Columns

Let's extend our analysis to not only consider the department but also include the country. For instance, in France, we aim to calculate the total salary distributed across various departments such as Marketing, Research and Development, Accounting, and Finance. This requires grouping by multiple columns, specifically by Country and then by Department, to aggregate the total salary first by country and subsequently by department.

SQL Query:

Select Country, Department, SUM(Salary) as TotalSalary
from Employee
group by Country, Department

Linq Query:

var results = from r in Employees
                         group r by new { r.Department, r.Country } into gp
                         select new
                         {
                             DepartmentId = gp.Key.Department,
                             Country = gp.Key.Country,
                             TotalSalary = gp.Sum(a => a.Salary)
                         };

foreach (var r in results)
            {
                Debug.Print(($@"""{r.DepartmentId}"" ""{r.Country}"" ""{r.TotalSalary}"""));
            }

Output:

Country Department TotalSalary
France 1 8500
UK 1 5900
USA 1 3500
France 2 2500
UK 2 2400
USA 2 2800
France 3 3200
UK 3 5100
USA 3 4800
This C# LINQ query groups the employee records by both department and country, then calculates the total salary for each combination of department and country. 

1. `from r in Employees`: This specifies the source of the data, which is the Employees collection.

2. `group r by new { r.Department, r.Country } into gp`: This groups the employee records (`r`) based on a composite key consisting of both department and country. The `into gp` part creates a new group variable named `gp` to store the grouped results.

3. `select new { DepartmentId = gp.Key.Department, Country = gp.Key.Country, TotalSalary = gp.Sum(a => a.Salary) }`: This part of the query selects the result for each group. It creates a new anonymous object with three properties:
  • `DepartmentId`: This is assigned the department from the key of the group.
  • `Country`: This is assigned the country from the key of the group.
  • `TotalSalary`: This calculates the sum of the salaries (`a.Salary`) within each group using the `Sum` method.
Above LINQ query groups the employee records by both department and country, then computes the total salary for each combination of department and country, returning the results as a collection of anonymous objects with department ID, country, and total salary.

Use Multiple aggregate functions with order by

Let's illustrate this with an example. We're displaying the total salary by country and department, and suppose we also want to determine the total number of employees.

To find the total number of employees, we can utilize the `Count()` function. For instance, if we examine the Employee table and find 13 rows, we can retrieve the total count by using `Employee.Count()`, resulting in a count of 13. Additionally, we will utilize the `ORDER BY` clause in conjunction with the `GROUP BY` clause.

Sql Query:

Select Country, Department, SUM(Salary) as TotalSalary,COUNT(Name) as TotalEmployees
from Employee
group by Country, Department
order by TotalSalary

Linq Query:

var results = (from r in employees
                           group r by new { r.Department, r.Country } into gp
                         select new
                         {
                             DepartmentId = gp.Key.Department,
                             Country = gp.Key.Country,
                             TotalSalary = gp.Sum(item => item.Salary),
                             TotalEmployees = gp.Count()
                         }).OrderBy(a=>a.TotalEmployees);

foreach (var r in results)
            {
                Debug.Print(($@"""{r.Country}"" ""{r.DepartmentId}""  ""{r.TotalSalary}"" ""{r.TotalEmployees}"""));
            }

Output

Country Department TotalSalary TotalEmployees
UK 2 2400 1
France 2 2500 1
USA 2 2800 1
France 3 3200 1
USA 1 3500 1
USA 3 4800 2
UK 3 5100 2
UK 1 5900 2
France 1 8500 2
This C# LINQ query groups the employee records by both department and country, calculates the total salary and total number of employees for each combination of department and country, and then orders the results by the total number of employees. 

  • from r in employees: This specifies the source of the data, which is the employees collection.
  • group r by new { r.Department, r.Country } into gp: This groups the employee records (r) based on a composite key consisting of both department and country. The into gp part creates a new group variable named gp to store the grouped results.
  • select new { DepartmentId = gp.Key.Department, Country = gp.Key.Country, TotalSalary = gp.Sum(item => item.Salary), TotalEmployees = gp.Count() }: This part of the query selects the result for each group. It creates a new anonymous object with four properties:
  • DepartmentId: This is assigned the department from the key of the group.
  • Country: This is assigned the country from the key of the group.
  • TotalSalary: This calculates the sum of the salaries (item.Salary) within each group using the Sum method.
  • TotalEmployees: This calculates the total number of employees within each group using the Count method.
  • .OrderBy(a=>a.TotalEmployees): This part of the query orders the results by the total number of employees in ascending order.

How to apply the where clause with Group?

In the above scenario, we aim to filter the results obtained from the `GROUP BY` operation. Suppose we only want to consider the salaries of employees in the Research and Development department. To accomplish this filtering, we can utilize the `WHERE` clause, specifying that the department is equal to 2.

Sql Query:

Select Country, Department, SUM(Salary) as TotalSalary,COUNT(Name) as TotalEmployees
from Employee
where Department=2
group by Country, Department
order by TotalSalary

Linq Query:

var results = (from r in employees
                           where r.Department==2
                           group r by new { r.Department, r.Country } into gp
                           select new
                           {
                               DepartmentId = gp.Key.Department,
                               Country = gp.Key.Country,
                               TotalSalary = gp.Sum(item => item.Salary),
                               TotalEmployees = gp.Count()
                           }).OrderBy(a => a.TotalEmployees);

foreach (var r in results)
            {
                Debug.Print(($@"""{r.Country}"" ""{r.DepartmentId}""  ""{r.TotalSalary}"" ""{r.TotalEmployees}"""));
            }

Output:

Country Department TotalSalary TotalEmployees
UK 2 2400 1
France 2 2500 1
USA 2 2800 1

Use Min and Max aggregate functions with Group By

Lets’ say, I want the min salary that I’m paying to employees by Department.

Sql Query:

Select Department, MIN(Salary) as TotalSalary
from Employee
Group by Department

 

Linq Query:

var results = (from r in employees
                           group r by  r.Department into gp
                           select new
                           {
                               DepartmentId = gp.Key,
                               MinDepartmentSalary = gp.Min(item => item.Salary),
                           });

Output

Department MinDepartmentSalary
1 2700
2 2400
3 2000
The provided code snippet is a LINQ query in C# that operates on a collection of employees. This query is designed to group employees by their department and then, for each department, calculate the minimum salary. Here's a detailed breakdown of what each part of the query does:

  • from r in employees: This part starts the query by iterating over each element in the employees collection. Each element is referred to as r within the context of this query.
Grouping Operation:

  • group r by r.Department into gp: Here, the query groups the elements of the employees collection by the Department property. The result is a collection of groups, where each group (gp) represents a unique department and contains all employees belonging to that department. The gp.Key represents the unique department identifier for each group.

Lets’ say, I want the max salary that I’m paying to employees by Department.

Sql Query:


Select Department, Max(Salary) as MinDepartmentSalary
from Employee
Group by Department

Linq Query:

 var results = (from r in employees
                           group r by  r.Department into gp
                           select new
                           {
                               DepartmentId = gp.Key,
                               MaxDepartmentSalary = gp.Max(item => item.Salary),
                           });

foreach (var r in results)
            {
                Debug.Print(($@"""{r.DepartmentId}""  ""{r.MaxDepartmentSalary}"""));
            }

Output

Department MaxDepartmentSalary
1 5000
2 2800
3 3200
  • from r in employees: This specifies the source of the data, which is the employees collection.
  • group r by r.Department into gp: This groups the employee records (r) based on the department they belong to. The into gp part creates a new group variable named gp to store the grouped results.
  • select new { DepartmentId = gp.Key, MaxDepartmentSalary = gp.Max(item => item.Salary) }: This part of the query selects the result for each group. It creates a new anonymous object with two properties:
  • DepartmentId: This is assigned the department (represented by the key of the group).
  • MaxDepartmentSalary: This calculates the maximum salary within each department using the Max method.
This LINQ query groups the employee records by department and calculates the maximum salary for each department, returning the results as a collection of anonymous objects with department ID and maximum department salary.