Django Group By – Shishir Kant Singh

Introduction to the Django Group By

The SQL GROUP BY clause groups the rows returned by a query into groups. Typically, you use aggregate functions like count, min, max, avg, and sum with the GROUP BY clause to return an aggregated value for each group.

Here’s the basic usage of the GROUP BY clause in a SELECT statement:

SELECT column_1, AGGREGATE(column_2)
FROM table_name
GROUP BY column1;Code language: SQL (Structured Query Language) (sql)

In Django, you can you the annotate() method with the values() to apply the aggregation on groups like this:

(Entity.objects
    .values('column_2')
    .annotate(value=AGGREGATE('column_1'))
)Code language: Python (python)

In this syntax;

values('column_2') – pass the column that you want to group to the values() method.
annotate(value=AGGREGATE('column_1')) – specify what to aggregate in the annotate() method.

Notice that the order of calling values() and annotates() matter. If you do not call the values() method first and annotate() second, the expression won’t produce aggregate results.

Django Group By examples

We’ll use the Employee and Department models from the HR application for the demonstration. The Emloyee and Department models map to the hr_employee and hr_department tables in the database:

1) Django Group By with Count example

The following example uses the values() and annotate() method to get the number of employees by department:

>>> (Employee.objects
...     .values('department')
...     .annotate(head_count=Count('department'))
...     .order_by('department')
...  )
SELECT "hr_employee"."department_id",
       COUNT("hr_employee"."department_id") AS "head_count"
  FROM "hr_employee"
 GROUP BY "hr_employee"."department_id"
 ORDER BY "hr_employee"."department_id" ASC
 LIMIT 21
Execution time: 0.001492s [Database: default]
<QuerySet [{'department': 1, 'head_count': 30}, {'department': 2, 'head_count': 40}, {'department': 3, 'head_count': 28}, {'department': 4, 'head_count': 29}, {'department': 5, 'head_count': 29}, {'department': 6, 'head_count': 30}, {'department': 7, 'head_count': 34}]>Code language: Python (python)

How it works.

First, group the employees by department using the values() method:

values('department')Code language: Python (python)

Second, apply the Count() to each group:

annotate(head_count=Count('department'))Code language: Python (python)

Third, sort the objects in the QuerySet by department:

order_by('department')Code language: Python (python)

Behind the scenes, Django executes the SELECT statement with the GROUP BY clause:

SELECT "hr_employee"."department_id",
       COUNT("hr_employee"."department_id") AS "head_count"
  FROM "hr_employee"
 GROUP BY "hr_employee"."department_id"
 ORDER BY "hr_employee"."department_id" ASC
 LIMIT 21
Code language: SQL (Structured Query Language) (sql)

2) Django Group By with Sum example

Similarly, you can use the Sum() aggregate to calculate the total salary of employees in each department:

>>> (Employee.objects
...     .values('department')
...     .annotate(total_salary=Sum('salary'))
...     .order_by('department')
...  )
SELECT "hr_employee"."department_id",
       SUM("hr_employee"."salary") AS "total_salary"
  FROM "hr_employee"
 GROUP BY "hr_employee"."department_id"
 ORDER BY "hr_employee"."department_id" ASC
 LIMIT 21
Execution time: 0.000927s [Database: default]
<QuerySet [{'department': 1, 'total_salary': Decimal('3615341.00')}, {'department': 2, 'total_salary': Decimal('5141611.00')}, {'department': 3, 'total_salary': Decimal('3728988.00')}, {'department': 4, 'total_salary': Decimal('3955669.00')}, {'department': 5, 'total_salary': Decimal('4385784.00')}, {'department': 6, 'total_salary': Decimal('4735927.00')}, {'department': 7, 'total_salary': Decimal('4598788.00')}]>
Code language: Python (python)

3) Django Group By with Min, Max, and Avg example

The following example applies multiple aggregate functions to groups to get the lowest, average, and highest salary of employees in each department:

>>> (Employee.objects
...     .values('department')
...     .annotate(
...         min_salary=Min('salary'),
...         max_salary=Max('salary'),
...         avg_salary=Avg('salary')
...     )
...     .order_by('department')
...  )
SELECT "hr_employee"."department_id",
       MIN("hr_employee"."salary") AS "min_salary",
       MAX("hr_employee"."salary") AS "max_salary",
       AVG("hr_employee"."salary") AS "avg_salary"
  FROM "hr_employee"
 GROUP BY "hr_employee"."department_id"
 ORDER BY "hr_employee"."department_id" ASC
 LIMIT 21
Execution time: 0.001670s [Database: default]
<QuerySet [{'department': 1, 'min_salary': Decimal('45427.00'), 'max_salary': Decimal('149830.00'), 'avg_salary': Decimal('120511.366666666667')}, {'department': 
2, 'min_salary': Decimal('46637.00'), 'max_salary': Decimal('243462.00'), 'avg_salary': Decimal('128540.275000000000')}, {'department': 3, 'min_salary': Decimal('40762.00'), 'max_salary': Decimal('248265.00'), 'avg_salary': Decimal('133178.142857142857')}, {'department': 4, 'min_salary': Decimal('43000.00'), 'max_salary': 
Decimal('238016.00'), 'avg_salary': Decimal('136402.379310344828')}, {'department': 5, 'min_salary': Decimal('42080.00'), 'max_salary': Decimal('246403.00'), 'avg_salary': Decimal('151233.931034482759')}, {'department': 6, 'min_salary': Decimal('58356.00'), 'max_salary': Decimal('248312.00'), 'avg_salary': Decimal('157864.233333333333')}, {'department': 7, 'min_salary': Decimal('40543.00'), 'max_salary': Decimal('238892.00'), 'avg_salary': Decimal('135258.470588235294')}]>Code language: Python (python)

4) Django group by with join example

The following example uses the values() and annotate() methods to get the number of employees per department:

>>> (Department.objects
...     .values('name')
...     .annotate(
...         head_count=Count('employee')
...     )
...  )
SELECT "hr_department"."name",
       COUNT("hr_employee"."id") AS "head_count"
  FROM "hr_department"
  LEFT OUTER JOIN "hr_employee"
    ON ("hr_department"."id" = "hr_employee"."department_id")
 GROUP BY "hr_department"."name"
 LIMIT 21
Execution time: 0.001953s [Database: default]
<QuerySet [{'name': 'Marketing', 'head_count': 28}, {'name': 'Finance', 'head_count': 29}, {'name': 'SCM', 'head_count': 29}, {'name': 'GA', 'head_count': 30}, {'name': 'Sales', 'head_count': 40}, {'name': 'IT', 'head_count': 30}, {'name': 'HR', 'head_count': 34}]>Code language: Python (python)

How it works.

values('name') – groups department by name.
annotate(headcount=Count('employee')) – counts employees in each department.

Behind the scenes, Django uses a LEFT JOIN to join the hr_department table with hr_employee table and apply the COUNT() function to each group.

Django group by with having

To apply a condition to the groups, you use the filter() method. For example, the following uses the filter() method to get the department with head counts are more than 30:

>>> (Department.objects
...     .values('name')
...     .annotate(
...         head_count=Count('employee')
...     )
...     .filter(head_count__gt=30)
...  )
SELECT "hr_department"."name",
       COUNT("hr_employee"."id") AS "head_count"
  FROM "hr_department"
  LEFT OUTER JOIN "hr_employee"
    ON ("hr_department"."id" = "hr_employee"."department_id")
 GROUP BY "hr_department"."name"
HAVING COUNT("hr_employee"."id") > 30
 LIMIT 21
Execution time: 0.002893s [Database: default]
<QuerySet [{'name': 'Sales', 'head_count': 40}, {'name': 'HR', 'head_count': 34}]>
Code language: Python (python)

Behind the scenes, Django uses the HAVING clause to filter the group based on the condition that we pass to the filter() method:

SELECT "hr_department"."name",
       COUNT("hr_employee"."id") AS "head_count"
  FROM "hr_department"
  LEFT OUTER JOIN "hr_employee"
    ON ("hr_department"."id" = "hr_employee"."department_id")
 GROUP BY "hr_department"."name"
HAVING COUNT("hr_employee"."id") > 30
Code language: Python (python)

Summary

Use values() and annotate() method to group rows into groups.
Use filter() to add conditions to filter groups.