Saturday, February 20, 2016

Django prefetch_related VS select_related in a single database query

Imagine the following model:
class Parent(Model):
    ...

class Child(Model)
    parent = ForeignKey(Parent)
    ...

I want to list all parents along with first related child.

If I use prefetch_related method:
Parent.objects.prefetch_related('child_set')
It gives me all children instead of just one. Moreover it hits database two times, which is not efficient:
SELECT "parent"."id" FROM "parent";
SELECT "child"."id", "child"."parent_id" FROM "child" WHERE "child"."parent_id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27);

Finally I ended up using another approach, which allows to fetch all data in a single query. Seems there is no support in django 1.9.2 for a simple left outer join (select all parents and join children). Instead I will do the opposite: select all first children and join parents.

Single query approach using select_related and distinct:
Child.objects.order_by('parent', 'pk').select_related('parent').distinct('parent')
Here I order children by 'pk' and then distinct gives me only first child in a particular parent group. But note that order_by must also have 'parent' as a first parameter, this is because we do 'distinct' by 'parent'.
SELECT DISTINCT ON ("child"."parent_id") "child"."id", "child"."parent_id", "parent"."id" FROM "child" INNER JOIN "parent" ON ("child"."parent_id" = "parent"."id") ORDER BY "child"."parent_id" ASC, "child"."id" ASC;
This approach works as long as every parent has at least one child.

No comments:

Post a Comment