Left Outer Joins in LINQ with Entity Framework by ThinqLinq

Left Outer Joins in LINQ with Entity Framework

As I spend more time reviewing code with clients and on public forums, I’m constantly seeing cases where people have issues with Outer joins in LINQ and the various flavors of LINQ. In this post, I’m going to look at a couple options from a syntax perspective that you can use to make working with outer joins easier with LINQ. Naturally, we have a much deeper discussion of outer joins in our book that you’re welcome to dig into as well.

Typically, if you do a join in LINQ, it will perform an inner join where it only returns the records that contain results in both sides of the evaluation (in both tables). If a child table doesn’t have a record for a parent row, that result will be excluded. Consider the case in Northwind where an Order record exists, but there are no order detail lines. The following LINQ query won’t return that record as part of its result set:

from o in Orders
join od in OrderDetails on o.OrderID equals od.OrderID
select new {o, od}

This translates into the following SQL statement (Note to DBA’s that are concerned by the * below: EF does list the individual columns in the actual SQL. I’m reducing them to * for the purposes of this post to focus on the join clauses):

SELECT 
    [Extent1].*, [Extent2.*
FROM  [dbo].[Orders] AS [Extent1]
    INNER JOIN [dbo].[OrderDetails] AS [Extent2] ON [Extent1].[OrderID] = [Extent2].[OrderID]

If you want to include the orders regardless of whether it has any detail lines, you would need to turn this inner join into an outer join using the DefaultIfEmpty extension method. LINQ only supports left outer joins. If you want a right outer join, you need to flip the logic of your query to turn it into a left outer join. In order to use the DefaultIfEmpty, you typically need to push the joined set into a temporary value first and then select from it using the DefaultIfEmpty method:

from o in Orders
join innerOD in OrderDetails on o.OrderID equals innerOD.OrderID into Inners
from od in Inners.DefaultIfEmpty()
select new {o, od}

This generates the expected LEFT outer join as shown below:

SELECT 
    [Extent1].*, [Extent2].*
FROM  [dbo].[Orders] AS [Extent1]
    LEFT OUTER JOIN [dbo].[OrderDetails] AS [Extent2] ON [Extent1].[OrderID] = [Extent2].[OrderID]
The problem that I have with this is that the syntax seems overly verbose to accomplish this change. As is often the case, Microsoft often gives multiple ways to accomplish the same goal. One method that I’ve started to find helpful is to revert to more of an ANSI 82 style syntax where the joins were accomplished in a where clause instead of a join clause. By mixing and matching the LINQ query comprehensions and lambda syntax, we can restate the above query as follows:
from o in Orders
from od in OrderDetails
    .Where(details => o.OrderID == details.OrderID)
    .DefaultIfEmpty()
select new {o, od}

If we check the SQL, we can see that this generates the same SQL as the more verbose version above. Often when I’m doing this in application code, I’ll put the where and DefaultIfEmpty on the same line to let me focus on what I’m fetching from, not how I’m joining them unless I need to focus on that.

There is an issue with using this syntax when joining nullable values or strings. Since the where statement doesn’t know about the cardinality relationship (1-Many, 0-1 – Many, etc), Entity Framework adds an additional check where the nullable value is not null to allow for join cases where both sides have a null value. Changing our query to use Northwind’s Customers and Orders which joins on the string CustomerID values, we can write the following LINQ query which is nearly identical to the one we created before for Orders and OrderDetails:

from c in Customers
from o in Orders
    .Where (o => o.CustomerID == c.CustomerID)
    .DefaultIfEmpty()
select new {c, o};

This results in the following SQL statement.

SELECT 
    1 AS [C1], 
    [Extent1].*, [Extent2}.*
    FROM  [dbo].[Customers] AS [Extent1]
    LEFT OUTER JOIN [dbo].[Orders] AS [Extent2] ON ([Extent2].[CustomerID] = [Extent1].[CustomerID])
       AND ([Extent2].[CustomerID] IS NOT NULL)

Note that in this case, we have an additional clause checking for Extent2.CustomerID IS NOT NULL. This may seem innocuous, but I have found in at least one case for the query execution to be significantly slower due to the use of an index scan rather than an index seek caused by this clause. Here you have to be careful when crafting your queries and monitor performance even more carefully to avoid performance bottlenecks unnecessarily.

While this version works better, I still prefer to use associations rather than joins to think about the problem from more of an object graph perspective rather than set based operations. As long as you have a natural association between the entities I much prefer using the associations to navigate through than to have to worry about building out the joins manually each time. We can restate our join even more simply as follows using associations:

from o in Orders
from od in o.Order_Details.DefaultIfEmpty()
select new {o, od}

Note that if you omit the DefaultIfEmpty clause, you would get an Inner join rather than left outer join.

If you have other ways of creating outer joins in LINQ that you prefer, let me know what you thinq.

Posted on - Comment
Categories: LINQ - C# - Entity Framework -
comments powered by Disqus