Testing to see if a record Exists in LINQ to SQL
There are a number of options you can consider when testing to see if a record exists using LINQ to SQL. Which one should you use? It depends… In general, check the generated SQL for various options in SQL Management Studio to see the how the various execution plans compare. For example, each of the following can tell you if a record exists.
Dim q1 = Customers.FirstOrDefault(Function(c) c.City="London")
Dim q2 = Customers.Count(Function(c) c.City="London")
Dim q3 = Customers.Any(Function(c) c.City="London")
If we take a look at the generated SQL, we'll see that these produce the following SQL Statements:
-- Region Parameters
DECLARE @p0 NVarChar(6) = 'London'
-- EndRegion
SELECT TOP (1) [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],
[t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region],
[t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [Customers] AS [t0]
WHERE [t0].[City] = @p0
GO
-- Region Parameters
DECLARE @p0 NVarChar(6) = 'London'
-- EndRegion
SELECT COUNT(*) AS [value]
FROM [Customers] AS [t0]
WHERE [t0].[City] = @p0
GO
-- Region Parameters
DECLARE @p0 NVarChar(6) = 'London'
-- EndRegion
SELECT
(CASE
WHEN EXISTS(
SELECT NULL AS [EMPTY]
FROM [Customers] AS [t0]
WHERE [t0].[City] = @p0
) THEN 1
ELSE 0
END) AS [value]
Analyzing this in SQL Management Studio shows that the second and third options take roughly 30% of the total time where the first takes 40%. As a result, it would appear that the first last two options were faster than the first. Realizing that the first has to hydrate a full object (and throw it away), we can recognize that there is additional data being transferred across the wire which can slow things down as well.
That doesn't tell the whole story. In general, I would expect that the last option would perform best because SQL Server could stop as soon as it finds the first matching record, where-as, it has to process the entire list (or index) to do a count. Usually with large volumes of data, Exists will out perform Count.
If we check the overall performance of the second two options, we can see that typically the last option performs slower than the second. I suspect this is due to the time it takes to process the expression tree and generate the SQL for these two methods. Also, the underlying data (Northwind) doesn't have that many records so the processing time may be unrealistic as compared to results from a larger database.
Also consider whether you need to work with the results or just want to know if they exist. If you need to work with them once you've determined that they exist, then using the .Any/Exists option would cause you to need a separate query to fetch the actual objects. In that case, FirstOrDefault would be better as you only need a single query to the database.
In general, there is no silver bullet. You need to test your options and determine which is best for your current situation.



If you run this against the MSFT northwind sample db then the first query will most likely resort to a table scan due to the small number of records in the customers table. (<100 customers in northwind). This is because on small tables, SQL Server will go for table scans rather than index seek + a subsequent key lookup (double work).
With a bit more data, all three queries should do an index seek on the index on the city column, but the first query will also have to do a key lookup to get the actual record while the latter two only need to touch the index on the city column (since no records are returned by the select). This will give query #1 a slightly higher I/O cost, so as always it is a good idea to not query for more data than what is needed.
All in all, the cost of the last two queries should not differ db-side although as you say there may be a difference client-side in L2S.
...and it is always good to do this kind of tests with a not-too-small sample db - northwind etc can give deceptive results due to the small amount of data... :)
Kristofer (Posted on 9/13/2009 9:48:00 PM)