What is Entity Framework and How Does it Support Bulk Updates?

Share on FacebookTweet about this on TwitterShare on LinkedIn

Share on FacebookTweet about this on TwitterShare on LinkedIn

One frequently asked question about Entity Framework and other Object-Relational Mapping (ORM) libraries is how to perform bulk updates of records in an efficient manner, without unnecessary overhead or excessive database calls. This article discusses how to insert or update data in the database using the built-in capabilities of Microsoft’s Entity Framework library for .NET, as well as using a third party library which I will introduce later. For purposes of this article, I will assume you have a basic working knowledge of LINQ, LINQ to SQL, or LINQ to Entities. I will be using the Northwind database schema.

what-is-entity-framework-and-how-does-it-support-bulk-updates

What is Entity Framework?

According to the Microsoft Developer Network (MSDN), “Entity Framework (EF) is an object-relational mapper that enables .NET developers to work with relational data using domain-specific objects.”  Essentially, it is a layer between your application code and your database which maps your C# classes to database tables. Previous to Entity Framework, you would have used ADO.Net to open a database connection, create a command object, execute a SQL statement, iterate through the results and create your C# class instances (objects) manually. Entity Framework builds on top of ADO.Net and takes care of all of that for you.

How do I update individual records using Entity Framework?

Updating individual records in Entity Framework is straightforward and relatively simple: create a context, load the record to be updated, update the properties, and save the changes. The following method marks a single product as discontinued, based on the product ID:

             public void DiscontinueProduct(int productID)
        {
            using (NorthwindContext context = new NorthwindContext())
            {
                Product product = context.Products.Single(x => x.ProductID == productID);
                product.Discontinued = true;
                context.SaveChanges();
            }
        }

As you can see, I’m using a simple LINQ query to get a single product with a ProductID matching the productID parameter of the method, setting the Discontinued flag, and saving the changes. The overhead of this is a little bit more than if I were using ADO.Net, though. In this case, I have to execute one database call to retrieve the product, and one database call to update the data. If I were using ADO.Net I would just write a single Update statement.

How do I update multiple records using Entity Framework?

Let’s say I wanted to discontinue all products which are out of stock and have no units on order, instead of just one. Using only the built-in Entity Framework functionality, I would have to do something like this:

        public void DiscontinueProducts()
        {
            using (NorthwindContext context = new NorthwindContext())
            {
                IQueryable toDiscontinue = context.Products
                    .Where(x => x.UnitsInStock == 0 && x.UnitsOnOrder == 0);
                foreach (Product product in toDiscontinue)
                {
                    product.Discontinued = true;
                }
                context.SaveChanges();
            }
        }

This LOOKS pretty simple – I’m iterating through the context. Products set, setting Discontinued for each product, and saving changes once. This should be pretty optimal, right? Unfortunately, it’s not.  Behind the scenes, Entity Framework is executing one SQL ‘select’ call to retrieve all the products, but when you call SaveChanges, it is executing one SQL ‘update’ call per product. If you were to run SQL Profiler while executing this code, you would see there are a total of 5 SQL statements executed – one to retrieve the data, and four to update each of the four records in the Products table which match the search criteria. Using ADO.Net, I would execute a single update statement: “update products set discontinued=1 where UnitsInStock=0 and UnitsOnOrder=0”. As the number of out-of-stock products in the Products table grows, the number of SQL statements executed grows.

This demonstrates a very common problem with Entity Framework and other object-relational mapping libraries.They’re great for retrieving data, but when it comes to making updates, they often lose efficiency. A few years ago, convinced that there must be a better way, I spent some time searching online for anyone who had solved this problem. After hours of searching, the only thing I had found was that everyone agreed this was a problem, but nobody had a solution. Finally, six months later, I found someone who had created an initial solution.

Updating multiple records using EntityFramework.Extended

Paul Welter of LoreSoft.com created a library called EntityFramework.Extended which allows you to (among other features not covered by this article) perform batch Update and Delete statements using a single SQL call. The same DiscontinueProducts method as above could be rewritten as:

        public void BulkDiscontinueProducts()
        {
            using (NorthwindContext context = new NorthwindContext())
            {
                context.Products
                    .Where(x => x.UnitsInStock == 0 && x.UnitsOnOrder == 0)
                    .Update(x => new Product() { Discontinued = true });
            }
        }

Notice there is now only a single method called against the context, and no call to context.SaveChanges(). With SQL Profiler running, you would see only a single call to the database:

exec sp_executesql N'UPDATE [dbo].[Products] SET
[Discontinued] = @p__update__0
FROM [dbo].[Products] AS j0 INNER JOIN (
SELECT
[Extent1].[ProductID] AS [ProductID]
FROM [dbo].[Products] AS [Extent1]
WHERE (0 =  CAST( [Extent1].[UnitsInStock] AS int)) AND (0 =  CAST( [Extent1].[UnitsOnOrder] AS int))
) AS j1 ON (j0.[ProductID] = j1.[ProductID])',N'@p__update__0 bit',@p__update__0=1

While this isn’t exactly what we would write if we were doing this by hand, it is far better than what Entity Framework would do on its own. We’re only making a single call to SQL Server, which is good, and no matter how many records are being updated, the number of calls to SQL Server never increases, which is fantastic!

Behind the scenes, the EntityFramework.Extended library is using reflection to dig into the private members of the Entity Framework classes to retrieve the ORM mapping information. It’s then using this to construct a nested SQL statement, one part to collect the set of records to be updated, and the other part to actually perform the update. Because EntityFramework.Extended is creating this nested SQL statement, the LINQ statement can be as complex as you want it to be – as long as the end result (before calling .Update()) is an IQueryable. The the call to .Update can take various forms, but the method I’m using here takes an expression containing a Product initializer with the fields to be updated.

EntityFramework.Extended also has a bulk Delete, which is very similar to the bulk Update but instead of calling .Update(…..) you just call .Delete() with no arguments.

Inserting Multiple Records using EntityFramework.Extended

Unfortunately EntityFramework.Extended does not yet have a bulk Insert method; however, I have written one based on Paul’s excellent foundation code. I wrote two methods – BulkInsert and InsertFrom.  BulkInsert takes a List of objects of type T and executes one or more SQL bulk insert statements.  Due to restrictions of ADO.Net, if you are inserting a very large number of records, the BulkInsert may have to break it up into smaller chunks. The number of chunks depends on the size (number of fields and data size) of the records you are inserting. InsertFrom constructs an “insert into X select … from Y” statement, to insert records into one table based on a Select from another set of tables. The InsertFrom method does not have the same restrictions as BulkInsert, though it does require that the origin data already be in tables in your database. Because InsertFrom executes entirely on the database server, it can be extremely efficient.

Entity Framework Future Roadmap

I’ve looked at the roadmap for Entity Framework, and I don’t see anything on the immediate horizon to provide these capabilities as built-in features, but hopefully the folks on the Entity Framework team will take the work Paul and I have done and incorporate it, or something like it, into some future version.


Share on FacebookTweet about this on TwitterShare on LinkedIn

About the Author

Mark Shapiro is Segue's Senior Software Architect and has been with the company since 1998. He has a Master's Degree in Software Engineering and over 20 years of work experience in a wide array of languages including Perl, PHP, Java, Delphi (Object Pascal), and C#, and once helped build the network for a major telephone company's dialup ISP. In his spare time he enjoys scuba diving and playing with his Alaskan Malamute. Read more from Mark Shapiro