Performing Bulk Updates in Entity Framework 6.1+ Part 1

Written by Mark Shapiro on January 26, 2015

In a previous article I discussed performing bulk updates with Entity Framework. At the time, I was using Entity Framework 5, and referenced a third party library called “EntityFramework.Extended”. That library still works up to and including Entity Framework 6.0. However, the latest releases use a different internal structure for mapping, and so EntityFramework.Extended no longer works on their versions.

In this three part series, I will show you how to insert or update data in the database using the built-in capabilities of Microsoft’s Entity Framework 6.1 library for .NET using some additional code which will be included in this article. For purposes of this article I will assume you have a basic working knowledge of LINQ, LINQ to SQL, or LINQ to Entities.

What is Entity Framework 6.1?

According to MSDN, “Entity Framework (EF) is an object-relational mapper that enables .NET developers to work with relational data using domain-specific objects.” Essentially, it is a layer between your application code and your database which maps your C# classes to database tables. Previous to Entity Framework, you would have used ADO.Net to open a database connection, create a command object, execute a SQL statement, iterate through the results and create your C# class instances (objects) manually. Entity Framework builds on top of ADO.Net and takes care of all of that for you. Entity Framework 6.1, in particular, introduced a Mapping API which exposes the internal object-to-entity mapping schema.

How Do I Update Individual Records Using Entity Framework?

Updating individual records in Entity Framework is straightforward and relatively simple: create a context, load the record to be updated, update the properties, and save the changes. The following method marks a single product as discontinued, based on the product ID:As you can see, I’m using a simple LINQ query to get a single product with a ProductID matching the productID parameter of the method, setting the Discontinued flag, and saving the changes. The overhead of this is a little bit more than if I were using ADO.Net, though. In this case I have to execute one database call to retrieve the product, and one database call to update the data. If I were using ADO.Net I would just write a single Update statement.

How Do I Update Multiple Records Using Entity Framework?

Let’s say I wanted to discontinue all products which are out of stock and have no units on order, instead of just one. Using only the built-in Entity Framework functionality, I would have to do something like this:This LOOKS pretty simple – I’m iterating through the context.Products set, setting Discontinued for each product, and saving changes once. This should be pretty optimal, right? Unfortunately it’s not. Behind the scenes, Entity Framework is executing one SQL ‘select’ call to retrieve all the products, but when you call SaveChanges, it is executing one SQL ‘update’ call per product. If you were to run SQL Profiler while executing this code against the Northwind database, you would see there are a total of five SQL statements executed – one to retrieve the data, and four to update each of the four records in the Products table, which match the search criteria. Using ADO.Net I would execute a single update statement, “update products set discontinued=1 where UnitsInStock=0 and UnitsOnOrder=0”. As the number of out of stock products in the Products table grows, the number of SQL statements executed grows.

This demonstrates a very common problem with Entity Framework and other object-relational mapping libraries. They’re great for retrieving data, but when it comes to making updates they often lose efficiency. A few years ago, convinced that there must be a better way, I spent some time searching online for anyone who had solved this problem. After hours of searching, the only thing I had found was that everyone agreed this was a problem, but nobody had a solution. Finally, six months later, I found someone who agreed that this was a problem and created his own solution.

Updating Multiple Records Using EntityFramework.Extended

Paul Welter of LoreSoft.com created a library called EntityFramework.Extended which allows you to (among other features not covered by this article) perform batch Update and Delete statements using a single SQL call. The same DiscontinueProducts method as above could be rewritten as:Notice there is now only a single method called against the context, and no call to context.SaveChanges(). With SQL Profiler running, you would see only a single call to the database:While this isn’t exactly what we would write if we were doing this by hand, it is far better than what Entity Framework would do on its own. We’re only making a single call to SQL Server, which is good, and no matter how many records are being updated, the number of calls to SQL Server never increases, which is fantastic.

EntityFramework.Extended also has a bulk Delete, which is very similar to the bulk Update but instead of calling .Update(…..) you just call .Delete() with no arguments.

Why Did Entity Framework 6.1 Break EntityFramework.Extended?

The EntityFramework.Extended library used reflection to access internal (private and protected) properties and methods of the various EntityFramework classes. While this approach is technically sound for an individual version of the Framework, there is no guarantee that future versions will have the same internals. Indeed, there were subtle changes between EntityFramework 5.x and 6.0, which required a minor change to EntityFramework.Extended. The changes between 6.0 and 6.1 were more significant, however, and with the introduction of the Mapping API it seemed logical to switch from using Reflection to using the new API.

Using the Mapping API

Using the Mapping API, it is possible to write extension methods similar to those provided by EntityFramework.Extended, so that you can call IQueryable.Update(…) or IQueryable.Delete(…) and have EntityFramework generate a single SQL statement which affects multiple records. In the ideal situation you would not have to rely on reflection at all, however there are certain parts of this which I was unable to find a way to accomplish without still relying on reflection – in particular, getting at the underlying SQL statement of the query, including parameters and values. In this article, I will focus on Delete(…), as that is the simpler of the two. The next article will focus on Update, and the third article in the series will cover two different types of bulk inserts. All of the public methods presented here are extension methods on either DbContext or an IQueryable generated from a DbContext.

Delete Signatures

My implementation of Delete has two signatures:

1. Call Delete from the DbContext, passing in the same sort of lambda expression you would pass into IQueryable.Where(expression)2. Call Delete directly on an IQueryable, with the caveat that the IQueryable must have originated from a DbContext or ObjectContext

You’ll notice that the internals of both methods are very similar. In the first case, we call context.Set().Where(expression) to get an IQueryable and then it looks an awful lot like the second case. In both cases, we convert the IQueryable into an ObjectQuery and we call a private implementation of Delete which does the actual work.

Converting IQueryable to ObjectQuery

Later in the blog, we will be getting the underlying SQL statement for the query. On the face of this, it should be pretty simple. IQueryable.ToString() returns the SQL statement. Unfortunately, it does not return the parameters. Those are easy to get, however, by converting the IQueryable into an ObjectQuery and calling ObjectQuery.ToTraceString().

Unfortunately I haven’t been able to find a way to convert an IQueryable to an ObjectQuery without resorting to Reflection. Assuming the IQueryable is really a DbQuery , it has a private InternalQuery property which is an internal type used by LINQ. That internal type has an ObjectQuery property, which is (not surprisingly) of type ObjectQuery.

Delete Internals

The internal implementation of Delete calls the Entity Framework Mapping API to determine the table name and primary key, and uses a little bit of reflection to get the SQL statement and parameters for the From/Join part of the query.

The end result is a DELETE statement which deletes the records selected by either the IQueryable or the Expression<Func<TEntity, bool>> you used for the .Delete(…) call.

Getting Mapping Information

GetMapping(…) wraps the Entity Framework Mapping API into something a little more manageable so I don’t have to traverse through the API every time I want to get some information. It also has some caching built in so that once I’ve retrieved the mapping information for a given TEntity, I don’t have to access the API again for that TEntity.

The getEntitySet(…) method is the one that does the actual heavy lifting of going through the API to find the EntitySet to which the TEntity is mapped. I cobbled getEntitySet(…) together from several other articles, StackOverflow questions, and other online sources, each of which contributed a piece (but not the entirety) to the puzzle.

This is all somewhat complicated, and I’m going to have to ask you to take it as a matter of faith that it’s somewhat correct. I’ve tested it in a couple of different scenarios and it works, though I will freely admit that there may be scenarios in which it doesn’t.

The GetSelectSQL Method

The final piece of the puzzle is the GetSelectSQL method, which takes the ObjectQuery and turns it into a parameterized SQL statement. The code that follows uses Microsoft’s LINQ Dynamic Query library described in Scott Guthrie’s blog.

Putting it all together, we start with either an Expression or an IQueryable, step down to the underlying ObjectQuery, use the Entity Framework Mapping API to retrieve table and primary key information for TEntity, and generate and execute a DELETE statement.

Similar code can be written to handle Bulk Update and Bulk Insert, though at the time this article was written, I had not yet written them. When I do, I’ll update this article to include those methods.

Caveat Lector

While this all sounds great, there are a few catches:

1. The Mapping API has a number of issues which make it difficult to use both now and in the future. The API isn’t well documented, nor is it very well self-documented and it can be very difficult to figure out what you’re looking at, what methods you need to call, what IEnumerables you need to search, and exactly which property of which object has the specific information you are looking for. Furthermore, the Entity Framework team is making significant changes in the next major version (Entity Framework 7) which may make significant changes to the Mapping API. Normally an API is a contract and developers can expect the API to remain consistent (or expand) from version to version. In this case, that may not be true.

2. If you use inheritance hierarchies in your entities, this may not work for all implementations. It should work fine for Table per Hierarchy (TPH) and Table per Concrete Class (TPC) as both of these implementations have all the data for a given TEntity stored in a single table (multiple TEntity sharing a table for TPH, each TEntity with its own table for TPC.)

If you’re using Table Per Type (TPH) you may run into some limitations. I have not tested any of this with a TPH implementation, but I imagine that if you call the Delete on a child class, it will not work. If you call Delete on the root class and have cascading deletes on your foreign keys in the database, it should work fine

3. The DbContext has an in-memory cache of entities it has loaded from the database, along with state information about each entity. This is what allows you to load records, make changes to the POCOs, and call DbContext.SaveChanges() to commit those changes to the database. Neither EntityFramework.Extended nor the Mapping API approach described above do anything to the DbContext state. If you update or delete records in the database, the DbContext will not know about those changes and certain queries thereafter may return invalid data from the cache.

My typical pattern is to make sure that I always call SaveChanges() if I’ve made any entity changes, BEFORE calling Update(…) or Delete(…). I also tend to keep a DbContext around only as long as I need it, so that each new business transaction (user interaction, service request, etc.) uses a new, clean context.

Entity Framework Future Roadmap

I’ve looked at the roadmap for Entity Framework and I don’t see anything on the immediate horizon to provide these built-in bulk Update, Delete, and Insert features, but hopefully the folks on the Entity Framework team will get to that eventually.

Need Help? Contact us