Should you return IQueryable<T> from Your Repositories?

The internet has been abuzz for a long time with fierce debates about whether or not you should return IQueryable<T> types from your repositories. I do have a fairly strong opinion on this issue so I figured I would weigh in on the discussion. First let's take a look at what they are talking about. We'll say they are using Entity Framework for the sake of my examples but they could just as easily be using LINQ to SQL.

Why would someone want to return IQueryable<T>?

Well it is true that IQueryable<T> is extremely powerful because any LINQ written against an IQueryable collection is supposed to be translated into an expression tree which is then translated into the data store's native language, such as SQL. This means that you only get back the data you query for rather than having all the data returned and then filtered down in memory on the server. Take a look at this quick example of a repository that returns IQueryable.

public interface IBookRepository
{
    IQueryable<Book> Books { get; }
    void AddBook(Book book);
    void DeleteBook(Book book);
    void SaveChanges();
}

public class BookRepository : IBookRepository
{
    private Entities _objectContext;
    public IQueryable<Book> Books { get { return _objectContext.Books; } }

    public BookRepository()
    {
        _objectContext = new Entities();
    }

    public void AddBook(Book book)
    {
        _objectContext.Books.AddObject(book);
    }

    public void DeleteBook(Book book)
    {
        _objectContext.Books.DeleteObject(book);
    }

    public void SaveChanges()
    {
        _objectContext.SaveChanges();
    }
}

The thing people love about this interface is that you can get books however you want.

IBookRepository bookRepo = new BookRepository();
var booksByRowling = bookRepo.Books.Where(x => x.Author == "J.K. Rowling");

You are in complete control over how the data is retrieved which at first seems really awesome, but I'll show you in a minute why I don't think it's a good practice.

Why I don't think you should use IQueryable<T> outside of your repository.

I don't believe that you should return IQueryable for several reasons. First, it breaks unit testing capabilities. Half the point of compartmentalizing your data persistence logic into repositories is so that you can isolate and test that logic. If you're passing the buck up to the domain logic level then you're no longer able to test data persistence logic and domain logic by itself. In my opinion it completely breaks separation of concerns. You can't unit test a property of type IQueryable because you couldn't possibly account for all the logic possibilities that could be handed off from your domain logic. The person writing the domain logic is completely responsible for writing sound queries against the database! That's just absurd in my humble opinion.

Straight away the inability to properly test a component of your application should be reason enough to steer clear. However, there is another major issue that I haven't really seen discussed elsewhere on the web in relation to this debate. Returning IQueryable from your repository automatically forces the domain logic to make several assumptions. As I sort of mentioned before, the first assumption is that the domain is responsible for writing query logic. If you divide up the pieces of your application among different developers then those developers working on domain logic are forced to write code against that IQueryable interface in order to retrieve the data they want. The second assumption it makes is that the data store will ALWAYS have to know how to properly implement IQueryable. If you ever try to re-factor your repository layer to use a different data storage technique then you will likely also have to re-factor your domain logic and/or write a new repository layer that exposes entity collections as IQueryable. If your new data store doesn't understand expression trees out of the box then you're in for a world of hurt.

Aside from test-ability the other major reason for implementing the repository pattern is to create an application seam. The repository layer is supposed to hide the data persistence code from the rest of the application. The domain logic is ideally supposed to talk to the repository layer through interfaces and remain ignorant of what is actually going on behind those interfaces. This enables you to re-write your interfaces at any time to use a completely different data persistence technology. By returning IQueryable you automatically tie your repository layer to any data persistence technology that properly implements IQueryable. Yes technically you could implement IQueryable yourself and attempt to translate the LINQ written against it into something usable for whatever data persistence tech you're using, but that's a huge commitment and far from fool-proof. If you still decide to return IQueryable then you may as well resolve yourself to the fact that you will be stuck using either LINQ to SQL or Entity Framework because as of this writing I am not aware of any other tech that properly implements IQueryable.

The proper way to do it.

I definitely understand the desire to return such a useful and powerful type. But for the sake of test-ability and maintainability I highly advise against it. Instead you should be exposing necessary functions through your repository interfaces. Make your methods more verbose so that writing domain logic that uses it is simple and easy. For example, instead of writing a method called GetBooks write a method called GetBooksByAuthor(string author) and another called GetBooksByDateRange(DateTime startDate, DateTime endDate), etc. Yes the method names get larger and yes you will be writing more methods, but when you think about it from the perspective of maintainability it becomes pretty clear that it's the right way to go.

Many people have argued that IQueryable<T> makes it so easy to do paging because you can just do var pageOfBooks = bookRepo.Books.skip(itemsPerPage * (page-1)).Take(itemsPerPage); in your domain logic. It's not that much harder to write a method on your repository and implement that paging logic there.

public IList<Book> GetBooksByPage(int page, int itemsPerPage)
{
    return _objectContext.Books.Skip(itemsPerPage * (page-1)).Take(itemsPerPage).ToList();
}

You still get the benefit of having that paging executed in the data store so you're not returning superfluous rows that you're not even going to use, but it's compartmentalized in a repository method so that it's hidden from the rest of your application.

The counter-argument to this is a scenario where you need a page of results but you also need the count for how many items there are total (usually for calculating the maximum number of pages). To do this people often think they need two methods. One method to get the paged result set and another method to get the item count. If you did have to use two methods for that it does seem a little silly, but you don't. See my post about how to properly return a paged result set.

A related discussion is whether or not you should return IEnumerable<T> or IList<T>. Personally I think it really depends. IEnumerable is usually used for lazy loading, which means that the collection isn't populated until you iterate over it (such as in a foreach loop or by using certain LINQ methods like ToList()). This means that if you loop over data in an MVC view then the data is being retrieved at the time it's being rendered to the view. There are definitely edge-cases where this is useful, such as when accessing a navigation property on an entity object, but it's not really useful in the verbose repository pattern that I'm promoting. This is because I like my repository methods to be very specific which almost always means that whatever data I'm getting back is going to be used because I'm calling specific methods to get specifically what I want, instead of calling generic methods and then filtering through the results. So I always return IList and call the ToList() LINQ method after my query results in my repository method so that the query is executed and the data is loaded immediately. By the time my data hits my domain it is already in memory waiting to be consumed.

The bottom line is that returning IEnumerable is questionable at best, but returning IQueryable completely ruins your test-ability and destroys separation of concerns.