-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve final DB query "size" #145
Comments
i was aware of this problem, but since it didnt effect my data model i put it in the "too hard for now" basket. I have been planning on cracking open the EFCore codebase and see if i can create a new query API that can be programmatically constructed based on strings rather than expressions. The current EFCore API is purely based on expressions, which makes sense for top level code that has access to all the types. but it is horrible for anyone writing extensions. this has hit me a few times on diff projects and i was hoping to come up with a re-usable solution that doesnt involve the complexity of programmatically creating expressions |
As far as creating the expressions for simple properties, that part is trivial, it's just a MemberInit expression, with some Binds. Example: https://gist.github.com/ApocDev/5e6001d18083e78d0f852885ee99e797 The real work comes when you need to "subquery". I have run into an issue where I'm not sure how to get the "parent" from the selected field, and continue building the sub-select expressions to build up navigation fields, or collections. Lastly, custom resolvers are.... difficult. Possibly https://github.com/StefH/System.Linq.Dynamic.Core would be of help to dynamically build the queries more easily? |
I'd suggest using reflection to create an expression that is based off all the selected fields, using the resolve expression from the Field() method to populate the anonymous type (or object[] array, etc). Then it can be passed to EF as a single Select(). The issue is that some resolve functions MUST run client side (vs in SQL server). EF Core currently allows for automatic detection of what needs to run client-side, and EF Core 3.0 will also allow code in the Select expression to run client side. So this should be feasible. This would be way cool, but it does seem like a lot of work. It would be most particularly useful when adding specialized indexes for tables which have included columns. (Note: creating an index with included columns natively are a new feature of EF Core 2.2, although can be done in prior versions manually) Here's another idea: what if you added an includedFields property to Field (call it AddField), and then upon execution, generate a Select function with only those fields populated? Then the resolve functions can still run client side. Sorta a middle ground approach that would likely be easier to implement and more reliable than passing all the resolve functions to EF. Still have to figure out the subqueries.... |
@SimonCropp are you thinking about GraphQL to SQL parser? |
@fisapras nope. more like a "graphql to (EF compat) expression parser" |
On this topic I think try to use the same approach of Automapper to create the projection expression (thought for Odata) can give some result. If I have some time, I'll test. |
anyone want to have a go at a PR for this? |
I can’t, but I will say that it is possible to dynamically create select arguments which create dictionaries for the selected members, but it would require a rewrite of the entire project except the argument parsing code. Also, performance with EF 3.0 is poor with certain queries due to cartesian results being returned. This is easier patched with the existing codebase than with dynamically generated queries, due to EF auto stitching tracked objects. No patching is necessary with EF 2.x because it splits up queries as necessary, albeit sometimes quite inefficiently. |
ok closing this for now. always happy to consider PRs that improve the query size |
I tried using |
@thesobercoder happy to consider a PR. |
It can be written without |
@ApocDev did you get your gist working? I'm looking into this in my project and that looks pretty sound. |
Any movement on this from the folks in the room closely familiar with EF Core's workings? This hit me hard in testing, pulling a single column from a few million rows on a wide table definitely made things slow down... Can the queries that this generates be setup for projection in some manner? Or failing that, customized for certain tables to manually avoid this sort of issue? |
If you want to work on this feature, I can help if you have questions. I guarantee you that it is possible to add this feature in the manner I described above. (I am unfamiliar with the Automapper technique or if that even works.) |
Honestly, between not being deeply familiar with EF Core's innards, and this projects code being somewhat obfuscated I wouldn't want to work on it. I could dig into EF Core, but struggling through this codebase is my hard stop, I've already been referencing it to sort out some of it's ambiguity and it's been... frustrating. (Not to rag on you Simon! Just calling out why it is undesirable, a clean and commented codebase is MUCH easier to contribute to) Aside from that projection isn't just an automapper thing, It's a native EF thing! var authors = context.Authors
.Select(author => new
{
Id = author.AuthorId,
FullName = author.FirstName + " " + author.LastName,
}).ToList(); Will only generate the query for the Edit: This can also project to say a DTO rather than an anonymous type. |
It's not so much about knowing EF Core as it is working with expression trees. Once you complete the code it would work just as well with linq2db as it would with EF, as it's just iterating through the requested graph and building a |
@Shane32 See my updated comment. EF already does this for you, that may influence the proposed solution? |
The trick is getting EF to know what fields you want and don't want. If your graphs only use a small subset of the available database fields, you could just write a dbcontext that only maps the fields you actually use. This would make it so that while it still returns all the fields for any table, "all" is only a small subset that you're interested in. |
Another thing is if you skip the ability to use "Includes", a main feature of this library, and instead always rely on dataloaders, then with a lot less work you can probably write some code that examines what fields are required and pulls those fields only. I have not thought it all the way through, but it should be a lot less complex than a complete overhaul that builds expression trees. |
That's a workable workaround. As for getting EF to know which fields you want, could we piggy back off of the behavior that already exists with projection? I have never delved into this, can anonymous types be defined at runtime that are selected? Good callout that piggying off that pmuch eliminates the ability to perform |
This isn't EF specific, mind you -- EF simply looks at the |
Anonymous types cannot be defined at runtime. (Well, it's possible, but only by loading a dynamically-built assembly at runtime. Not for the faint of heart.) For my implementation, I have it return a dictionary with keys as graphql field names. So when querying:
It produces the following select expression:
And then your field resolver pulls from the dictionary like this:
Of course this all gets built automatically, but that's what it boils down to |
If you requested |
It sounds like a lot of extra cpu cycles happening. That's true, but it can save a lot of time if you are using a lot of "wide" tables from your data source. A bit of a balance game here. Good news is that the strings are constants, so they are never encoded more than once - it's just copying references to them all over the place. |
I would like to use arrays instead of dictionaries, but then the field resolvers would need to know which index in the array is necessary to return the correct data... |
I didn't consider just projecting to a dictionary, that's a pretty obvious stepping stone! As for extra cycles, wouldn't be a problem if it's an optional feature, use it when you need otherwise it operates like normal. Where this loses me is how to actually define the entries for the dictionary at runtime, I've dabbled in expression trees for dynamically building queries before, but it was years ago, and I distinctly remember it being a nightmare. |
And that's why it took me two months of intense effort to write our implementation... |
Hmm...re-reading the original issue, it seems I was just having a problem with navigation entries. Well, test it and see, I guess! |
A new codebase often seems confusing when u first look at it. the problem with this project is it deal primarily with reflection and expression construction. it is not code that equates well with simplicity and ease of readability. When i find a new codebase new or daunting, i generally assume it is due to my lack of context, not a fault of the code. But that is just me. If you find ways to make it more readable, i am happy to consider pull requests. Also would your comment have been any less effective by omitting your critical options of the code quality As for |
Good point @SimonCropp -- you can easily spend as much time commenting as you do writing the code. The more you know the codebase, the less comments are necessary. So it really becomes quite difficult to know how many comments to put in. I find that having xml comments on public members are key to helping others understand the code -- sometimes even for private members. I've been working on the graphql-dotnet repo to help add xml comments there, but it has a long way to go. After that, so long as classes, methods and variables are named logically, as @SimonCropp says, limited comments usually are adequate, understanding of course that there will always be a learning curve for someone new to the codebase. |
I've separated my reply to another thread to keep this topic on track. @SimonCropp to make it clear I'm showing care and criticality for this because I want to contribute to this codebase. Not because I get off on being critical. |
Preface: I know very little about GraphQL.Net I decided to try and switch from REST endpoint hell to GraphQL this week. Please excuse me if some things are obvious on that side, and I'm just missing it. Alright, so, messed around a bit with this and got a built a working lambda that can be tossed into the I ran into a couple upstream issues, some resolved (noting them here for clarification). Shane may have words for some of this?
|
class EfSource<T> : Dictionary<string, object>
{
//add default constructors
public EfSource(...) : base(...) { }
}
class PersonType : ObjectGraphType<EfSource<T>>
{
public PersonType()
{
Field("id", resolve: context => context.Source["id"], graphType: typeof(NonNullGraphType<IdGraphType>));
Field("name", resolve: context => context.Source["name"], graphType: typeof(StringGraphType));
}
} You could of course write code to use reflection to examine each model type and create matching graph type fields. I chose to be explicit about the graph type members. |
Keep in mind that the dictionary becomes the source for the graph type, not the graph type itself. So your main query will look something like this: class Query : ObjectGraphType
{
public Query()
{
Field("people", resolve: async context => {
// examine the IResolveFieldContext to determine which fields were queried
// build the select query (using EfSource<Person> as the return type, not Dictionary<string, object>)
// execute the query and return a list of dictionaries
IEnumerable<EfSource<Person>> ret = await db.People.Select(selectQuery).ToListAsync();
return ret;
}, graphType: typeof(ListGraphType<PersonType>));
}
} You can of course just use class EfObjectType<T> : ObjectGraphType<Dictionary<string, object>> //or ObjectGraphType<EfSource<T>>
{
public EfObjectType() {
//build fields dynamically by using reflection on typeof(T)
}
}
class PersonType : EfObjectType<Person>
{
} Edit: or with the auto generating sample, you could use the generic type on the field type, so in the above example: Field(..., graphType: typeof(EfObjectType<Person>)); |
Damn Shane, it's late! Ohh, the picture is starting to come together nicely now. The Taking that a step further with the Of course, I see some serious performance concerns with using reflection so much, especially when building the lambda. However, I'm 90% sure that I've solved this by caching before, and that I could do that here, just gotta find my old code. Either way, not a concern right now. I'll take what you've explained and work with that tomorrow evening and see if I can get something that works end-to-end. Idk if I can properly integrate it into this lib once it works, but once I get it to a working and clean state I can present it at the very least. |
Good news is that the graph types are singletons, and so the reflection building the fields only happens once. Your performance/reflection considerations are going to come into play while building the select lambda. Caching will help of course. You can of course cache an entire lambda if you can create a cache key that's based on the selected fields for the specific model. |
I should add lambda caching to my code... lol |
Yeah, the potential variations in fields may hose that up I think. Each difference in selected fields will result in a different cache That or the warm up period is negligible in the real world as common queries would be warmed up fast anyways, and infrequent ones would only be slow once. Edit: This is exciting, I just wanna hop on this now, but sleep is important. I'll check back in when I've made more progress towards a working solution! Thanks for all the help this far. |
In practice, I've found the speed of the building the lambda to be insignificant compared to the overall execution time of the query. However, EF also needs to compile the select queries to SQL statements, and I believe it uses caching to speed this process up. Potentially if the lambda is cached, it may be able to identify a cache match (or quicker) vs otherwise. So there may be a EF Core speed benefit to caching the lambda. |
Hm. Issue I'm running into is that only So I believe I need to get an Edit: That or figure out how to get a working generated Seems related to https://stackoverflow.com/questions/47744663/how-to-use-func-in-a-linq-query-that-provide-an-iqueryable-output |
You for sure need to generate a |
That's what I thought, However any
That's what I thought as well, or rather that's what I expected. However, when i get the |
It will execute as soon as you start to enumerate the query. For instance: var query = db.People.Where(x => x.Name == "john doe");
// query is an IQueryable<Person> and has not been executed yet
var queryEnumerable = query.AsEnumerable();
// queryEnumerable is an IEnumerable<Person> and has not been exectued yet
var selectedEnumerable = queryEnumerable.Select(x => x.Id);
// selectedEnumerable is an IEnumerable<int> which still has not been executed yet
// anything that enumerates the query will trigger the db call, like these statements:
foreach (var id in selectedEnumerable) { ... }
var list = selectedEnumerable.ToList();
var hashset = selectedEnumerable.ToHashSet();
// in the above sample, the database actually executes 3 different times
// also, the query is processed at the last point that it was an IQueryable; so, the full Person row is returned for each record
// note that Where can also be an IEnumerable, so this sample here actually pulls EVERY record from the database:
var result = db.People.AsEnumerable().Where(x => x.Id == 345).Single(); |
It's telling you that EF Core cannot process a BlockExpression, which you copied from the sample you linked me to at stackoverflow. Use ListInitExpression instead, as I outlined here: #145 (comment) It will largely go together the same way, but basically, instead of writing an expression that represents this: var dic = new Dictionary<string, object>();
dic.Add("id", x => x.Id);
dic.Add("string", x => x.Name);
return dic; You instead want to write an expression that represents this: return new Dictionary<string, object> { { "id", x => x.Id }, { "name", x => x.Name } }; As you can see, the body of the individual expressions you're adding to the dictionary have not changed. It's just how you create and initialize the dictionary. |
I would suggest you try the following: Expression<Func<Person, Dictionary<string, object>>> lambda = x => new Dictionary<string, object> { { "id", x => x.Id }, { "name", x => x.Name } };
var test = db.People.Select(lambda).First(); Then verify that You may find this free tool handy for visualizing expressions: |
FYI, none of these solutions require the use of |
I was hoping I would be able to copy/paste/modify/fix my way through the Expressions portion of this. Looks like I'll have to properly sit down and relearn their fundamentals again and do this right. Unfortunately I cannot commit to that right now, I've got a project looming. I will come back around to this when I find a nice slow or free week, or if it turns into a more urgent problem in this project. Thanks for the advice and knowledge thus far. I'll be back around sometime in the near future! |
@douglasg14b any luck so far? I'm very interested in how this issue turns out |
I hope this issue can be improved. (dotnet/core#5852) |
happy to consider a PR |
To follow up on this being closed. This is no longer an issue? |
The current implementation of the "Query" fields in this library are potentially very heavy.
Each query that eventually gets built is the equivalent of a
SELECT * FROM mytable;
, instead of leveraging IQueryable and expression building to generate a "thin" select based on requested fields.This is bad news when tables can contain very large fields (think news articles, or large file data during a "summary" query).
Solution?
Build the select expressions dynamically via expression building to handle only selecting required fields from the database.
This means that the Navigation query system will need to basically be "rewritten" to account for the total change from using
.Include("Navigation")
to something more akin tonew { Navigation = new Navigation { ... } }
.Possibly this can be optional on a per-query basis?
We should be able to cache the select queries in some way based on the incoming query structure, so we only have a one-time build performance hit. (Though, larger query sets, could cause potential memory consumption issues?)
I have already written code locally to handle the base "select only this table" builder, when navigation properties are not used. This vastly speeds up DB access, and dramatically lowers bandwidth requirements at the same time.
Only question is, where/how would we approach the navigation property issue?
The text was updated successfully, but these errors were encountered: