Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy over complex key generates redundant subquery #30113

Open
Tracked by #30173
alexb5dh opened this issue Jan 22, 2023 · 3 comments
Open
Tracked by #30173

GroupBy over complex key generates redundant subquery #30113

alexb5dh opened this issue Jan 22, 2023 · 3 comments

Comments

@alexb5dh
Copy link

alexb5dh commented Jan 22, 2023

Summary

In case anything different from property or navigation access is performed in GroupBy method, the generated query will add an additional subquery. Probably to avoid repeating SQL expression used in group by.

Example

Having the following context:

public class Entity
{
    public int Id { get; private set; }
}

public class AppContext: DbContext
{
    public DbSet<Entity> Entities { get; private set; }
    
    //...
}

and perfoming query:

await context.Entities
    .GroupBy(e => e.Id.ToString())
    .Select(g => g.Key)
    .ToListAsync()

version 7.0.0-preview.5.22302.2 (and later, up to 7.0.2 as of the time of writing) generates this SQL:

SELECT [t].[Key]
FROM (
    SELECT CONVERT(varchar(11), [e].[Id]) AS [Key]
FROM [Entities] AS [e]
    ) AS [t]
GROUP BY [t].[Key]

while version 7.0.0-preview.4.22229.2 (and before) does querying via:

SELECT CONVERT(varchar(11), [e].[Id])
FROM [Entities] AS [e]
GROUP BY CONVERT(varchar(11), [e].[Id])

Changing the GroupBy expression to e => e.Id (or navigation property access ) gets rid of subquery in all cases, yet using something as simple as e => 1 still generates one.

MRE is attached: GroupBySubquery.zip. Just replace the connection string with the one you use.

Сonclusion

The first option seems to generally have a worse execution plan, especially as the main query grows more complex.

I'm not sure if this was an intentional change or an introduced bug, but in any case, user needs to have the option to revert to an old behaviour, as this can affect query performance quite noticeably.

In similar looking issues I found a suggested workaround of adding

AppContext.SetSwitch("Microsoft.EntityFrameworkCore.Issue27102", true);

but this doesn't seem to affect this case.

Environment

EF Core version: 7.0.2, 7.0.0-preview.5.22302.2
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET 7.0
Operating system: Windows

@ajcvickers
Copy link
Member

/cc @maumar @roji

@georg-jung
Copy link

georg-jung commented Jan 31, 2023

I guess this is related to https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-7.0/whatsnew#groupby-entity-type. This is a major performance decrease in our app too. What took 1 or 2 seconds on EF Core 6 now times out after >1min in production. If there is no fix for this it is probably required for us to downgrade our production app back to ef core 6.

We do something similar to (this is pseudo code, I simplified a more complicated query):

var qAgg = from x in db.B
           group x by new { x.Process.Element.Customer.Kind }
           into g
           select new
           {
               Kind = g.Key.Kind,
               Region1Rule1 = g.Count(x => x.Region1 && x.Process.Rule == Rule.Rule1),
               Region2Rule1 = g.Count(x => x.Region2 && x.Process.Rule == Rule.Rule1),
               Region3Rule1 = g.Count(x => x.Region3 && x.Process.Rule == Rule.Rule1),
               Region4Rule1 = g.Count(x => x.Region4 && x.Process.Rule == Rule.Rule1),
               Region1Rule2 = g.Count(x => x.Region1 && x.Process.Rule == Rule.Rule2),
               Region2Rule2 = g.Count(x => x.Region2 && x.Process.Rule == Rule.Rule2),
               Region3Rule2 = g.Count(x => x.Region3 && x.Process.Rule == Rule.Rule2),
               Region4Rule2 = g.Count(x => x.Region4 && x.Process.Rule == Rule.Rule2),               
           };

On EF Core 6 this used to be translated to something like (we're using postgres)

SELECT 
    e.kind AS "Kind",
    COUNT(CASE WHEN b.region1 AND (z1.rule = 1) THEN 1 END)::INT AS "Region1Rule1", 
    COUNT(CASE WHEN b.region2 AND (z1.rule = 1) THEN 1 END)::INT AS "Region2Rule1", 
    COUNT(CASE WHEN b.region3 AND (z1.rule = 1) THEN 1 END)::INT AS "Region3Rule1",
    COUNT(CASE WHEN b.region4 AND (z1.rule = 1) THEN 1 END)::INT AS "Region4Rule1",
    COUNT(CASE WHEN b.region1 AND (z1.rule = 2) THEN 1 END)::INT AS "Region1Rule2", 
    COUNT(CASE WHEN b.region2 AND (z1.rule = 2) THEN 1 END)::INT AS "Region2Rule2", 
    COUNT(CASE WHEN b.region3 AND (z1.rule = 2) THEN 1 END)::INT AS "Region3Rule2",
    COUNT(CASE WHEN b.region4 AND (z1.rule = 2) THEN 1 END)::INT AS "Region4Rule2"
FROM b
INNER JOIN z /* ...*/
INNER JOIN e /* ...*/
INNER JOIN z1 /* ...*/
GROUP BY e.kind

On EF Core 7 every single COUNT is translated to a subquery, resulting in a huge and repetitive query that takes forever to complete.

If there is any more info required I'm happy to work on this further. Providing my production example is not feasible because the queries are huge and customer-specific. I hope that my simplified examples together with @alexb5dh's explanation and example code provide enough info to further investigate.

@alexb5dh
Copy link
Author

alexb5dh commented Mar 24, 2023

After some additional search through EF issues, this one seems to be tightly related: #29593.
Workaround from there also seems to work for now - the main trick is to add Distinct to "cheat" new EF behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants