Why would GE run faster on a dataset with a lot of rows compared to a dataset with fewer rows? #5577

cosgroveblue · 2022-07-23T21:25:08Z

cosgroveblue
Jul 23, 2022

Do certain expectations tend to take longer than others?

austiezr · 2022-08-02T15:53:54Z

austiezr
Aug 2, 2022

Hey @cosgroveblue ! Thanks for reaching out. There are a few possibilities.

Whenever possible, GE pushes compute back onto your backend (Spark instance, database, etc.) Instances where that isn't possible, for example when working with local data using the Pandas Execution Engine, may see slower compute when compared to a similar operation taking advantage of greater compute available to your backend.

Additionally, a given expectation may potentially:

require multiple roundtrips to your datasource
rely on multiple metrics to be calculated
include a particularly compute-heavy step

Any of which may cause an expectation to take longer than another which doesn't have those requirements.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why would GE run faster on a dataset with a lot of rows compared to a dataset with fewer rows? #5577

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Why would GE run faster on a dataset with a lot of rows compared to a dataset with fewer rows? #5577

cosgroveblue Jul 23, 2022

Replies: 1 comment

austiezr Aug 2, 2022

cosgroveblue
Jul 23, 2022

austiezr
Aug 2, 2022