Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf testing: Cover Site Editor loading time #23842

Merged
merged 15 commits into from
Jul 16, 2020
Merged
2 changes: 1 addition & 1 deletion .github/workflows/performance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,4 @@ jobs:
npm ci

- name: Run the performance tests
run: ./bin/plugin/cli.js perf --ci $GITHUB_SHA master
run: ./bin/plugin/cli.js perf --ci $GITHUB_SHA master --tests-branch $GITHUB_SHA
4 changes: 4 additions & 0 deletions bin/plugin/cli.js
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ program
.command( 'performance-tests [branches...]' )
.alias( 'perf' )
.option( '-c, --ci', 'Run in CI (non interactive)' )
.option(
'--tests-branch <branch>',
"Use this branch's performance test files"
)
.description(
'Runs performance tests on two separate branches and outputs the result'
)
Expand Down
49 changes: 38 additions & 11 deletions bin/plugin/commands/performance.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ const config = require( '../config' );
/**
* @typedef WPPerformanceCommandOptions
*
* @property {boolean=} ci Run on CI.
* @property {boolean=} ci Run on CI.
* @property {string=} testsBranch The branch whose performance test files will be used for testing.
*/

/**
Expand Down Expand Up @@ -119,13 +120,15 @@ function curateResults( results ) {
*
* @param {string} performanceTestDirectory Path to the performance tests' clone.
* @param {string} environmentDirectory Path to the plugin environment's clone.
* @param {string} testSuite Name of the tests set.
* @param {string} branch Branch name.
*
* @return {Promise<WPFormattedPerformanceResults>} Performance results for the branch.
*/
async function getPerformanceResultsForBranch(
performanceTestDirectory,
environmentDirectory,
testSuite,
branch
) {
// Restore clean working directory (e.g. if `package-lock.json` has local
Expand All @@ -147,13 +150,13 @@ async function getPerformanceResultsForBranch(
const results = [];
for ( let i = 0; i < 3; i++ ) {
await runShellScript(
'npm run test-performance',
`npm run test-performance -- packages/e2e-tests/specs/performance/${ testSuite }.test.js`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems to be the change that's causing issues in the Github action. We're changing the API of npm run test-performance. Can't we just leave it as is? basically run all performance tests when you do npm run test-performance? This is documented already in several places and I'm not sure we should be breaking this assumption?

Alternatively, we could have:

npm run test-performance // runs post editor tests
npm run test-performance --suite="posteditor" // runs post editor tests
npm run test-performance --suite="site-editor" // runs site editor tests

This will ensure we don't break the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's only one side of the medal though, since master only produces one results.json file for the post editor only, whereas this branch renames that file to post-editor.results.json, and writes the site editor results to site-editor.results.json.

In fact, an earlier version of this PR had npm run test-performance (with no added args), and that caused the GH action to fail when looking for post-editor.results.json, since master had only produced results.json.

I'm not sure we can retain API stability this way, and add test coverage for a different test suite. We'd be extremely limited in our options, and things might end up rather hacky (something like adding site-editor specific fields to results.json, while retaining the post-editor ones without even renaming them -- sounds like it could become quite messy and hard to understand/maintain/extend quickly).

Overall, I'm not convinced that that's the right approach; the requirement of this kind of API stability basically means we can never significantly change perf tests, and that's just a really tough constraint, since it's really hard to anticipate whatever needs might arise in the future.

Wouldn't it make sense that a branch that changes perf tests uses those tests to run on both that branch, and master?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm you're right, I can see that. I guess an alternative here could be to make ./bin/plugin/cli.js take an optional "branch" where if specified, we run the perf tests from that branch instead of "master" and we could use that argument in the github action? Any thoughts on that?

Regardless, though, if we change the signature of npm run test-performance we should make sure to update the docs, having a --suite option or something like that is a bit nicer than having to pass the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, still a few more questions 😅

Mmm you're right, I can see that. I guess an alternative here could be to make ./bin/plugin/cli.js take an optional "branch" where if specified, we run the perf tests from that branch instead of "master" and we could use that argument in the github action? Any thoughts on that?

This is actually an interface where I am wondering if we need to add more options: Do we need the extra argument? Doesn't it make sense to always use the branch's tests? For most branches, those will be identical to master's anyway, so it doesn't really make a difference there. But if a branch does modify them, we surely wanna use them, no?

Regardless, though, if we change the signature of npm run test-performance we should make sure to update the docs, having a --suite option or something like that is a bit nicer than having to pass the file.

I don't mind adding that. Personally, I actually like the 'pass-through' approach (i.e. the npm ... -- ... syntax), mostly for its discoverability -- I didn't really look at any code but just tried it, and it did the right thing. I find that quite intuitive (and it's always the first thing I try when needing to pass a filename to an npm script). I prefer this uniform syntax over the 'nice' one, but I'm happy to go along with the 'nice' one.


Finally: To make the actual change, we'll have to insert the branch's test files into the wp-env container. As I said, this might be kinda painful (or even near-impossible currently) to test on my Linux box, where there's always some permissions issue when attempting to add or change files in those Docker images (and I haven't been able to fully track down that issue for a while -- wp-env works for most purposes for me, just not for stuff that requires write access, e.g. uploading media files). My alternative suggestion would be to merge without those changes, verify that the new perf test behavior works fine (and revert if it doesn't), and tackle that perf test file injection in a separate PR. Would you be okay with that? 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the extra argument?

We need to run the same tests across branches ./bin/plugin/cli.js branchA branchB runs the master tests in the environment built using branchA and branchB and compare both.

I was suggesting something like ./bin/plugin/cli.js branchA branchB --testsBranch branchC to run the branchC tests in the environment built using branchA and branchB and compare both.

In the Github action BranchC will be equal to branchA though but that's fine.

--suite config

I don't feel strongly here, the important thing is that the docs stay correct whichever choice you make.

Finally: To make the actual change, we'll have to insert the branch's test files into the wp-env container.

I think it's just a matter of calling git checkout branchC at the beginning of the perf command (when setting up the perf environment directory)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rare situations where we had to make something work in different branches differently came down so far, to selector changes to open the inserter but that doesn't have an impact on any metric.

The metric is simple: - type a character, click on a block, load the page... These are the things that need to be computed similarly across branches and these things don't differ depending on the branch the test is used for. So for me personally, I think the testsBranch argument is doing its work properly so far.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take your word for it; I don't understand what that work properly is, other than there was a PR where filenames changed and something crashed because of the lack of a tests branch. Thanks for helping clarity this! It's not a big problem that I'm seeing; was mostly just trying to remove things I don't think we need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, if we change code that break performance tests, we need to update in the same PR the tests to run both with the new updated code but we also need to ensure the tests continue to run properly if applied to trunk (previous code).

@youknowriad I'm coming back to this and trying to better understand what you wrote here; I'm wondering if you are saying that you suspect that test changes will work in a PR but then fail when run on trunk.

One thing I discovered while working on our test suites is that we don't run tests against the PR's branch, but the "PR's merge branch." The merge branch is a branch which tracks the PR but every time we make a commit it creates a hypothetical merge of that branch tip into the target branch (usually trunk).

What this means is that if in a PR we make changes to a test suite, then when those changes run in CI and we're relying on $GITHUB_SHA (as we do), then the tests already incorporate the issues of merging back into trunk. If the new test files would break in trunk they would also break on the $GITHUB_SHA.

I know that at that point in that PR we're comparing two different test suites, but by providing --tests-branch we're changing the equation from "the results in the PR comparing trunk to the changes are invalid" into "the results in trunk comparing one commit to the previous commit are invalid."

Since we only report performance changes in mainline trunk I'm still struggling to see how --tests-branch helps us achieve the goal we say it does.

cc: @ockham

But also, say we change the way we compute some metric "typing" or something like that, if we decide to run different tests on different branches (say the branches tests), the results will be completely different between the branches so comparing them wouldn't make sense anymore.

This is another thing I think I've been uncovering as a hidden assumption, that our results are consistent within test runs. Even with running each branch 3x I'm seeing routine variation between the branches on the perf tests, frequently hitting more than 2x difference when no difference exists. I'm going to start trying to gather some statistics on the values from individual PR runs so we can discuss the facts of the measurements we're making, but I know already for a fact that the way we talk about these measurements in the PRs doesn't match the reality I've seen.

For example, I make a change to README.md and in the performance tests (after 3x runs and taking the median) it shows that my branch loads the editor in half the time as it does in trunk and yet trunk is 2x faster when typing. The test code is the same, the code under test is the same, but our test runtime variance is so much wider than the time it takes to run the code we're testing that our results are probably mostly random variation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I make a change to README.md and in the performance tests (after 3x runs and taking the median) it shows that my branch loads the editor in half the time as it does in trunk and yet trunk is 2x faster when typing. $

If this is happening for any of the metrics we track in https://codehealth.vercel.app then it's a problem yes. But in my experience I only saw it in site editor metrics. These are more recent and we didn't work on their stability yet so I don't think we can trust them just yet. It's just a vague indication for now. But for the post editor metrics tracked on the site, I've not seen any major stability issues like that. I do see small variances (you can look at the graphs) but any important change has correlated well with an impactful commit.

Copy link
Member

@dmsnell dmsnell Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for some context, @youknowriad, I'm trying to start gathering data on this in #45747 and in at least the first batch of 30 test runs I found that 22 of them showed results at least as different on the type event (in the post editor) as the reported change from 14.4 to 14.5 was - 10 of which were more than that many ms faster, and 12 of which were more than that many ms slower.

How are you measuring variance and correlation? I have to be missing something because it looks like the data is explained primarily by random variation. That is, if there is an impact, I'd like to know how you were able to pull it out of the noise caused by random variation which seems to have a bigger impact than any reported change.

In the meantime I'll spin up a PR that specifically tests that release and see if I can reproduce the measurements you found.

performanceTestDirectory
);
const rawResults = await readJSONFile(
path.join(
performanceTestDirectory,
'packages/e2e-tests/specs/performance/results.json'
`packages/e2e-tests/specs/performance/${ testSuite }.test.results.json`
)
);
results.push( curateResults( rawResults ) );
Expand Down Expand Up @@ -198,6 +201,19 @@ async function runPerformanceTests( branches, options ) {

log( '>> Cloning the repository' );
const performanceTestDirectory = await git.clone( config.gitRepositoryURL );

if ( !! options.testsBranch ) {
log(
'>> Fetching the ' +
formats.success( options.testsBranch ) +
' branch'
);
await git.checkoutRemoteBranch(
performanceTestDirectory,
options.testsBranch
);
}

const environmentDirectory = getRandomTemporaryPath();
log(
'>> Perf Tests Directory : ' +
Expand All @@ -220,21 +236,32 @@ async function runPerformanceTests( branches, options ) {
log( '>> Starting the WordPress environment' );
await runShellScript( 'npm run wp-env start', environmentDirectory );

/** @type {Record<string, WPFormattedPerformanceResults>} */
const testSuites = [ 'post-editor', 'site-editor' ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an argument instead of running all tests with default to post-editor? or do you think we should be running all tests everytime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the function body of runPerformanceTests, which is the top-level export of this file (invoked by bin/plugin/cli.js). If we want to make this an argument, we'd basically have to make it one that's passed by the CLI.

I'm leaning towards always running both perf suites. Their main use case is CI (the GH Action), as this is the most reliable guard against perf regressions, so it should IMO be easy to monitor that. Furthermore, the Site Editor test only covers loading time, which is currently pretty short. We can revisit if we add more coverage and testing times grow too much.


/** @type {Record<string,Record<string, WPFormattedPerformanceResults>>} */
const results = {};
for ( const branch of branches ) {
results[ branch ] = await getPerformanceResultsForBranch(
performanceTestDirectory,
environmentDirectory,
branch
);
for ( const testSuite of testSuites ) {
results[ testSuite ] = {};
for ( const branch of branches ) {
results[ testSuite ][
branch
] = await getPerformanceResultsForBranch(
performanceTestDirectory,
environmentDirectory,
testSuite,
branch
);
}
}

log( '>> Stopping the WordPress environment' );
await runShellScript( 'npm run wp-env stop', environmentDirectory );

log( '\n>> 🎉 Results.\n' );
console.table( results );
for ( const testSuite of testSuites ) {
log( `\n>> ${ testSuite }\n` );
console.table( results[ testSuite ] );
}
}

module.exports = {
Expand Down
11 changes: 7 additions & 4 deletions packages/e2e-tests/config/performance-reporter.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
* External dependencies
*/
const { readFileSync, existsSync } = require( 'fs' );
const path = require( 'path' );
const chalk = require( 'chalk' );

function average( array ) {
Expand All @@ -17,14 +18,16 @@ const title = chalk.bold;
const success = chalk.bold.green;

class PerformanceReporter {
onRunComplete() {
const path = __dirname + '/../specs/performance/results.json';
onTestResult( test ) {
const dirname = path.dirname( test.path );
const basename = path.basename( test.path, '.js' );
const filepath = path.join( dirname, basename + '.results.json' );

if ( ! existsSync( path ) ) {
if ( ! existsSync( filepath ) ) {
return;
}

const results = readFileSync( path, 'utf8' );
const results = readFileSync( filepath, 'utf8' );
const { load, domcontentloaded, type, focus } = JSON.parse( results );

if ( load && load.length ) {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* External dependencies
*/
import { join } from 'path';
import { basename, join } from 'path';
import { existsSync, readFileSync, unlinkSync, writeFileSync } from 'fs';

/**
Expand Down Expand Up @@ -71,7 +71,7 @@ function getSelectionEventDurations( trace ) {

jest.setTimeout( 1000000 );

describe( 'Performance', () => {
describe( 'Post Editor Performance', () => {
it( 'Loading, typing and selecting blocks', async () => {
const results = {
load: [],
Expand Down Expand Up @@ -182,8 +182,10 @@ describe( 'Performance', () => {
const [ focusEvents ] = getSelectionEventDurations( traceResults );
results.focus = focusEvents;

const resultsFilename = basename( __filename, '.js' ) + '.results.json';

writeFileSync(
__dirname + '/results.json',
join( __dirname, resultsFilename ),
JSON.stringify( results, null, 2 )
);

Expand Down
80 changes: 80 additions & 0 deletions packages/e2e-tests/specs/performance/site-editor.test.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
/**
* External dependencies
*/
import { basename, join } from 'path';
import { writeFileSync } from 'fs';

/**
* Internal dependencies
*/
import { useExperimentalFeatures } from '../../experimental-features';

/**
* WordPress dependencies
*/
import { trashAllPosts, visitAdminPage } from '@wordpress/e2e-test-utils';
import { addQueryArgs } from '@wordpress/url';

jest.setTimeout( 1000000 );

describe( 'Site Editor Performance', () => {
useExperimentalFeatures( [
'#gutenberg-full-site-editing',
'#gutenberg-full-site-editing-demo',
] );

beforeAll( async () => {
await trashAllPosts( 'wp_template' );
await trashAllPosts( 'wp_template_part' );
} );
afterAll( async () => {
await trashAllPosts( 'wp_template' );
await trashAllPosts( 'wp_template_part' );
} );

it( 'Loading', async () => {
const results = {
load: [],
domcontentloaded: [],
type: [],
focus: [],
};

visitAdminPage(
'admin.php',
addQueryArgs( '', {
page: 'gutenberg-edit-site',
} ).slice( 1 )
);

let i = 1;

// Measuring loading time
while ( i-- ) {
await page.reload( { waitUntil: [ 'domcontentloaded', 'load' ] } );
const timings = JSON.parse(
await page.evaluate( () =>
JSON.stringify( window.performance.timing )
)
);
const {
navigationStart,
domContentLoadedEventEnd,
loadEventEnd,
} = timings;
results.load.push( loadEventEnd - navigationStart );
results.domcontentloaded.push(
domContentLoadedEventEnd - navigationStart
);
}

const resultsFilename = basename( __filename, '.js' ) + '.results.json';

writeFileSync(
join( __dirname, resultsFilename ),
JSON.stringify( results, null, 2 )
);

expect( true ).toBe( true );
} );
} );