While that GitHub Action works extremely well, the zero-setup approach has two drawbacks:
It is not possible to configure the test environment, for example by adding demo content or changing plugin configuration
It is not possible to test more complex scenarios, like any user interactions (e.g. for INP)
For (2) the best alternative right now is to go with the manual approach. For (1), I have now found a solution in WordPress Playground. Playground is a platform that lets you run WordPress instantly on any device. It can be seen as a replacement for the Docker-based @wordpress/env tool.
Using Blueprints for automated testing
One particular strength of WordPress Playground is the idea of Blueprints. Blueprints are JSON files for setting up your WordPress Playground instance. In other words, they are a declarative way for configuring WordPress—like a recipe. A blueprint for installing a specific theme and plugin could look like this:
The newly released version 2 of the performance testing GitHub Action now uses Blueprints under the hood to set up the testing environment and do things like importing demo content and installing mandatory plugins and themes. In addition to that, you can now use Blueprints for your own dedicated setup!
This way you can install additional plugins, change the site language, define some options, or even run arbitrary WP-CLI commands. There are tons of possible steps and also a Blueprints Gallery with real-world code examples.
To get started, add a new swissspidy/wp-performance-action@v2 step to your workflow (e.g. .github/workflows/build-test.yml):
The GitHub Action will now use your custom blueprint to install and activate your own custom plugin and performance-lab and akismet plugins from the plugin directory.
Alongside this new feature I also included several bug fixes for things I originally planned to add but never really finished. For instance, it is now actually possible to run the performance tests twice and then compare the difference between the results.
This way, when you submit a pull request you can run tests first for the main branch and then for your PR branch to quickly see at a glance how the PR affects performance. Here is an example:
jobs:
comparison:
runs-on: ubuntu-latest
steps:
# Check out the target branch and build the plugin# ...
- name: Run performance tests (before)
id: before
uses: ./
with:
urls: |
/
/sample-page/
plugins: |
./tests/dummy-plugin
blueprint: ./my-custom-blueprint.json
print-results: false
upload-artifacts: false# Check out the current branch and build the plugin# ...
- name: Run performance tests (after)
uses: ./
with:
urls: |
/
/sample-page/
plugins: |
./tests/dummy-plugin
blueprint: ./my-custom-blueprint.json
previous-results: ${{ steps.before.outputs.results }}
print-results: true
upload-artifacts: falseCode language:PHP(php)
The result will look a bit like this:
Playground is the future
Being able to use Playground for automated testing is really exciting. It simplifies a lot of the setup and speeds up the bootstrapping, even though the sites themselves aren’t as fast (yet) as when using a Docker-based setup. However, there is a lot of momentum behind WordPress Playground and it is getting better every day. Applications like this one further help push its boundaries.
Learn how to set up Playwright-based end-to-end performance testing for your own WordPress project.
Introduction
End-to-end (E2E) tests are a type of software testing that verifies the behavior of a software application from, well, end to end. They simulate an actual user interacting with the application to verify that it behaves as expected. E2E tests are important because they can help to identify and fix bugs that may not be caught by unit tests or other types of testing. Additionally, they can help to ensure that the application is performing as expected under real-world conditions, with real user flows that are typical for the application. This means starting an actual web server, installing WordPress, and interacting with the website through a browser. For example, the majority of the block editor is covered extensively by end-to-end tests.
Performance testing
Browser-based performance testing is a subset of this kind of testing. Such tests measure the speed and reactivity of the website in order to find performance regressions. This includes common metrics such as Web Vitals or page load time, but also dedicated metrics that are more tailored to your project. For instance, Gutenberg tracks things like typing speed and the time it takes to open the block inserter.
Both WordPress core and Gutenberg use Playwright for end-to-end and performance tests. It supports multiple browsers and operating systems, and provides great developer experience thanks to a resilient API and powerful tooling. If you know Puppeteer, Playwright is a forked and enhanced version of it. The WordPress project is actually still undergoing a migration from Puppeteer to Playwright.
This article shows how to set up Playwright-based end-to-end tests for your own project, with a focus on performance testing. To familiarize yourself with how Playwright works, explore their Getting Started guide. Would you like to jump straight to the code? Check out this example project on GitHub! It provides a ready-to-use boilerplate for Playwright-based performance tests that you can add to your existing project.
Before diving right into the details of writing performance tests and fiddling with reporting, there is also a shortcut to get your feet wet.
Most of what I cover in this article is also available in a single, ready-to-use GitHub Action. You can easily add it to almost any project with little to no configuration. Here’s an example of the minimum setup needed:
Using this action will spin up a new WordPress installation, install your desired plugins and themes, run Playwright tests against the provided pages on that site, and print easy to understand results to the workflow summary.
This one-stop solution allows you to quickly get started with performance testing in a WordPress context and helps to familiarize yourself with the topic. It might even cover all of your needs already, which would be even better! Another big advantage of such a GitHub Action is that you will automatically benefit from new changes made to it. And If you ever need more, continue reading below to learn how you can do it yourself.
Update (September 2024): check out my follow-up post about v2 of this GitHub Action using WordPress Playground.
Setting up Playwright tests for a WordPress plugin/theme
Reminder: if you want a head start on setting up Playwright tests, check out the example project on GitHub. It provides a ready-to-use boilerplate with everything that’s covered below.
This article assumes that you are developing WordPress blocks, plugins, themes, or even a whole WordPress site, and are familiar with the common @wordpress/scripts and @wordpress/env toolstack. The env package allows you to quickly spin up a local WordPress site using Docker, whereas the scripts package offers a range of programs to lint, format, build, and test your code. This conveniently includes Playwright tests! In addition to that, the @wordpress/e2e-test-utils-playwright package offers a set of useful helpers for writing Playwright tests for a WordPress project.
All you need to get started is installing these packages using npm:
Check out the @wordpress/env documentation on how to further configure or customize your local environment, for example to automatically install and activate your plugin/theme in this new WordPress site.
Note: if you already have @wordpress/env installed or use another local development environment, skip this step and use your existing setup.
To run the Playwright tests with @wordpress/scripts, use the command npx wp-scripts test-playwright. If you have a custom Playwright configuration file in your project root directory, it will be automatically picked up. Otherwise, provide the path like so:
In a custom config file like this one you can override some of the details from the default configuration provided by @wordpress/scripts. Refer to the documentation for a list of possible options. Most commonly, you would need this to customize the default test timeout, the directory where test artifacts are stored, or how often each test should be repeated.
Writing your first end-to-end browser test
The aforementioned utilities package hides most of the complexity of writing end-to-end tests and provides functionality for the most common interactions with WordPress. Your first test could be as simple as this:
When running npx wp-scripts test-playwright, this test visits /wp-admin/ and waits for the “Welcome to WordPress” meta box heading to be visible. And before all tests run (in this case there is only one), it ensures the Twenty-Twenty One theme is activated. That’s it! No need to wait for the page to load or anything, Playwright handles everything for you. And thanks to the locator API, the test is self-explanatory as well.
Locators are very similar to what’s offered by Testing Library in case you have used that one before. But if you are new to this kind of testing API, it’s worth strolling through the documentation a bit more. That said, the easiest way to write a new Playwright test is by recording one using the test generator. That’s right, Playwright comes with the ability to generate tests for you as you perform actions in the browser and will automatically pick the right locators for you. This can be done using a VS Code extension or by simply running npx playwright codegen. Very handy!
Setting up performance tests
The jump from a simple end-to-end test to a performance test is not so big. The key difference is performing some additional tasks after visiting a page, further processing the collected metrics, and then repeating all that multiple times to get more accurate results. This is where things get interesting!
First, you need to determine what you want to measure. When running performance tests with an actual browser, it’s of course interesting to simply measure how fast your pages load. From there you can expand to measuring more client-side metrics such as specific user interactions or Web Vitals like Largest Contentful Paint. But you can also focus more on server-side metrics, such as how long it takes for your plugin to perform a specific task during page load. It all depends on your specific project requirements.
Writing your first performance test
Putting all of these pieces together, we can turn a simple end-to-end test into a performance test. Let’s track the time to first byte (TTFB) as a start.
import { test } from'@wordpress/e2e-test-utils-playwright';
test.describe( 'Front End', () => {
test.use( {
storageState: {}, // User will be logged out.
} );
test.beforeAll( async ( { requestUtils } ) => {
await requestUtils.activateTheme( 'twentytwentyone' );
} );
const iterations = 20;
for ( let i = 1; i <= iterations; i++ ) {
test( `Measure TTFB (${ i } of ${ iterations })`, async ( {
page,
metrics,
} ) => {
await page.goto( '/' );
const ttfb = await metrics.getTimeToFirstByte();
console.log( `TTFB: ${ttfb}`);
} );
}
} );Code language:JavaScript(javascript)
What’s standing out is that Playwright’s storageState is reset for the tests, ensuring that tests are performed as a logged-out user. This is because being logged in could skew results. Of course for some other scenarios this is not necessarily desired. It all depends on what you are testing.
Second, a for loop around the test() block allows running the test multiple times. It’s worth noting that the loop should be outside the test and not inside. This way, Playwright can ensure proper test isolation, so that a new page is created with every iteration. It will be completely isolated from the other pages, like in incognito mode.
The metrics object used in the test is a so-called test fixture provided by, you guessed it, the e2e utils package we’ve previously installed. How convenient! From now on, most of the time we will be using this fixture.
Measuring all the things
Server-Timing
The Server-Timing HTTP response header is a way for the server to send information about server-side metrics to the client. This is useful to get answers for things like:
Was there a cache hit
How long did it take to load translations
How long did it take to load X from the database
How many database queries were performed
How much memory was used
The last ones are admittedly a bit of a stretch. Server-Timing is meant for duration values, not counts. But it’s the most convenient way to send such metrics because they can be processed in JavaScript even after a page navigation. For Playwright-based performance testing this is perfect.
In WordPress, the easiest way to add Server-Timing headers is by using the Performance Lab plugin. By default it supports exposing the following metrics:
wp-before-template: Time it takes for WordPress to initialize, i.e. from the start of WordPress’s execution until it begins sending the template output to the client.
wp-template: Time it takes to compute and render the template, which begins right after the above metric has been measured.
wp-total: Time it takes for WordPress to respond entirely, i.e. this is simply the sum of wp-before-template + wp-template.
Additional metrics can be added via the perflab_server_timing_register_metric() function. For example, this adds the number of database queries to the header:
Besides getServerTiming() and getTimeToFirstByte(), the metrics fixture provides a handful of other helpers to measure certain load time metrics to make your life easier:
getLargestContentfulPaint: Returns the Largest Contentful Paint (LCP) value using the dedicated API.
getCumulativeLayoutShift: Returns the Cumulative Layout Shift (CLS) value using the dedicated API.
getLoadingDurations: Returns the loading durations using the Navigation Timing API. All the durations exclude the server response time. The returned object contains serverResponse, firstPaint, domContentLoaded, loaded, firstContentfulPaint, timeSinceResponseEnd.
Some of these methods are mostly there because it’s trivial to retrieve the metrics, but not all of these might make sense for your use case.
Tracing
The metrics fixture provides an easy way to access Chromium’s trace event profiling tool. It allows you to get more insights into what Chrome is doing “under the hood” when interacting with a page. To give you an example of what this means, in Gutenberg this is used to measure things like typing speed.
// Start tracing.await metrics.startTracing();
// Type the testing sequence into the empty paragraph.await paragraph.type( 'x'.repeat( iterations ) );
// Stop tracing.await metrics.stopTracing();
// Get the durations.const [ keyDownEvents, keyPressEvents, keyUpEvents ] =
metrics.getTypingEventDurations();Code language:JavaScript(javascript)
In addition to getTypingEventDurations() there are also getSelectionEventDurations(), getClickEventDurations(), and getHoverEventDurations().
Lighthouse reports
The @wordpress/e2e-test-utils-playwright package has basic support for running Lighthouse reports for a given page. Support is basic because it only performs a handful of audits and does not yet allow any configuration. Also, due to the way Lighthouse works, it’s much slower than taking similar measurements by hand using simple JavaScript snippets. That’s because it does a lot of things under the hood like applying CPU and network throttling to emulate mobile connection speeds. Still, it can be useful to compare numbers and provide feedback to the folks working on this package to further improve it. A basic example:
This one line is enough to run a Lighthouse report, which involves opening a new isolated browser instance on a dedicated port for running tests in.
Interactivity metrics
The metrics fixture already provides ways to get some Web Vitals values such as First Contentful Paint (FCP), Largest Contentful Paint (LCP), and Cumulative Layout Shift (CLS). They cover loading performance and layout stability. Interaction to Next Paint (INP), a pending Core Web Vital metric that will replace First Input Delay (FID) in March 2024, is notably absent from that list. That’s because it’s not so trivial to retrieve, as it requires user interaction.
INP is a metric that assesses a page’s overall responsiveness to user interactions. It does so by observing the latency of all click, tap, and keyboard interactions that occur throughout the lifespan of a user’s visit to a page. The final INP value is the longest interaction observed, ignoring outliers. So how can you measure that reliably in an automated test? Enter the web-vitals library.
This library is the easiest way to measure all the Web Vitals metrics in a way that accurately matches how they’re measured by Chrome and reported to tools like PageSpeed Insights.
As of very recently (i.e. it’s not even released yet!), the metrics fixture has preliminary support for web-vitals.js and allows measuring web vitals using one simple method:
Under the hood, this will refresh the page while simultaneously loading the library and collecting numbers. This will measure CLS, FCP, FID, INP, LCP, and TTFB metrics for the given page and return all the ones that exist.
Again, metrics like INP require user interaction. To accommodate for that, separate the loading and collection part like so:
await metrics.initWebVitals( /* reload */false );
await page.goto( '/some-other-page/' ); // web-vitals.js will be loaded now.// Interact with page here...console.log( await metrics.getWebVitals() );Code language:JavaScript(javascript)
You may find that retrieving web vitals using this single method is easier than calling separate getLargestContentfulPaint() and getCumulativeLayoutShift() methods, though the reported numbers will be identical. In the future these methods may be consolidated into one.
Making sense of performance metrics
With the foundation for running performance tests set and all these functions to retrieve various metrics available, the next step is to actually collect all this data in a uniform way. What’s needed is a way to store results and ideally compare them with earlier results. This is so you are actually able to make sense of all these metrics and to identify performance regressions.
For this purpose, I’ve built a custom test reporter that takes metrics collected in tests and combines them all in one single file. Then, a second command-line script formats the data and optionally does a comparison as well. The reporter and the CLI script are both available on the demo GitHub repository, together with all the other example code from this article. Here’s an example output by this script:
Note: there is work underway to further refine these scripts and make them easier available through a dedicated npm package. Imagine a single package like @wordpress/performance-tests that provides the whole suite of tools ready to go! This is currently being discussed and I will update this post accordingly when something like this happens.
In a GitHub Action, you would run this combination of scripts in this order:
Start web server (optional, as Playwright will start it for you otherwise)
Run tests
Optionally run tests for the previous commit or target branch
The raw results are also available as a build artifact and a step output. This way you don’t have to unnecessarily run tests twice but can reuse previous results
Run the CLI script to format results and optionally compare them with the ones from step 3
Eventually you will get to a point where you want to see the bigger picture and track your project’s performance over time. For example using a dedicated dashboard such as the one WordPress core currently uses.
When doing so, you will inevitably need to deal with data storage and visualization, and things like variance between individual runs. These are not yet currently solved problems, both for WordPress projects but also in general. In a future blog post I plan to go a bit more in depth on this side of things and show you how to set this up for your project, allowing you to make more sense of performance metrics over time.
Conclusion
With the foundation from this blog post you should be able to start writing and running your first performance tests for your WordPress project. However, there is still a lot that can be covered and optimized, as performance testing can be quite a complex matter.
And of course please let me know your thoughts in the comments so the team and I can further refine the techniques shared in this post. Thanks for reading!