Tag: Playwright

  • Automated testing using WordPress Playground and Blueprints

    Automated testing using WordPress Playground and Blueprints

    Learn how to leverage WordPress Playground and Blueprints for automated end-to-end browser and performance testing.

    Late last year I published a detailed tutorial for getting started with end‑to‑end performance testing in WordPress. It was accompanied by a template GitHub repository and a dedicated GitHub Action for effortlessly running performance tests with zero setup.

    Introductory blog post to browser-based performance testing in WordPress projects

    While that GitHub Action works extremely well, the zero-setup approach has two drawbacks:

    1. It is not possible to configure the test environment, for example by adding demo content or changing plugin configuration
    2. It is not possible to test more complex scenarios, like any user interactions (e.g. for INP)

    For (2) the best alternative right now is to go with the manual approach. For (1), I have now found a solution in WordPress Playground. Playground is a platform that lets you run WordPress instantly on any device. It can be seen as a replacement for the Docker-based @wordpress/env tool.

    A playground set with a slide and a stairway. There are trees in the background.
    A real playground but just as fun! Photo by Hudson Roseboom on Unsplash

    Using Blueprints for automated testing

    One particular strength of WordPress Playground is the idea of Blueprints. Blueprints are JSON files for setting up your WordPress Playground instance. In other words, they are a declarative way for configuring WordPress—like a recipe. A blueprint for installing a specific theme and plugin could look like this:

    {
    	"steps": [
    		{
    			"step": "installPlugin",
    			"pluginZipFile": {
    				"resource": "wordpress.org/plugins",
    				"slug": "performance-lab"
    			}
    		},
    		{
    			"step": "installTheme",
    			"themeZipFile": {
    				"resource": "wordpress.org/themes",
    				"slug": "twentytwentyone"
    			}
    		}
    	]
    }Code language: JSON / JSON with Comments (json)

    Performance testing 2.0

    The newly released version 2 of the performance testing GitHub Action now uses Blueprints under the hood to set up the testing environment and do things like importing demo content and installing mandatory plugins and themes. In addition to that, you can now use Blueprints for your own dedicated setup!

    This way you can install additional plugins, change the site language, define some options, or even run arbitrary WP-CLI commands. There are tons of possible steps and also a Blueprints Gallery with real-world code examples.

    To get started, add a new swissspidy/wp-performance-action@v2 step to your workflow (e.g. .github/workflows/build-test.yml):

    steps:
      - name: Checkout
        uses: actions/checkout@v4
      
      - name: Run performance tests
        uses: swissspidy/wp-performance-action@v2
        with:
          urls: |
            /
            /sample-page/
          plugins: |
            ./my-awesome-plugin
          blueprint: ./my-custom-blueprint.json
          iterations: 5
          repetitions: 1Code language: YAML (yaml)

    Then, add the blueprint (my-custom-blueprint.json):

    {
      "$schema": "https://playground.wordpress.net/blueprint-schema.json",
      "plugins": [
        "performant-translations",
        "akismet"
      ],
      "steps": [
        {
          "step": "defineWpConfigConsts",
          "consts": {
            "WP_DEBUG": true
          }
        },
        {
          "step": "activatePlugin",
          "pluginName": "My Awesome Plugin",
          "pluginPath": "/wordpress/wp-content/plugins/my-awesome-plugin"
        }
      ]
    }Code language: JSON / JSON with Comments (json)

    And that’s it!

    The GitHub Action will now use your custom blueprint to install and activate your own custom plugin and performance-lab and akismet plugins from the plugin directory.

    Alongside this new feature I also included several bug fixes for things I originally planned to add but never really finished. For instance, it is now actually possible to run the performance tests twice and then compare the difference between the results.

    This way, when you submit a pull request you can run tests first for the main branch and then for your PR branch to quickly see at a glance how the PR affects performance. Here is an example:

    jobs:
      comparison:
        runs-on: ubuntu-latest
    
        steps:
    
        # Check out the target branch and build the plugin
        # ...
    
        - name: Run performance tests (before)
          id: before
          uses: ./
          with:
            urls: |
              /
              /sample-page/
            plugins: |
              ./tests/dummy-plugin
            blueprint: ./my-custom-blueprint.json
            print-results: false
            upload-artifacts: false
    
        # Check out the current branch and build the plugin
        # ...
    
        - name: Run performance tests (after)
          uses: ./
          with:
            urls: |
              /
              /sample-page/
            plugins: |
              ./tests/dummy-plugin
            blueprint: ./my-custom-blueprint.json
            previous-results: ${{ steps.before.outputs.results }}
            print-results: true
            upload-artifacts: falseCode language: PHP (php)

    The result will look a bit like this:

    Screenshot of the performance tests results printed in a GitHub Actions workflow summary, comparing metrics such as LCP or memory usage before and after a change.
    Example workflow summary when comparing two sets of performance testing results.

    Playground is the future

    Being able to use Playground for automated testing is really exciting. It simplifies a lot of the setup and speeds up the bootstrapping, even though the sites themselves aren’t as fast (yet) as when using a Docker-based setup. However, there is a lot of momentum behind WordPress Playground and it is getting better every day. Applications like this one further help push its boundaries.

    I had similar success so far when testing Playground with our WordPress performance comparison script and I think it could work well for the Plugin Check GitHub Action.

    WordPress Playground clearly is the future.

  • Getting started with end‑to‑end performance testing in WordPress

    Getting started with end‑to‑end performance testing in WordPress

    Learn how to set up Playwright-based end-to-end performance testing for your own WordPress project.

    Introduction

    End-to-end (E2E) tests are a type of software testing that verifies the behavior of a software application from, well, end to end. They simulate an actual user interacting with the application to verify that it behaves as expected. E2E tests are important because they can help to identify and fix bugs that may not be caught by unit tests or other types of testing. Additionally, they can help to ensure that the application is performing as expected under real-world conditions, with real user flows that are typical for the application. This means starting an actual web server, installing WordPress, and interacting with the website through a browser. For example, the majority of the block editor is covered extensively by end-to-end tests.

    Performance testing

    Browser-based performance testing is a subset of this kind of testing. Such tests measure the speed and reactivity of the website in order to find performance regressions. This includes common metrics such as Web Vitals or page load time, but also dedicated metrics that are more tailored to your project. For instance, Gutenberg tracks things like typing speed and the time it takes to open the block inserter.

    Both WordPress core and Gutenberg use Playwright for end-to-end and performance tests. It supports multiple browsers and operating systems, and provides great developer experience thanks to a resilient API and powerful tooling. If you know Puppeteer, Playwright is a forked and enhanced version of it. The WordPress project is actually still undergoing a migration from Puppeteer to Playwright.

    This article shows how to set up Playwright-based end-to-end tests for your own project, with a focus on performance testing. To familiarize yourself with how Playwright works, explore their Getting Started guide. Would you like to jump straight to the code? Check out this example project on GitHub! It provides a ready-to-use boilerplate for Playwright-based performance tests that you can add to your existing project.

    Using a one-stop solution for performance testing

    Before diving right into the details of writing performance tests and fiddling with reporting, there is also a shortcut to get your feet wet.

    Most of what I cover in this article is also available in a single, ready-to-use GitHub Action. You can easily add it to almost any project with little to no configuration. Here’s an example of the minimum setup needed:

    name: Performance Tests
    on:
      push:
        branches: [ main ]
      pull_request:
        branches: [ main ]
    jobs:
      performance-tests:
        timeout-minutes: 60
        runs-on: ubuntu-latest
    
        steps:
          - name: Checkout
            uses: actions/checkout@v4
    
          - name: Run performance tests
            uses: swissspidy/wp-performance-action@v2
            with:
              plugins: |
                ./path-to-my-awesome-plugin
              urls: |
                /
                /sample-page/Code language: YAML (yaml)

    Using this action will spin up a new WordPress installation, install your desired plugins and themes, run Playwright tests against the provided pages on that site, and print easy to understand results to the workflow summary.

    Performance test results summary on GitHub, showing various collected metrics for a given commit, including things like the number of database queries and total load time.

    This one-stop solution allows you to quickly get started with performance testing in a WordPress context and helps to familiarize yourself with the topic. It might even cover all of your needs already, which would be even better! Another big advantage of such a GitHub Action is that you will automatically benefit from new changes made to it. And If you ever need more, continue reading below to learn how you can do it yourself.

    Update (September 2024): check out my follow-up post about v2 of this GitHub Action using WordPress Playground.

    Setting up Playwright tests for a WordPress plugin/theme

    Reminder: if you want a head start on setting up Playwright tests, check out the example project on GitHub. It provides a ready-to-use boilerplate with everything that’s covered below.

    This article assumes that you are developing WordPress blocks, plugins, themes, or even a whole WordPress site, and are familiar with the common @wordpress/scripts and @wordpress/env toolstack. The env package allows you to quickly spin up a local WordPress site using Docker, whereas the scripts package offers a range of programs to lint, format, build, and test your code. This conveniently includes Playwright tests! In addition to that, the @wordpress/e2e-test-utils-playwright package offers a set of useful helpers for writing Playwright tests for a WordPress project.

    All you need to get started is installing these packages using npm:

    npm install --save-dev @wordpress/scripts @wordpress/env @wordpress/e2e-test-utils-playwrightCode language: Bash (bash)

    To start your server right away, you can run the following command:

    npx --package=@wordpress/env wp-env startCode language: Bash (bash)

    Check out the @wordpress/env documentation on how to further configure or customize your local environment, for example to automatically install and activate your plugin/theme in this new WordPress site.

    Note: if you already have @wordpress/env installed or use another local development environment, skip this step and use your existing setup.

    To run the Playwright tests with @wordpress/scripts, use the command npx wp-scripts test-playwright. If you have a custom Playwright configuration file in your project root directory, it will be automatically picked up. Otherwise, provide the path like so:

    npx wp-scripts test-playwright --config tests/performance/playwright.config.ts

    In a custom config file like this one you can override some of the details from the default configuration provided by @wordpress/scripts. Refer to the documentation for a list of possible options. Most commonly, you would need this to customize the default test timeout, the directory where test artifacts are stored, or how often each test should be repeated.

    Writing your first end-to-end browser test

    The aforementioned utilities package hides most of the complexity of writing end-to-end tests and provides functionality for the most common interactions with WordPress. Your first test could be as simple as this:

    import { test, expect } from '@wordpress/e2e-test-utils-playwright';
    
    test.describe( 'Dashboard', () => {
      test.beforeAll( async ( { requestUtils } ) => {
        await requestUtils.activateTheme( 'twentytwentyone' );
      } );
    
      test( 'Should load properly', async ( { admin, page } ) => {
        await admin.visitAdminPage( '/' );
        await expect(
            page.getByRole('heading', { name: 'Welcome to WordPress', level: 2 })
        ).toBeVisible();
      } );
    } );Code language: JavaScript (javascript)

    When running npx wp-scripts test-playwright, this test visits /wp-admin/ and waits for the “Welcome to WordPress” meta box heading to be visible. And before all tests run (in this case there is only one), it ensures the Twenty-Twenty One theme is activated. That’s it! No need to wait for the page to load or anything, Playwright handles everything for you. And thanks to the locator API, the test is self-explanatory as well.

    Locators are very similar to what’s offered by Testing Library in case you have used that one before. But if you are new to this kind of testing API, it’s worth strolling through the documentation a bit more.
    That said, the easiest way to write a new Playwright test is by recording one using the test generator. That’s right, Playwright comes with the ability to generate tests for you as you perform actions in the browser and will automatically pick the right locators for you. This can be done using a VS Code extension or by simply running npx playwright codegen. Very handy!

    Setting up performance tests

    The jump from a simple end-to-end test to a performance test is not so big. The key difference is performing some additional tasks after visiting a page, further processing the collected metrics, and then repeating all that multiple times to get more accurate results. This is where things get interesting!

    First, you need to determine what you want to measure. When running performance tests with an actual browser, it’s of course interesting to simply measure how fast your pages load. From there you can expand to measuring more client-side metrics such as specific user interactions or Web Vitals like Largest Contentful Paint. But you can also focus more on server-side metrics, such as how long it takes for your plugin to perform a specific task during page load. It all depends on your specific project requirements.

    Writing your first performance test

    Putting all of these pieces together, we can turn a simple end-to-end test into a performance test. Let’s track the time to first byte (TTFB) as a start.

    import { test } from '@wordpress/e2e-test-utils-playwright';
    
    test.describe( 'Front End', () => {
      test.use( {
        storageState: {}, // User will be logged out.
      } );
    
      test.beforeAll( async ( { requestUtils } ) => {
        await requestUtils.activateTheme( 'twentytwentyone' );
      } );
    
      const iterations = 20;
      for ( let i = 1; i <= iterations; i++ ) {
        test( `Measure TTFB (${ i } of ${ iterations })`, async ( {
          page,
          metrics,
        } ) => {
          await page.goto( '/' );
    
          const ttfb = await metrics.getTimeToFirstByte();
    
          console.log( `TTFB: ${ttfb}`);
        } );
      }
    } );Code language: JavaScript (javascript)

    What’s standing out is that Playwright’s storageState is reset for the tests, ensuring that tests are performed as a logged-out user. This is because being logged in could skew results. Of course for some other scenarios this is not necessarily desired. It all depends on what you are testing.

    Second, a for loop around the test() block allows running the test multiple times. It’s worth noting that the loop should be outside the test and not inside. This way, Playwright can ensure proper test isolation, so that a new page is created with every iteration. It will be completely isolated from the other pages, like in incognito mode.

    The metrics object used in the test is a so-called test fixture provided by, you guessed it, the e2e utils package we’ve previously installed. How convenient! From now on, most of the time we will be using this fixture.

    Measuring all the things

    Server-Timing

    The Server-Timing HTTP response header is a way for the server to send information about server-side metrics to the client. This is useful to get answers for things like:

    • Was there a cache hit
    • How long did it take to load translations
    • How long did it take to load X from the database
    • How many database queries were performed
    • How much memory was used

    The last ones are admittedly a bit of a stretch. Server-Timing is meant for duration values, not counts. But it’s the most convenient way to send such metrics because they can be processed in JavaScript even after a page navigation. For Playwright-based performance testing this is perfect.

    In WordPress, the easiest way to add Server-Timing headers is by using the Performance Lab plugin. By default it supports exposing the following metrics:

    • wp-before-template: Time it takes for WordPress to initialize, i.e. from the start of WordPress’s execution until it begins sending the template output to the client.
    • wp-template: Time it takes to compute and render the template, which begins right after the above metric has been measured.
    • wp-total: Time it takes for WordPress to respond entirely, i.e. this is simply the sum of wp-before-template + wp-template.

    Additional metrics can be added via the perflab_server_timing_register_metric() function. For example, this adds the number of database queries to the header:

    add_action(
    	'plugins_loaded',
    	static function() {
    		if ( ! function_exists( 'perflab_server_timing_register_metric' ) ) {
    			return;
    		}
    	
    		perflab_server_timing_register_metric(
    			'db-queries',
    			array(
    				'measure_callback' => static function( Perflab_Server_Timing_Metric $metric ) {
    					add_action(
    						'perflab_server_timing_send_header',
    						static function() use ( $metric ) {
    							global $wpdb;
    							$metric->set_value( $wpdb->num_queries );
    						}
    					);
    				},
    				'access_cap'       => 'exist',
    			)
    		);
    	}
    );Code language: PHP (php)

    Once you have everything set up, WordPress will send an HTTP header like this with every response:

    Server-Timing:
    wp-before-template;dur=110.73, wp-template;dur=143, wp-total;dur=253.73, wp-db-queries;dur=27

    Again, the number of database queries is not a duration, but if it works, it works!

    In Playwright tests, this is how you can retrieve the values:

    test.describe( 'Homepage', () => {
      test( 'Server-Timing', async ( { page, metrics } ) => {
        await page.goto( '/' );
        const serverTiming = await metrics.getServerTiming();
    
        // {
        //   'wp-before-template': 110.73,
        //   'wp-template': 143,
        //   'wp-total': 253.73,
        //   'wp-db-queries': 27,
        // }
      } );
    } );Code language: JavaScript (javascript)

    Load time metrics

    Besides getServerTiming() and getTimeToFirstByte(), the metrics fixture provides a handful of other helpers to measure certain load time metrics to make your life easier:

    • getLargestContentfulPaint: Returns the Largest Contentful Paint (LCP) value using the dedicated API.
    • getCumulativeLayoutShift: Returns the Cumulative Layout Shift (CLS) value using the dedicated API.
    • getLoadingDurations: Returns the loading durations using the Navigation Timing API. All the durations exclude the server response time. The returned object contains serverResponse, firstPaint, domContentLoaded, loaded, firstContentfulPaint, timeSinceResponseEnd.

    Some of these methods are mostly there because it’s trivial to retrieve the metrics, but not all of these might make sense for your use case.

    Tracing

    The metrics fixture provides an easy way to access Chromium’s trace event profiling tool. It allows you to get more insights into what Chrome is doing “under the hood” when interacting with a page. To give you an example of what this means, in Gutenberg this is used to measure things like typing speed.

    // Start tracing.
    await metrics.startTracing();
    
    // Type the testing sequence into the empty paragraph.
    await paragraph.type( 'x'.repeat( iterations ) );
    
    // Stop tracing.
    await metrics.stopTracing();
    
    // Get the durations.
    const [ keyDownEvents, keyPressEvents, keyUpEvents ] =
        metrics.getTypingEventDurations();Code language: JavaScript (javascript)

    In addition to getTypingEventDurations() there are also getSelectionEventDurations(), getClickEventDurations(), and getHoverEventDurations().

    Lighthouse reports

    The @wordpress/e2e-test-utils-playwright package has basic support for running Lighthouse reports for a given page. Support is basic because it only performs a handful of audits and does not yet allow any configuration. Also, due to the way Lighthouse works, it’s much slower than taking similar measurements by hand using simple JavaScript snippets. That’s because it does a lot of things under the hood like applying CPU and network throttling to emulate mobile connection speeds. Still, it can be useful to compare numbers and provide feedback to the folks working on this package to further improve it. A basic example:

    test.describe( 'Homepage', () => {
      test( 'Lighthouse', async ( { page, lighthouse } ) => {
        await page.goto( '/' );
        const report = await lighthouse.getReport();
    
        // {
        //   'LCP': 123,
        //   'TBT': 456,
        //   'TTI': 789,
        //   'CLS': 0.01,
        //   'INP': 321,
        // }
      } );
    } );Code language: JavaScript (javascript)

    This one line is enough to run a Lighthouse report, which involves opening a new isolated browser instance on a dedicated port for running tests in.

    Interactivity metrics

    The metrics fixture already provides ways to get some Web Vitals values such as First Contentful Paint (FCP), Largest Contentful Paint (LCP), and Cumulative Layout Shift (CLS). They cover loading performance and layout stability. Interaction to Next Paint (INP), a pending Core Web Vital metric that will replace First Input Delay (FID) in March 2024, is notably absent from that list. That’s because it’s not so trivial to retrieve, as it requires user interaction.

    INP is a metric that assesses a page’s overall responsiveness to user interactions. It does so by observing the latency of all click, tap, and keyboard interactions that occur throughout the lifespan of a user’s visit to a page. The final INP value is the longest interaction observed, ignoring outliers. So how can you measure that reliably in an automated test? Enter the web-vitals library.

    This library is the easiest way to measure all the Web Vitals metrics in a way that accurately matches how they’re measured by Chrome and reported to tools like PageSpeed Insights.

    As of very recently (i.e. it’s not even released yet!), the metrics fixture has preliminary support for web-vitals.js and allows measuring web vitals using one simple method:

    await page.goto( '/' );
    
    console.log( await metrics.getWebVitals() );Code language: JavaScript (javascript)

    Under the hood, this will refresh the page while simultaneously loading the library and collecting numbers. This will measure CLS, FCP, FID, INP, LCP, and TTFB metrics for the given page and return all the ones that exist.

    Again, metrics like INP require user interaction. To accommodate for that, separate the loading and collection part like so:

    await metrics.initWebVitals( /* reload */ false );
    await page.goto( '/some-other-page/' ); // web-vitals.js will be loaded now.
    // Interact with page here...
    console.log( await metrics.getWebVitals() );Code language: JavaScript (javascript)

    You may find that retrieving web vitals using this single method is easier than calling separate getLargestContentfulPaint() and getCumulativeLayoutShift() methods, though the reported numbers will be identical. In the future these methods may be consolidated into one.

    Making sense of performance metrics

    With the foundation for running performance tests set and all these functions to retrieve various metrics available, the next step is to actually collect all this data in a uniform way. What’s needed is a way to store results and ideally compare them with earlier results. This is so you are actually able to make sense of all these metrics and to identify performance regressions.

    For this purpose, I’ve built a custom test reporter that takes metrics collected in tests and combines them all in one single file. Then, a second command-line script formats the data and optionally does a comparison as well. The reporter and the CLI script are both available on the demo GitHub repository, together with all the other example code from this article. Here’s an example output by this script:

    Note: there is work underway to further refine these scripts and make them easier available through a dedicated npm package. Imagine a single package like @wordpress/performance-tests that provides the whole suite of tools ready to go! This is currently being discussed and I will update this post accordingly when something like this happens.

    In a GitHub Action, you would run this combination of scripts in this order:

    1. Start web server (optional, as Playwright will start it for you otherwise)
    2. Run tests
    3. Optionally run tests for the previous commit or target branch
      1. The raw results are also available as a build artifact and a step output. This way you don’t have to unnecessarily run tests twice but can reuse previous results
    4. Run the CLI script to format results and optionally compare them with the ones from step 3

    Here’s an example of what that could look like.

    Tracking performance metrics over time

    Eventually you will get to a point where you want to see the bigger picture and track your project’s performance over time. For example using a dedicated dashboard such as the one WordPress core currently uses.

    Screenshot of the codevitals.run dashboard for WordPress core, tracking all the different metrics

    When doing so, you will inevitably need to deal with data storage and visualization, and things like variance between individual runs. These are not yet currently solved problems, both for WordPress projects but also in general. In a future blog post I plan to go a bit more in depth on this side of things and show you how to set this up for your project, allowing you to make more sense of performance metrics over time.

    Conclusion

    With the foundation from this blog post you should be able to start writing and running your first performance tests for your WordPress project. However, there is still a lot that can be covered and optimized, as performance testing can be quite a complex matter.

    For now, I do recommend checking out the WordPress performance tests GitHub action as well as the demo repository I set up with the complete testing setup. Oh, and bookmark the Playwright documentation just in case.

    This topic is obviously top of mind for me and also the WordPress core performance team, so expect some more updates in this regard in the future. I do recommend following on X/Twitter and the make/performance blog for updates on all things performance testing.

    And of course please let me know your thoughts in the comments so the team and I can further refine the techniques shared in this post. Thanks for reading!