Tag: FFmpeg

  • Client-side media processing in WordPress

    Client-side media processing in WordPress

    At WordCamp US 2024 I gave a presentation about client-side media processing, which is all about bringing WordPress’ media uploading and editing capabilities from the server to the browser. Watch the recording or check out the slides. This blog post is a written adaption of this talk.

    This was a long overdue step after my announcement tweet in December 2023 went viral. After that, I talked about it on the WP Tavern Jukebox podcast and in an interview with WPShout. You might also remember an old post I wrote about client-side video optimization. This is a 10x evolution of that.

    A lot has changed since then. Not only have I built new features, but I also completely refactored the Media Experiments plugin that was all part of. For WordCamp US, I chose to put more focus on the technical aspects of media handling in WordPress and the benefits of a new browser-based approach.

    If you haven’t seen it before, here’s a quick glimpse of what browser-based media processing allows us to do:

    Contributors wanted

    You can find everything covered in this article in the Media Experiments GitHub repository. It contains a working WordPress plugin that you can easily install on your site or even just test with one click using WordPress Playground.

    The goal is to eventually bring parts of this into WordPress core itself, which is something I am currently working on.

    To make this project a reality, I need your help! Please try the plugin and provide feedback for anything that catches your eye. Or even batter, check out the source code and help tackle some of the open issues.

    Let WordPress help you

    The WordPress project has a clear philosophy, with pillars such as “Design for the majority” and “Decisions, not options”. There is one section in that philosophy which particularly stands out to me:

    The average WordPress user simply wants to be able to write without problems or interruption. These are the users that we design the software for

    WordPress philosophy

    In my experience, when it comes to uploading images and other types of media, WordPress sometimes falls short of that. There are still many problems and interruptions. This is even more problematic nowadays as the web is more media-rich than ever before.

    Minimizing frustration

    Who hasn’t experienced issues when uploading media to WordPress?

    Perhaps an image was too large and uploading takes forever, maybe even resulting in a timeout. And if it worked, the image was so large that it degraded your site’s performance.

    Maybe you were trying to upload a photo from your phone, but WordPress doesn’t support it and tells you to please use another file. Or even worse, you upload a video, it succeeds, but then you realize none of the browsers actually support the video format.

    In short: uploading media is a frustrating experience.

    To work around these issues, you start googling how to manually convert and compress images before uploading them. Maybe even reducing the dimensions to save some bandwidth.

    If you use videos, maybe you upload them to YouTube because you don’t want to bother with video formats too.

    Maybe you switch your hosting provider because your server still takes too long to generate all those thumbnails or because it doesn’t support the newest image formats.

    This is tedious and time consuming. WordPress should be taking work off your shoulders, not making your lives harder. So I set out to make this better. I wanted to turn this around and let WordPress help you.

    At State of the Word 2023, WordPress co-founder Matt Mullenweg said the following about WordPress’ mission to Democratize Publishing:

    We take things that used to require advanced technical knowledge and make it accessible to everyone.

    Matt Mullenweg

    And I think media uploads is a perfect opportunity for us to apply this. And the solution for that lies in the browser.

    WebAssembly

    Your server might not be capable to generate all those thumbnails or to convert a specific image format to something usable. But thanks to your own device’s computing power and technologies such as WebAssembly, we can fix this for you.

    With WebAssembly you can compile code written in a language like Rust or C++ for running it in browsers with near-native performance. In the browser you can load WebAssembly modules via JavaScript and seamlessly send data back and forth.

    At the core of my what I am showing you here is one such WebAssembly solution called wasm-vips. It is a port of the powerful libvips image processing library. That means any image operation that you can do with vips, you can now do in the browser.

    Vips vs. ImageMagick

    Vips is similar to ImageMagick, which WordPress typically uses, but has some serious advantages. For example, when WordPress loads vips in the browser it can always use the latest version. Whereas on the server we have to use whatever version that is available.

    Sometimes, those are really old versions that have certain bugs or don’t support more modern image formats like AVIF. For hosts it can be challenging to upgrade, as they don’t want to break any sites. And even if ImageMagick already supports a format like AVIF, it could be very slow. Vips on the other hand is more performant, has more features, and even for older formats like JPEG it uses newer encoders with better results.

    Client-side vs. server-side media processing

    Sequence diagram for server-side media processing
    Sequence diagram for server-side media processing

    Traditionally, when you drop an image into the editor or the media library, it is sent to WordPress straight away. There, ImageMagick creates thumbnails for every registered image size one by one. That means a lot of waiting until WordPress can proceed. Here is where timeouts usually happen.

    Eventually, once all the thumbnails are generated, WordPress creates a new attachment and sends it back to the editor. There, the editor can swap out the file you originally dropped with the final one returned by the server.

    Compare this to the client-side approach using the vips image library:

    Sequence diagram for client-side media processing
    Sequence diagram for client-side media processing

    Once you drop an image into the editor, a web worker creates thumbnails of it. A web worker runs in a separate thread from the editor, so none of the image processing affects your workflow. Plus, the cropping happens in parallel, which makes it a super fast process. Every thumbnail is then uploaded separately to the server. The server only has to do little work, just storing the file and returning the attachment data.

    You immediately see all the updates in the editor after every step, so you have a much faster feedback loop. With this approach, the chances for errors, timeouts or memory issues are basically zero.

    New use cases

    The Media Experiments plugin contains tons of media-related features and enhancements. In this section I want to highlight some of them to better demonstrate what this new technology unlocks in WordPress.

    Image compression

    As shown in the demo at the beginning of the article, a key feature is the ability to compress or convert images directly in the browser. This works for existing images as well as new ones. All the thumbnails are generated in the browser as well.

    Bonus: Did you see it? The plugin automatically adds AI-generated image captions and alt text for the image. This simply wouldn’t be possible on a typical WordPress server, but thanks to WebAssembly we can easily use AI models for such a task in the browser.

    You can also compress all existing images in a blog post at once. The images can come from all sorts of sources too, for example from the image block, gallery block, or the post’s featured image.

    In theory you could even do this for the whole media library. The tricky part of course is that your browser needs to be running. So that idea isn’t fully fleshed out yet.

    Smart thumbnails

    By default, when WordPress creates those 150×150 thumbnails it does a hard crop in the center of the image. For some photos that will lead to poor results where for example it cuts off the most relevant part of the picture, like a person’s head.

    Vips supports saliency-aware image cropping out of the box, which looks for things like color saturation to determine a better crop.

    Comparison of default vs. smart image cropping
    Comparison of default vs. smart image cropping

    At first you might think it is just a minor detail, but it’s actually really impactful. It just works, and it works for everybody! You will never have to worry about accidentally cropping off someone’s face again.

    HEIC Images

    If you use an iPhone you might have seen HEIC/HEIF images before, as it uses that format by default. It is a format with strong compression, but only Safari fully supports it.

    Thanks to WebAssembly, WordPress can automatically convert such images to something usable. In this demo you will first notice a broken preview, as the (Chrome) browser doesn’t support the file format. But then it swiftly converts it to a JPEG, fixing the preview, and then uploads it to the server.

    Bonus: this also works for JPEG XL, which is another format that only Safari supports.

    Upload from your phone

    In the above video I used an HEIC image which I previously took on my iPhone and then transferred to my computer. And from my computer I then uploaded it to WordPress. But what if you cut out the middleman?

    In the editor, you can generate a QR code that you scan with your camera, or a URL that you can share with a colleague. Here, I am opening this URL in another browser, but let’s pretend it’s my phone. On your phone you then choose the image you want to upload. After that, it magically appears in the editor on your computer.

    Hat tip to John Blackbourn for the idea!

    Video compression

    Media compression and conversion also works great for videos. When I record screencasts for this post, they will be in the MOV format, which doesn’t work in all browsers.

    Thanks to ffmpeg.wasm, a WebAssembly port of the powerful FFmpeg framework, WordPress can convert them to a more universal format like MP4 or WebM. The same works for audio files as well.

    This solution also generates poster images out of the box, which is important for user experience and performance.

    Bonus: just like for image captions, AI can automatically subtitles for any video.

    Animated GIFs

    Sometimes you’re not dealing with videos though, but with GIFs. Who doesn’t like GIFs?

    Well, the thing is, GIFs are actually really bad for user experience and performance. Not only are they usually very bad quality, they can also be huge in file size. Most likely you should not be using animated GIFs.

    Every time you use the GIF format, a kitten dies
    As Paul Bakaus once said: Gifs must die

    The good news is that animated GIFs are nothing but videos with a few key characteristics:

    • They play automatically.
    • They loop continuously
    • They’re silent.

    By converting large GIFs to videos, you can save big on users’ bandwidth. And that’s exactly what WordPress can and should do for you.

    In the following demo, I am dragging and dropping a GIF file from my computer to the editor. Since it is an image file, WordPress first creates an image block and starts the upload process.

    Then, it detects that it is an animated GIF, and uses FFmpeg to convert it to an MP4 video. This happens in the blink of an eye. As it’s now a video, WordPress replaces the image block with a special GIF variation of the video block that’s looping and autoplaying. And of course the video is multiple times smaller than the original image file. As a user, you can’t tell the difference. It just works.

    Media recording

    Compressing videos and converting GIFs is cool, but one of my personal favorites is the ability to record videos or take still pictures directly in the editor, and then upload them straight to WordPress.

    So if you’re writing some sort of tutorial and want to accompany it with a video, or if you are building the next TikTok competitor, you could do that with WordPress.

    Bonus: You probably don’t see it well in the demo, but thanks to AI you can even blur your background for a little more privacy. Super cool!

    Challenges

    Client-side media processing adds a pretty powerful new dimension to WordPress, but it isn’t always as easy as it looks!

    Cross-origin isolation

    On the implementation side, cross-origin isolation is a tricky topic.

    So, WebAssembly libraries like vips or ffmpeg use multiple threads to speed up the processing, which means they require shared memory. Shared memory means you need SharedArrayBuffer.

    For security reasons, enabling SharedArrayBuffer requires a special configuration called cross-origin isolation. That puts a web page into a special state that enforces some restrictions when loading resources from other origins.

    In the WordPress editor, I tried to implement this as smoothly as possible. Normally, you will not even realize that cross-origin isolation is in effect. However, some things in the editor might not work as expected anymore.

    The most common issue I encountered is with embed previews in the editor.

    So in Chrome, all your embed previews in the editor continue to work, while in Firefox or Safari they don’t because they do not support iframe credentialless when isolation is in effect.

    I hope that Firefox and Safari remedy this in the future. Chrome is also working on an alternative proposal called Document-Isolation-Policy which would help resolve this as well. But that might still be years in the future.

    Open source licenses (GPL compatibility)

    Another unfortunate thing is that open source licenses aren’t always compatible with each other. This is the case with the HEIC conversion for those iPhone photos.

    Being able to convert those iPhone photos directly in the browser before sending them to the server just makes so much sense. Unfortunately, it’s a very proprietary file format. The only open source implementation (libheif) is licensed under the LGPL 3.0, which is only compatible with GPL v3. However, WordPress’ license is GPLv2 or later.

    That means we can’t actually use it 🙁

    The good news is that we found another way, and it’s even already part of the next WordPress release!

    However, this happens on the server again instead of the browser.

    This is possible because on the server the conversion happens in ImageMagick (when compiled with libheif), and not in core itself, so there’s no license concern for WordPress.

    The downside of this approach is that it will only work for very few WordPress sites, as it again depends on your PHP and ImageMagick versions. So while this is a nice step into the right direction, only with the client-side approach can we truly support this for everyone.

    The next steps

    All of these challenges simply mean there is still some work to do before it can be put into the hands of millions of WordPress users.

    While this project started as a separate plugin, I am currently in the process of contributing these features step by step to Gutenberg, where we can further test them behind an experimental flag.

    We start with the fundamental rewrite of the upload logic, adding support for image compression and thumbnail conversion. After that, we can look into format conversion, making it easier to use more modern image formats and choosing the format that is most suitable for any given image. From there, we can expand this to videos and audio files.

    Finally and ideally, we expand beyond the post editor and make this available to the rest of WordPress, like the media gallery or anywhere else where one would upload files.

    I am also really excited about the possibility of making this available to everyone building a block editor outside of WordPress, like Tumblr for example.

    Democratizing publishing

    With client-side media processing we make a giant leap forward when it comes to democratizing publishing.

    As mentioned at the beginning, the average WordPress user simply wants to be able to write without problems or interruption. By eliminating all these problems related to media, users will be able to create media-rich content much easier and faster.

    Thanks to client-side media processing, we can greatly improve the user experience around uploads. You benefit from faster uploads, fewer headaches, smaller images, and less overloaded servers. Also, you no longer need to worry about server support or switch hosting providers. Smaller images and more modern image formats help make your site load faster too, which is a nice little bonus.

    Convinced? Check out the GitHub repository and the proposed roadmap for the Gutenberg integration.

  • Client-Side Video Optimization

    Client-Side Video Optimization

    With Web Stories for WordPress, we want to make it easy and fun to create beautiful, immersive stories on the web. Videos contribute a large part to the immersive experience of the story format. Thus, we wanted to streamline the process of adding videos to stories as much as possible. For instance, the Web Stories editor automatically creates poster images for all videos to improve the user experience and accessibility for viewers. One key feature we recently introduced in this area is client-side video optimization — video transcoding and compression directly in the browser.

    New to Web Stories? You can learn more about this exciting format in my recent lightning talk.

    The Problem With Self-Hosting Videos

    The Web Stories format does not currently support embedding videos from platforms like YouTube, which means one has to host videos themselves if they want to use any. And here’s where things get cumbersome, because you have to ensure the videos are in the correct file format, have the right dimensions and low file size to reduce bandwidth costs* and improve download speed.

    A typical use case for a story creator is to record a video on their iPhone and upload it straight to WordPress for use in their next story. There’s just one problem: your iPhone records videos in the .mov format, which is not supported by most browsers. Once you realize that, you might find some online service to convert the .mov file into an .mp4 file. But that doesn’t address the video dimensions and file size concerns. So you try to find another online service or tutorial to help with that. Ugh.

    We wanted to prevent you from having to go down the rabbit hole of figuring this all out.

    * Aside: To reduce bandwidth costs, we are actually working on a solution to serve story videos directly from the Google CDN, which is pretty cool and will help a lot to reduce costs for creators!

    Alternatives

    Of course, there are some alternatives to this. For example services like Transcoder or Jetpack video hosting. These solutions will transcode videos on-the-fly during upload on their powerful servers. So you upload your .mov file, but you receive an optimized .mp4 video. However, that requires you to install yet another plugin. Plus, these services won’t optimize the video to the dimensions optimal for stories. So there’s still room for improvement.

    We wanted a solution without having to rely on third-party plugins or services. Something that’s built into the Web Stories plugin and ready to go, requiring zero setup. And since hosting providers don’t typically offer any tools for server-side video optimization, we had to resort to the client.

    Making Video Optimization Seamless

    In our research, we quickly stumbled upon ffmpeg.wasm, a WebAssembly port of the powerful FFmpeg program, which enables video transcoding and compression right in the browser. Jonathan Harris and I did some extensive testing and prototyping with it until we were comfortable with the results.

    The initial prototype was followed by multiple rounds of UX reviews and massive changes to media uploads in the Web Stories editor. In fact, I basically rewrote most of the upload logic so we could better cater for all possible edge cases and ensure consistent user experience regardless of what kind of files users try to upload.

    The result is super smooth: just drop a video file into the editor and it will instantly get transcoded, compressed and ultimately uploaded to WordPress. Here’s a quick demo:

    Client-side video optimization in the Web Stories editor in action

    Technical Challenges

    FFmpeg Configuration

    A lot of our time fine-tuning the initial prototype was spent improving the FFmpeg configuration options. As you might know, there’s a ton of them and you can easily shoot yourself in the foot if you’re not familiar with them (which I personally wasn’t). We tried to find the sweet spot with the best tradeoff between video quality, encoding speed, and CPU consumption.

    The FFmpeg options we currently use:

    OptionDescription
    -vcodec libx264Use H.264 video codec.
    -vf scale='min(720,iw)':'min(1080,ih)':
    'force_original_aspect_ratio=decrease',
    pad='width=ceil(iw/2)*2:height=ceil(ih/2)*2'
    Scale down (never up) dimensions to enforce maximum video dimensions of 1080×720 as per the Web Stories recommendations, while avoiding stretching.

    Adds 1px pad to width/height if they’re not divisible by 2, to prevent FFmpeg from crashing due to odd numbers.
    -pix_fmt yuv420pSimpler color profile with best cross-player support
    -preset fastUse the fast encoding preset (i.e. a collection of options).

    In our testing, veryfast didn’t work with ffmpeg.wasm in the browser; there were constant crashes.
    FFmpeg configuration used in the Web Stories WordPress plugin

    Cross-Origin Isolation

    ffmpeg.wasm uses WebAssembly threads and thus requires SharedArrayBuffer support. For security reasons (remember Spectre?), Chrome and Firefox require so-called cross-origin isolation for SharedArrayBuffer to be available.

    To opt in to a cross-origin isolated state, one needs to send the following HTTP headers on the main document:

    Cross-Origin-Embedder-Policy: require-corp
    Cross-Origin-Opener-Policy: same-originCode language: HTTP (http)

    These headers instruct the browser to block loading of resources which haven’t opted into being loaded by cross-origin documents, and prevent cross-origin windows from directly interacting with your document. This also means those resources being loaded cross-origin require opt-ins.

    You can determine whether a web page is in a cross-origin isolated state by examining self.crossOriginIsolated.

    In addition to setting these headers, one also has to ensure that all external resources on the page are loaded with Cross Origin Resource Policy or Cross Origin Resource Sharing HTTP headers. This usually means having to use the crossorigin HTML attribute (e.g. <img src="***" crossorigin>) and ensuring the resource sends Access-Control-Allow-Origin: * headers.

    Now, if you have full control over your website, setting up cross-origin isolation is relatively easy. But the Web Stories editor runs on someone else’s WordPress site, with all sorts of plugins and server configurations at play, where we only control a small piece of it. Given these unknowns, it was not clear whether we could actually use cross-origin isolation in practice.

    Luckily, Jonny was able to implement cross-origin isolation in WordPress admin by output buffering the whole page and adding crossorigin attributes to all images, styles, scripts, and iframes if they were served from a different host.

    This won’t catch resources that are loaded later on using JavaScript, but that’s quite rare in our experience so far. And since we only do this on the editor screen and only when video optimization is enabled, there are less likely to be conflicts with other plugins.

    Other Use Cases

    Over time, we have expanded our usage of FFmpeg in the Web Stories editor beyond mere video optimization during upload. For example, users can now optimize existing videos as well and we also use it to quickly generate a poster image if the browser is unable to do so. But there are two other clever uses cases that I’d like to highlight:

    Converting Animated GIFs to Videos

    Did you know that the GIF image format is really bad? Animated GIFs can be massive in file size. Replacing them with actual videos is better in every way possible. So we tasked ourselves to do exactly this: convert animated GIFs to videos.

    Today, in Web Stories for WordPress, if you upload a GIF, we detect whether it’s animated and silently convert it to a much smaller MP4 video. To the creator and the users viewing the story, this is completely visible. It still behaves like a GIF, but it’s actually a video under the hood. Instead of dozens of MB in size, the video is only a few KB, helping a lot with performance.

    This feature was actually inspired by this issue my colleague Paul Bakaus filed for Gutenberg. It would be super cool to have this same feature in the block editor as well.

    Muting Videos

    Often times, creators upload videos to their stories that they want to use as a non-obtrusive background. For such cases, they’d like the video to be muted. But just adding the muted attribute on a <video> still sends the audio track over the wire, which is wasteful.

    For this reason, when muting a video in the story editor, we actually remove any audio tracks behind the scenes. It’s one of the fastest applications of FFmpeg in our code base because the video is otherwise left untouched. So it usually takes only a few seconds.

    What’s Next?

    I am really glad we were able to solve cumbersome real-world issues for our users in such a smooth way. Even though it’s quite robust already, we’re still working on refining it and expanding it to other parts of the plugin. For example, we want to give users an option to trim their videos directly in the browser.

    We can then use our learnings to bring this solution to other areas too. For example, it would be amazing to land this in Gutenberg so millions of WordPress users could take advantage of client-side video optimization. However, implementing it at this scale would be inherently more complex.