How to Trust Your Web Performance Metrics
Have you ever wondered why you get different numbers for the same metrics using different tools? Here are some of the most common reasons. Just remember: the most important thing to track is consistency and changes within a single testing tool and settings.
Image: Freepik
One of the most common questions we hear is "Why are my numbers for the same metrics different in different tools?" It's understandable to want to validate that whatever monitoring tool you use is gathering correct measurements. Here are some of the most common comparisons we encounter. Remember: The most important thing to track is consistency and changes within a single testing tool and settings.
Synthetic vs real user monitoring
You're not going to get a like-for-like comparison between synthetic and RUM, because they work differently. It's quite typical for RUM metrics to be much faster than synthetic. There are a number of reasons for this, the two most important being:
- Browser caching – Synthetic tests your pages as first-time views, with a "cold" browser cache. For real users, some amount of browser caching will occur, which can make pages considerably faster.
- Snapshots vs real-world experiences – Synthetic tests are a snapshot of performance based on a very particular set of variables. RUM shows you the full breadth of real user experiences, which involve users who have different contexts (e.g. browsers, devices, connection types, geolocations) than what you've set up in your synthetic tests.
Your RUM metrics are your source of truth for how your site performs in the real world. Your synthetic metrics serve as a baseline, so you can see whether code and design changes make your site faster or slower. Use RUM to learn how long your pages are taking to load for real users, and then use synthetic to get the optimization recommendations you need to diagnose and fix performance issues.
SpeedCurve Synthetic vs other synthetic tools
Running tests on your local machine (for example, using Chrome DevTools) will generally be different from what you see in SpeedCurve. In SpeedCurve, you're trying to emulate performance under defined network and browser conditions. For example, you might want to test how a page performs for a person using Chrome in Germany over a DSL connection.
SpeedCurve Synthetic always loads websites:
- From a completely empty cache
- On consistent hardware
- Within Amazon EC2
- At a throttled average connection speed
Chances are, that's a very different environment from your local computer, and therefore you will see very different numbers. Remember: the goal is to establish a consistent performance baseline over time, so that you can pinpoint when changes to the page affect performance.
We encourage you to look at how much improvement you're seeing in SpeedCurve as you improve your code base, rather than focus on absolute numbers. For example: "Our start render is 38% faster than site X, and 28% faster than it was last week."
SpeedCurve Lighthouse vs other Lighthouse
There are a number of reasons why your Lighthouse scores in SpeedCurve might be different from the scores you see in other tools. Here are a few common reasons, along with our recommendation not to focus too much on getting your scores to "match":
- Different Lighthouse versions – We we typically run the latest release.
- TBT and different test environments – The performance score is strongly influenced by Total Blocking Time (TBT), which can be quite different depending on the test environment and runtime settings. The Lighthouse team have written some background on what can cause variability in your scores.
- Runtime settings – At the bottom of the Lighthouse report, you will see the runtime settings used for the Lighthouse test. SpeedCurve's Lighthouse test runs use the official throttling settings for mobile and desktop. We do not override settings based on the test environment, and your settings may be different when running outside of SpeedCurve.
- Test hardware and location – In addition to runtime settings potentially being different, the test servers themselves may be very different. SpeedCurve runs Lighthouse tests from specified locations based on your Site Settings.
- SpeedCurve runs Lighthouse separately from the main test – The Lighthouse report doesn't reuse the same page load that SpeedCurve does. It's a separate page load done at the end of the SpeedCurve test, so the metrics numbers will be different. Lighthouse is run only once, it doesn't use the Set how many times each URL should be loaded setting.
- Lighthouse does not support synthetic scripts – Multistep scripts, authentication scripts, or scripts setting cookies may have very different results due to broken navigation.
SpeedCurve RUM vs CrUX
The Chrome User Experience Report (CrUX) is a public resource from Google that provides real user measurements (RUM) for millions of sites. This data is referenced throughout many of their other products such as Big Query and Google's Search Console. While the metrics from both RUM and CrUX are collected by the same methods, there are a few reasons why they may not match up exactly with one another:
- CrUX only collects data from Chrome browsers – This is the most obvious reason you'll see discrepancies. SpeedCurve RUM collects data from any capable browser where the site owner embeds our JavaScript snippet. CrUX only includes data from Chrome browsers.
- Different sample sizes – There are a number of reasons why your sample sizes may not match between RUM and CrUX. This doesn't necessarily point to issues with the data reported, but it's important to know that in order for Chrome to collect data from their end users, those users must have:
- opted-in to syncing their browsing history,
- not set up a sync passphrase, and
- have usage statistic reporting enabled.
- Different dimensions – There are a number of different dimensions provided by both CrUX and RUM. You may be comparing substantially different cohorts.
- Different time periods – CrUX data available through Big Query is rolled up monthly (to the previous month). SpeedCurve RUM time periods are configurable and may span as little as one day up to the last 90 days.
- Data aggregation – Depending on where you are comparing the data from, you may be looking at different data aggregation methods. Google standardizes on the 75th percentile for Core Web Vitals. SpeedCurve charts default to the median (50th percentile) and we encourage users to look at the 75th when setting performance budgets.
- CrUX does not segment by URL/path/page label – When querying the public BigQuery data set, CrUX will segment a site by origin (i.e. https://www.speedcurve.com). SpeedCurve RUM allows for further segmentation by page label, which provides a great deal of granularity, but may vary dramatically when compared to the entire set of URLs from the origin.
- Different approaches to measuring Cumulative Layout Shift – CLS is a cumulative measurement that changes during the page lifecycle. For SpeedCurve RUM, the calculation of CLS stops when the beacon is fired. However, for CrUX the measurement continues until the visibility state changes to "hidden". This may lead to very different CLS numbers reported, especially if your application continues to see layout shifts occur after the load event.