For many years, a lot of people have considered Google’s PageSpeed Insights to be the golden standard when it comes to benchmarking the performance of websites. Within the last couple of years however, the tool has changed significantly, and PageSpeed Insights (PSI) now frequently hands out low grades for sites that previously sat near perfect scoring. This naturally begs the question: What is going on here? Should PSI still be my go-to?
I was naive. I tried simple searches, in hopes that there might be an easily accessible answer to what I thought were probably common questions – but things were about to get complicated. Really complicated. After ending up somewhat empty-handed, I took it upon myself to dive into the dark, gritty underworld of website performance optimization, in search of answers.
Cutting to the chase
First of all, if you are looking for a specific tool or service that you should use to test your website(s), this article unfortunately doesn’t give you any ‘correct’ answers. As we will soon come to learn, auditing a website is not so simple, and most tools grade per their own parameters, sometimes using undisclosed hardware, throttled network connections and obscure geolocation configurations. The actionable insight in this article, is going to be: Stick with one tool. But I am getting ahead of myself.
The Google suite
If you go to any search engine, looking for one of these benchmarking tools, you will quickly see that there are quite a few choices. Google themselves seemingly have four different tools, that they have developed for these purposes: PageSpeed Insights, Lighthouse, Web.dev and thinkwithgoogle.
To prevent this article from drifting into full-fledged thesis territory, I will stick to talking about PSI and Lighthouse, since these are the most widely talked about applications. Also, the Web.dev and thinkwithgoogle tools are oddly similar to these anyway.
Alright, okay.. So.. What are the differences between these?
This is apparently a question that many developers (also, some of the ones that work at Google?) have a hard time answering. “Wait.. what?” Yep, you read that right. Since 2018, there has been an open debate on the Google Lighthouse Github repository (I went deep), regarding this conundrum.
Lighthouse, the integrated auditing tool in Google Chrome, seems to grade sites differently from PSI – despite the the fact, that they both use the same Lighthouse engine. As you can probably imagine, this is frustrating for many developers. User “fchristant” comments:
“The answers given so far aren’t good enough. If you position yourself as the performance gatekeeper of the web and directly associate it with very real business outcomes (ranking), none of these technical explanations are good enough […] If PSI is way off and too variable, I expect it to be seriously investigated and improved, no matter the technical challenge. ‘Deal with it’ isn’t good enough.“
Another user, “ensemblebd” adds:
“This ticket will never close and will never be resolved because those who are in charge will never admit it’s fundamentally flawed. And that’s because the market value and $$ generated by SEO companies using this tool is supermassive. […] Two years we’ve waited. Where is the official google response? Where is it?“
The fundamental flaws
They both have a point. From what I can gather, trying to accurately, objectively capture the performance of a website is actually impossible. Why is this the case? Well, due to the many variables that are associated with your users. Where are they are located? What hardware they are using to access your website? And also, what do their connection speeds look like?
Now, you might be thinking: “Wait, hardware?” (I know I did) – but it makes a lot of sense. If we open our developer consoles in Google Chrome on a newer desktop computer, and audit a website using the in-browser version of Lighthouse, we will get a better grading – compared to the same audit on a 2015 MacBook. The issue here, is that the program running the audit (Google Chrome), needs to use the resources of the computer that it is running on, to carry out the task. Better hardware will simply run an audit faster, and because of this, will make it seem like the audited site is also loading faster – and, well, technically it is. At least on your computer.
You could argue, that because of this, auditing locally has to be fundamentally flawed. Especially if we can’t throttle our hardware to simulate some agreed-upon median user. I’d agree with this sentiment. Lighthouse is a good tool for simulating slower network connections (Like 3G) and analyzing bundle-sizes, but delivers very, very subjective overall performance scores.
So, what can be done to remedy this hardware issue? Well, we can use a service like PSI. The thing that sets PSI and Lighthouse apart, is that PSI audits use hardware sitting a Google datacenter (the one closest to you), to carry out the audit. On paper, this seems like a better way to generalize these metrics: “If we all use the same hardware and connection speeds to test, we’ll have less fluctuation in grading, right?“.
I wasn’t joking around when I started this article by relating, that things got complicated – because this model of testing unfortunately seems to have it’s own set of issues as well.
First of all, as you may have caught onto in what I just explained, there is more than one datacenter. This introduces some uncertainty in regards to whether or not your audits are always running on the same machines. In fact, this almost becomes a certainty when a site is audited in different countries, since it’s likely that different physical locations will be tasked with the audit in cases like these. In general, it seems, that auditing a site in this way, where the entire process is abstracted away from you, produces some.. Well, rather inconsistent results. At least with PSI.
You don’t have to have to be a rocket scientist to realize that, PSI auditing the same site twice in succession, often produces different results. I’ve seen 20 point variances in performance scores before, within minutes of running tests in rapid succession. Especially the Time-To-First-Byte (TTFB) metric tends to fluctuate a lot. If you are trying to audit a site, that sits on a server in another country – well, then you are also out of luck. The performance score will not be accurate for you in cases like these, especially if the website is intended for an audience in said country, who’d be closer to the physical server. As Github user dobbobbs puts it:
“Considering my server is in the US (LA) and I am in Europe, but the website is intended for US users primarily, it makes no sense that the test should be QUICKER when I run it from within Chrome here in Europe (with all that trans-Atlantic latency), or that PSI should run the test as if I were a user close to my real location when the site isn’t even intended for this geographic region.“
And I’d be inclined to agree with him/her – it remains rather unintuitive.
Perhaps you see where this is going? I have only just begun to scratch the surface of the sheer amount of nuances that are involved in performance optimization – but, it might not be so important for the time being. We are more interested in having a high-level overview of the issue, after all. So what’s next?
I used the tools created by Google as examples in this article, since they are the most heavily discussed ones in this relatively esoteric community. I thought this was a really interesting subject to dive into, since people make millions of dollars optimizing websites every day. Yet, we are all doing this more or less without having a single source of truth – OR a set of standardized metrics.
I should say, that there are plenty of other ‘non-Google’ tools, worthy of mention: GTmetrix, Webpagetest and Pingdom are all very developed and renowned as well. Some of them, maybe even more-so than PSI and Lighthouse. Webpagetest.org gives you the opportunity to test from 50+ locations around the globe, for example – where Google just sends you to their nearest location. Having the ability to manually influence this makes a huge difference, in my opinion.
Another thing is, what is called a ‘YSlow score’. This is something that PSI does not directly give you. Essentially, a good YSlow score means that your page is optimized for the browser to render as fast as possible. This is also a very useful set of optimization standards to be made aware of, directly in your performance tests (to be fair, this can be somewhat gauged through Lighthouse). Both GTmetrix and Webpagetest give you this, for example.
Personally, I tend to use one of these two services. The fluctuations in PSI, make me less confident about the stability of the scoring. Being able to manually select a test-location is a great first step towards having more coherence across the performance-test spectrum, in my opinion.
A single source of truth
To round this article off, I will leave you with my personal opinion. There seems to be a lot of uncertainty in the web development community around these kinds of tests, and hopefully I’ve made it clear why that is. It is very confusing.
It seems somewhat clear to me, that what we really need is a set of standardized metrics, combined with the ability to manually select geolocation – in order for everyone who are doing these tests, to land within a more coherent spectrum of results. Local testing, seems inherently flawed to me. We need to force a set of general parameters (network connections, hardware specs) if we want to have tests that aspire for objectivity.
But after all, this has become an industry. Money is involved. It might not be in the interest of everyone involved to agree on common metrics. For the time being, I’d recommend that whatever tool you pick; Stick with it.
You can’t please everyone!