I did a scan of the Alexa Top 1 Million back in August 2015 and published the results for everyone to see. Having just completed another scan of the current Alexa Top 1 Million with additional metrics being tracked, I thought I'd compare the results and see how much progress we've made in a relatively short amount of time.
I published all of my findings in my previous blog, How widely used are security based HTTP response headers?, and they were pretty dismal reading. I was expecting low usage but after much verification the findings were even worse than I thought they'd be! Here they are:
I wasn't even seeing single digit % utilisation in some of the really important headers like CSP and PKP, and STS only just made it over 1% of sites whilst 6.7% were actively redirecting to HTTPS. It's only been around 6 months since I last ran the scan but I thought it was long enough to get a feel for the direction things were heading. I grabbed a fresh copy of the Alexa Top 1 Million (download link) and dusted off the bots sat in my DigitalOcean account. If you use my referral link to sign up with them you get $10 free credit which is enough to run the exact same crawl I did 44 times!
I added several new metrics to the crawl to capture some more information that I thought would be useful to have.
I added the ACAO header to the crawl to see just how widely utilised it is. There was no analysis of the value, just a check to see if it was present.
After creating the HPKP Toolset on report-uri.io I had quite a bit of code handy to deal with x.509 certs in PHP. I decided to put that to good use and track the number of sites in the Alexa Top 1 Million that were using certificates issued by Let's Encrypt.
The live site for securityheaders.io contains a few improvements covered in my blog, securityheaeders.io update, that include strict checking of header values and an API that you can hit to fetch scores for a site. The crawl seemed like a good opportunity to exercise that API so I pulled the score for every site in the Alexa Top 1 Million and included it in the data.
The results are in!
The latest results of the scan came in and I collected and analysed all of the data. Whilst things didn't look too great in the first scan, the rate of improvement being shown is incredible! The easiest way to show that is with the raw numbers, so here they are:
These numbers are still a long way from where I'd like to see them but all of the metrics are showing considerable progress. We've doubled the number of sites that have a CSP deployed and have a healthy boost in the number of sites with a PKP policy deployed too. What's even better is the staggering rise in the number of sites deploying the report-only version of these policies, CSPRO and PKPRO. A 2,227% increase in the number of sites testing out a CSPRO policy is great news. We've nearly doubled the number of sites issuing an STS policy and increased the number of sites redirecting to HTTPS by over 40%.
Some things don't change
What's interesting about the results is that the exact same trends are visible in the data. You can still see the spike in adoption at the top end of the ranking followed by a sharp decline and then a steady tail off as you move down the rankings. Here is the CSP graph for the Feb 2016 scan including the CSP results for the Aug 2015 scan.
The trend is almost identical across the results with the only difference being the scale on the Y axis has increased. The exact same comparison can be drawn for all of the other headers and even for the deployment of HTTPS.
The securityheaders.io grades
I was quite impressed with how the API held up to being hit by 250 crawlers but a little less impressed with how the grades themselves actually turned out!
As you can see, the vast majority of sites out there score an F grade on the securityheaders.io scan. This means they either don't issue any security based HTTP headers, or they do and they aren't configured properly. The graph also highlights just how wide the gap is.
Unfortunately the sheer number of sites getting an F pretty much drowns out the rest but you can still see the trends in there. The further down the ranking you go, the more likely you are to get an F grade with the spikes in the better grades just visible at the bottom. To clear that up I cut down the results to only show the A+ to E grades and then just the A+ and A grades.
Interestingly, the grade D score breaks all the trends and actually increases as you go down the rankings. Going back to my previous research published in Aug 2015 I recalled that there were a small number of headers that broke the trend and the same still applies in the Feb 2016 results. The graphs are a little too busy to overlay, so here are the Aug 2015 and Feb 2016 graphs for XCTO, XFO and XXSSP.
As you can see, the graph has scaled up on the Y axis to accommodate the higher number of sites deploying but the trend remains almost exactly the same. Going back to my blog on Scoring transparency for securityheaders.io you can see that the combination of XCTO and XXSSP would indeed get you a D on the securityheaders.io scan so accounts for the anomaly in the results.
Let's Encrypt Usage
I wanted to check out the usage of Let's Encrypt because I'm super pleased about the availability of free certificates for everyone and even wrote a blog on Getting started with Let's Encrypt!. Let's see how they stack up in the Alexa Top 1 Million most visited sites on the web.
As I expected the usage of Let's Encrypt certificates increases as you go down the rankings. I imagine all of the huge corporate entities up at the top probably have some deals with commercial authorities and actually want to pay for their certificates ;-). There's also the factor that Let's Encrypt don't issue EV certificates so probably can't cater for a few of the sites up there either. Still though, of the 1 million most visited sites on the web, Let's Encrypt is providing certificates to 1,234 of them, which I also thought was a pretty cool number to turn out.
Just like the previous scan, all of the formatted data is available on my Google Doc Spreadsheet located here and contains the results for both scans now.
I also wanted to open up the raw data to be available too so you can now grab a zip file of all of the raw data that my crawler dumps out. That includes all the counts for each metric across the scan and also includes a list of domains for each each metric. You can get those here:
Things are looking up!
All in all the metrics here are very promising. I think we're going to continue to see similar increases in the use of CSP and CSPRO throughout 2016 as organisations become aware of it and just how useful it can be. The increase in sites redirecting to HTTPS is phenomenal and again, I'm sure we can expect to see this trend continue through 2016. The securityheaders.io scores were a little poor, but there's plenty of opportunity for easy improvements to be made with the simpler X-based headers. I will probably run the scan again in another few months and in the meantime I'm hoping to do a technical blog on the crawler and parser itself!