It's time for the 5th instalment of my Alexa Top 1 Million scan and this time around there's another new metric in the data.
I've done 4 previous crawls before now and they were Aug 2015, Feb 2016, Aug 2016 and Feb 2017. I'm also publishing my daily crawl data which is available here for further analysis by the community. Let's dig into the latest data!
To start off with the good news, things are continuing to get better!
One of the biggest changes since the last scan has to be the enormous jump in the number of sites deploying HPKP. This is rather interesting for many reasons, not least because just last week I announced that I'm giving up on HPKP...
The number of sites in the Alexa Top 1 Million deploying HPKP recently jumped from 187 to 6,616! A 3,438% increase!— Scott Helme (@Scott_Helme) May 29, 2017
The increase in HPKP is almost entirely caused by Tumblr deploying HPKP across their entire catalogue of sites. Whilst the number of their sites in the top 1 million has changed since I first noticed this, there's still a huge jump of over 3,000 sites. Another big win in the scan this time around is the continued growth of the deployment of HTTPS. We're really seeing a continuation of the awesome progress being made here and this was confirmed by Adrienne Porter Felt and April King recently in their talk 'Measuring HTTPS Adoption on the Web' at USENIX (slides).
We can clearly see there's huge progress each time I conduct these scans on just how fast HTTPS is being deployed across the web but another thing that's really important is that not only is adoption continuing, it's accelerating. I've noticed this increase in the rate of adoption in previous scans and I'm really excited to see it again.
One of the original purposes of my scans was to determine the adoption of various HTTP security headers and I'm still tracking good progress in that area too. We've seen increases in usage across the board and some of them are quite significant. I'd like to think that securityheaders.io is at least helping to drive adoption and education about these headers.
What's really odd is that the trend for XXP and XCTO are still there! The presence of all other headers decreases as you go down the ranking except these two and still to this day there isn't a solid explanation for this. As I mentioned above the raw data from my daily scans is available so please do dig into the data and see if you can identify why this trend exists.
I've now been tracking the adoption of Let's Encrypt over 18 months and they too have seen some great progress in their adoption in the Alexa Top 1 Million.
The low usage in the very top ranked sites is still present but across the rest of the ranking they've seen significant growth. My guess is that sites right near the top probably have established commercial agreements with a CA but we may see them shifting over time, albeit more slowly.
After a recent, debate, about the use of EV certificates on Twitter between various parties I decided to add tracking to my crawler for the type of certificates used by sites in the top 1 million. It's interesting that the use of EV certs follows the same trend line as most of the other metrics that I track.
As you can see the usage of EV certs is much higher in the higher ranked sites and tails off much in the same way that most other metrics do. I'm sure there will be various arguments for why this is the case but my guess is that sites near the top have a higher budget so the cost of EV is less significant to them and worth a shot for any potential benefits.
You should check over the raw data that I make available if you want to dig into specifics but this is a nice overview of a few of the stats that my crawlers now collect.
Total Rows: 890204 Security Headers Grades: A+ 106 A 1101 B 4562 C 36279 D 48231 E 78285 F 721571 R 69 Sites using strict-transport-security: 65244 Sites using content-security-policy: 17437 Sites using content-security-policy-report-only: 1297 Sites using x-webkit-csp: 439 Sites using x-content-security-policy: 1154 Sites using public-key-pins: 3508 Sites using public-key-pins-report-only: 99 Sites using x-content-type-options: 104099 Sites using x-frame-options: 110391 Sites using x-xss-protection: 82551 Sites using x-download-options: 9696 Sites using x-permitted-cross-domain-policies: 9390 Sites using access-control-allow-origin: 29601 Sites using referrer-policy: 1615 Sites redirecting to HTTPS: 273837 Sites using Let's Encrypt certificate: 63843 Top 10 Server headers: Apache 189985 nginx 145853 cloudflare-nginx 93246 Microsoft-IIS/8.5 31985 Microsoft-IIS/7.5 29442 LiteSpeed 19560 nginx/1.12.1 16369 GSE 15565 Apache/2.4.7 (Ubuntu) 11094 Apache/2.2.15 (CentOS) 10985 Top 10 TLDs: .com 439781 .ru 45256 .net 44956 .org 41717 .de 22581 .jp 18791 .br 14158 .uk 14123 .ir 12322 .in 11788 Top 10 Certificate Issuers: Let's Encrypt Authority X3 63842 COMODO RSA Domain Validation Secure Server CA 37827 COMODO ECC Domain Validation Secure Server CA 2 30170 Go Daddy Secure Certificate Authority - G2 22479 RapidSSL SHA256 CA 12438 Amazon 7087 DigiCert SHA2 High Assurance Server CA 6191 GeoTrust SSL CA - G3 5812 AlphaSSL CA - SHA256 - G2 5550 Symantec Class 3 Secure Server CA - G4 4849 Top 10 Protocols: TLSv1.2 253949 TLSv1 8266 TLSv1.1 177 NULL 0 Top 10 Cipher Suites: ECDHE-RSA-AES256-GCM-SHA384 113309 ECDHE-RSA-AES128-GCM-SHA256 79256 ECDHE-ECDSA-AES128-GCM-SHA256 31843 ECDHE-RSA-AES256-SHA384 13991 DHE-RSA-AES256-GCM-SHA384 4344 ECDHE-RSA-AES256-SHA 3048 DHE-RSA-AES256-SHA 2919 AES128-SHA 2072 AES256-SHA 1977 AES256-SHA256 1941 Top Key Sizes: RSA 2048 bit 212830 ECDSA 256 bit 32070 RSA 4096 bit 16942 RSA 1024 bit 293 RSA 3072 bit 142 ECDSA 384 bit 81 RSA 8192 bit 6 RSA 4056 bit 3 RSA 3248 bit 3 RSA 2058 bit 2
There are a few other nice things that I've noticed whilst looking over the data here that I think are worth pointing out.
As I mentioned above Let's Encrypt have seen tremendous growth in the top 1 million sites, but they're actually really close to becoming the biggest issuing CA! In the Feb 2017 scan Comodo had 46,466 certificates issued and Let's Encrypt had 31,030. Now in the Aug 2017 scan Comodo has 67,977 and Let's Encrypt has 63,842. Given the rate at which Let's Encrypt are closing that gap they will very soon become the largest issuing CA in the Alexa Top 1 Million!
Another awesome development is the increase in the use of ECDSA keys in certificates instead of RSA. The Feb 2017 scans saw RSA 2048 bit keys number 146,817 whilst ECDSA 256 bit keys were 20,046. Looking at the data from Aug 2017 we can see that RSA 2048 bit keys are 212,830 with ECDSA 256 bit keys at 32,070. To put that another way, in Feb 2017 13.7% of sites supported ECDSA but in Aug 2017 that had increased to 15.1% of sites.
The protocol support also surprised me a little with some of the changes there. As expected we've seen a huge jump in the number of sites using TLSv1.2 from 171,723 to 253,949. Again as expected we've seen a decrease in the use of TLSv1.1 from 208 sites to 177 sites. What did surprise me was that we've seen an increase in the number of sites using TLSv1.0 from 7,945 to 8,266. Remember, these are sites that can't negotiate a higher protocol version with me and there's really no reason that they we shouldn't be seeing 100% TLSv1.2 support in the top 1 million.
The last few things to quickly note are that Nginx is closing the gap on Apache as the most popular server choice, Cloudflare have seen a significant increase in their presence and this is the first report where no sites in the top 1 million negotiated SSLv3 with my crawler!
As always the raw data from my scans is available here.
By making the raw data available I'm hoping that others will be able to conduct further analysis or use it to further their own research. I dump the data from the crawlers every day so there's a lot to go at!
You can also view the Google Sheet with all of the data and graphs I've used throughout this article and all of my previous articles here.
I will be sending a few tweets with other bits of information that I've found and embedding them below.