It's that time of year again! I'm really excited to publish the 6th installment of my Alexa Top 1 Million analysis so we can take a look over our progress on securing the web over the last 6 months.
Previous Crawls
It's hard to believe there are now 5 previous crawls available for comparison purposes!
August 2015
February 2016
August 2016
February 2017
August 2017
As I publish more of these reports we start to get a much clearer picture of the progress we're making. If you're interested in doing your own analysis not only do I have the links above but I also publish the data from my crawlers on a daily basis. If you want to get hands on with a large set of data I'd love to see what further analysis you can do.
February 2018
The first report of 2018 and it's looking like a good one. As is tradition, let's start with a quick summary and get a look at what kind of things we have in store.
Similar to the Aug 2017 when we saw a huge jump in the number of sites using HPKP, we've seen a continued rise in the use of HPKP and a huge jump in the number of sites using HPKP-RO too. I used to a be a big supporter of HPKP, I even have guidance on how to set it up, but I recently gave up on HPKP and Chrome announced they may deprecate it. This does make it interesting to see continued and strong growth in its usage and it's also make a trend pretty clear; the larger sites are less likely to use HPKP. This is the reverse of the trend for every other metric.
One of the things I'm always eager to see in these reports is the adoption of HTTPS and whether we're still continuing to encrypt the web at an impressive rate. I'm really glad to say that we are continuing to make outstanding progress on that front!
The line does look a little less smooth in this scan, and checking the daily scans this does seem to have been a trend developing over the last few weeks, but either way, we have seen a 32.2% increase in the number of sites redirecting to and enforcing HTTPS in the Alexa Top 1 Million!
One thing I am sad to say though is that something I predicted back in 2017 and have talked about a few times on Twitter has come to pass. The rate at which we were migrating to HTTPS was not only being maintained but it was actually increasing in previous reports, you can see that in the graph. This, of course, could not be maintained forever. Whilst we are still seeing tremendous growth, and I'm massively excited about that and proud to be a part of it, the graph is starting to show signs of a plateau. From Aug 2017 to Feb 2018 the rate of progress has slowed. We're still going in the right direction, and no doubt will continue to do so, but the Aug 2018 and Feb 2019 reports may show much smaller steps forward.
Security Headers
We can't forget the original reason that this whole report started and the use of Security Headers was that reason. Powered by the scanning and analysis engine on securityheaders.io here are the usage of headers and the Security Headers grading in the Alexa Top 1 Million sites.
We're still seeing the same interesting trends that have been present in all previous scans and another one has emerged. Right down towards the bottom of the ranking there is a clear group of sites with a noticeably higher grading. Perhaps an opportunity for someone to grab the data and take a look why. It could be a large hosting provider or platform doing something new by default, or maybe just an anamoly. Let me know if you figure it out!
Let's Encrypt
It's now 2 years since I started tracking the use of Let's Encrypt certificates in these reports and I'm pretty sure that no one here needs me to tell them what's coming.
Let's Encrypt have continued to see strong growth in their presence in the top 1 million sites on the web. Removing cost and technical barriers really does help increase adoption and this is the proof. Back in Aug 2017 Let's Encrypt were close to becoming the largest issuing CA in the top 1 million sites and they did it by Oct 2017, just 2 months later.
Very soon @letsencrypt will be the largest issuing CA for the top 1 million sites on the web 😎 https://t.co/i9byJj2mR5 pic.twitter.com/mXXHWXjUe6
— Scott Helme (@Scott_Helme) August 30, 2017
They did it!!! 🎉🎉🎉
— Scott Helme (@Scott_Helme) December 3, 2017
On the 20th Oct 2017 by 80,352 to 80,062 certificates, Let's Encrypt became the largest issuing CA in the Alexa Top 1 Million! 🍾 pic.twitter.com/Wa4IKsgtDc
EV Certificates
In the Aug 2017 scan I introduced a check for EV certificate usage in the Alexa Top 1 Million and I've left the logic in place to continue to monitor the usage of EV certs. I guess one important thing to point out here is that has been only one change in the methodology that allows me to identify more EV certificates than I did previously. Anyone that's tried to do something like this will tell you that identifying EV certs isn't exactly easy!
We're still seeing the same considerably higher adoption at the top end of the ranking but the really interesting thing here is that overall there's almost no growth in the use of EV certificates. In Aug 2017 I detected 17,877 sites using an EV certificate but I ran the new logic against my old data (I keep all scan data for historic scans) and identified a new total of 18,552 sites using EV certificates. In the new Feb 2018 scan that number has only increased to 19,803 EV certificates. Whilst HTTPS has seen an increase in adoption of 32.30% compared to the last scan, EV certificates only accounted for 6.74% of the increase.
Certificate Authority Authorisation
CAA is a brand new DNS record that sites can set to control which CAs they authorise to issues certificate for their domain. I have a great introduction blog on CAA if you want more information, but the good news is that it's now one extra metric that I'm tracking in the daily crawl! I did a brief intro post about CAA usage back in December when I first added the metric and this is the first time it will be included in a report.
As is common in these results now we're seeing comparatively huge adoption in the sites higher up the ranking with a quick decline followed by a much steadier decrease. I found a total of 4,064 sites with a valid CAA policy set compared to 3,404 in the first scan in Dec 2017, an increase of 19.39% in roughly 2 months. Let's hope that by the Aug 2018 scan we will continue to see a healthy increase in adoption.
General Stats
The raw crawler data is available but I also like to publish a selection of statistics from the data:
Total Rows: 946719
Security Headers Grades:
A+ 763
A 15258
B 18954
C 26957
D 146633
E 29691
F 708385
R 78
Sites using strict-transport-security: 94116
Sites using content-security-policy: 24044
Sites using content-security-policy-report-only: 4595
Sites using x-webkit-csp: 455
Sites using x-content-security-policy: 1235
Sites using public-key-pins: 6889
Sites using public-key-pins-report-only: 2709
Sites using x-content-type-options: 132085
Sites using x-frame-options: 124835
Sites using x-xss-protection: 105956
Sites using x-download-options: 12021
Sites using x-permitted-cross-domain-policies: 11593
Sites using access-control-allow-origin: 32294
Sites using referrer-policy: 3990
Sites redirecting to HTTPS: 372125
Sites using Let's Encrypt certificate: 108146
Top 10 Server headers:
Apache 221564
nginx 160874
cloudflare 92251
Microsoft-IIS/8.5 35599
nginx/1.12.2 29258
Microsoft-IIS/7.5 24947
LiteSpeed 23226
GSE 23041
openresty 14749
Apache/2 12885
Top 10 TLDs:
.com 443948
.org 45933
.ru 40995
.net 38964
.de 38756
.br 27815
.uk 22215
.pl 17704
.it 14246
.ir 13841
Top 10 Certificate Issuers:
C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3 108146
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Domain Validation Secure Server CA 46220
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO ECC Domain Validation Secure Server CA 2 38537
C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", OU = http://certs.godaddy.com/repository/, CN = Go Daddy Secure Certificate Authority - G2 29436
C = US, O = GeoTrust Inc., CN = RapidSSL SHA256 CA 10741
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA 10662
C = US, O = Amazon, OU = Server CA 1B, CN = Amazon 9380
C = US, ST = TX, L = Houston, O = "cPanel, Inc.", CN = "cPanel, Inc. Certification Authority" 8489
C = BE, O = GlobalSign nv-sa, CN = AlphaSSL CA - SHA256 - G2 6580
C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA 6441
Top 10 Protocols:
TLSv1.2 350451
TLSv1 7309
TLSv1.1 165
Top 10 Cipher Suites:
ECDHE-RSA-AES256-GCM-SHA384 147985
ECDHE-RSA-AES128-GCM-SHA256 127964
ECDHE-ECDSA-AES128-GCM-SHA256 41043
ECDHE-RSA-AES256-SHA384 15400
DHE-RSA-AES256-GCM-SHA384 4326
ECDHE-RSA-AES256-SHA 3231
DHE-RSA-AES256-SHA 2484
0000 2194
AES256-SHA 2113
AES128-SHA 1855
Top 10 PFS Key Exchange Params:
ECDH, P-256, 256 bits 325059
ECDH, P-384, 384 bits 6822
ECDH, P-521, 521 bits 6267
DH, 1024 bits 6208
DH, 2048 bits 1275
ECDH, B-571, 570 bits 103
ECDH, brainpoolP512r1, 512 bits 18
DH, 4096 bits 5
DH, 3072 bits 2
DH, 768 bits 1
Top Key Sizes:
2048 bit 289141
256 bit 41402
4096 bit 24527
1024 bit 315
3072 bit 231
384 bit 87
8192 bit 7
2432 bit 4
2049 bit 3
512 bit 2
Sites using CAA: 4186
Other Observations
Looking over the data myself there are some other interesting observations that can be made.
Public Keys
We've seen a huge jump in the number of 2,048 bit RSA keys as you'd expect from a jump in the adoption of HTTPS, but we're also seeing the use of 256 bit ECDSA key usage increasing too, up from 32,070 in Aug 2017 to 41,402 in Feb 2018. The majority of the increase in HTTPS was taken up by RSA though.
Not only that but the use of 3,072 bit and 4,096 RSA keys has also risen quite sharply. 3,072 bit went from 142 to 231 and 4,096 bit went from 16,942 to 24,527. Those are some pretty sizeable keys and there are a lot of sites using them, which does come as a little bit of a surprise.
Cipher Suites
Given the constant drive towards performance on the web, the public key usage above was fairly interseting and so too is the user of cipher suites. The top cipher suite remains as ECDHE-RSA-AES256-GCM-SHA384 raising from 113,309 sites in Aug 2017 to 147,985 sites in Feb 2018. I would have expected that ECDHE-RSA-AES128-GCM-SHA256 would be the most popular suite but that ranked second in both scans with 79,256 sites in Aug 2017 and 127,964 in Feb 2018.
From the graph I guess we can say that the very top sites in the ranking have the highest amount of support for ECDHE-RSA-AES128-GCM-SHA256 which is the faster of the two RSA suites.
Protocol Support
With the pending removal of TLSv1.0 support in PCI DSS coming in June, protocol support will be another interesting thing to keep an eye on. GitHub also did an expirement recently where they disabled TLSv1.0 and TLSv1.1 support on github.com and other services to see what would break. The good news is that protocol support does look pretty good.
To put that another way.
Protocol support looks pretty good in the top 1 million. We have the vast majority on TLSv1.2, a tiny slice on TLSv1.0 and an even tinier slice on TLSv1.1 after that. Once sites do remove TLSv1.0 they may as well remove TLSv1.1 at the same time and just have TLSv1.2 unless TLSv1.3 is here by then.
Servers
The top 4 servers in use hasn't changed and in order are still Apache
, nginx
, cloudflare
and Microsoft-IIS/8.5
. Cloudflare have changed their header from cloudflare-nginx
to cloudflare
and also saw a small loss in the number of sites returning their header but remain 3rd in the ranking. As the 3rd most popular server on the planet I'd imagine removing those 6 bytes from the Server header has actually added up to a fairly significant amount of data of the last few weeks/months!
Report URI
Another cool thing that I wanted to look at was how many sites are using Report URI in the Alexa Top 1 Million.
As of right now that graph is showing 413 sites which is somewhat short of the real total for two main reasons. One, some of the larger sites that report with us downsample their reports by only injecting the report-uri
directive into a subset of responses and two, not all sites configure reporting via the HTTP response header. It is also possible to enable reporting using Report URI JS and my crawler doesn't analayse the body of the page so it'd miss those too. As with all of the other trends we have a much larger presence in the higher ranked sites and a steady trend once you get out of the top few thousand.
Raw Data
As always, details on how to get hold of the raw data can be found here and I'd love to see any further analysis that other members of the community could contribute!
Links
Details on raw data here.
Raw data download links here.
Google sheet with tables and graphs here.