Top 1 Million Analysis - March 2020

It's time for another 6 month update on the state of security online that's a little late! This is the second report using the new data source that was announced in the last report so we have some good comparisons to make when we take a look at the data.



The Crawl

As always, the data for this scan is taken from Crawler.Ninja and it's all available, in raw form too, over on that site. This is now the 10th report I've done on the Top 1 Million sites over 5 years!!!



I fund the infrastructure for the crawler and the time to do the analysis and report all out of my own pocket so if you do have a couple of dollars / pounds / rupees / yen / whatever to kick in then that'd be awesome! Please head to the donate section on the site on look in my Support section here for ways to help keep the project going.


March 2020

This is the second crawler report that I've done using the new Tranco Top 1 Million list that I announced in the previous crawler report in Sep 2019. What that means is that our comparisons between the last report of everything that's changed over the last 6 months should be a lot more reliable.



Looking at the changes since the last report we can see that everything is positive and we continue to make great progress in improving security online. Notably, the use of Content Security Policy (CSP) has seen some good growth with both the CSP and CSPRO header increasing by a good margin. It is odd to see growth in HTTP Public Key Pinning (HPKP) given that it's no longer supported and can be quite dangerous (1, 2, 3) but the change here could be related to sites shifting in and out of the Top 1 Million ranking. Other than that, as I said, everything continues to see growth where needed so let's take a closer look at just what's going on.


HTTPS

We saw a bit of a dip in HTTPS in the last report because of the change in data source but I'm happy to say that now, with the second report using the Tranco list, we have seen growth in the use of HTTPS!



Looking at the absolute numbers we're up to 528,498 sites out of the Top 1 Million using HTTPS! The number is probably slightly higher than that too as there's always a small number of failed scans so it's great to see such awesome numbers. Looking at the % and things are still looking really healthy.



You can see that same dip back in Sep 2019 caused by the switch in data source but there is nice growth over the last 6 months. Even though there was a slightly dip it's good to see that switching data source didn't have a particularly huge impact, giving me more confidence in the accuracy of these numbers. We're now up to 60.9338% of the Top 1 Million sites actively redirecting users to HTTPS.



HTTP Strict Transport Security

If you're not familiar with HSTS then you should check out my blog post HSTS - The missing link in Transport Layer Security and my HSTS Cheat Sheet. If you really want to up your game then take a look at HSTS Preloading too! HSTS is essential for sites that expect their visitors to use HTTPS all the time so it's good to see continued increase in the use of HSTS.



The use of HSTS has been tracked across the entire history of these reports so I have data going back 5 years and you can really see the increase in adoption over that time. In the first report in Aug 2015 there were only 11,308 sites using HSTS and in this latest report we have 132,466 sites using it! That's phenomenal growth and we've seen a 12.49% increase in the last 6 months alone.


Certificates

I started tracking more metrics about certificates over the years as they become more important in our increasingly HTTPS world and there are still some interesting trend emerging over time.

I started tracking the presence of Let's Encrypt in the Top 1 Million back in 2016 and they're seen some truly amazing growth in that time. Like other metrics they took a hit when I changed data source but also like other metrics they've seen nice growth in the last 6 months again.



Let's Encrypt are now covering, 181,896 sites in the Top 1 Million, a share of 20.97%!

Another continuing trend over the course of these scans is the decline in the presence of EV certificates in the Top 1 Million. Despite there being more sites than I've ever recorded using HTTPS there is also the lowest number of sites using EV certificates that I've ever recorded.



That graph doesn't really do it justice so I've represented the data slightly differently here.




That's a really sharp and noticeable decline in the last 6 months alone and there are currently 15,604 sites using EV certificates, the lowest absolute number I've ever recorded, and that represents 1.80% of sites using EV certificates, the lowest market share I've ever recorded. Given the tremendous growth in the use of certificates over the last few years, it's interesting, but unsurprising, that EV is not only failing to capture any of the new sites using HTTPS but also losing existing ground as sites switch to DV certificates. If you've missed the back story on what's happening with EV then check out my posts Gone forEVer!, Sites that used to have EV and Are EV certificates worth the paper they're written on?. Just to wrap up on the certificate section I also track who is issuing certificates to the Top 1 Million sites so here's the data on that.


Certificate Issuers:
C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3 181,895
C = US, ST = CA, L = San Francisco, O = "CloudFlare, Inc.", CN = CloudFlare Inc ECC CA-2 88,085
C = GB, ST = Greater Manchester, L = Salford, O = Sectigo Limited, CN = Sectigo RSA Domain Validation Secure Server CA 35,568
C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", OU = http://certs.godaddy.com/repository/, CN = Go Daddy Secure Certificate Authority - G2 35,500
C = US, O = Amazon, OU = Server CA 1B, CN = Amazon 23,000
C = US, ST = TX, L = Houston, O = "cPanel, Inc.", CN = "cPanel, Inc. Certification Authority" 16,828
C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA 16,191
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Domain Validation Secure Server CA 12,338
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = RapidSSL RSA CA 2018 11,036
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = GeoTrust RSA CA 2018 9,189


Alongside that meteoric growth of Let's Encrypt there is another interesting thing to note and it's the continuing decline in the presence of traditional CAs and the continuing rise in platforms providing certificates. All of the traditional CAs like DigiCert, Sectigo/Comodo and GoDaddy have all lost ground while Amazon and CloudFlare have gained ground. It's probably safe to roll at least a little of Let's Encrypt into that same category too as whilst they are a traditional CA in that they hand out certificates to site operators, they are also used at scale for platforms like GitHub Pages and WordPress Blogs. There are certainly interesting times ahead and I'd be curios to see how this trend continues over time. If do you want to look at the live data each day you can find that here.


Certificate Authority Authorisation

Whilst we're still on the topic of certificates it's of course important to talk about Certificate Authority Authorisation (CAA). The ability to control which CAs can issue certificates for your site and when they can issue them is a great feature to leverage and more sites need to use it. On that note, it's great to say that more sites are using it!



You can see that huge focus on using CAA in the most highly ranked sites to the left side of the graph and the 10 highest ranked sites using CAA show us what kind of organisations are using it.


Sites using CAA:
1 google.com
2 facebook.com
3 youtube.com
9 netflix.com
13 wikipedia.org
15 yahoo.com
16 doubleclick.net
20 wikipedia.com
24 googletagmanager.com
25 youtu.be


You can see the daily list of sites using CAA published here but as with most of these security mechanisms, it's the larger sites that are focusing on using them and usage quickly tails off as we move down the ranking. Another thing you can look at that's updated daily is the list of configurations that sites are using right here. Here's a sample of the 10 most common configurations.


Values for CAA:
CAA	0 issue "letsencrypt.org"
 1,855
CAA	0 issue "pki.goog"
 520
CAA	0 issue "comodoca.com"
 452
CAA	0 issue "digicert.com"
 395
CAA	128 issue "letsencrypt.org"
 338
CAA	0 issue "\;"
 178
CAA	0 issue "globalsign.com"
 148
CAA	0 issue "godaddy.com"
 122
CAA	0 issue "sectigo.com"
 113
CAA	0 issuewild "godaddy.com"
 108


General Statistics

The general stats section is a nice overview of each crawl and it's updated daily so if you want to browse through the latest one then click right here.


Total Rows: 869874 

Security Headers Grades:
A 23,597
A+ 3,538
B 21,226
C 31,577
D 118,783
E 12,029
F 659,013
R 111

Sites using strict-transport-security:
133,054

Sites using content-security-policy:
52,174

Sites using content-security-policy-report-only:
2,399

Sites using x-webkit-csp:
632

Sites using x-content-security-policy:
1,898

Sites using public-key-pins:
703

Sites using public-key-pins-report-only:
38

Sites using x-content-type-options:
151,403

Sites using x-frame-options:
158,265

Sites using x-xss-protection:
120,717

Sites using x-download-options:
18,780

Sites using x-permitted-cross-domain-policies:
17,207

Sites using access-control-allow-origin:
37,005

Sites using referrer-policy:
36,325

Sites using feature-policy:
4,416

Sites using report-to:
12,339

Sites using nel:
12,131

Sites using security.txt:
1,766

Sites redirecting to HTTPS:
528,895

Sites using Let's Encrypt certificate:
182,033

Sites using EV Certificates:
15,631

Top 10 Server headers:
Apache 181,642
cloudflare 147,798
nginx 143,441
Microsoft-IIS/7.5 37,619
Microsoft-IIS/8.5 27,026
Microsoft-IIS/10.0 17,240
LiteSpeed 16,115
openresty 11,781
nginx/1.16.1 9,220
Apache/2 8,323

Top 10 TLDs:
.com 490,999
.org 65,000
.net 39,796
.ru 26,661
.cn 16,643
.de 16,347
.uk 14,310
.jp 8,578
.br 8,372
.in 7,147

Top 10 Certificate Issuers:
C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3 182,032
C = US, ST = CA, L = San Francisco, O = "CloudFlare, Inc.", CN = CloudFlare Inc ECC CA-2 89,596
C = GB, ST = Greater Manchester, L = Salford, O = Sectigo Limited, CN = Sectigo RSA Domain Validation Secure Server CA 35,620
C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", OU = http://certs.godaddy.com/repository/, CN = Go Daddy Secure Certificate Authority - G2 35,425
C = US, O = Amazon, OU = Server CA 1B, CN = Amazon 23,113
C = US, ST = TX, L = Houston, O = "cPanel, Inc.", CN = "cPanel, Inc. Certification Authority" 16,901
C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA 16,078
C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Domain Validation Secure Server CA 12,115
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = RapidSSL RSA CA 2018 11,035
C = US, O = DigiCert Inc, OU = www.digicert.com, CN = GeoTrust RSA CA 2018 9,302

Top 10 Protocols:
TLSv1.2 327,483
TLSv1 1,929
TLSv1.1 17

Top 10 Cipher Suites:
ECDHE-RSA-AES256-GCM-SHA384 132,896
ECDHE-RSA-AES128-GCM-SHA256 112,794
ECDHE-ECDSA-AES128-GCM-SHA256 57,992
ECDHE-RSA-AES256-SHA384 11,172
DHE-RSA-AES256-GCM-SHA384 2,385
0 2,251
ECDHE-ECDSA-AES256-GCM-SHA384 1,766
ECDHE-RSA-AES256-SHA 1,306
AES256-SHA 1,236
ECDHE-RSA-AES128-SHA256 878

Top 10 PFS Key Exchange Params:
ECDH, P-256, 256 bits 303,032
ECDH, P-384, 384 bits 11,063
ECDH, P-521, 521 bits 5,058
DH, 1024 bits 2,368
DH, 2048 bits 892
DH, 4096 bits 108
ECDH, B-571, 570 bits 36
ECDH, brainpoolP512r1, 512 bits 13
DH, 3072 bits 9
ECDH, secp256k1, 256 bits 3

Top Key Sizes:
2048 bit 245,090
256 bit 59,070
4096 bit 21,317
3072 bit 826
384 bit 695
1024 bit 147
8192 bit 19
4056 bit 3
4048 bit 3
2255 bit 2

Sites using CAA:
17,245


Just looking through the general statistics for each crawl and there are already a few things that jump out to me and are worth talking about.


Legacy TLS

I recently wrote about Legacy TLS is on the way out: Start deprecating TLSv1.0 and TLSv1.1 now and it's great to see such low numbers of reliance on Legacy TLS in the crawl.



Y0u can see the absolute numbers in the daily crawl data here but I was surprised that there are still a few thousand sites that don't support higher than TLSv1.0 or TLSv1.1 which are both quite old.


Top 10 Protocols:
TLSv1.2 327,483
TLSv1 1,929
TLSv1.1 17


Cipher Suites

Another surprising thing came in the cipher suites and that AES256 was so much more prevalent than AES128. You can see the live daily data here and here are the top 5 most common cipher suites from today.


Top 10 Cipher Suites:
ECDHE-RSA-AES256-GCM-SHA384 132,896
ECDHE-RSA-AES128-GCM-SHA256 112,794
ECDHE-ECDSA-AES128-GCM-SHA256 57,992
ECDHE-RSA-AES256-SHA384 11,172
DHE-RSA-AES256-GCM-SHA384 2,385


Given that AES128 is sufficient and better performance than AES256 I'd have expected that to be more the prevalent cipher suite, but indeed not!


RSA vs ECDSA

Focusing on the performance side of things again and using an ECDSA key for your certificate is far better and offers slightly better security too. The data for key types might not indicate that though.



Unfortunately the use of ECDSA is still quite low despite the fact that they are better for performance and security, but there's probably a good reason why: Windows XP. Well it's not just Windows XP but it is legacy clients that can't support ECDSA and only support RSA so we have to RSA if you have legacy client concerns. There's also an element of RSA just being 'the default' so there are probably people who could upgrade to ECDSA and just haven't. I have written about ECDSA certificates on my blog before and if you want to get really fancy it is possible to support both RSA and ECDSA together, but for now, we do need to drive those ECDSA numbers up quite a bit.


Key Size

Talking about performance again and another surprising thing jumped out at me. Looking at the most common key sizes used for authentication we have the following data, available daily here.


Top Key Sizes:
2048 bit 245,090
256 bit 59,070
4096 bit 21,317
3072 bit 826
384 bit 695


So the top 2 key sizes are where I'd expect. The 2,048bit key is an RSA key and the 256bit is an ECDSA key. As I said in the previous section we can see that RSA is more common than ECDSA but there are different key size available for each. The most surprising thing here is the absolute crazy amount of 4,096bit RSA keys! This is insane! The performance hit of such an unnecessarily massive RSA key won't be small and there is a heap of sites using them.



Security Headers

In this section we'll look at the utilisation rate of different headers and the grade that sites score on my Security Headers analyser service which is free to use so head over there if you've not checked it out before.



Looking at utilisation first and we can see that these headers are more popular amongst the higher ranked sites which has been a consistent them throughout the history of these scans.



Running all of these sites against the Security Headers API to fetch their scores yields the same disappointing results that the homepage of Security Headers tells us: lower grades are very common.



The most common grades are F and D as you can see and interestingly the F grades are lowest at the high end of the ranking and the D grades are the highest.


Get the data

If you want to see the data that these scans are based on then there are several things to check out. All of the tables/graphs/data that this report was based on are available on the Google Sheet here. The crawler fleet itself and the daily data are available over on Crawler.Ninja so head over there for those. There's also a full mysqldump of the crawler database with the raw crawl data for every single scan I've ever done, over 2.5TB of data, available on Scans.io which means if you want to do some additional analysis the data is there for you to use. On top of all of that, if you've enjoyed this post, the data and the analysis then please consider support the project in some way on the donations section!


Other quick observations

I don't want to make a long post much longer but there were a couple of other quick things that grabbed my attention so I will post them up here for a brief look.

Because I'm analysing certificates for all sites that get scanned I can also see when they expire so I have a list for sites serving expired certificates, and for sites whose certificate expires in one, three and seven days.

There are a surprising number of sites still using HPKP as they are issuing PKP/PKPRO headers. The even more surprising thing when looking at the actual header values is that the vast majority of them are completely wrong/invalid! Here's the 2 most common headers:

pin-sha256="X3pGTSOuJeEVw989IJ/cEtXUEmy52zs1TZQrU06KUKg=" max-age=15552000; includeSubDomains x 200
pin-sha256="base64+primary=="; pin-sha256="base64+backup=="; max-age=5184000; includeSubDomains x 55

Records of Server headers have changed loads over time and there are still some super old values in the full list. Here's the top 5

Server headers:
Apache 181,642
cloudflare 147,798
nginx 143,441
Microsoft-IIS/7.5 37,619
Microsoft-IIS/8.5 27,026

How about X-ASPNet-Version anyone? Full list.

Values for X-Aspnet-Version:
4.0.30319 31,725
2.0.50727 3,734
411
0 265
1.1.4322 53

The X-Page-Speed header, Full List. It's interesting how many people have set the value of the header to on...

Values for X-Page-Speed:
1.13.35.2-0 1,135
Powered By ngx_pagespeed 126
on 109
1.12.34.3-0 93
1.12.34.2-0 75

The X-Powered-By header is actually quite grim reading, stuff is so old in there! Here's the Full List and the top 5.

Values for X-Powered-By:
ASP.NET 85,373
PleskLin 19,600
WP Engine 19,459
PHP/5.6.40 18,977
PHP/5.4.45 11,193


Wrapping up

Okay, I could keep going over this data for hours! Take a look at all of the daily files and maybe there's other interesting observations to make and if you want to automate something up they're all available as JSON files too. Have fun!