Regular readers will know that I'm very active in the CA / PKI space and even deliver a 2-day advanced training course on the topic. Over the last year or so I've been watching as a potentially big problem has been rolling in over the horizon and just the other day I saw the first signs of the storm hitting the shore.
Terminology
Just before we dig in I want to clear up some terminology so we're all on the same page. A CA is a Certificate Authority, the organisations that issue certificates so you can have HTTPS on your website. I speak a lot about Let's Encrypt who are one of the biggest CAs out there but you may also recognise names like Comodo, Sectigo, DigiCert et al. The PKI, or Public Key Infrastructure, is used to authenticate users and devices online. Today I'm going to be talking about a subset of that which we call the Internet PKI, which refers to the collection of public CAs used to issue certificates to websites so we can authenticate them in the browser.
How CAs Work
So that the browser can authenticate a website, it must be presented with a valid certificate chain by the server it's connecting to. A typical chain would look something like this, but note that there can be more than 1 intermediate certificate in a chain. The minimum number of certificates you can expect to see in a valid chain is 3.
The Root CA Certificate is the heart of a CA and is quite literally embedded in your OS or your browser of choice, it's physically present on your device. The Root CA issues the Intermediate CA, which in-turn issues the End-Entity Certificate (also known as the leaf certificate or server certificate) to your website. The Leaf and Intermediate certificates are delivered to the client from the server, and the client already has the Root certificate, so with this collection of certificates the chain can be built and the identity of the website authenticated. That is an incredibly brief overview of how this works and for more details you should seriously check out our Training Course on the subject, but for now that should be enough to get us going.
So what's the problem?
The problem is something that we all deal with on a regular basis without too much issue: Certificates have an expiry date and need replacing. I recently wrote about the new lifetime limit of 1 year that will be imposed in September 2020 which means we will all have to replace our Server Certificates every 12 months at least. That limit only applies to Server Certificates though, the certificates that we obtain to install on our website, it does not apply to CA Certificates.
CA Certificates are governed by a different set of rules to our certificates and as such they have different lifetime restrictions. It's very common to see Intermediate Certificates with a lifetime of 5 years and Root Certificates with a lifetime of 25 years! This means that Intermediate Certificates expire on a somewhat regular schedule but this generally isn't a problem. Because the intermediate is delivered by the website it's fairly dynamic and as the website is renewing their certificate on a regular schedule, changing out the intermediate isn't really much of an extra burden. It can be changed quite easily alongside the Server Certificate, unlike the Root CA Certificate.
As I said a moment ago, the Root CA is embedded into the client device itself, usually in the OS but also possibly in the browser or other software. Changing the Root CA isn't something the website can control, it's something that requires an update to be installed on the client, either an OS update or software update. I wonder what our track record of keeping OS/software updated is in the wider world?...
Legacy CAs
Some CAs have now been around for a very long time, we're talking 20-25 years! That also just so happens to mean that some of the original Root CAs out there are also coming towards the end of their natural life, their time is almost up. For most of us this won't be a problem at all because CAs have created new root certificates and those have been distributed across the World in OS and browser updates for years. For some of us though, those who haven't installed OS or browser updates for years, well there's kind of a problem...
This problem was perfectly demonstrated recently, at May 30 10:48:38 2020 GMT to be exact. That exact time was then the AddTrust External CA Root expired and brought with it the first signs of trouble that I've been expecting for some time.
Roku, who are a pretty popular streaming device, had an incident.
Stripe, the payment processor, also had issues.
As of 10:48 UTC, webhook delivery is failing for some users. We’re investigating and will post updates here.
— Stripe Status (@stripestatus) May 30, 2020
Spreedly, another big payment processor, also had an incident.
There is a whole load of stuff that broke because of this Root CA expiring and Andy Ayer has a good list tracking quite a few more here. The point is, the affected clients only have the old (now expired) AddTrust Root CA Certificate installed and because they've not been updated they haven't received the new version that replaces it. Without that new version, things simply don't work and Server Certificates that should be valid, that are valid, will rightly be seen as invalid and rejected by the client.
A Long Time Coming
This particular issue does not come as a surprise to many that operate in this specific area, I've been talking about this impending problem for probably 2 years in our TLS training course, but for many, this will come as a surprise, proven by all of the incidents I linked to above.
Another good example is the upcoming Root CA transition that's Let's Encrypt will be performing. I wrote about this back in April 2019 when Let's Encrypt were planning to transition from their Identrust cross-signed chain to their own ISRG Root chain in July 2019, but it didn't happen...
Update, May 20 2019
Due to concerns about insufficient ISRG root propagation on Android devices we have decided to move the date on which we will start serving a chain to our own root from July 8, 2019, to July 8, 2020.
Let's Encrypt had to push back the transition because of an issue we call root propagation, or more specifically, a lack of root propagation, where a Root CA is not widely distributed onto all clients out there. Let's Encrypt are currently using a cross-signed intermediate and chain down to the IdenTrust DST Root CA X3 certificate. That root certificate expires on 30th Sep 2021 and was issued way back in Sep 2000, so it's widely distributed, or propagated, as most devices have done an update in the last 20 years and as a result they have the IdenTrust Root Certificate installed. That said, Let's Encrypt need to move away from it before it expires and the plan was to migrate to their Root CA, the ISRG Root X1.
The ISRG Root was issued on 4th Jun 2015 and began the approval process to become a CA, a process it completed 6th Aug 2018. At that point the Root CA will be available to all clients via an OS or software update, all they need to do is install the update. "All they need to do"...
Here is the core of the problem. The new Let's Encrypt Root CA was created in 2015 and fully approved for distribution in 2018. Now, ~2 years later, if a device has not been updated since Aug 2018, how does it know about this new Root CA? The answer: it doesn't. This is why Let's Encrypt delayed their switch to their own ISRG Root CA and are still serving the intermediate that chains down to the IdenTrust root, but that solution will only last until the expiry of the IdenTrust Root. They bought themselves some time, but not much. To test if your current client has the ISRG Root X1 installed, try and load this test site: https://valid-isrgrootx1.letsencrypt.org/
If you can connect without warnings then it probably means you're OK, but just because it does connect it doesn't mean you're definitely OK. I'm not going to go into the complexities of chain building and the client doing what it wants, or the possibility you're behind a device terminating TLS on a corporate network, because there are a heap of other things to consider here that require hundreds more words. All I can say is that you're probably OK and if you want to spend hours talking about this, well, I have a training course for that!
This is not Let's Encrypt specific
Not by a long shot, and that's the problem. We're coming to a point in time now where there are lots of CA Root Certificates expiring in the next few years simply because it's been 20+ years since the encrypted Web really started up and that's the lifetime of a Root CA certificate. This will catch some organisations off guard in a big way, as we've already seen, but there are also some organisations that have seen the problem on the horizon and are taking whatever steps they can to resolve it.
With the TLS Training I deliver and various bits of consultancy work I do, I've worked with some organisations that have already actually hit this problem and worked through a solution where they can. There's only one organisation I'm fortunate enough to have been given permission to talk about and that's the BBC, our national broadcaster here in the UK. They had a really interesting problem because they do a lot of online streaming to a lot of different devices. Now, mobile apps and browsers aren't generally too much of a problem, but Smart TVs, well, they're a whole different game.
I'm sure many of you here know that Smart TVs aren't often as 'smart' as we'd like. Generally speaking the only time my TV gets an update now it's to remove a feature, not add one... But this lack of updates does present another rather interesting problem. A Smart TV is basically a cut-down Linux computer, a computer that does TLS comms, that has a Root CA store and that has the exact same problems I've just talked through. The clock is ticking on the Root Certificates installed on the TV and with no updates, they never get replaced...
A very awesome friend of mine, Neil Craig, is Lead Technical Architect at the BBC and he got me some specific details of an incident over there and allowed me to share it with you. On a recent server certificate update they got a new certificate issued by the GlobalSign R5 Root, the root is valid from 13th Nov 2012 to 19th Jan 2038. The problem was, some TVs are so out of date that they don't have that R5 Root CA installed on them that was issued in 2012! This means that those TVs will reject certificates that chain to that Root CA and as a result, the streaming app stops working on the TV! Here we are in 2019/2020 with a problem that an 8 year old Root CA still hasn't managed to make its way onto a significant portion of 'Smart' TVs. The BBC were smart though and there was a workaround they could deploy which meant serving additional intermediate certificates that chain down to a different GlobalSign Root CA. The GlobalSign R3 Root is valid 18th Mar 2009 to 18th Mar 2029 and the R1 Root is valid 1st Sep 1998 to 28th Jan 2028. With the R1 Root going back so far, and there being an alternate trust path available to build to it, the BBC was able to fix the problem and those outdated TVs were fixed because they had the old R1 Root installed. Here's a trimmed output from openssl s_client -connect www.bbc.co.uk:443 -showcerts
and what that looks like in Chrome on Windows at present. They use the same workaround on their www
so it's easier for us to inspect the chain there than it is on the iPlayer API endpoints.
Certificate chain
0 s:C = GB, ST = London, L = London, O = British Broadcasting Corporation, CN = www.bbc.co.uk
i:C = BE, O = GlobalSign nv-sa, CN = GlobalSign ECC OV SSL CA 2018
1 s:C = BE, O = GlobalSign nv-sa, CN = GlobalSign ECC OV SSL CA 2018
i:OU = GlobalSign ECC Root CA - R5, O = GlobalSign, CN = GlobalSign
2 s:OU = GlobalSign ECC Root CA - R5, O = GlobalSign, CN = GlobalSign
i:OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
3 s:OU = GlobalSign Root CA - R3, O = GlobalSign, CN = GlobalSign
i:C = BE, O = GlobalSign nv-sa, OU = Root CA, CN = GlobalSign Root CA
What we can see here is the www.bbc.co.uk
certificate issued by the GlobalSign ECC OV SSL CA 2018
intermediate. That Intermediate has an Authority Key ID of 3de629489bea07ca21444a26de6eded283d09f59
which means it can chain to any one of following three certificates:
GlobalSign ECC Root CA - R5 (Root) Validity 13 Nov 2012 to 19 Jan 2038
GlobalSign ECC Root CA - R5 (Intermediate) Validity 19 Jun 2019 to 28 Jan 2028
GlobalSign ECC Root CA - R5 (Intermediate) Validity 21 Nov 2018 to 18 Mar 2029
The problem here is the first one in the list, the R5 Root. This is the newer one issued in 2012 that the older Smart TVs did not have installed, so the BBC couldn't just return the www.bbc.co.uk
and GlobalSign ECC OV SSL CA 2018
certificates, they had to provide more intermediates so the client can build an alternate chain. By providing the 3rd certificate in the above list, the R5 Intermediate, the client can build a chain 'around' the R5 Root that is missing, so the chain continues. That R5 Intermediate certificate has an Authority Key ID of 8ff04b7fa82e4524ae4d50fa639a8bdee2dd1bbc
meaning it could chain to one of the following three certificates:
GlobalSign Root CA - R3 (Root) Validity 18 Mar 2009 to 18 Mar 2029
GlobalSign Root CA - R3 (Intermediate) Validity 18 Mar 2009 to 28 Jan 2028
GlobalSign Root CA - R3 (Intermediate) Validity 19 Sep 2018 to 28 Jan 2028
At this point the BBC could end the chain and serve the www.bbc.co.uk
, GlobalSign ECC OV SSL CA 2018
and GlobalSign Root CA - R5
(Intermediate) to the client for it to anchor on the GlobalSign Root CA - R3
(Root) but again, with an issue date of 2009 and a few years for approval and distribution, that might still not solve the problem as the R3 Root might not be present. So, in goes another intermediate to work around that! Again they served the 3rd certificate in the list above, the GlobalSign Root CA - R3
(Intermediate), which has an Authority Key ID of 607b661a450d97ca89502f7d04cd34a8fffcfd4b
. That means it could chain to one of the following two certificates:
GlobalSign Root CA (Root) Validity 01 Sep 1998 to 28 Jan 2028
GlobalSign Root CA (Intermediate) Validity 20 Feb 2014 to 15 Dec 2021
Now we're finally getting somewhere. That first certificate has a Friendly Name of GlobalSign Root CA - R1
and is a Root CA that is old enough to be installed on ancient devices that haven't been updated in years and the 'Smart' TVs can successfully build a chain to anchor on. This means instead of serving a 'normal' chain of:
www.bbc.co.uk (Leaf)
GlobalSign ECC OV SSL CA 2018 (Intermediate)
The BBC instead have to serve a chain of:
www.bbc.co.uk (Leaf)
GlobalSign ECC OV SSL CA 2018 (Intermediate)
GlobalSign Root CA - R5 (Intermediate)
GlobalSign Root CA - R3 (Intermediate)
Note: when I started this blog I had no idea how deep down the rabbit hole we were going to end up, but here we are. Anyway, onward!
At best, all the BBC have done here is delay the problem until 2028 when the R1 Root expires. At that point, they could shorten the chain and try to anchor on the R3 Root instead, which expires in 2029, and hope that Smart TVs have updated enough to have that Root CA installed by the time we get there...
The BBC could also look at switching to another CA with a Root CA that has a slightly longer expiry, maybe 2030 or 2031, but it's the same problem over and over again. The solution here, the real solution, is that the client needs to be updated. Smart TV manufacturers might release updates for a couple of years, but we're talking a decade or more if you want to resolve this particular problem. I've quite comfortably had a TV for 10 years and I'm sure as hell not contributing a heap of e-waste just to update the Root CAs installed on my television!!
This affects all devices
If you have a device that's connected to the Internet or has the word 'Smart' somewhere in the marketing material then this Root CA expiry problem is probably a consideration, there's no way to avoid it. If the device is not updated then the Root CA store will become stale over time and eventually, the problem will surface. How soon it will be a problem, and how big of a problem it will be, will depend on when the Root CA store was last updated, but just because a device was built in 2018, it doesn't mean the software wasn't already 6+ years out of date either.
With all the problems that the BBC have had they now require these things to be taken into consideration if Smart TV manufacturers want to get the BBC seal of approval for the iPlayer on the box. Microsoft have also taken steps to remedy this problem in Windows and your OS can now get Root CA Store updates when it needs them. Going forwards, it does look like this problem is starting to be solved for the future, but it isn't being solved for the past and the present.
Going back earlier in the thread I mentioned that the Let's Encrypt root transition was delayed, their reason was a lack of root propagation and they specifically called out Android devices. I did some digging and found data on what the Android ecosystem looks like in terms of installed OS versions.
This shows there is a significant portion of devices that are either lagging seriously behind on updates or simply aren't being updated either by the vendor or by the user (hint: it's the vendor). If we take a look at similar data for iOS, it's a very different story.
I wouldn't be too concerned about this problem if I was an iOS user (I am) but it looks like Android users might have some concerns in the not too distant future!
No modern CAs for you
Changing the intermediates you serve to chain back to an old Root CA to keep devices alive is one thing, and even switching CA to choose a CA with the longest lived 'legacy' root is another, but this also means you can't use a modern CA if you want to. Let me quote something that Neil said to me:
we literally can't use a CA like LE [Let's Encrypt] with TVs because it's in very few root stores
An enormous media and streaming platform can't use a wicked-awesome, fully automated and free CA because the devices that connect to their service aren't modern enough. Yep, that's right, your cutting edge 50" 4K Smart TV isn't modern enough. Sounds crazy, right? But that's the problem!
If you operate a service, or want to build one, that will have legacy client considerations, you can't just go out and hit up the coolest, newest and free CA, you need to be careful about your choice. You need to know which platform your clients are on, which version of their trust store they're using and when they were last updated, all in the hope you can figure out which root certificates are in there and which CA to use to issue your certificates. It sounds easy, but it can be a real pain to figure out. Some vendors like Apple provide data on the contents of their current Root Store, like iOS 13 here, and you can go back some time but it's not straightforward or easy. Cloudflare have cfssl_trust which can get you back to ~2017 and covers various platforms but that could easily not be far enough back for your needs either. In truth, is this is a concern for you then you have a little bit of work to do to figure this out, there's no easy way. Given the prominence of legacy 'stuff' on the Internet, I think we'd better get to fixing this sooner rather than later.
Detecting the issue
Knowing if you have an issue like this could be quite useful indeed and there was a surprising story about how to detect the problem. Of course you can have an advanced and intimate knowledge of all of your clients and their Root Stores, which is difficult, or you can use NEL. I wrote an introductory blog post about Network Error Logging and followed that up with a Network Error Logging: Deep Dive too. The TLDR; for this blog though is that you can have a client send you feedback when they have connectivity issues to your site, including issues caused by certificates. Here's is a very small subset of the errors a client can report.
tls.version_or_cipher_mismatch
The TLS connection was aborted due to version or cipher mismatch
tls.bad_client_auth_cert
The TLS connection was aborted due to invalid client certificate
tls.cert.name_invalid
The TLS connection was aborted due to invalid name
tls.cert.date_invalid
The TLS connection was aborted due to invalid certificate date
tls.cert.authority_invalid
The TLS connection was aborted due to invalid issuing authority
tls.cert.invalid
The TLS connection was aborted due to invalid certificate
tls.cert.revoked
The TLS connection was aborted due to revoked server certificate
tls.cert.pinned_key_not_in_cert_chain
The TLS connection was aborted due to a key pinning error
tls.protocol.error
The TLS connection was aborted due to a TLS protocol error
tls.failed
The TLS connection failed due to reasons not covered by previous errors
The error of interest there of course is the tls.cert.authority_invalid
error and when the client starts to report those, you can quickly take steps to investigate. It may seem surprising that we're talking about NEL, which is a very modern feature, and talking about legacy clients that don't have updates installed at the same time. The scenario here though turned out to be old Android devices running modern version of Chrome browser which got us to the scenario of an outdated OS and Root Store but a modern client that supported NEL reports! If you don't currently use NEL you should check out my blogs linked above and our support for NEL Reports over at Report URI.
The Solution
Updates. One way or another there needs to be an update somewhere. If you're building devices or software that depend on the Internet PKI for secure comms then you're going to have to consider the impact that not updating a Root Store will have on your product or service. If you run a service with legacy clients you need to consider how your choice of CA can affect them.
The cynic in me tells me that TV manufacturers might not care that streaming services stop working because the solution is to buy a new TV, but planned obsolescence isn't a new idea and there's probably no hidden agenda here according to Hanlon's Razor (or Occam's Razor if you prefer a more gentle message).
If you're bundling a library or building on an OS, you need to consider how you're going to update the Root Store in the years to come. You don't need to release a software update with new features, simply replacing the Root Store with the latest version might give a device years more useful life or prevent your service being negatively impacted when the next Root CA expiry comes around. The recent AddTrust Root CA expiry showed us that some big organisations did not see this coming and weren't prepared, but this is the first such incident of its kind, certainly not the last.