Revocation is broken
We have a little problem on the web right now and I can only see this becoming a larger concern as time goes by. More and more sites are obtaining certificates, vitally important documents that we need to deploy HTTPS, but we have no way of protecting ourselves when things go wrong.
We're currently seeing a bit of a gold rush for certificates on the web as more and more sites deploy HTTPS. Beyond the obvious security and privacy benefits of HTTPS, there are quite a few reasons you might want to consider moving to a secure connection that I outline in my article Still think you don't need HTTPS?. Commonly referred to as 'SSL certificates' or 'HTTPS certificates', the wider internet is obtaining them at a rate we've never seen before in the history of the web. Every day I crawl the top 1 million sites on the web and analyse various aspects of their security and every 6 months I publish a report. You can see the reports here but the main result to focus on right now is the adoption of HTTPS.
Not only are we continuing to deploy HTTPS, the rate at which we're doing so is increasing too. This is what real progress looks like. The process of obtaining a certificate has become more and more simple over time and now, thanks to the amazing Let's Encrypt, it's also free to get them. Put simply, we send a Certificate Signing Request (CSR) to the Certificate Authority (CA) and the CA will challenge us to prove our ownership of the domain. This is usually done by setting a DNS TXT record or hosting a challenge code somewhere on a random path on our domain. Once this challenge has been satisfied the CA will issue the certificate and we can then use it to present to the browser and get our green padlock and HTTPS in the address bar.
I have a few tutorials to help you out with this process including how to get started, smart renewal and dual certificates. But this is all great, right? What's the problem here? The problem is when things don't go according to plan and you have a bad day.
We've been hacked
Nobody ever wants to hear those words but the sad reality is that we do, more often than any of us would like. Hackers can go after any number of things when they gain access to our servers and often one of the things they can access is our private key. The certificates we use for HTTPS are public documents, we send them to anyone that connects to our site, but the thing that stops other people using our certificate is that they don't have our private key. When a browser establishes a secure connection to a site, it checks that the server has the private key for the certificate it's trying to use, this is why no one but us can use our certificate. If an attacker gets our private key though, that changes things.
Now that an attacker has managed to obtain our private key, they can use our certificate to prove that they are us. Let's say that again. There is now somebody on the internet, that can prove they are you, when they are not you. This is a real problem and before you think 'this will never happen to me', do you remember Heartbleed? This tiny bug in the OpenSSL library allowed an attacker to steal your private key and you didn't even have to do anything wrong for it to happen. On top of this there are countless ways that private keys are exposed by accident or negligence. The simple truth is that we can lose our private key and when this happens, we need a way to stop an attacker from using our certificate. We need to revoke the certificate.
In a compromise scenario we revoke our certificate so that an attacker can't abuse it. Once a certificate is marked as revoked the browser will know not to trust it, even though it's valid. The owner has requested revocation and no client should accept it.
Once we know we've had a compromise we contact the CA and ask that they revoke our certificate. We need to prove ownership of the certificate in question and once we do that, the CA will mark the certificate as revoked. Now the certificate is revoked, we need a way of communicating this to any client that might require the information. Right now the browser doesn't know and of course, that's a problem. There are two mechanisms that we can use to make this information available and they are a Certificate Revocation List (CRL) or the Online Certificate Status Protocol (OCSP).
Certificate Revocation Lists
A CRL is a really simple concept and is quite literally just a list of all certificates that a CA has marked as revoked. A client can contact the CRL Server and download a copy of the list. Armed with a copy of the list the browser can check to see if the certificate it has been provided is on that list. If the certificate is on the list, the browser now knows the certificate is bad and it shouldn't be trusted, it will throw an error and abandon the connection. If the certificate isn't on the list then everything is fine and the browser can continue the connection.
The problem with a CRL is that they contain a lot of revoked certificates from the particular CA. Without getting into too much detail they are broken down per intermediate certificate a CA has and the CA can fragment the lists into smaller chunks, but, the point I want to make remains the same. The CRL is typically not an insignificant size. The other problem is that if the client doesn't have a fresh copy of the CRL, it has to fetch one during the initial connection to the site which can make things look much slower than they actually are. This doesn't sound particularly great so how about we take a look at OCSP?
Online Certificate Status Protocol
The OCSP provides a much nicer solution to the problem and has a significant advantage over the CRL approach. With OCSP we ask the CA for the status of a single, particular certificate. This means all the CA has to do is respond with a good/revoked answer which is considerably smaller than a CRL. Great stuff!
It is true that OCSP offered a significant performance advantage over fetching a CRL, but, that performance advantage did come with a cost (don't you hate it when that happens?). The cost was a pretty significant one too, it was your privacy... When we think about what an OCSP request is, the request for the status of a very particular, single certificate, you may start to realise that you're leaking some information. When you send an OCSP request, you're basically asking the CA this:
Is the certificate for pornhub.com valid?
So, not exactly an ideal scenario. You're now advertising your browsing history to some third party that you didn't even know about, all in the name of HTTPS which set out to give us more security and privacy. Kind of ironic, huh? But wait, there's something else.
I talked about the CRL and OCSP responses above, the two mechanisms a browser can use to check if a certificate is revoked, and they look like this.
Upon receiving the certificate, the browser will reach out to one of these services and perform the necessary query to ultimately ascertain the status of the certificate. What if your CA is having a bad day and the infrastructure is offline? What if it looks like this?
The browser has only two choices here. It can refuse to accept the certificate because it can't check the revocation status or it can take a risk and accept the certificate without knowing the revocation status. Both of these options come with their advantages and disadvantages. If the browser refuses to accept the certificate then every time your CA has a bad day and their infrastructure goes offline, your sites goes offline too. If the browser continues and accepts the certificate then it risks using a certificate that could have been stolen and exposes the user to the associated risks. It's a tough call, but right now, today, neither of these actually happen...
What actually happens today is that a browser will do what we call a soft fail revocation check. That is, the browser will try to do a revocation check but if the response doesn't come back, or doesn't come back in a short period of time, the browser will simply forget about it. Even is worse is that Chrome doesn't even do revocation checks, at all. Yes, you did read that right, Chrome doesn't even try to check the revocation status of certificates that it encounters. What you might find even more odd is that I completely agree with their approach and I'm quite happy to report that right now Firefox looks like they will be joining the party very soon too. Let me explain. The problem we had with hard fail was obvious, the CA has a bad day and so do we, that's how we arrived at soft fail. The browser will now try to do a revocation check but will ultimately abandon the check if it takes too long or it appears the CA is offline. Wait, what was that last part? The revocation check is abandoned if "it appears the CA is offline". I wonder if an attacker could simulate that?
If you have an attacker performing a MiTM attack all they need to do is simply block the revocation request and make it look like the CA is offline. The browser will then soft fail the check and continue on to happily use the revoked certificate. Every single time you browse and encounter this certificate whilst not under attack you will pay the cost of performing the revocation check to find out the certificate is not revoked. The one time you're under attack, the one time you really need revocation to work, the attacker will simply block the connection and the browser will soft fail through the failure. Adam Langley at Google came up with the best description for what revocation is, it's a seatbelt that snaps in a car crash, and he's right. You get in your car every day and you put your seatbelt on and it makes you feel all warm and fuzzy that you're safe. Then, one day, things don't quite go to plan, you're involved in a crash and out of the windscreen you go. The one time you needed it, it let you down.
Fixing the problem
Right now at this very moment in time the truth is that there is no reliable way to fix this problem, revocation is broken. There are a couple of things worth bringing up though and we may be able to look to a future where we have a reliable revocation checking mechanism.
If a site is compromised and an attacker gets hold of the private key they can impersonate that site and cause a fair amount of harm. That's not great but it could be worse. What if a CA was compromised and an attacker got access to the private key for an intermediate certificate? That would be a disaster because the attacker could then impersonate pretty much any site they like by signing their own certificates. Rather than doing online checks for revocation of intermediate certificates, Chrome and Firefox both have their own mechanisms that work in the same way.
Chrome calls theirs CRLsets and Firefox call theirs OneCRL and they curate lists of revoked certificates by combining available CRLs and selecting certificates from them to be included. So, we have high value certificates like intermediates covered, but what about you and I?
To explain what OCSP Must-Staple is, we first need a quick background on OCSP Stapling. I'm not going to go into too much info, you can get that in my blog on OCSP Stapling, but here is the TL;DR. OCSP Stapling saves the browser having to perform an OCSP request by providing the OCSP response along with the certificate. It's called OCSP Stapling because the idea is that the server would 'staple' the OCSP Response to the certificate and provide both together.
At first glance this seems a little odd because the server is almost 'self certifying' its own certificate as not being revoked, but it all checks out. The OCSP response is only valid for a short period and is signed by the CA in the same way that the certificate is signed. So, in the same way the browser can verify the certificate definitely came from the CA, it can also verify that the OCSP response came from the CA too. This solves the massive privacy concern with OCSP and also removes a burden on the client from having to perform this external request. Winner! But not so much actually, sorry. OCSP Stapling is great and we should all support it on our sites, but, do we honestly think an attacker is going to enable OCSP Stapling? No, I didn't think so, of course they aren't going to. What we need is a way to force the server to OCSP Staple and this is what OCSP Must-Staple is for. When requesting our certificate from the CA we ask them to set the OCSP Must-Staple flag in the certificate. This flag instructs the browser that the certificate must be served with an OCSP Staple or it has to be rejected. Setting the flag is easy.
Now that we have a certificate with this flag set, we as the host must ensure that we OCSP Staple or the browser will not accept our certificate. In the event of a compromise and an attacker obtaining our key, they must also supply an OCSP Staple when they use our certificate too. If they don't include an OCSP Staple, the browser will reject the certificate, and if they do include an OCSP Staple then the OCSP response will say that the certificate is revoked and the browser will reject. Tada!
Whilst Must-Staple sounds like a great solution to the problem of revocation, it isn't quite there just yet. One of the biggest problems that I see is that as a site operator I don't actually know how reliably I OCSP staple and if the client is happy with the stapled response. Without OCSP Must-Staple enabled this isn't really a problem but if we do enable OCSP Must-Staple and then we don't OCSP Staple properly or reliably, our site will start to break. To try and get some feedback about how we're doing in terms of OCSP Stapling we can enable a feature called OCSP Expect-Staple. I've written about this before and you can get all of the details in the blog OCSP Expect-Staple but I will give the TL;DR here. You request an addition to the HSTS preload list that asks the browser to send you a report if it isn't happy with the OCSP Staple. You can collect the reports your self or use my service, report-uri.io, to do it for you and you can learn exactly how often you would hit problems if you turned on OCSP Must-Staple. Because getting an addition to the HSTS preload list isn't as straightforward as I'd like I also wrote a spec to define a new security header called Expect-Staple to deliver the same functionality but with less effort involved. The idea being that you can now set this header and enable reporting to get the feedback we desperately need before enabling Must-Staple. Enabling the header would be simple, just like all of the other security headers:
Expect-Staple: max-age=31536000; report-uri="https://scotthelme.report-uri.io/r/d/staple"; includeSubDomains; preload
One of the other things that we have to consider whilst we're on the topic of revocation is rogue certificates. If somebody manages to compromise a CA or otherwise obtains a certificate that they aren't supposed to have, how are we supposed to know? If I were to breach a CA right now and obtain a certificate for your site without telling you, you wouldn't ever learn about it unless it was widely reported. You could even have an insider threat and someone in your organisation could obtain certificates without going through the proper internal channels and do with them as they please. We need a way to have 100% transparency and we will very soon, Certificate Transparency.
CT is a new requirement that will be mandatory from early next year and will require that all certificates are logged in a public log if the browser is to trust them. You can read the article for more details on CT but what will generally happen is that a CA will log all certificates it issues in a CT log.
These logs are totally public and anyone can search them so the idea is that if a certificate is issued for your site, you will know about it. For example, here you can see all certificates issued for my domain and you can search for your own, you can also use CertSpotter from sslmate to do the same and I use the Facebook Certificate Transparency Monitoring tool which will send you an email each time a new certificate is issued for your domain/s. CT is a fantastic idea and I can't wait for it to become mandatory but there is one thing to note and it's that CT is only the first step. Knowing about these certificates is great but we still have all of the above mentioned problems with revoking them. That said, we can only tackle one problem at once and having the best revocation mechanisms in the world is no good if we don't know about the certificates we need to revoke. CT gives us that much at least.
Certificate Authority Authorisation
Stopping a certificate being issued is much easier than trying to revoke it and this is exactly what Certificate Authority Authorisation allows us to start doing. Again, there are further details in the linked article but the short version is that we can now authorise only specific CAs to issue certificates for us instead of the current situation where we can't indicate any preference at all. It's as simple as creating a DNS record:
scotthelme.co.uk. IN CAA 0 issue "letsencrypt.org"
Whilst CAA isn't a particularly strong mechanism and it won't help in all mis-issuance scenarios, there are some where it can help us and we should assert our preference by creating a CAA record.
As it currently stands there is a real problem, we can't revoke certificates if someone obtains our private key. Just imagine how that will play out the next time Heartbleed comes along! One thing that you can do to try and limit the impact of a compromise is to reduce the validity period of certificates you obtain. Instead of three years go for one year or even less. Let's Encrypt only issue certificates that are valid for ninety days! With a reduced lifetime on your certificate you have less of a problem if you're compromised because an attacker has less time to abuse the certificate before it expires. Beyond this, there's very little we can do.
To demonstrate the issue and just how real this is, try and visit the new subdomain I setup on my site, revoked.scotthelme.co.uk. As you can probably guess this subdomain is using a revoked certificate and there's a good chance that it will load just fine. If it doesn't, if you do get a warning about the certificate having expired, then your browser is still doing an OCSP check and you just told the CA you were going to my site. To prove that this soft fail check is pointless you can add a hosts entry for
ocsp.int-x3.letsencrypt.org to resolve to
127.0.0.1 or block it via some other means and then try the same thing again. This time the page will load just fine because the revocation check will fail and the browser will continue loading the page. Some use that is...
The final point I want to close on is a question, and it's this. Should we fix revocation? That's a story for another day though!