Last week we announced support for a new type of report on Report URI, Network Error Logging, or NEL reports. These reports are going to unlock a seriously huge amount of really helpful data so it's well worth me doing a deep dive on exactly how you can use NEL and what it can tell you.
Network Error Logging
You can read the NEL spec online but I'm going to cover basically all of the functionality here so you can save yourself the trouble of reading a spec document! Much like other features built into the browser such as CSP that allow the browser to send reports, the browser can now send NEL reports too. The difference is that unlike CSP, NEL doesn't require you to build a policy, it's simply an on/off switch that you need to flick. This should make the adoption and deployment of NEL considerably easier than CSP which does require a little configuration. That's really awesome so let's start off by turning it on and then looking at what it does.
Report-To: {"group":"default","max_age":31536000,"endpoints":[{"url":"https://{subdomain}.report-uri.com/a/d/g"}],"include_subdomains":true}
NEL: {"report_to":"default","max_age":31536000,"include_subdomains":true}
What you need to do is set 2 different HTTP response headers, Report-To
and NEL
. The Report-To header defines where reports should be sent by the browser and is part of the new Reporting API. Going forwards this will be used as a common feature for enabling the browser to send various pieces of information. Let's break apart the JSON that was sent in the Report-To header.
{
"group" : "default",
"max_age" : 31536000,
"endpoints" : [
{"url" : "https://scotthelme.report-uri.com/a/d/g"}
],
"include_subdomains" : true
}
Here we've defined a reporting group called default, told the browser to remember these settings for a year with the max_age
directive, provided the endpoints
URL where reports should be sent and with include_subdomains
we can ask for reports about all of our subdomains to be sent here too. Looking at the content of the NEL header we can see it's really simple.
{
"report_to" : "default",
"max_age" : 31536000,
"include_subdomains" : true
}
Delivering the header turns the feature on, the report_to
directive lists the name of the group in the Report-To header where reports should be sent and the max_age
tells the browser how long it should send NEL reports for after receiving this header. The include_subdomains
flag does what it says on the tin.
The Reports
This is the really exciting part of NEL, what reports will we receive?! Well, there are a lot of things covered by NEL and they're all pretty much things you should really want to know about. Have you ever seen a full page warning in Chrome like this?
I have, you have, I'm pretty sure everyone has and they can happen for a whole heap of reasons. The DNS resolution failed, the certificate has expired, redirect loops, TLS protocol or cipher issues, HTTP failures and countless other problems. Well, you can now get a report when a whole variety of them happen! Just take a look at this list of error codes to get started.
dns.unreachable
DNS server is unreachable
dns.name_not_resolved
DNS server responded but is unable to resolve the address
dns.failed
Request to the DNS server failed due to reasons not covered by previous errors
dns.address_changed
Indicates that the resolved IP address for a request's origin has changed since the corresponding NEL policy was received
tcp.timed_out
TCP connection to the server timed out
tcp.closed
The TCP connection was closed by the server
tcp.reset
The TCP connection was reset
tcp.refused
The TCP connection was refused by the server
tcp.aborted
The TCP connection was aborted
tcp.address_invalid
The IP address is invalid
tcp.address_unreachable
The IP address is unreachable
tcp.failed
The TCP connection failed due to reasons not covered by previous errors
tls.version_or_cipher_mismatch
The TLS connection was aborted due to version or cipher mismatch
tls.bad_client_auth_cert
The TLS connection was aborted due to invalid client certificate
tls.cert.name_invalid
The TLS connection was aborted due to invalid name
tls.cert.date_invalid
The TLS connection was aborted due to invalid certificate date
tls.cert.authority_invalid
The TLS connection was aborted due to invalid issuing authority
tls.cert.invalid
The TLS connection was aborted due to invalid certificate
tls.cert.revoked
The TLS connection was aborted due to revoked server certificate
tls.cert.pinned_key_not_in_cert_chain
The TLS connection was aborted due to a key pinning error
tls.protocol.error
The TLS connection was aborted due to a TLS protocol error
tls.failed
The TLS connection failed due to reasons not covered by previous errors
http.error
The user agent successfully received a response, but it had a 4xx or 5xx status code
http.protocol.error
The connection was aborted due to an HTTP protocol error
http.response.invalid
Response is empty, has a content-length mismatch, has improper encoding, and/or other conditions that prevent user agent from processing the response
http.response.redirect_loop
The request was aborted due to a detected redirect loop
http.failed
The connection failed due to errors in HTTP protocol not covered by previous errors
abandoned
User aborted the resource fetch before it is complete
unknown
error type is unknown
Some of these are things that are going to be really helpful for site operators to monitor. Take the tls.cert.date-invalid
for example, I've tweeted countless times recently about sites that are serving an expired cert and that's often the first they hear of it. Imagine if visitors to your site were sending reports about that, in real-time, the second they visited your page? Yes of course we can say certs shouldn't expire in production, but they clearly do. The question is how quickly do you want to know about it? The dns.name_not_resolved
could alert you to DNS resolution problems for your visitors, a variety of the tcp
errors would be great for knowing about configuration or availability issues on your site and the same goes for tls.version_or_cipher_mismatch
which could be a great tip off about configuration issues on your site. Then of course we step into the application layer with http.response.redirect_loop
and generic 400 or 500 tracking which can quickly alert you to issues. With NEL configured these are some examples of the JSON payloads the browser would send.
{
"age": 0,
"type": "network-error",
"url": "https://new-subdomain.scotthelme.co.uk/",
"body": {
"sampling_fraction": 1.0,
"server_ip": "",
"protocol": "http/1.1",
"method": "GET",
"status_code": 0,
"elapsed_time": 48,
"type": "dns.name_not_resolved",
"phase": "dns"
}
}
{
"age": 0,
"type": "network-error",
"url": "https://scotthelme.co.uk/some-redirect-thing/",
"body": {
"sampling_fraction": 0.5,
"server_ip": "123.122.121.120",
"protocol": "h2",
"method": "GET",
"status_code": 301,
"elapsed_time": 823,
"type": "http.response.redirect_loop",
"phase": "application"
}
}
{
"age": 0,
"type": "network-error",
"url": "https://scotthelme.co.uk/",
"body": {
"sampling_fraction": 1.0,
"referrer": "",
"server_ip": "",
"protocol": "",
"method": "GET",
"status_code": 0,
"elapsed_time": 92,
"type": "tls.cert.date_invalid"
}
}
Looking at your reports
With support for these reports now in Report URI, you can search through them just like you would have done for other report types if you're an existing user. If not, simply create a free account at https://report-uri.com and deliver both of the headers to enable it. Remember to update the report address with your own customised address:
Report-To: {"group":"default","max_age":31536000,"endpoints":[{"url":"https://{your subdomain here}.report-uri.com/a/d/g"}],"include_subdomains":true}
NEL: {"report_to":"default","max_age":31536000,"include_subdomains":true}
Once that's done, the reports will start showing up in your account and you can browse through them.
As well as all of the normal ways you can search for reports, based on Date, URL/Path, Browser and now Platform, you can search on the Type and Phase of the NEL report itself.
This will be really useful if you want to see all of a specific type of report like cert expiry or DNS problems, heck you can even track down all HTTP 500 errors that users saw.
Reporting successful requests
Yes, you did read that right. With NEL you can report on successful requests to your site! Now, this seems like it would generate insane amounts of reports, in theory 1 report per page load, but there is a way to easily control this. The success_fraction
can be set in the NEL header and it can be set to a value between 0.0
and 1.0
.
NEL: {"report_to": "default", "max_age": 31536000, "include_subdomains": true, "success_fraction": 0.5}
You can read the section in the spec about this but basically if this value is present then it controls what fraction of successful network requests to your origin should have reports sent about them. If the value is not present then the default is 0.0
and no reports are sent about successful requests. The example above would result in 50% of network requests to your origin sending a NEL report and they could look something like this.
{
"age": 0,
"type": "network-error",
"url": "https://scotthelme.co.uk/",
"body": {
"sampling_fraction": 0.5,
"referrer": "https://scotthelme.co.uk/",
"server_ip": "123.123.123.123",
"protocol": "h2",
"method": "GET",
"status_code": 200,
"elapsed_time": 823,
"type": "ok"
}
}
One thing that we're looking at making great use of here is the elapsed_time
value which is defined as "The elapsed number of milliseconds between the start of the resource fetch and when it was completed or aborted by the user agent". This will be a great metric for network performance measuring and I'm planning to do a little A/B testing with this and using it for things like CDN testing in various locations. Imagine being able to test the average network latency for various regions or types of browser by varying where the NEL header is currently enabled!
Downsampling report volume
Another great thing about the new NEL header is that the reporting mechanism has the ability to down sample reports built right into it! This has previously been something that was quite difficult to achieve manually but with native support this will really help sites deploy NEL whilst generating controllable volumes of reports.
NEL: {"report_to": "default", "max_age": 31536000, "include_subdomains": true, "failure_fraction": 0.5}
By defining the failure_fraction
in the NEL header you can specify what fraction of reports should be sent between 0.0
and 1.0
inclusive. If the failure_fraction
is not defined then it will default to 1.0
and send all failure reports to the reporting endpoint or the example just above will send 50% of the reports. This should help you cut down on a considerable volume of reports but ensure that you don't miss out on important and less frequent events.
Observing and debugging browser behaviour
The NEL feature, and the Reporting API that it uses to dispatch reports, are both brand new in Chrome and you may be curious about how they work or want to debug your current setup. To do this, Chrome has implemented an interface for you to see sites registered for NEL, information about the policies receive and any reports that you have cached locally in the queue for being dispatched.
You can access this page at chrome://net-internals/#reporting and see all of the information there.
Try it out
All that's left to do is enable NEL on your site and let me know about your experience. I'm genuinely really interested to see if I start receiving information about problems that I didn't know existed or weird behaviour that previously went undetected. The worst-case scenario is that you have no problems at all and you receive no reports!