When I started building Report URI almost 6 years ago, it was a small project operated by just me and handled very little data. With 6 years behind us now, things are very different and I wanted to update you on some important things we're doing behind the scenes!
To show just how much things have changed in ~6 years, all I need to do is embed the following tweet.
It's absolutely wild that I can now say we've processed over HALF A TRILLION REPORTS for our customers!— Scott Helme (@Scott_Helme) March 14, 2021
The current total as of this tweet stands at:
500,205,618,910 reports!!! 😲 https://t.co/jWQNQYX2dP
From our humble beginnings of processing only a few tens of thousands of reports per month, Report URI can now comfortably hit that in a single second with peak volumes of reports hitting ~80,000/s!! That's just such a massive number when you consider that inbound reports aren't simple GET requests served from cache, but POST requests with a payload that requires processing, normalising and most likely, storing in our database. Having now processed over half a trillion of these reports for our customers, I felt it was a good time to talk about what we do with all of that data and how we do it.
In many ways, all of the new things I'm going to talk about in this blog post aren't actually new and aren't even 'changes'. From the very inception of Report URI I've been at the helm and I've had my extensive experience and awareness around application security and online privacy at the forefront of my mind. The service was built to protect data both from those with hostile intent that might steal it and also from us, to make sure that data is only ever used in the right way. With other people joining along the way, like Troy Hunt and Michal Špaček, we gained more people with significant experience in not only software, but security and privacy too. This heavy presence of security experience has helped ensure we stay on the right path and today we're becoming more transparent about that.
Just recently, we published a full and unredacted copy of our latest Penetration Test Report, something that I have to say I don't think I've seen any other company do yet. This proves that we do engage with external testing companies, alongside depending on our own extensive security knowledge, and that indeed we too are human, just like everyone else, and make mistakes. We had issues identified in the test that were quickly resolved and resulted in no impact for our customers or their data due to other mitigating controls we have in place, namely CSP (imagine that!). The main motivator for us to do this was transparency. We have nothing to hide. If you operate any company long enough you're going to have an issue along the way and as I've said publicly, many times before, I think it's more important that companies acknowledge issues and resolve them quickly and transparently than it is to pretend we can be perfect and never have an issue in the first place.
Whilst our focus on securing our application and data is a key point in making sure nothing bad happens to it, we also have to make sure we're doing everything we can to minimise the data we collect and that we're also processing it in a transparent way, and that's where this blog post comes in. Stick with me, because whilst some of this stuff might not seem exciting, it is important.
New legal docs
According to The General Data Protection Regulation (GDPR), organisations must provide people with information about how personal data is processed that is:
- In a concise, transparent, intelligible, and easily accessible form
- Written in clear and plain language, particularly for any information addressed specifically to a child
- Delivered in a timely manner
- Provided free of charge
Data Protection Information
We put this new document together to help prospective customers better understand the common areas of concern around reporting. For many of our customers, given how young reporting technology is, this is often the first time they've considered deploying reporting. As a result we get a great deal of questions from customers that all follow a similar pattern. Like any good technology person would do, I found myself conducting the same task a few times, so I automated it! Providing this document allows me to quickly and easily answer these questions and it is also available for anyone to read on our site.
Our new Data Protection Information document is designed to help potential customers get up to speed more quickly on what data is involved in the sending and receiving of reports. This will help them to arrive at the right decision around whether or not Report URI will be a Data Processor and if they need to action a DPA with us. You could always do this before, but now it will be an automated process available in the Settings page of your account that requires no human interaction on our part.
The document also outlines many of our efforts in Data Minimisation, Privacy by Design and Data Protection by Default which are some of the core principles of the GDPR. One of the things that I was really happy to see during this whole process was that none of this required us to change what data we process or how we process it, this is merely us being more clear and transparent about what we already do. I'm a big believer in the "if you don't have it, you can't lose it" approach and things like an IP address of the device sending a report have never been logged or stored by our systems because we simply don't need to. If there isn't a good reason that we need a piece of data, we don't store or process it, it's that simple. If there's a chance it might be useful to our customers then by default we won't collect it and we provide an option to turn on collection instead. Something like query string parameters in reports is a great example of this and by default the collection is turned off so we store as little data as possible as our default position, but a customer can turn it on if there's some reason they determine they need it.
Records of Processing - Data Protection Analysis
There are a few other requirements set out in the GDPR that we are required to cater for. A non-exhaustive list of some of the main ones would be:
Article 30 of the GDPR requires that we keep a record of our data processing activities.
Article 35 requires that we determine if a Data Protection Impact Assessment is required.
Article 37 requires that we designate a DPO if required.
To cater for these requirements, and others, we have created our Data Protection Analysis document which details both the data we process and how we process it.
As as a result of the analysis, it is our determination that Report URI does not need a DPIA as the data we are processing is not "likely to result in a high risk to the rights and freedoms of natural persons". This is pretty easy to determine if you look at the total amount of data we are the Controller for, which is, most of the time, just your username and password!
Our activities also do not meet the criteria to require us to appoint a DPO. We do not do "systematic monitoring of data subjects on a large scale", we do not do "processing on a large scale of special categories of data" and we are not "a public body or authority".
The remainder of the document forms our record of processing activities along with additional information as required and details all of the data we process and how we process it. I've been told that it's not typical for an organisation to publish such documents as this but I couldn't see the harm in doing so and have decided to make it available for anyone to see.
To ensure absolute transparency here, you may notice that the document has a version of "1v4R", with the R being for Redacted. The only difference between "1V4" and "1V4R" was the removal of commercially sensitive information such as the number and type of users and revenue information. This minor redaction, made in the last paragraph in Section 5, has no material impact on the purpose of sharing the document which is transparency around how we collect and process data. That information appears in the unredacted version because the volume of personal data processed by a Controller is a factor that’s considered in the risk analysis and impact assessment.
PCI DSS SAQ A
Again, I've not come across any organisations publishing such documents in the past, but this is now available on our site for further inspection.
I think and hope that it's fair to say that we keep the service as privacy-respecting as possible whilst delivering the value that reporting can offer. Of course we will continue to seek to improve ourselves wherever the opportunity presents itself but for now, I wanted to be much more clear on our current stance and our plans moving forwards. Whilst we will add new features and we are looking at cool things we can do with so much data, none of that will come at a privacy cost to either our customers or their visitors. We have some exciting things planned for the future and I'm excited about doing them all whilst demonstrating the importance of remaining a privacy respecting service.