How the EU made our website slow

Yep, you did read that right, this is a blog post about the EU made the Report URI website slow. I've talked about the VAT MOSS scheme in the EU before and here's a little further information about problems it introduces.


EU VAT MOSS

So what the heck is VAT MOSS, the European Union Value Added Tax Mini One-Stop Shop of course! If you want to know more about VAT MOSS (you don't) you can read my blog (you shouldn't) all about it, Overcoming the hurdles of VAT and VAT MOSS in the EU. The TLDR of that is that we have to charge different rates of VAT to people in 28 different member states based on evidence of where they are at the time of purchase and whether or not they are registered with their local VAT authority. The huge pain to comply with VAT MOSS is detailed in the linked blog post but we did hit other issues along the way after we were compliant.


VIES

Wikipedia actually has a pretty good explanation of what VIES is:


The VAT Information Exchange System (VIES) is an electronic means of transmitting information relating to VAT registration (i.e., validity of VAT numbers) of companies registered in the European Union. EU law requires that, where goods or services are procured within the EU, VAT must be paid only in the member state where the purchaser resides. For this reason, suppliers need an easy way to validate the VAT numbers presented by purchasers. This validation is performed through VIES.


The important thing in that explanation is the 'validity of VAT number'. When someone in the EU purchases something from us, we can't just ask for their VAT number like we can for their name, address, card details, etc... We have to ask for their VAT number and then check that it's valid and that it's theirs. This is where VIES comes in.


VIES


You can go to the VIES VAT number validation tool online and validate any EU VAT number there. If the VAT number is valid it will confirm that it is valid and the organisation or person it was issued to. This is the data that we must validate against those registering for Report URI. Ok, that's not so bad you might think, but of course we need a better way to do this as a service, we need an API.


The VIES API

Information on the API that VIES provides can be found in their WSDL here: http://ec.europa.eu/taxation_customs/vies/checkVatService.wsdl

Interestingly the WSDL is available only over HTTP and doesn't work over HTTPS.


wsdl-http


wsdl-https


That's less than ideal but also only the start of the problems. We have to use this service if we want to have customers in the EU, and we kinda want to have customers in the EU. When you subscribe to an account we take your VAT number and call VIES to validate it. Based upon the validation being successful or not the price of your subscription can change as it will determine whether or not you pay VAT on the total price. Organisations that are VAT registered don't want to pay VAT so we have to validate VAT numbers and reflect that in the pricing by "reverse charging" the VAT. The problem was, we seem to have somewhat reliable issues trying to call in and validate VAT numbers.


Monitoring the VIES API

I decided to setup a little monitor and regularly poll the VIES API to see if it was alive and notify us if the API was having problems. After all, our billing flow is dependent on this API being online so if it is down we really need to know. I thought 1 minute was a reasonable interval to poll the API and see if it was responding, but VIES didn't agree. It turns out if you poll the API every minute you get your IP banned from calling the API...

I created a really hacky PHP script to monitor the API and I was just going to run it from my local server at first, here it is:


<?php

$countryCodes = array(
'BE' => '0420429375',
'BG' => '130460283',
'CZ' => '8007244542',
'DK' => '73444217',
'DE' => '129273398',
'EE' => '100072174',
'IE' => '9513488W',
'EL' => '094468339',
'ES' => 'A28229813',
'FR' => '27402835961',
'HR' => '24640993045',
'IT' => '01114601006',
'CY' => '10139104E',
'LV' => '40003245752',
'LT' => '100006256115',
'LU' => '20981643',
'HU' => '10773381',
'MT' => '12701906',
'NL' => '808936955B01',
'AT' => 'U37207205',
'PL' => '5220002334',
'PT' => '509250149',
'RO' => '160796',
'SI' => '94995737',
'SK' => '2021879959',
'FI' => '07055792',
'SE' => '556138653201',
'GB' => '277368458');

$client = new SoapClient("http://ec.europa.eu/taxation_customs/vies/checkVatService.wsdl");
$output = "" . date('d M Y H:i:s') . " ";

foreach($countryCodes as $countryCode => $vatNumber)
{
        echo $countryCode . "\r\n";
        $error = '';
        $start = microtime(true);
        try {
                var_dump($client->checkVat(array('countryCode' => $countryCode, 'vatNumber' => $vatNumber)));
        } catch(Exception $e) {
                var_dump($e);
                $error = $e->getMessage();
        }
        $end = microtime(true);
        $time = $end - $start;
        $output .= $countryCode . ":" . ($error == '' ? $time : $error) . " - ";
}

file_put_contents('/home/scott/VIES/vies.log', substr($output, 0, -2) . "\r\n", FILE_APPEND);

All it was doing was calling into VIES and checking a single, valid VAT number for each member state that VIES provides data for. You don't actually query VIES as such, they appear to be a proxy to each of the VAT authorities in each member state so VIES could be up whilst half of the member state systems were down. The only real way to know is to make a query through to each member state. I fired this up on cron to run every minute...


13 Feb 2018 03:44:01 BE:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - BG:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - CZ:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - DK:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - DE:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - EE:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - IE:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - EL:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - ES:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - FR:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - HR:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - IT:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - CY:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - LV:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - LT:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - LU:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - HU:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - MT:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - NL:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - AT:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - PL:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - PT:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - RO:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - SI:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - SK:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - FI:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - SE:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED - GB:javax.xml.rpc.soap.SOAPFaultException: IP_BLOCKED 

It didn't take long for them to ban hammer me and it seems that stopping the requests didn't get the ban removed by the following day. After some messing around I just power cycled my router to get a new IP and start again with a less aggressive frequency of every 10 minutes.


14 Feb 2018 09:10:01 BE:0.36851215362549 - BG:0.35953402519226 - CZ:0.36895704269409 - DK:1.6945331096649 - DE:0.25840401649475 - EE:0.29912781715393 - IE:0.17529106140137 - EL:0.32289981842041 - ES:0.21482586860657 - FR:0.89746594429016 - HR:0.29334688186646 - IT:0.97457313537598 - CY:0.4465069770813 - LV:0.29882597923279 - LT:0.29937291145325 - LU:0.11224389076233 - HU:0.23431587219238 - MT:0.34735703468323 - NL:0.19270086288452 - AT:0.18781495094299 - PL:0.24466705322266 - PT:3.0486500263214 - RO:0.20380306243896 - SI:0.17909097671509 - SK:0.45247483253479 - FI:0.29300093650818 - SE:3.619619846344 - GB:0.24547100067139 

I could now time how long it took for each query to run and see if there were particular problems. In the section of logs I'm using for this blog post the first entry is 14 Feb 2018 09:10:01 and the last entry is 13 Jul 2018 14:10:02. There are 3,580 tests in that time against each of the 28 member states, so let's see how many problem there were.


VIES Errors

There are a bunch of errors that the API returns and some of them happen a lot more frequently than others. Here are some of the worst offenders:


MS_UNAVAILABLE
The application at the Member State is not replying or not available. Please refer to the Technical Information page to check the status of the requested Member State, try again later.
Count: 1,680


MS_MAX_CONCURRENT_REQ
Your Request for VAT validation has not been processed; the maximum number of concurrent requests for this Member State has been reached. Please re-submit your request later or contact TAXUD-VIESWEB@ec.europa.eu for further information": Your request cannot be processed due to high traffic towards the Member State you are trying to reach. Please try again later.
Count: 232


TIMEOUT
The application did not receive a reply within the allocated time period, try again later.
Count: 283


looks like we got no XML document
No message provided.
Count: 213


Redirection limit reached, aborting
No message provided.
Count: 25


Error Fetching http headers
No message provided.
Count: 20


Not Found
No message provided.
Count: 4


So, out of the 100,240 test we conducted there were 2,457 errors where we didn't get a proper response. That's a 2.45% failure rate on requests to an API that is critical in our billing flow! Not so great. That's also just failures, when things didn't work. What about the times when things do work, just really, really slowly? Finland had a a busy hour or so on 26th May:


26 May 2018 17:10:01 BE:1.3399569988251 - BG:1.0218980312347 - CZ:0.41921615600586 - DK:1.7808570861816 - DE:9.8924231529236 - EE:11.54936504364 - IE:TIMEOUT - EL:0.3336398601532 - ES:0.63100099563599 - FR:1.6053969860077 - HR:0.95138692855835 - IT:1.2633180618286 - CY:0.43682408332825 - LV:3.1926281452179 - LT:0.27870512008667 - LU:3.0638310909271 - HU:1.0205821990967 - MT:0.37093305587769 - NL:0.22589421272278 - AT:0.60714602470398 - PL:0.58894491195679 - PT:0.31774806976318 - RO:0.59447383880615 - SI:0.56872487068176 - SK:1.6284410953522 - FI:46.033823013306 - SE:0.96296286582947 - GB:5.7091720104218 

Germany also had a rough day on 15th Feb where their response time was over 40 seconds for several hours:


15 Feb 2018 09:10:01 BE:10.121844053268 - BG:4.1402130126953 - CZ:8.3899168968201 - DK:45.707659959793 - DE:33.148834943771 - EE:53.348067998886 - IE:Error Fetching http headers - EL:looks like we got no XML document - ES:35.409417152405 - FR:looks like we got no XML document - HR:looks like we got no XML document - IT:Error Fetching http headers - CY:looks like we got no XML document - LV:looks like we got no XML document - LT:26.82378911972 - LU:Error Fetching http headers - HU:25.858185052872 - MT:Error Fetching http headers - NL:looks like we got no XML document - AT:looks like we got no XML document - PL:looks like we got no XML document - PT:47.369332075119 - RO:27.334869861603 - SI:34.250901937485 - SK:looks like we got no XML document - FI:Error Fetching http headers - SE:40.345087051392 - GB:looks like we got no XML document 

There are many occasions when services take too long to respond and the only choice we have is to sit and wait, right in the middle of our billing flow. Here are the response times and their counts:


50+s   x 1
40-50s x 4
30-40s x 5
20-30s x 20
10-20s x 381
1-10s  x 9,929
<1s    x 87,780

Whilst the vast majority of API responses are fairly quick at <1s, there are 10,340 requests taking over a second and all the way up to 50+s. That's 10.32% of requests against the VIES API that are adding an absolute minimum of 1 second to our process and on average a lot more.


Only use it when necessary

There were things that we were doing that were less than ideal and we did have to change them. For example, whenever we fetched the pricing for our plans we'd apply VAT or not based on where you were and whether or not you had a VAT number registered on your account. We used to validate that VAT number each time we used it which in hindsight wasn't a great idea so we had to stop doing that. Right now we only validate the VAT number when we're about to quote pricing in the billing flow and nowhere else. That means that you won't see the appropriate rate of VAT in places like our homepage where we list our plans under the Pricing section, but there's not much we can do about that. If we want VAT inclusive prices there we have to validate the VAT number and that means hitting VIES, which was literally slowing down our homepage. Yes we could do this once and store it against your account and then actually validate it again when we're about to bill/charge you (which we also have to do to be compliant) but all of this is just adding complexity and mess. The only real solution was to show prices without VAT across the site and then add VAT in the billing flow. This wasn't something that I really wanted to do but it was something we had little choice in.

All in all I guess my biggest worry here is that something happens to the VIES API. If we have a particularly busy period there is a chance we could get our IP banned if we poll the service too often. There's also a chance, given that it already seems to have slow spells and periods of being unavailable, that someone could DDoS this thing into oblivion. My guess is that it's not running on particularly modern hardware and it doesn't seem to have a CDN fronting it as the domain resolves to IP addresses owned by "Commission Europeenne".


;; ANSWER SECTION:
ec.europa.eu.           78      IN      A       147.67.119.171
ec.europa.eu.           78      IN      A       147.67.119.136
ec.europa.eu.           78      IN      A       147.67.136.136
ec.europa.eu.           78      IN      A       147.67.119.61
ec.europa.eu.           78      IN      A       147.67.136.171
ec.europa.eu.           78      IN      A       147.67.136.61

For now, there is no alternative and we have to keep using it, let's just hope they can improve reliability and maintain it.