I recently made some changes to report-uri.io to introduce some sensible usage limits. As part of those limits I'd already introduced the inbound rate limit but hadn't yet implemented the removal of historic data. That's where Azure Functions are helping me out.


Removing old data

You can read more details in my blog Introducing sensible limits to report-uri.io if you'd like, but the short version is that storing all historic data in the service had me fast approaching 2TB of data on disk! With usage growing every single month the cost of having so much data on disk, and the rate at which it was growing, needed to be addressed. As a result I introduced some fairly relaxed limits on how long I'd store data for which meant that I needed method to grab some statistics on previous data and then delete it.


The problem

The code to do what I need is simple. I store user's report data in time based partitions in Table Storage so I have to query out a partition, tally up some totals that I use on the graph page and then delete the data from the partition. Sounds easy, and it is. The problem is that my server infrastructure is hosted externally to Azure over at DigitalOcean. I have the infrastructure to handle this without a worry but it generates a lot of outbound bandwidth from Azure trying to do this remotely. That outbound bandwidth costs money. Moving this to an Azure Function allowed me to move it internally to Azure and completely remove the bandwidth cost.


Creating an Azure Function

You can read the full details of Azure Functions over on the site but the idea is that Functions allow you to run snippets of code in response to an event. There are loads of different events you can use to trigger the Function but for me the timer trigger was perfect.


azure function logo


First, you need to create a Function. Navigate to the appropriate section of the Azure Portal and create a Function App using your usual naming conventions and settings. Once the Function App is created, open it and create a new Function.


create PHP function


Once the Function has been created you can add the code that you'd like the Function to execute when it is triggered. I had already written the PHP I needed to roll up and purge historic data so I pasted it into the Code field.


the PHP code


Once the code was saved I needed to install the Azure SDK for PHP. This is really easy and there are some nice instruction on the Azure GitHub page. You need to create 2 files locally and zip them to upload to your Function. Download and save composer.phar from this link and create composer.json with the following contents.

{
    "require": {        
        "microsoft/windowsazure": "^0.4"
    }  
}

Add both of these files to composer.zip and then they are ready for upload. To upload them you can use Kudu so open your browser and navigate to the appropriate dashboard for your Function app.

https://<functionappname>.scm.azurewebsites.net

Once you have Kudu open select 'Debug console' then 'CMD' from the navigation menu.


Kudu debug console


In the file browser navigate to site/wwwroot/<functionname> or use the CMD prompt if you like. Once there you can drag and drop the composer.zip file you created earlier onto the file browser and it will upload and unzip it for you. To install the SDK run the following command.

php composer.phar install

You are now ready to go with the Azure SDK for PHP inside your function and can use it as your normally would. The last thing left to do is setup a trigger for the function. I'm going to use the timer trigger which handily accepts a Cron expression. Back in the Azure portal click on the Integrate tab for the Function and add a new Trigger.


add a timer trigger


Most Cron expressions usually have only 5 fields but the Azure format has 6 to include seconds. If you aren't familiar with Cron expressions you can read more on the Wikipedia page but in essence, they allow you to define time intervals in a simple format.

┌───────────── sec (0-59)
│ ┌───────────── min (0 - 59)
│ │ ┌────────────── hour (0 - 23)
│ │ │ ┌─────────────── day of month (1 - 31)
│ │ │ │ ┌──────────────── month (1 - 12)
│ │ │ │ │ ┌───────────────── day of week (0 - 6)
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
* * * * * *

I use the following Cron expressions to run my Functions at certain intervals.


Every minute

0 */1 * * * *

Every hour

0 0 */1 * * *

Every day

0 0 0 */1 * *

Every week

0 0 0 * * 0

Every month

0 0 0 1 * *

Once the Cron expression is saved the timer will be activated and the function will now be called as often as you specified. This means I can now sit back and the data will be automatically rolled up and deleted from Table Storage at the appropriate time.


In action

Since I started writing this blog and first started the scripts running they've churned through an awful lot of data and right before publication I received my latest monthly Azure bill. I've now managed to reduce the amount of data on disk to a little over 1.1TB and they still have a little way to go. I will probably tweet about the latest numbers once everything is finished and up to date.