I'm a sucker for pretty graphs. I love dashboards. That feeling of knowing just how everything is going at any given time is quite soothing. I recently replaced my server monitoring software with something new and I think it's so awesome I'm opening my dashboard up to the public.
Monitor all the things
I used to use New Relic to monitor my server and application health and had a pretty extensive blog titled Monitoring Server and Application Health with New Relic that covered everything. It was good and it met my requirements of keeping track of various important metrics on my servers like CPU/RAM/Network/Disk and I had alarms that'd trigger when things started to head into the red. It required various different agents on the endpoint though and their dashboards were functional but not the best looking I'd ever seen. But, monitoring is important so I stuck with it, until now.
Introducing netdata
I came across netdata in GitHub's 'State of the Octoverse' where they list the most starred repositories. There are quite a few cool things in there like D3 and electron that I already know of right alongside netdata. I could talk about netdata and how awesome it is, how pretty the graphs are, I could even include screenshots, but why don't we just go and look at it!
https://horizon.scotthelme.co.uk
Benefits
There are loads of different aspects that I love about netdata so I will grab a few of the highlights for me and look at those. You should check out their site for more info though.
No centralised collection
In netdata there is no centralised collection point for your metrics. The agents installed on your endpoints do no send their data anywhere, it all resides locally. This was a bit of an odd one to get my head around at first but it does make it incredibly easy to deploy and incredibly scalable too. When you load up a dashboard in your browser it becomes the central point for all netdata agents to connect to.
Custom dashboards
The default netdata dashboard for each server is staggering and contains pretty much all of the metrics you could ever think of. I wanted somewhere that I could get a quick overview of all of my servers though and that's the dashboard I built that you can at https://horizon.scotthelme.co.uk/. It's literally just a few snippets of HTML, there's nothing difficult about it and you can get them up and running in no time.
Lightweight
The agent itself is pretty easy going on resources and certainly has a much smaller footprint than previous tools I've used. It only takes up ~16Mb of RAM to keep my 1 hour history in memory but you can increase that to go further back in time if you like.
Setup
I added netdata to all of my servers which you can see in the dashboard on Horizon and setup is really simple. This is how I did it on my Ubuntu 14.04/16.04 servers.
sudo apt-get install zlib1g-dev uuid-dev libmnl-dev gcc make git autoconf autoconf-archive autogen automake pkg-config curl
cd ~
git clone https://github.com/firehol/netdata.git --depth=1
cd netdata
sudo ./netdata-installer.sh
Netdata is now up and running and you can go and view the dashboard for the server you just installed it on. It will be available on port 19999 assuming you have it open. You could just leave it as it currently stands but this isn't really a finished installation. I wanted to sit my dashboards behind Nginx and you can do that locally or on another host like I have.
Nginx reverse proxy
Netdata runs on all of my servers but they are proxied through Nginx on my blog server and I use a firewall rule to ensure that my server is the only one that can access them. If netdata and nginx are on the same host then you don't need to worry about this step.
sudo ufw allow from 107.170.218.42 to any port 19999
sudo ufw allow from 2604:a880:1:20::207:b001 to any port 19999
I use ufw, UncomplicatedFirewall, because it does exactly what it says on the tin and is much nicer than trying to play around with iptables. If you use a different firewall then you need to add similar rules restricting access to port 19999 to the server you want to access it or only put it on localhost. The next step is configuring Nginx to reverse proxy the connection to your netdata install/s. If you're running them on the same server then you wont need the upstream, you can just proxy pass to localhost.
upstream reporturi-shared-01 {
server 1.2.3.4:19999;
keepalive 64;
}
# Declare multiple upstreams here.
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name horizon.scotthelme.co.uk;
root /var/www/horizon/;
index index.html;
# The remainder of your usual config here.
location / {
try_files $uri $uri/ /index.html;
}
location ~ "/(?<behost>reporturi-(?:shared|www|dev|test)-[0-9]{2}|scotthel(?:me|)|securityheaders)/(?<ndpath>.*)" {
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Server $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_http_version 1.1;
proxy_pass_request_headers on;
proxy_set_header Connection "keep-alive";
proxy_store off;
proxy_pass http://$behost/$ndpath$is_args$args;
gzip on;
gzip_proxied any;
gzip_types *;
}
}
Nginx is now configured to listen for requests to the specific dashboards and will proxy them. In the example above a request to horizon.scotthelme.co.uk/reporturi-shared-01/
would be passed to the upstream defined at the top of the config. Any requests to horizon.scotthelme.co.uk
would be served out of my root directory as usual, more on that later.
Update 3rd March 2017: The above Nginx config was updated to add a regex match in the location block for my upstreams to fix an open proxy bug found by @AlekMuzo.
Enable Nginx monitoring
If you want netdata to output detailed metrics on Nginx then you need to enable the Nginx status stub. On the site being monitored add the following to the bottom of one of your virtual host configs.
server {
listen 127.0.0.1:80;
server_name 127.0.0.1;
location /stub_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
location ~ ^/(status|ping)$ {
allow 127.0.0.1;
deny all;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_pass 127.0.0.1:9000;
}
}
This will tell Nginx to make the status stub available on localhost at /stub_status
which is where netdata will look for it. Netdata will now use that to produce information for Nginx. The second location block in there was to do the same for PHP, which we now need to enable. Restart Nginx to apply the changes.
sudo service nginx restart
Enable PHP monitoring
You need to make 3 small changes to your PHP config to allow netdata to monitor it.
sudo nano /etc/php/5.6/fpm/pool.d/www.conf
You will need to update the path if your version of PHP is different but the file will be the same and you need to uncomment the following 3 lines.
pm.status_path = /status
ping.path = /ping
ping.response = pong
This will make the status data available to netdata so go ahead and restart PHP to bring the changes into effect.
sudo service php5.6-fpm restart
After updating both of these you can restart the netdata service to see the new graphs in your dashboard.
sudo service netdata restart
Custom Dashboards
These things are super easy to setup. Each of the CPU graphs you see on the main page are produced with the following HTML.
<div class="row">
<div data-netdata="system.cpu" data-host="https://horizon.scotthelme.co.uk/reporturi-shared-01/" data-gauge-max-value="100" data-chart-library="gauge" data-width="50%" data-after="-540" data-points="540" data-title="CPU" data-units="%" data-colors="#FF9331" class="netdata-container" style="width: 200px; height: 140px;"></div>
</div>
<div class="row">
<div class="col-xs-12"><p class="text-center"><a href="https://horizon.scotthelme.co.uk/reporturi-shared-01/" class="btn btn-primary">Click Me!</a></p></div>
</div>
You can customise everything about it, use different metrics, colours, bounds on the graph values, types of graph, anything. I've published my full code up on GitHub here so feel free to use that as a base for anything you create.