Getting started with InfluxDB

This evening I got super fustrated as I waited for an age for the last month summary of my power consumption to load from my Graphite server which stores time series data for my home automation setup and decided there had to be a better way, and after reading a whole bunch of time series database comparison articles I decided to give InfluxDB a go. I didn't care about the historic data so I forcefully deleted my old Debian Jessie Graphite VM (for some reason the package maintainer only packaged the front end for Debian Stretch and not the backend carbon stuff) and spun up a fresh Debian Stretch VM to run Influx DB. Here is how I got it up and running...

Before you start following along because you want your own super awesome InfluxDB server please bear in mind I only setup a single node which is probably suitable for dev purposes only. If you'd like to run this in production you'll probably want clustering for scaling and redundancy.

First I looked up what sort of size VM to create using the InfluxDB hardware sizing guide, this gives some rough guidelines on what sort of VM you should choose. You can find the hardware sizing guide here... https://docs.influxdata.com/influxdb/v1.4/guides/hardware_sizing in the end as I will be using far less than the "Low" load server writes, queries and series I decided to go for the minimum requirements for a low load server, 2 cores, 2GB RAM and a fast-ish 15K SAS disks (no SSDs in my hypervisor I am afraid). Following a minimal installation of Debian Stretch I configured the host with my SSH keys and the usual stuff I like to configure and then continued onto the InfluxDB installation.

First install the apt-transport-https package to allow installation from aptitude repositories on SSL enabled webservers. It is assumed you are sudo su'd up to the root user account:

apt-get install apt-transport-https

Next add the following to your /etc/apt/sources.list file:

deb https://repos.influxdata.com/debian stretch stable
deb https://packagecloud.io/grafana/stable/debian/ stretch main

These are the official InfluxDB and Grafana repositories for Debian. Next add the keys for these repos:

curl -sL https://packagecloud.io/gpg.key | apt-key add -
curl -sL https://repos.influxdata.com/influxdb.key | apt-key add -

Next install InfluxDB and Grafana and enable them at boot time:

apt-get install influxdb grafana
systemctl daemon-reload
systemctl enable grafana-server
systemctl start grafana-server
systemctl enable influxdb
systemctl start influxdb

Now InfluxDB is up and running you should be able to access InfluxDB via the command line client. We'll create our first database and apply a 520 week (about 10 years) retention policy to the database, of course you should tailor the retention policy around your use case and requirements:

root@home:~# influx
Connected to http://localhost:8086 version 1.4.2
InfluxDB shell version: 1.4.2
> CREATE database iot_devices;
> USE iot_devices;
> CREATE RETENTION POLICY "10_years" ON iot_devices DURATION 52525252520w REPLICATION 1 DEFAULT;

Now lets insert some data into the InfluxDB database, I previously created a Python script which takes data packets received on a Software Defined Radio from my CurrentCost CT Clamp (for measuring power consumption) and inserts them into a GraphiteDB, I will modify this to push data into InfluxDB. Here is the resulting script:

#!/usr/bin/python3


import subprocess
import threading
import json
import time
import datetime
from influxdb import InfluxDBClient


# Configuration values.
device_id = 1636
rtl_433_cmd = "/usr/local/bin/rtl_433 -F json"
influx_db_host = "localhost"
influx_db_port = 8086
influx_db_name = "iot_devices"
influx_db_user = None
influx_db_pass = None


def shovel_results():
    for line in iter(p.stdout.readline, b''):
        try:
            data = json.loads(line.strip())
        except:
            data = None

        if not data:
            print("Received invalid data from RTL433.")
        else:
            try:
                print("Received valid packet from RTL433: %s" % data)

                if data.get('dev_id', None) == device_id:
                    influx_client = InfluxDBClient(influx_db_host,
                                                   influx_db_port,
                                                   influx_db_user,
                                                   influx_db_pass,
                                                   influx_db_name)
                    power = data.get('power0', None)
                    print("Packet from CurrentCost meter, power usage: %s watts" % power)

                    db_json = [{
                               "measurement": "power_consumption",
                               "tags": {"device_id": device_id},
                               "time": datetime.datetime.now(),
                               "fields": {"power": power}
                              }]
                    influx_client.write_points(db_json)
            except Exception as err:
                print("Something went wrong :-(  : %s" % str(err))


p = subprocess.Popen(rtl_433_cmd.split(),
                     stdout=subprocess.PIPE,
                     stderr=subprocess.STDOUT,
                     universal_newlines=True)

t = threading.Thread(target=shovel_results)
t.start()

try:
    while True:
        time.sleep(1)
        if p.poll() is not None:
            break

finally:
    p.terminate()

If you read my previous post about posting this data into Graphite you'll notice the script is almost identical but now submits data to InfluxDB instead. Of course you'll need the InfluxDB module for Python, this can be installed directly from PyPi using the following command:

pip install influxdb

Now upon running the insertion script any data received from my CurrentCost CT Clamp will be parsed by the script and inserted into InfluxDB, these data packets are received about every 6 seconds from the wireless CT Clamp. If we check using the CLI client we can see the data getting inserted into the database:

root@home:~# influx
Connected to http://localhost:8086 version 1.4.2
InfluxDB shell version: 1.4.2
> use iot_devices;
Using database iot_devices
> SELECT * FROM power_consumption ORDER BY time DESC LIMIT 10;
name: power_consumption
time                device_id power
----                --------- -----
1516565681222598912 1636      527
1516565675455642880 1636      531
1516565669164477184 1636      529
1516565663397319168 1636      533
1516565657630265088 1636      535
1516565651338936064 1636      531
1516565639804201984 1636      528
1516565633513384192 1636      531
1516565627745722112 1636      531
1516565621454888192 1636      531
>

As you can see data from the CT Clamp has started recording into the InfluxDB database, notice how InfluxDB uses a nice SQL like syntax. Next lets configure Grafana to visualise the data. To access the Grafana UI visit your server's IP in a web browser on port 3000 and login with the default username and password "admin". Once logged in you may edit your username and password by going to the Admin > Global Users section of the menu. Next go to Home Dashboard, you should see something similar to the following.

Before drawing any pretty graphs we need to add our InfluxDB data source, click the "Create your first data source" link on the default dashboard and complete the form pointing to your InfluxDB database. Once the data source has been added we can build a custom dashboard to graph the time series data.

Following adding the data source navigate back to the home dashboard and click the "Create your first dashboard" link, this opens the dashboard builder in edit mode, to add a graph click and drag the graph icon into the dashboard, this will add a blank graph to the dashboard, to edit the graph click the graph's panel and click the edit link in the pop up menu.

Another panel will load allowing you to edit the graph's preferences, go to the "general" tab of this panel and add the title for your graph, in my case "Power Consumption".

Next click on the "metrics" tab of the graph editing panel, select the data source for the InfluxDB database and then start building the query by selecting the measurement, field and other elements of the query. As you edit the metrics settings you should see Grafana draw the graph you are proposing in the graph's panel.

Once the graph is looking satisfactory press the "X" in the top right corner of the graph editing panel and then click the save button in the main header panel to save your dashboard, since it is the first time we are saving this dashboard it will ask for a name for the dashboard. Once saved you can access this dashboard again from the dashboard selection drop down in the main header panel. If you'd like to edit the time window for the dashboard just edit the time frame in the right hand side of the main header panel.

As you can see InfluxDB and Grafana can make visually pleasing dashboards and store vast amounts of time series data. Hopefully querying InfluxDB will not suffer from the slowness which I experienced with GraphiteDB API. I will leave this running for several months adding more metrics from my other devices and then report back if that querying time is to my satisfaction. I also need to look in to backing up the data stored within InfluxDB to my NAS periodically incase the server ever fails and requires restoration.

Hopefully this gives you enough to get started using InfluxDB, of course we did not speak about clustering, user auth or any other stuff, and of course it is unlikely your source of data will be the same as mine so you will need to twist the guide as per your use case / requirements. The InfluxDB docs are super comprehensive and have been a great help in getting me this far, if you'd like to read more checkout https://docs.influxdata.com/influxdb/v1.4/

By @Robert Putt in
Tags : #internet, #iot, #technology,