Verifying Replication Consistency with Percona’s pt-table-checksum

Replication is an important concept for improving database performance and security. In this blog post, I would like to demonstrate how the consistency between a MySQL master and a slave can be verified. We will create two Docker containers, one for the master one for the slave.

Installing the Percona Toolkit

The Percona Toolkit is a collection of useful utilities, which can be obained for free from the company’s portal. The following commands install the prerequisits, download the package and eventually the package.

sudo apt-get install -y wget libdbi-perl libdbd-mysql-perl libterm-readkey-perl libio-socket-ssl-perl
sudo dpkg -i percona-toolkit_3.0.4-1.xenial_amd64.deb 

Setting up a Test Environment with Docker

The following command creates and starts a docker container. Note that these are minimal examples and are not suitable for a serious environment.

docker run --name mysql_master -e MYSQL_ALLOW_EMPTY_PASSWORD=true -d mysql:5.6 --log-bin \
   --binlog-format=ROW --server-id=1```

Get the IP address from the master container:

Get the IP of the master

docker inspect mysql_master | grep IPAddress

“SecondaryIPAddresses”: null, “IPAddress”: “"```

You can connect to this container like this and verify the server id:

stefan@Lenovo ~/Docker-Projects $ mysql -u root -h
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.35-log MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like 'server_id';
| Variable_name | Value |
| server_id     | 1     |
1 row in set (0,00 sec)

We repeat the command for the slave, but use a different id. port and name:

docker run --name mysql_slave -e MYSQL_ALLOW_EMPTY_PASSWORD=true -d mysql:5.6 --server-id=2```

For simplicity, we did not use Docker links, but will rather use IP addresses assigned by Docker directly.

## Replication Setup

First, we need to setup a user with replication privileges. This user will connect from the slave to the master.

On the host, interact with the master container

Get the IP address of the slave container

$ docker inspect mysql_slave | grep IPAddress “SecondaryIPAddresses”: null, “IPAddress”: “”, “IPAddress”: “”,

Login to the MySQL console of the master

Grant permissions


Get the current binlog position

mysql> SHOW MASTER STATUS; +——————-+———-+————–+——————+——————-+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +——————-+———-+————–+——————+——————-+ | mysqld-bin.000002 | 346 | | | | +——————-+———-+————–+——————+——————-+ 1 row in set (0,00 sec)```

Now log into the slave container and add the connection details for the master:

## Connect to the MySQL Slave instance
$ mysql -u root -h

### Setup the slave

Query OK, 0 rows affected, 2 warnings (0,05 sec)

### Start and check
mysql>   start slave;
Query OK, 0 rows affected (0,01 sec)

mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_User: percona
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000002
          Read_Master_Log_Pos: 346
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 284
        Relay_Master_Log_File: mysqld-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Now our simple slave setup is running.

Get some test data

Lets download the Sakila test database and import it into the master. It will be replicated immediately.

~/Docker-Projects $ tar xvfz sakila-db.tar.gz 

mysql -u root -h < sakila-db/sakila-schema.sql 
mysql -u root -h < sakila-db/sakila-data.sql```

Verify that the data is on the slave as well:

mysql -u root -h -e “USE sakila;SHOW TABLES;” +—————————-+ | Tables_in_sakila | +—————————-+ | actor | | actor_info | | address | | category | | city | | country | | customer | … | store | +—————————-+

After our setup is completed, we can proceed with Percona pt-table checksum.

# Percona pt-table-checksum

The Percona pt-table-checksum tool requires the connection information of the master and the slave in a specific format. This is called the DSN (data source name), which is a coma separated string. We can store this information in a dedicated database called percona in a table called dsns. We create this table on the master. Note that the data gets replicated to the slave within the blink of an eye.

CREATE DATABASE percona; USE percona;

CREATE TABLE DSN-Table ( id int(11) NOT NULL AUTO_INCREMENT, dsn varchar(255) NOT NULL, PRIMARY KEY (id) );

The next step involves creating permissions on the slave and the master!


The percona user is needed to run the script. Note that the IP address is this time from the (Docker) host, having the IP by default. In real world scenarios, this script would either be run on the master or on the slave directly.

Now we need to add the information about the slave to the table we created. The Percona tool could also read this from the process list, but it is more reliable if we add the information ourselves. To do so, we add a record to the table we just created, which describes the slave DSN:

INSERT INTO percona.DSN-Table VALUES (1,'h=,u=percona,p=SECRET,P=3306');```

The pt-table-checksum tool the connects to the master instance and the the slave. It computes checksums of all databases and tables and compares results. You can use the tool like this:

pt-table-checksum –replicate=percona.checksums –create-replicate-table –empty-replicate-table
–recursion-method=dsn=t=percona.DSN_Table -h -P 3306 -u percona -pSECRET TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 09-10T10:13:11 0 0 0 1 0 0.020 mysql.columns_priv 09-10T10:13:11 0 0 3 1 0 0.016 mysql.db 09-10T10:13:11 0 0 0 1 0 0.024 mysql.event 09-10T10:13:11 0 0 0 1 0 0.014 mysql.func 09-10T10:13:11 0 0 40 1 0 0.026 mysql.help_category 09-10T10:13:11 0 0 614 1 0 0.023 mysql.help_keyword 09-10T10:13:11 0 0 1224 1 0 0.022 mysql.help_relation 09-10T10:13:12 0 0 585 1 0 0.266 mysql.help_topic 09-10T10:13:12 0 0 0 1 0 0.031 mysql.ndb_binlog_index 09-10T10:13:12 0 0 0 1 0 0.024 mysql.plugin 09-10T10:13:12 0 0 6 1 0 0.287 mysql.proc 09-10T10:13:12 0 0 0 1 0 0.031 mysql.procs_priv 09-10T10:13:12 0 1 2 1 0 0.020 mysql.proxies_priv 09-10T10:13:12 0 0 0 1 0 0.024 mysql.servers 09-10T10:13:12 0 0 0 1 0 0.017 mysql.tables_priv 09-10T10:13:12 0 0 1820 1 0 0.019 mysql.time_zone 09-10T10:13:12 0 0 0 1 0 0.015 mysql.time_zone_leap_second 09-10T10:13:12 0 0 1820 1 0 0.267 mysql.time_zone_name 09-10T10:13:13 0 0 122530 1 0 0.326 mysql.time_zone_transition 09-10T10:13:13 0 0 8843 1 0 0.289 mysql.time_zone_transition_type 09-10T10:13:13 0 1 4 1 0 0.031 mysql.user 09-10T10:13:13 0 0 1 1 0 0.018 percona.DSN_Table 09-10T10:13:13 0 0 200 1 0 0.028 09-10T10:13:13 0 0 603 1 0 0.023 sakila.address 09-10T10:13:13 0 0 16 1 0 0.033 sakila.category 09-10T10:13:13 0 0 600 1 0 0.023 09-10T10:13:13 0 0 109 1 0 0.029 09-10T10:13:14 0 0 599 1 0 0.279 sakila.customer 09-10T10:13:14 0 0 1000 1 0 0.287 09-10T10:13:14 0 0 5462 1 0 0.299 sakila.film_actor 09-10T10:13:14 0 0 1000 1 0 0.027 sakila.film_category 09-10T10:13:14 0 0 1000 1 0 0.032 sakila.film_text 09-10T10:13:14 0 0 4581 1 0 0.276 sakila.inventory 09-10T10:13:15 0 0 6 1 0 0.030 sakila.language 09-10T10:13:15 0 0 16049 1 0 0.303 sakila.payment 09-10T10:13:15 0 0 16044 1 0 0.310 sakila.rental 09-10T10:13:15 0 0 2 1 0 0.029 sakila.staff 09-10T10:13:15 0 0 2 1 0 0.020

The result shows a difference in the MySQL internal table for permissions. This is obviously not what we are interested in, as permissions are individual to a host. So we rather exclude the MySQL internal database and also the percona database, because it is not what we are interested in. Also in order to test it the tool works, we delete the last five category assignments from the table with <span class="lang:default decode:true crayon-inline">mysql -u root -h -e &#8220;DELETE FROM sakila.film_category WHERE film_id > 995;&#8221;</span> and update a row in the city table with&nbsp;

mysql -u root -h -e “update SET city='Innsbruck’ WHERE city_id=590;“```

Now execute the command again:

pt-table-checksum --replicate=percona.checksums --create-replicate-table --empty-replicate-table \
   --recursion-method=dsn=t=percona.DSN_Table --ignore-databases mysql,percona -h -P 3306 -u percona -pSECRET
09-10T10:46:33      0      0      200       1       0   0.017
09-10T10:46:34      0      0      603       1       0   0.282 sakila.address
09-10T10:46:34      0      0       16       1       0   0.034 sakila.category
09-10T10:46:34      0      1      600       1       0   0.269
09-10T10:46:34      0      0      109       1       0   0.028
09-10T10:46:34      0      0      599       1       0   0.285 sakila.customer
09-10T10:46:35      0      0     1000       1       0   0.297
09-10T10:46:35      0      0     5462       1       0   0.294 sakila.film_actor
09-10T10:46:35      0      1     1000       1       0   0.025 sakila.film_category
09-10T10:46:35      0      0     1000       1       0   0.031 sakila.film_text
09-10T10:46:35      0      0     4581       1       0   0.287 sakila.inventory
09-10T10:46:35      0      0        6       1       0   0.035 sakila.language
09-10T10:46:36      0      0    16049       1       0   0.312 sakila.payment
09-10T10:46:36      0      0    16044       1       0   0.320 sakila.rental
09-10T10:46:36      0      0        2       1       0   0.030 sakila.staff
09-10T10:46:36      0      0        2       1       0   0.027

You see that there is a difference in the tables and in the table sakila.film_category. The tool does not report the actual number of differences, but rather the number of different chunks. To get the actual differences, we need to use a different tool, which utilises the checksum table that the previous step created.

Show the differences with pt-tabel-sync

The pt-table-sync tool is the counter part for the pt-table-checksum util. It can print or even replay the SQL statements that would render the slave the same state again to be in sync with the master. We can run a dry-run first, as the tool is potentially dangerous.

pt-table-sync --dry-run  --replicate=percona.checksums --sync-to-master h= -P 3306 \
   -u percona -pSECRET --ignore-databases mysql,percona
# NOTE: --dry-run does not show if data needs to be synced because it
#       does not access, compare or sync data.  --dry-run only shows
#       the work that would be done.
# Syncing via replication P=3306,h=,p=...,u=percona in dry-run mode, without accessing or comparing data
#      0       0      0      0 Chunk     08:57:51 08:57:51 0
#      0       0      0      0 Nibble    08:57:51 08:57:51 0    sakila.film_category

With –dry-run, you only see affected tables, but not the actual data because it does not really access the databases tables in question. Use –print additionally or instead of dry-run to get a list:

pt-table-sync --print --replicate=percona.checksums --sync-to-master h= -P 3306 \
  -u percona -pSECRET --ignore-databases mysql,percona
REPLACE INTO `sakila`.`city`(`city_id`, `city`, `country_id`, `last_update`) VALUES \
   ('590', 'Yuncheng', '23', '2006-02-15 04:45:25') 
  \ /*percona-toolkit src_db:sakila src_tbl:city  ...
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ... 
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('997',... 
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('998', '11 ...
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('999', '3', ...
REPLACE INTO `sakila`.`film_category`(`film_id`, `category_id`, `last_update`) VALUES ('1000', '5', ... 

The command shows how we can rename back from Innsbruck to Yuncheng again and also provides the INSERT statements to get the deleted records back.When we replace –print with –execute, the data gets written to the master and replicated to the slave. To allow this, we need to set the permissions on the master

GRANT INSERT, UPDATE, DELETE ON sakila.* TO 'percona'@'';
pt-table-sync --execute  --replicate=percona.checksums --check-child-tables \ 
  --sync-to-master h= -P 3306 -u percona -pSECRET --ignore-databases mysql,percona
REPLACE statements on can adversely affect child table `sakila`.`address` 
   because it has an ON UPDATE CASCADE foreign key constraint. 
   See --[no]check-child-tables in the documentation for more information. 
   --check-child-tables error  while doing on

This error indicates that updating the city table has consequences, because it is a FK to child tables. In this example, we are bold and ignore this warning. This is absolutely not recommended for real world scenarios.

pt-table-sync --execute  --replicate=percona.checksums --no-check-child-tables \
   --no-foreign-key-checks --sync-to-master h= -P 3306 -u percona -pSECRET \ 
   --ignore-databases mysql,percona

The command–no-check-child-tables ignores child tables and the command –no-foreign-key-checks ignores foreign keys.

Run the checksum command again to verify that the data has been restored:

pt-table-checksum --replicate=percona.checksums --create-replicate-table --empty-replicate-table \ 
   --recursion-method=dsn=t=percona.DSN_Table --ignore-databases mysql,percona 
   -h -P 3306 -u percona -pSECRET

09-10T11:24:42      0      0      200       1       0   0.268
09-10T11:24:42      0      0      603       1       0   0.033 sakila.address
09-10T11:24:42      0      0       16       1       0   0.029 sakila.category
09-10T11:24:42      0      0      600       1       0   0.275
09-10T11:24:42      0      0      109       1       0   0.023
09-10T11:24:43      0      0      599       1       0   0.282 sakila.customer
09-10T11:24:43      0      0     1000       1       0   0.046
09-10T11:24:43      0      0     5462       1       0   0.284 sakila.film_actor
09-10T11:24:43      0      0     1000       1       0   0.036 sakila.film_category
09-10T11:24:43      0      0     1000       1       0   0.279 sakila.film_text
09-10T11:24:44      0      0     4581       1       0   0.292 sakila.inventory
09-10T11:24:44      0      0        6       1       0   0.031 sakila.language
09-10T11:24:44      0      0    16049       1       0   0.309 sakila.payment
09-10T11:24:44      0      0    16044       1       0   0.325 sakila.rental
09-10T11:24:44      0      0        2       1       0   0.029 sakila.staff
09-10T11:24:44      0      0        2       1       0   0.028

0 DIFFS, we are done!

Fixing Random Freezes with Ubuntu 16.04 LTS, Intel Skylake and an Nvidia GPU

My Lenovo ThinkCentre m900 (10FHCTO1WW) with an Intel i7-6700 showed weird and random freezes from day 1 when trying to install Mint 18 / Ubuntu 16 with any kernel newer than 3x. After investigating for quite some hours, I gave up and installed an Ubuntu 14.04 LTS on it. The device is certified to it, but the old version did not support all features and even some basic things such as audio did not work. At lest the random freezes were gone and I could work with that machine. Now that the system will not receive updates soon, I gave it another try and setup Mint 18.2 (Sonya). Unfortunately, the Lenovo machine froze again after a few minutes, filling up the log again with the following error messages.

Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025547] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [chrome:13814]
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025549] Modules linked in: bnep ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink ...
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025614] CPU: 6 PID: 13814 Comm: chrome Not tainted 4.8.0-53-generic #56~16.04.1-Ubuntu
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025614] Hardware name: LENOVO 10FHCTO1WW/30BC, BIOS FWKT5FA   11/08/2016
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025615] task: ffff8fd736e12dc0 task.stack: ffff8fd71781c000
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025615] RIP: 0010:[<ffffffff90d0b339>]  [&lt;ffffffff90d0b339&gt;] smp_call_function_many+0x1f9/0x250
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025619] RSP: 0018:ffff8fd71781fc00  EFLAGS: 00000202
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025619] RAX: 0000000000000003 RBX: 0000000000000200 RCX: 0000000000000007
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025620] RDX: ffff8fd85dddd920 RSI: 0000000000000200 RDI: ffff8fd85dd9a288
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025620] RBP: ffff8fd71781fc38 R08: 0000000000000000 R09: 00000000000000bf
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025620] R10: 0000000000000008 R11: ffff8fd85dd9a288 R12: ffff8fd85dd9a288
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025621] R13: ffff8fd85dd9a280 R14: ffffffff90c723c0 R15: 0000000000000000
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025622] FS:  00007f6120196a80(0000) GS:ffff8fd85dd80000(0000) knlGS:0000000000000000
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025622] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025622] CR2: 00003a66b8d61000 CR3: 0000000449abd000 CR4: 00000000003406e0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025623] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025623] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025624] Stack:
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025624]  000000000001a240 0100000000000001 ffff8fd6f7434d80 ffffffff90c723c0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025626]  0000000000000000 ffff8fd71781fd10 ffff8fd71781fc68 ffff8fd71781fc60
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025627]  ffffffff90d0b46d ffff8fd6f7434d80 ffff8fd85ddd4508 ffff8fd71781fd08
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025628] Call Trace:
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025630]  [<ffffffff90c723c0>] ? leave_mm+0xd0/0xd0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025631]  [<ffffffff90d0b46d>] on_each_cpu+0x2d/0x60
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025632]  [<ffffffff90c72c2b>] flush_tlb_kernel_range+0x4b/0x80
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025634]  [<ffffffff90de9f56>] __purge_vmap_area_lazy+0x2d6/0x320
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025635]  [<ffffffff90dea0b7>] vm_unmap_aliases+0x117/0x140
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025636]  [<ffffffff90c6e1ae>] change_page_attr_set_clr+0xee/0x4f0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025638]  [<ffffffff90c6f21f>] set_memory_ro+0x2f/0x40
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025639]  [<ffffffff90d7f11a>] bpf_prog_select_runtime+0x2a/0xd0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025641]  [<ffffffff9139a2af>] bpf_prepare_filter+0x37f/0x3f0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025642]  [<ffffffff9139a47c>] bpf_prog_create_from_user+0xbc/0x120
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025643]  [<ffffffff90d43b30>] ? proc_watchdog_cpumask+0xe0/0xe0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025644]  [<ffffffff90d4410e>] do_seccomp+0x12e/0x610
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025645]  [<ffffffff90c991c6>] ? SyS_prctl+0x46/0x490
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025646]  [<ffffffff90d446fe>] SyS_seccomp+0xe/0x10
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025648]  [<ffffffff9149a876>] entry_SYSCALL_64_fastpath+0x1e/0xa8
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025648] Code: 94 33 00 3b 05 ed 3a e5 00 89 c1 0f 8d 99 fe ff ff 48 98 49 8b ...```

I started the investigation again and found a different trace, which pointed to the graphics card. The important hint and solution came from [SO][1]. Following a few other forum posts, it became clear that the Nvidia drivers do not play nicely with recent kernels for some specific Nvidia cards ind combination with newer kernels. So I followed the proposed steps and disabled the card complete. Just removing the card in the BIOS and uninstalling the drivers was not enough. I also had to blacklist the modules for thenouveau kernel driver.

  1. Disable the Nvidia card in the BIOS and use the Intel onchip GPU
  2. Remove all Nvidia packages:  
    `sudo apt-get remove nvidia* && sudo apt autoremove`
  3. Blacklist the module:   
    `sudo vim /etc/modprobe.d/blacklist.conf`</p> <pre class=""><code>blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off</code>```

    <pre class=""><code>echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf</code>```

    <pre class=""><code>sudo update-initramfs -u</code>```

  4. Reboot

The card is not used any more and the freezes stopped.

*-display UNGEFORDERT Beschreibung: VGA compatible controller Produkt: GK208 [GeForce GT 720] Hersteller: NVIDIA Corporation Physische ID: 0 Bus-Informationen: pci@0000:01:00.0 Version: a1 Breite: 64 bits Takt: 33MHz Fähigkeiten: pm msi pciexpress vga_controller cap_list Konfiguration: latency=0```

I hope I do not have to remove this article again and the system remains as stable as it is now for six hours.

Deploying MySQL in a Local Development Environment

Installing MySQL via apt-get is a simple task, but the migration between different MySQL versions requires planning and testing. Thus installing one central instance of the database system might not be suitable, when the version of MySQL or project specific settings should be switched quickly without interfering with other applications. Using one central instance can quickly become cumbersome. In this article, I will describe how any number of MySQL instances can be stored and executed from within a user’s home directory.

Adapting MySQL Data an Log File Locations

Some scenarios might require to run several MySQL instances at once, other scenarios cover sensitive data, where we do not want MySQL to write any data on non-encrypted partitions. This is especially true for devices which can get easily stolen, for instance laptops. If you use a laptop for developing your applications from time to time, chances are good that you need to store sensitive data in a database, but need to make sure that the data is encrypted when at rest. The data stored in a database needs to be protected when at rest.

This can be solved with full disk encryption, but this technique has several disadvantages. First of all, full disk encryption only utilises one password. This entails, that several users who utilise a device need to share one password, which reduces the reliability of this approach. Also when the system needs to be rebooted, full disk encryption can become an obstacle, which increases the complexity further.

Way easier to use is the transparent home directory encryption, which can be selected during many modern Linux setup procedures out of the box. We will use this encryption type for this article, as it is reasonable secure and easy to setup. Our goal is to store all MySQL related data in the home directory and run MySQL with normal user privileges.

Creating the Directory Structure

The first step is creating a directory structure for storing the data. In this example, the user name is stefan, please adapt to your needs.

Flashing a NanoPc T3 with DietPi

The NanoPc T3 is a 64 bit octa core single board computer, quite similar to the famous Raspberry Pi boards. It is also often referred to as NanoPi T3 as well.

Hardware Specification

The single board computer has eight cores with up to 1.4GhZ and 1 GB of DDR3 RAM. It has a lot of nice interfaces, the specification below is taken from [here][1].

SoC: Samsung S5P6818 Octa-Core Cortex-A53, 400M Hz - 1.4G Hz
    Power Management Unit: AXP228 PMU, it supports software power-off and wake-up.
    System Memory: 1GB/2GB 32bit DDR3 RAM
    Storage: 1 x SD Card Socket
    Ethernet: Gbit Ethernet(RTL8211E)
    WiFi: 802.11b/g/n
    Bluetooth: 4.0 dual mode
    Antenna: Porcelain Antenna IPX Interface
    eMMC: 8GB
    Video Input: DVP Camera/MIPI-CSI (two camera interfaces)
    Video Output: HDMI Type-A / LVDS / Parallel RGB-LCD / MIPI-DSI (four video output interfaces)
    Audio: 3.5 mm audio jack / via HDMI
    Microphone: onboard Microphone
    USB Host: 4 x USB 2.0 Host, two type A ports and two 2.54 mm pitch pin-headers
    MicroUSB: 1 x MicroUSB 2.0 Client, Type A
    LCD Interface: 0.5mm pitch 45 pin FPC seat, full color RGB 8-8-8
    HDMI: 1.4A Type A, 1080P
    DVP Camera: 0.5mm pitch 24 pin FPC seat
    GPIO: 2.54 mm pitch 30 pin-header
    Serial Debug Port: 2.54mm pitch 4-pin-header
    User Key: K1 (power), Reset
    LED: 1 x power LED and 2 x GPIO LED
    Other Resources: CPU’s internal TMU
    RTC Battery: RTC Battery Seat
    Heat Sink: 1 x Heat Sink with mounting holes
    Power: DC 5V/2A
    PCB: Six Layer
    Dimension: 100 mm x 60 mm
    OS/Software: uboot, Android and Debian


The device offers quite a lot considering its small measurements. The picture below is an overview picture taken from [here][2].

[][3]The device with the heat sink and attached cables is shown below.


Comparison with the Raspberry Pi Model 3B

It costs about twice as much as the Raspberry Pi 3, but comes with eight cores at 1.4GHz instead of four cores with 1.2GHz, GBit Ethernet instead of just 100 MBit and several additional interfaces. It has a dedicated power switch, supports soft poweroff and provides reset and boot buttons. It comes with an SD card slot instead of micro SD, has only two standard USB ports but also one micro USB port. This port however is not for powering the device, but only for data.

Some remarks at First

The board can get quite warm, so I would recommend buying the heat sins that fit directly on the board as well. The wifi signal is also rather weak, I would recommend investing in the external antenna if the device is in an area with low signal reception. Also it requires an external 5V power source and does not provide a micro USB port for power like similar boards use.

Buying and Additional Information

The board can be obtained for 60$ from [here][5] and there also exists a [wiki page][1] dedicated to the T3. The images are stored at a One-Click share hoster and the download is very slow. Also the files are not that well organized and can be easily confused with other platforms offered by the same company.

  • Nano PC T3 ($60)
  • Heat sink ($1.99)
  • Power supply ($20)
  • SD card (~ $10)

Additionally there is shipping ($20 to Europe) and also very likely some toll to pay.

Initial Setup

The NanoPi T3 has an internal eMMC storage with 8GB capacity. It comes pre-installed with Android, which is not really useful for my applications. Instead, there exist different ISO images wich can be obtained here. The wiki page documents how to create bootable SD cards with Windows and Linux and there are also scripts offered, which automate the process. Unfortunately, the scripts are not documented well and some of the links are already broken, which reduces the usability of the provided information. Also as the images should be downloaded from some Sharehoster, there is no way of verifying, what kind of image you actually obtained. This is a security risk and not applicable in many scenarios. Fortunately, there also exist alternative images which are more transparent to use.

By default, the device boots from the eMMC flash storage. By pressing the boot button in the lower right corner, we can also boot from the SD card. This is a nice feature, but if you want to reboot the system unattended, then we need to replace the default operating system. In the course of this article, we are going to write an alternative Debian image to the flash memory and boot this OS automatically.


[DietPi][6] is a Debian based distribution, which claims to be an optimized and lightweight alternative for single board PCs. The number of supported devices is impressive and luckily, also the NanoPC T3 is in the list. It also comes with a list of nice features for the configuration and the backup of the system. DietPi can be dowloaded [here][7] and the documentation is available [here][8].

The following steps are requried:

  1. Download the DietPi Image
  2. Write the image to the SD card
  3. Mount the SD card on your desktop and copy the DietPi image to the card
  4. Boot the NanoPC T3 from the card
  5. Flash the DietPi image to the eMMC
  6. Reboot
  7. Configure

Creating a Bootable SD Card

The fist step involves creating a bootable SD card by writing the DietPi image with dd to the card. To do so, download the DietPi image to your local Desktop and then write the file with dd. The process does not differ from other single board machines and is described [here][9]. The next step might seem a bit odd. After you finished writing the SD card, mount it on your local Dekstop and copy the DietPi image to the tmp directory of the SD card.The reason we do this is that we need to have a running Linux system so that we can flash the integrated eMMC of the T3. We then use the DietPi Linux zu actually flash the eMMC of the T3 also with the DietPi image. By copying the image we save some time for downloading and we have the image right available in the next step.

Boot the SD Card

Make sure the T3 is powered off and insert the SD card into the board. Hold and keep pressed the boot button and flip the power switch. The T3 then should boot into the DietPi system. It is easier if you attach a monitor and a keyboard to the system for the further configuration. Alternatively, you can also configure the networking settings in advance, by mounting the SD card at the Desktop and edit the configuration files there, but as we simply use this system for installing the actual operating system, this might be a bit too much effort. Press CRTL+ALT+F2 to switch to a new TTY and login. The standard login for the DietPi system is with the user root and password dietpi.

First, create a backup of the original eMMC content, just in case anything does wrong. Use fdisk, to see the available drives.

Switch the Git Clone Protocol from HTTPS to SSH

Gitlab offers several options for interacting with remote repositories: git, http, https and ssh. The first option – git – is the native transport protocol and does not encrypt the traffic. The same applies for http, rendering https and ssh the only feasible protocols if you commit and retrieve data via insecure networks. Ssh and https are also both available via the web interfaces of Github and Gitlab. In both systems you can simply copy and paste the clone URLs including the protocol. The following screenshot shows the Github version.



The simplest way to fetch the repository is to just copy the default HTTPS URL and clone it to the local drive. Git will ask you for the Github credentials.

:~/Projects$ git clone
Klone nach 'test-project' ...
Username for '': 

You will be asked for the credentials every time you interact with the Github remote repository. Per default, git stores credentials for 5 minutes. Instead of waiting so long, we can just drop the credentials and proceed with an empty cache again.

git credential-cache exit```

To make our live a little easier, we can store the username. In this example, we store this information only locally, valid for this cloned repository only. The same settings can also be applied globally.

<pre class="theme:github lang:default decode:true">git config ""
git config "username"

Git will store that information locally (i.e. inside the repository) in the config file:

Switching Kernels: Using Python 2.7 and Python 3.5 in Jupyter Notebooks

Jupyter Notebooks are a great way for working with Python interactively. The integration of Python code into documents is very useful for reports or for writing executable documentation of algorithms and functions. The text can be structured and exported in various formats. With the ever increasing popularity of Python based on the data science hype, more and more libraries are available. Although Python3 is considered to be the future of Python, consensus on the question Python 2.7 vs Python 3.5 is not yet reached. There are quite a few differences and Python 3 is not backwards compatible and therefore the code cannot be executed with both versions without modification. When you install Jupyter Notebooks via Anaconda, Python3 is recommended but Python 2.7 packages also exist.

As there is a large number of libraries, which have not yet been ported to Python 3, it can be useful to switch between the language version within a Jupyter Notebook. The following example assumes that you have both Python versions already installed.

Installing a new Kernel

In Jupyter Notebooks, the kernel is responsible for executing Python code. When you install the Anaconda System for Python3, this version also becomes the default for the notebooks. In order to enable Python 2.7 in your notebooks, you need to install a new kernel like this:

A MySQL 5.7 Cluster Based on Ubuntu 16.04 LTS – Part 2

In a recent article, I described how to setup a basic MySQL Cluster with two data nodes and a combined SQL and management node. In this article, I am going to highlight a hew more things and we are going to adapt the cluster a little bit.

Using Hostnames

For making our lives easier, we can use hostnames which are easier to remember than IP addresses. Hostnames can be specified for each VM in the file /etc/hosts. For each request to the hostname, the operating system will lookup the corresponding IP address. We need to change this file on all three nodes to the following example:

A MySQL 5.7 Cluster Based on Ubuntu 16.04 LTS – Part 1

A Cluster Scenario

In this example we create the smallest possible MySQL cluster based on four nodes running on three machines. Node 1 will run the cluster management software, Node 2 and Node 3 will serve as dats nodes and Node 4 is the MySQSL API, which runs on the same VM on Node 1.

Parsing SQL Statements

JDBC and the Limits of ResultSet Metadata

For my work in the area of [data citation][1], I need to analyse queries, which are used for creating subsets. I am particularly interested in query parameters, sortings and filters. One of the most commonly used query languages is SQL, which is used by many relational database management systems such as MySQL. In some cases, the interaction with databases is abstract, meaning that there is hardly any SQL statements executed directly. The SQL statements are rather built on the fly by object relational mappers such as Hibernate. Other scenarios use SQL statements as String and also prepared statements, which are executed via JDBC. However,  analysing SQL statements is tricky as the language is very flexible.

In order to understand what columns have been selected, it is sufficient to utilise the ResultSet Metadata and retrieve the column names from there. In my case I need to extract this imformation from the query in advance and potentially enforce a specific sorting by adding columns to the ORDER BY clause. In this scenario, I need to parse the SQL statement and retrieve this information from the statement itself. Probably the best way to do this would be to implement a parser for the SQL dialect with [ANTLR][2] (ANother Tool for Language Recognition). But this is quite a challenge, so I decided to take a shortcut: FoundationDB.

The FoundationDB Parser

[FoundationDB][3] was a NoSQL database which provided several layers for supporting different paradigms at once. I am using past tense here, because the project got [acquired by Apple in 2015][4] and since then does pursue the open source project any more. However, the Maven libraries for the software are still available at [Maven Central][5]. FoundationDB uses its own SQL parser, which understands standard SQL queries. These queries can be interpreted as a tree and the parser library allows traversing SQL statements and analyse the nodes. We can use this tree to parse and interpret SQL statements and extract additional information.

The Foundations of FoundationDB

The FoundationDB parser can be included into your own project with the following Maven dependency:

Create a Category Page for one Specific Category and Exclude this Category from the Main Page in WordPress

This blog serves as my digital notebook for more than eight years and I use to to collect all sorts of things, that I think are worth storing and sharing. Mainly, I blog about tiny technical bits, but recently I also started to write about my life here in Innsbruck, where I try to discover what this small city and its surroundings has to offer. The technical articles are written in English, as naturally the majority of visitors understands this language. The local posts are in German for the same reason. My intention was to separate this two topics in the blog and lot let the posts create any clutter between languages.

Child Themes

When tinkering with the code of your WordPress blog, it is strongly recommended to deploy and use a child theme. This allows to reverse changes easily and more importantly, allows to update the theme without having to re-implement your adaptions after each update. Creating a child theme is very easy and described here. In addition I would recommend using some sort of code versioning tool, such as Git.

Excluding a Category from the Main Page

WordPress offers user defined categories out of the box and category pages for each category. This model does not fit well for my blog, where I have static pages and a time series of blog posts on the main page. In order to prevent that the posts about Innsbruck show up at the main page, the category ‘Innsbruck’ needs to be excluded. We can create or modify the file functions.php in the child theme folder and add the following code.

add_action( 'wp_enqueue_scripts', 'theme_enqueue_styles' );
function theme_enqueue_styles() {
        wp_enqueue_style( 'parent-style', get_template_directory_uri() . '/style.css' );


// Exclude Innsbruck Category from Main Page
function exclude_category($query) {
     if ( $query->is_home() ) {
         // Get the category ID of the category Insbruck
         $innsbruckCategory = get_cat_ID( 'Innsbruck' );
         // Add a minus in front of the string
         $query->set('cat','-' . $innsbruckCategory);

     return $query;
add_filter('pre_get_posts', 'exclude_category');


This adds a filter which gets executed before the posts are collected. We omit all posts of the category Innsbruck, by adding a minus as prefix of the category ID. Of course you could also lookup the category ID in the administration dashboard, by hovering with your mouse over the category name and save one database query.

A Custom Page Specific for one Category

In the second step, create a new page in the dashboard. This page will contain all posts of the Innsbruck category that we will publish. Create the file page.php in your child theme folder and use the following code:

 * The template for displaying all pages.
 * This is the template that displays all pages by default.
 * Please note that this is the WordPress construct of pages
 * and that other 'pages' on your WordPress site will use a
 * different template.
 * @package dazzling

    <div id="primary" class="content-area col-sm-12 col-md-8">
        <main id="main" class="site-main" role="main">

     // Specify the arguments for the post query
     $args = array(
        'cat' => '91', // Innsbruck category id
        'post_type' => 'post',
        'posts_per_page' => 5,
        'paged' => ( get_query_var('paged') ? get_query_var('paged') : 1),

    if( is_page( 'innsbruck' )) {

<?php while ( have_posts() ) : the_post(); ?>
    <?php get_template_part( 'content', 'post' ); ?>
        // If comments are open or we have at least one comment, load up the comment template
        if ( comments_open() || '0' != get_comments_number() ) :
<?php endwhile; // end of the loop. ?>

<div class="navigation">
    <div class="alignleft">&lt;?php next_posts_link('&laquo; Ältere Beiträge') ?&gt;&lt;/div&gt;
    <div class="alignright">&lt;?php previous_posts_link('Neuere Beiträge &raquo;') ?&gt;&lt;/div&gt;

    </main>&lt;!-- #main --&gt;
</div>&lt;!-- #primary --&gt;
<?php get_sidebar(); ?>
<?php get_footer(); ?>

In this code snippet, we define a set of arguments, which are used for filtering the posts of the desired category. In this example, I used the id of the Innsbruck category (91) directly. We define that we want to display posts only, 5 per page. An important aspect is the pagination. When we only display posts of one category, we need to make sure that Worpress counts the pages correctly. Otherwise the page would always display the same posts, regardless how often the user clicks on the next page button. The reason is that this button uses the global paged variable, which is set correctly in the example above.

The if conditional makes sure that only the pages from the Innsbruck category are displayed. The while loop then iterates over all posts and displays them. At the bottom we can see the navigational buttons for older and newer posts of the Innsbruck category.

Persistent Data in a MySQL Docker Container

Running MySQL in Docker

In a recent article on Docker in this blog, we presented some basics for dealing with data in containers. This article will present another popular application for Docker: MySQL containers. Running MySQL instances in Docker allows isolating database infrastructure with ease.

Connecting to the Standard MySQL Container

The description of the MySQL docker image provides a lot of useful information how to launch and connect to a MySQL container. The first step is to create standard MySQL container from the latest available image.

sudo docker run \
   -p 3307:3306 

This creates a MySQL container where the root password is set to secret. As the host is already running its own MySQL instance (which has nothing to do with this docker example), the standard port 3306 is already taken. Thus we publish utilise the port 3307 on the host system and forward it to the 3306 standard port from the container.

Connect from the Host

We can then connect from the command line like this:

mysql -uroot -psecret -h -P3307

We could also provide the hostname localhost for connecting to the container, but as the MySQL client per default assumes that a localhost connection is via a socket, this would not work. Thus when using the hostname localhost, we needed to specify the protocol TCP, wo that the client connects via the network interface.

mysql -uroot -psecret -h localhost --protocol TCP -P3307

Connect from other Containers

Connecting from a different container to the MySQL container is pretty straight forward. Docker allows to link two containers and then use the exposed ports between them. The following command creates a new ubuntu container and links to the MySQL container.

sudo docker run -it --name ubuntu-container --link mysql-instance:mysql-link ubuntu:16.10 bash

After this command, you are in the terminal of the Ubuntu container. We then need to install the MySQL client for testing:

# Fetch the package list
root@7a44b3e7b088:/# apt-get update
# Install the client
root@7a44b3e7b088:/# apt-get install mysql-client
# Show environment variables
root@7a44b3e7b088:/# env

The last command gives you a list of environment variables, among which is the IP address and port of the MySQL container.


You can then connect either manually of by providing the variables

mysql -uroot -psecret -h

If you only require a MySQL client inside a container, simply use the MySQL image from docker. Batteries included!

Persistent Docker Containers

Docker Fundamentals

Docker has become a very popular tool for orchestrating services. Docker it much more lightweight than virtual machines. For instance do containers not require a boot process. Docker follows the philosophy that one container serves only one process. So in contrast to virtual machines which often bundle several services together, Docker is built for running single services per container. If you come from the world of virtualised machines, Docker can be a bit confusing in the beginning, because it uses its own terminology. A good point to start is as always the documentation and there are plenty of great tutorials out there.

Images and Containers

Docker images serves as templates for the containers. As images and containers both have hexadecimal ids they are very easy to confuse. The following example shows step by step how to create a new container based on the Debian image and how to open shell access.

# Create a new docker container based on the debian image
sudo docker create -t --name debian-test debian:stable 
# Start the container
sudo docker start  debian-test
# Check if the container is running
sudo docker ps -a
# Execute bash to get an interactive shell
sudo docker exec -i -t debian-test bash

A shorter variant of creating and launching a new container is listed below. The run command creates a new container and starts it automatically. Be aware that this creates a new container every time, so assigning a container name helps with not confusing the image with the container. The command run is in particular tricky, as you would expect it to run (i.e. launch) a container only. In fact, it creates a new one and starts it.

sudo docker run -it --name debian-test debian:stable bash

Important Commands

The following listing shows the most important commands:

# Show container status
sudo docker ps -a
# List available images
sudo docker images 
# Start or stop a container
sudo docker start CONTAINERNAME
sudo docker stop CONTAINERNAME
# Delete a container
sudo docker rm CONTAINERNAME

You can of course create your own images, which will not be discussed in this blog post. It is just important to know that you can’t move containers from your host so some other machine directly. You would need to commit the changes made to the image and create a new container based on that image. Please be aware that this does not include the actual data stored in that container! You need to manually export any data and files from the original container and import it in the new container again. This is another trap worth noting. You can, however,  also mount data in the image, if the data is available at the host at the time of image creation. Details on data in containers can be found here.

Persisting Data Across Containers

The way how Docker persists data needs getting used to in the beginning, especially as it is easy to confuse images with containers. Remember that Docker images serve only as the template. So when you issue the command sudo docker run …  this actually creates a container from an image first and then starts it. So whenever you issue this command again, you will end up with a new container which does share any data with the previously created container.

Docker 1.9 introduced data volume containers, which allow to create dedicated data containers which can be used from several other containers. Data volume containers can be used for persisting data. The following listing shows how to create a data volume container and mount the volume in a container.

# Create a data volume
sudo docker volume create --name data-volume-test
# List all volumes
sudo docker volume ls
# Delete the container
sudo docker rm debian-test
# Create a new container, now with the data volume 
sudo docker create -v data-volume-test:/test-data -t --name debian-test debian:stable
# Start the container
sudo docker start debian-test
# Get the shell
sudo docker exec -i -t debian-test bash

After we logged into the shell, we can see the data volume we mounted on the directory test-data:

root@d4ac8c89437f:/# ls -la
total 76
drwxr-xr-x  28 root root 4096 Aug  3 13:11 .
drwxr-xr-x  28 root root 4096 Aug  3 13:11 ..
-rwxr-xr-x   1 root root    0 Aug  3 13:10 .dockerenv
drwxr-xr-x   2 root root 4096 Jul 27 20:03 bin
drwxr-xr-x   2 root root 4096 May 30 04:18 boot
drwxr-xr-x   5 root root  380 Aug  3 13:11 dev
drwxr-xr-x  41 root root 4096 Aug  3 13:10 etc
drwxr-xr-x   2 root root 4096 May 30 04:18 home
drwxr-xr-x   9 root root 4096 Nov 27  2014 lib
drwxr-xr-x   2 root root 4096 Jul 27 20:02 lib64
drwxr-xr-x   2 root root 4096 Jul 27 20:02 media
drwxr-xr-x   2 root root 4096 Jul 27 20:02 mnt
drwxr-xr-x   2 root root 4096 Jul 27 20:02 opt
dr-xr-xr-x 267 root root    0 Aug  3 13:11 proc
drwx------   2 root root 4096 Jul 27 20:02 root
drwxr-xr-x   3 root root 4096 Jul 27 20:02 run
drwxr-xr-x   2 root root 4096 Jul 27 20:03 sbin
drwxr-xr-x   2 root root 4096 Jul 27 20:02 srv
dr-xr-xr-x  13 root root    0 Aug  3 13:11 sys
drwxr-xr-x   2 root root 4096 Aug  3 08:26 <span style="color: #0000ff;"><strong>test-data</strong></span>
drwxrwxrwt   2 root root 4096 Jul 27 20:03 tmp
drwxr-xr-x  10 root root 4096 Jul 27 20:02 usr
drwxr-xr-x  11 root root 4096 Jul 27 20:02 var```

We can navigate into that folder and create a 100 M data file with random data.

root@d4ac8c89437f:~# cd /test-data/ root@d4ac8c89437f:/test-data# dd if=/dev/urandom of=100M.dat bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 6.69175 s, 15.7 MB/s root@d4ac8c89437f:/test-data# du -h . 101M .

When we exit the container, we can see the file in the host file system  here:

stefan@stefan-desktop:~$ sudo ls -l /var/lib/docker/volumes/data-volume-test/_data insgesamt 102400 -rw-r–r– 1 root root 104857600 Aug 3 15:17 100M.dat```

We can use this volume transparently in the container, but it is not depending on the container itself. So whenever we have to delete to container or want to use the data with a different container, this solution works perfectly. Thw following command shows how we mount the same volume in an Ubuntu container and execute the ls command to show the content of the directory.

stefan@stefan-desktop:~$ sudo docker run -it -v data-volume-test:/test-data-from-debian --name ubuntu-test ubuntu:16.10 ls -l /test-data-from-debian
total 102400
-rw-r--r-- 1 root root 104857600 Aug  3 13:17 100M.dat

You can display a lot of usefil information about a container with the inspect command. It also shows the data container and where it is mounted.

sudo docker inspect ubuntu-test

        "Mounts": [
                "Name": "data-volume-test",
                "Source": "/var/lib/docker/volumes/data-volume-test/_data",
                "Destination": "/test-data-from-debian",
                "Driver": "local",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"

We delete the ubuntu container and create a new one. We then start the container, open a bash session and write some test data into the directory.

stefan@stefan-desktop:~$ sudo docker create -v data-volume-test:/test-data-ubuntu -t –name ubuntu-test ubuntu:16.10 f3893d368e11a32fee9b20079c64494603fc532128179f0c08d10321c8c7a166 stefan@stefan-desktop:~$ sudo docker start ubuntu-test ubuntu-test stefan@stefan-desktop:~$ sudo docker exec -it ubuntu-test bash root@f3893d368e11:/# cd /test-data-ubuntu/ root@f3893d368e11:/test-data-ubuntu# ls 100M.dat root@f3893d368e11:/test-data-ubuntu# touch ubuntu-writes-a-file.txt

When we check the Debian container, we can immediately see the written file, as the volume is transparently mounted.

stefan@stefan-desktop:~$ sudo docker exec -i -t debian-test ls -l /test-data total 102400 -rw-r–r– 1 root root 104857600 Aug 3 13:17 100M.dat -rw-r–r– 1 root root 0 Aug 3 13:42 ubuntu-writes-a-file.txt```

Please be aware that the docker volume is just a regular folder on the file system. Writing from both containers the same file can lead to data corruption. Also remember that you can read and write the volume files directly from the host system.

Backups and Migration

Backing up data is also an important aspect when you use named data volumes as shown above. Currently, there is no way of moving Docker containers or volumes natively to a different host. The intention of Docker is to make the creation and destruction  of containers very cheap and easy. So you should not get too attached to your containers, because you can re-create them very fast. This of course is not true for the data stored in volumes. So you need to take care of your data yourself, for instance by creating automated backups like this sudo tar cvfz Backup-data-volume-test.tar.gz /var/lib/docker/volumes/data-volume-test and re-store the data when needed in a new volume. How to backup volumes using a container is described here.

Plotting Colourful Graphs with R, RStudio and Ggplot2

The Aesthetics of Data Science

Data visualization is a powerful tool for communicating results and recently receives more and more attention due to the hype of data science. Integrating a meaningful graph into a paper or your thesis could improve readability and understandability more than any formulas or extended textual descriptions can. There exists a variety of different approaches for visualising data. Recently a lot of new Javascript based frameworks have gained quite some momentum, which can be used in Web applications and apps. A more classical work horse for data science is the R project and its plotting engine ggplot2. The reason why I decide to stick with R is its popularity and flexibility, which is still  impressive. Also with RStudio, there exists a convenient IDE which provides useful features for data scientists.

Plotting Graphs

In this blog post, I demonstrate how to plot time series data and use colours to highlight a specific aspect of data. As almost all techniques, R and ggplot2 require practise and training, which I realised again today when I spent quite a bit of time struggling with getting a simple plot right.

Currently I am evaluating two systems I developed and I needed to visualize their storage and execution time demands in comparison. My goal was to create a plot for each non-functional property, the execution time and the storage demand, while each plot should depict both systems’ performance. Each system runs a set of operations, think of create, read, update and delete operations (CRUD). Now for visualizing which of these operations has the most effects on the system, I needed to colourise each operation within one graph. This is the easy part. What was more tricky is to provide for each graph a defined set of colours, which can be mapped to each instance of the variable. Things which have the same meaning in both graphs should visualized in the same way, which requires a little hack.


Install the following packages via apt

sudo apt-get install r-base r-recommended r-cran-ggplot2

and RStudio by downloading the deb – File from the project homepage.

Evaluation Data

As an example,we plan to evaluate the storage demand of two different systems and compare the results. Consider the following sample data.

# Set seed to get the same random numbers for this example
# Generate 200 random data records
N <- 200
# Generate a random, increasing sequence of integers that we assume is the storage demand in some unit
storage1 =sort(sample(1:100000, size = N, replace = TRUE),decreasing = FALSE)
storage2 = sort(sample(1:100000, size = N, replace = TRUE),decreasing = FALSE)
# Define the operations availabel and draw a random sample
operationTypes = c('CREATE','READ','UPDATE','DELETE')
operations = sample(operationTypes,N,replace=TRUE)
# Create the dataframe
df  df
     id storage1 storage2 operations
1     1       24      238     CREATE
2     2      139     1755     UPDATE
3     3      158     1869     UPDATE
4     4      228     2146       READ
5     5      395     2967     DELETE
6     6      734     3252     CREATE
7     7      789     4049     DELETE
8     8     2909     4109       READ
9     9     3744     4835     CREATE
10   10     3894     4990       READ


We created a random data set simulating the characteristics of system measurement data. As you can see, we have a list of operations of the four types CREATE, READ, UPDATE and DELETE and a measurement value for the storage demand in both systems.

The Simple Plot

Plotting two graphs of thecolumns storage1 and storage2 is straight forward.

# Simple plot
p1 <- ggplot(df, aes(x,y)) +
  geom_point(aes(x=id,y=storage1,color="Storage 1")) +
  geom_point(aes(x=id,y=storage2,color="Storage 2")) +
  ggtitle("Overview of Measurements") +
  xlab("Number of Operations") +
  ylab("Storage Demand in MB") +
  scale_color_manual(values=c("Storage 1"="forestgreen", "Storage 2"="aquamarine"), 
                     name="Measurements", labels=c("System 1", "System 2"))


We assign for each point plot a color. Note that the color nme “Storage 1” for instance of course does not denote a color, but it assignes a level for all points of the graph. This level can be thought of as a category, which ensures that all the points which belong to the same category have the same color. As you can see at the definition of the color scale, we assign the actual color to this level there.  This is the result:

Plotting Levels

A common task is to visualise categories or levels of measurement data. In this example, there are four different levels we could observe: CREATE, READ, UPDATE and DELETE.

# Plot with levels
p1 <- ggplot(df, aes(x,y)) +
  geom_point(aes(x=id,y=storage1,color=operations)) +
  geom_point(aes(x=id,y=storage2,color=operations)) +
  ggtitle("Overview of Measurements") +
  labs(color="Measurements") +

Instead of assigning two colours, one for each graph, we can also assign colours to the operations. As you can see in the definition of the graphs and the colour scale, we map the colours to the variable operations instead. As a result we get differently coloured points per operation, but we get these of course for both graphs in an identical fashion as the categories are the same for both measurements. The result looks like this:

Now this is obviously not what we want to achieve as we cannot differentiate between the two graphs any more.

Plotting the same Levels for both Graphs in Different Colours

This last part is a bit tricky, as ggplot2 does not allow assigning different colour schemes within one plot. There do exist some hacks for this, but the solution does not improve the readability of the code in my opinion. In order to apply different colour schemes for the two graphs while still using the categories, I appended two extra columns to the data set. If we append some differentiation between the two graphs and basically double the categories from four to eight, where each graph now uses its own four categories, we can also assign distinct colours to them.

df$operationsStorage1 <- paste(df$operations,"-Storage1", sep = '')
df$operationsStorage2 <- paste(df$operations,"-Storage2", sep = '')

p3 <- ggplot(df, aes(x,y)) +
  geom_point(aes(x=id,y=storage1,color=operationsStorage1)) +
  geom_point(aes(x=id,y=storage2,color=operationsStorage2)) +
  ggtitle("Overview of Measurements") +
  xlab("Number of Operations") +
  ylab("Storage Demand in MB") +
  labs(color="Operations") +

We then assign the new column for each system individually as colour value. This ensures that each graph only considers the categories that we assigned in this step. Thus we can assign a different color scheme for wach graph and print the corresponding colours in the label (legend) next to the chart. This is the result:

Now we can see which operation was used at every measurement and still be able to distinguish between the two systems.

Timelape Photography with the Camera Module V2 and a Raspberry Pi Model B

Recently, I bought a camera module for the Raspberry Pi and experimented a little bit with the possibilities a scriptable camera provides. The new Camera Module V2 offers 8.08 MP from a Sony sensor and can be controlled with a well documented Python library. It allows to take HD videos and shoot still images. Assembly is easy, but as the camera is attached with a rather short ribbon cable, which renders the handling is a bit cumbersome. For the moment, a modified extra hand from my soldering kit acts as a makeshift.

Initial Setup

The initial setup is easy and just requires a few steps, which is not surprising because most of the documentation is targeted to kids in order to encourage their inner nerd. Still works for me as well 🙂

Attach the cable to the raspberry pi as described here. You can also buy an adapter for the Pi Zero. Once the camera is wired to the board, activate the module with the tool raspi-config.

Then you are ready to install the Python library with sudo apt-get install python3-picamera, add your user to the video group with usermod -a -G video USERNAME  and then reboot the Raspberry. After you logged in again, you can start taking still images with the simple command raspistill -o output.jpg. You can find some more documentation and usage examples here.

Timelapse Photography

What I really enjoy is making timelapse videos with the Raspberry Pi, which gives a nice effect for everyday phenomena and allows to observe processes which are usually too slow to follow. The following Gif shows a melting ice cube. I took one picture every five seconds.

A Small Python Script

The following script creates a series of pictures with a defined interval and stores all images with a filename indicating the time of shooting in a folder. It is rather self explanatory. The camera needs a little bit of time to adjust, so we set the adjustTime variable to 5 seconds. Then we take a picture every 300 seconds, each image has a resolution of 1024×768 pixels.

import os
import time
import picamera
from datetime import datetime

# Grab the current datetime which will be used to generate dynamic folder names
d =
initYear = "%04d" % (d.year)
initMonth = "%02d" % (d.month)
initDate = "%02d" % (
initHour = "%02d" % (d.hour)
initMins = "%02d" % (d.minute)
initSecs = "%02d" % (d.second)

folderToSave = "timelapse_" + str(initYear) + str(initMonth) + str(initDate) +"_"+ str(initHour) + str(initMins)

# Set the initial serial for saved images to 1
fileSerial = 1

# Create and configure the camera

# Create and configure the camera
with picamera.PiCamera() as camera:
    camera.resolution = (1024, 768)
    #camera.exposure_compensation = 5

    # Start the preview and give the camera a couple of seconds to adjust

        start = time.time()
        while True:
            d =
            # Set FileSerialNumber to 000X using four digits
            fileSerialNumber = "%04d" % (fileSerial)

            # Capture the CURRENT time (not start time as set above) to insert into each capture image filename
            hour = "%02d" % (d.hour)
            mins = "%02d" % (d.minute)
            secs = "%02d" % (d.second)
            camera.capture(str(folderToSave) + "/" + str(fileSerialNumber) + "_" + str(hour) + str(mins) + str(secs) + ".jpg")

            # Increment the fileSerial
            fileSerial += 1

    except KeyboardInterrupt:
        print ('interrupted!')
        # Stop the preview and close the camera

finish = time.time()
print("Captured %d images in %d seconds" % (fileSerial,finish - start))

This script then can run unattended and it creates a batch of images on the Raspberry Pi.

Image Metadata

The file name preserves the time of the shot, so that we can see later when a picture was taken. But the tool also stores EXIF metadata, which can be used for processing. You can view the data with the exiftool.

>ExifTool Version Number         : 9.46
File Name                       : 1052.jpg
Directory                       : .
File Size                       : 483 kB
File Modification Date/Time     : 2016:07:08 08:49:52+02:00
File Access Date/Time           : 2016:07:08 09:19:14+02:00
File Inode Change Date/Time     : 2016:07:08 09:17:52+02:00
File Permissions                : rw-r--r--
File Type                       : JPEG
MIME Type                       : image/jpeg
Exif Byte Order                 : Big-endian (Motorola, MM)
Make                            : RaspberryPi
Camera Model Name               : RP_b'imx219'
X Resolution                    : 72
Y Resolution                    : 72
Resolution Unit                 : inches
Modify Date                     : 2016:07:05 08:37:33
Y Cb Cr Positioning             : Centered
Exposure Time                   : 1/772
F Number                        : 2.0
Exposure Program                : Aperture-priority AE
ISO                             : 50
Exif Version                    : 0220
Date/Time Original              : 2016:07:05 08:37:33
Create Date                     : 2016:07:05 08:37:33
Components Configuration        : Y, Cb, Cr, -
Shutter Speed Value             : 1/772
Aperture Value                  : 2.0
Brightness Value                : 2.99
Max Aperture Value              : 2.0
Metering Mode                   : Center-weighted average
Flash                           : No Flash
Focal Length                    : 3.0 mm
Maker Note Unknown Text         : (Binary data 332 bytes, use -b option to extract)
Flashpix Version                : 0100
Color Space                     : sRGB
Exif Image Width                : 1024
Exif Image Height               : 768
Interoperability Index          : R98 - DCF basic file (sRGB)
Exposure Mode                   : Auto
White Balance                   : Auto
Compression                     : JPEG (old-style)
Thumbnail Offset                : 1054
Thumbnail Length                : 24576
Image Width                     : 1024
Image Height                    : 768
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Aperture                        : 2.0
Image Size                      : 1024x768
Shutter Speed                   : 1/772
Thumbnail Image                 : (Binary data 24576 bytes, use -b option to extract)
Focal Length                    : 3.0 mm
Light Value                     : 12.6

Processing Images

The Raspberry Pi would need a lot of time to create an animated Gif or a video from these images. This is why I decided to add new images automatically to a Git repository on Github and fetch the results on my Desktop PC. I created a new Git repository and adapted the script shown above to store the images within the folder of the repository. I then use the following script to add and push the images to Github using a cronjob.

>cd /home/stefan/Github/Timelapses
now=$(date +"%m_%d_%Y %H %M %S")
echo $now
git pull
git add *.jpg
git commit -am "New pictures added $now"
git push

You can add this to you user’s cron table with crontab -e and the following line, which adds the images every 5 minutes,

*/5	*	*	*	*	/home/stefan/Github/Timelapses/

On a more potent machine, you can clone the repository and pull the new images like this:

cd /home/stefan-srv/Github/Timelapses
now=$(date +"%m_%d_%Y %H %M %S")
echo "$now"
git pull --rebase

The file names are convenient for being able to read the date when it was taken, but most of the Linux tools require the files to be named within a sequence. The following code snippet renames the files into a sequence with four digits and pads them with zeros if possible.

for i in *.jpg; do
  new=$(printf "%04d.jpg" "$a") #04 pad to length of 4
  mv -- "$i" "$new"
  let a=a+1

Animated Gifs

Imagemagick offers a set of great tools for images processing. With its submodule convert, you can create animated Gifs from a series of images like this:

convert -delay 10 -loop 0 *.jpg Output.gif

This adds a delay after each images and loops the gif images infinitely. ImageMagick requires a lot of RAM for larger Gif images and does not handle memory allocation well, but the results are still nice. Note that the files get very large, so a smaller resolution might be more practical.

Still Images to Videos

The still images can also be converted in videos. Use the following command to create an image with 10 frames per second:

>avconv -framerate 10 -f image2 -i %04d.jpg -c:v h264 -crf 1

Example: Nordkette at Innsbruck, Tirol

This timelapse video of the Inn Valley Range in the north of the city of Innsbruck has been created by taking a picture with a Raspberry Pi Camera Module V2 every 5 minutes. This video consists of 1066 still images.

IntelliJ IDEA and the ClassNotFoundException

When compiling nested Maven projects in Idea, sometimes the compiler complains about a missing class file.

This occurs on several occasions, depending which part of a project is compiled and what dependencies have been considered. If the project is large this can easily happen, when a specific class should be compiled without having the complete context available. Besides the tipps such as invalidate caches and the ones I found here and here, editing the build configuration of a project helps. Add the a task “Make Project” and the correct class files should be compiled and available.