Allgemein

A MySQL 5.7 Cluster Based on Ubuntu 16.04 LTS – Part 1

A Cluster Scenario

In this example we create the smallest possible MySQL cluster based on four nodes running on three machines. Node 1 will run the cluster management software, Node 2 and Node 3 will serve as dats nodes and Node 4 is the MySQSL API, which runs on the same VM on Node 1.

Das Phrasensammelsurium

Wer viel liest, dem stechen sie unweigerlich ins Auge: sinnbefreite und nervige Phrasen, die sich wie aus dem Nichts in verschiedenen Medien erscheinen und sich plötzlich überall ausbreiten. Journalisten, Autoren und Wissenschafter sind meist Vielleser und machen sich – ganz unbewusst – Ausdrucksweisen, einzelne Begriffe und ganze Phrasen zu eigen. Diese übernehmen sie dann in ihren eigenen Wortschatz und das Drama nimmt seinen Lauf.

Aufgrund mangelnden Detailwissens meinerseits möchte und kann ich gar nicht auf die sprachwissenschaftlichen Hintergründe eingehen. Man kann sich aber beispielsweise mit Hilfe des Google Ngram Viewers ansehen, wann bestimmte Begriffe im deutschen Buch-Korpus auftauchen. Leider gibt es diese Daten für die deutsche Sprache nur bis einschließlich 2008, weswegen ganz aktuelle Phrasen noch nicht enthalten sind. Abgebildet ist ein Beispiel für das Wort Narrativ, das nun allenthalben herhalten muss.

Selbstverständlich gibt es verschiedenste Projekte, wie beispielsweise die Floskelwoche oder dieser Artikel im Österreichischen Journalist, die sich ganz diesem Thema verschrieben haben. Ich möchte hier dennoch meine persönliche Liste nervtötender Phrasen und Füllwörter in mehr oder weniger alphabetischer Reihenfolge festhalten. Einsendungen sind sehr willkommen.

  • “Aber nun der Reihe nach …”
  • Alternativlos
  • “Am Ende des Tages”
  • “Ganz einem Thema verschrieben”
  • Inflationär
  • “X kann Y”. Beispiel: “Karl-Heinz kann Social Media”
  • Narrativ
  • “X neu denken”.
  • Postfaktisch
  • “So muss X”. Beispiel: “So muss Technik
  • Spannend!
  • “Wir sind X”. Beispiel: “Wir sind Papst

Phrase einreichen

Sollsteinhaus

Das Sollsteinhaus befindet sich oberhalb von Hochzirl und lässt sich öffentlich sehr gut erreichen. Mit der S5 geht es vom Haupt- oder Westbahnhof in einer guten Viertelstunde bis zum Bahnhof Hochzirl. Von Innsbruck kommend kann man direkt durch das am (Zug-) Ende des nördlichen (bergseitigen) Bahnsteig gelegene Gatter gehen und man befindet sich bereits am gut ausgeschilderten Wanderweg 213 zum Sollsteinhaus. Zu Beginn geht es gemütlich durch den Wald und nach wenigen Minuten erreicht man eine Forststraße. Diese gestaltet sich etwas steil und kräftezehrend. Dieser Forststraße folgt man etwa 1.5 Stunden entland der sehr guten Beschilderung, bis man die Materialseilbahn des Sollsteinhauses erreicht. Nach einer weiteren halben Stunde erreicht man die private Solnalm, die man nicht fälschlicherweise schon für das ziel halten sollte. Weiter geht es wieder eintlang schmaler Wege, durch ein Bachbett und in wenigen Serpentinen hinauf bis zum Sollsteinhaus auf 1805m Seehöhe. Leider wird das Sollsteinhaus seit 25.09.2016 (Stand 27.09.2016) renoviert und ist daher geschlossen. Die Wanderung ist auch beim Portal Almenrausch gut beschrieben. Die Gesamtwanderzeit betrug bis zum Sollsteinhaus etwa 2.5 Stunden.

Persistent Data in a MySQL Docker Container

Running MySQL in Docker

In a recent article on Docker in this blog, we presented some basics for dealing with data in containers. This article will present another popular application for Docker: MySQL containers. Running MySQL instances in Docker allows isolating database infrastructure with ease.

Connecting to the Standard MySQL Container

The description of the MySQL docker image provides a lot of useful information how to launch and connect to a MySQL container. The first step is to create standard MySQL container from the latest available image.

sudo docker run \
   --name=mysql-instance 
   -e MYSQL_ROOT_PASSWORD=secret 
   -p 3307:3306 
   -d 
   mysql:latest

This creates a MySQL container where the root password is set to secret. As the host is already running its own MySQL instance (which has nothing to do with this docker example), the standard port 3306 is already taken. Thus we publish utilise the port 3307 on the host system and forward it to the 3306 standard port from the container.

Connect from the Host

We can then connect from the command line like this:

mysql -uroot -psecret -h 127.0.0.1 -P3307

We could also provide the hostname localhost for connecting to the container, but as the MySQL client per default assumes that a localhost connection is via a socket, this would not work. Thus when using the hostname localhost, we needed to specify the protocol TCP, wo that the client connects via the network interface.

mysql -uroot -psecret -h localhost --protocol TCP -P3307

Connect from other Containers

Connecting from a different container to the MySQL container is pretty straight forward. Docker allows to link two containers and then use the exposed ports between them. The following command creates a new ubuntu container and links to the MySQL container.

sudo docker run -it --name ubuntu-container --link mysql-instance:mysql-link ubuntu:16.10 bash

After this command, you are in the terminal of the Ubuntu container. We then need to install the MySQL client for testing:

# Fetch the package list
root@7a44b3e7b088:/# apt-get update
# Install the client
root@7a44b3e7b088:/# apt-get install mysql-client
# Show environment variables
root@7a44b3e7b088:/# env

The last command gives you a list of environment variables, among which is the IP address and port of the MySQL container.

MYSQL_LINK_NAME=/ubuntu-container/mysql-link
HOSTNAME=7a44b3e7b088
TERM=xterm
MYSQL_LINK_ENV_MYSQL_VERSION=5.7.14-1debian8
MYSQL_LINK_PORT=tcp://172.17.0.2:3306
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MYSQL_LINK_PORT_3306_TCP_ADDR=172.17.0.2
MYSQL_LINK_PORT_3306_TCP=tcp://172.17.0.2:3306
PWD=/
MYSQL_LINK_PORT_3306_TCP_PORT=3306
SHLVL=1
HOME=/root
MYSQL_LINK_ENV_MYSQL_MAJOR=5.7
MYSQL_LINK_PORT_3306_TCP_PROTO=tcp
MYSQL_LINK_ENV_GOSU_VERSION=1.7
MYSQL_LINK_ENV_MYSQL_ROOT_PASSWORD=secret
_=/usr/bin/env

You can then connect either manually of by providing the variables

mysql -uroot -psecret -h 172.17.0.2
mysql -uroot -p$MYSQL_LINK_ENV_MYSQL_ROOT_PASSWORD -h $MYSQL_LINK_PORT_3306_TCP_ADDR -P $MYSQL_LINK_PORT_3306_TCP_PORT

If you only require a MySQL client inside a container, simply use the MySQL image from docker. Batteries included!

Persistent Docker Containers

Docker Fundamentals

Docker has become a very popular tool for orchestrating services. Docker it much more lightweight than virtual machines. For instance do containers not require a boot process. Docker follows the philosophy that one container serves only one process. So in contrast to virtual machines which often bundle several services together, Docker is built for running single services per container. If you come from the world of virtualised machines, Docker can be a bit confusing in the beginning, because it uses its own terminology. A good point to start is as always the documentation and there are plenty of great tutorials out there.

Images and Containers

Docker images serves as templates for the containers. As images and containers both have hexadecimal ids they are very easy to confuse. The following example shows step by step how to create a new container based on the Debian image and how to open shell access.

# Create a new docker container based on the debian image
sudo docker create -t --name debian-test debian:stable 
# Start the container
sudo docker start  debian-test
# Check if the container is running
sudo docker ps -a
# Execute bash to get an interactive shell
sudo docker exec -i -t debian-test bash

A shorter variant of creating and launching a new container is listed below. The run command creates a new container and starts it automatically. Be aware that this creates a new container every time, so assigning a container name helps with not confusing the image with the container. The command run is in particular tricky, as you would expect it to run (i.e. launch) a container only. In fact, it creates a new one and starts it.

sudo docker run -it --name debian-test debian:stable bash

Important Commands

The following listing shows the most important commands:

# Show container status
sudo docker ps -a
# List available images
sudo docker images 
# Start or stop a container
sudo docker start CONTAINERNAME
sudo docker stop CONTAINERNAME
# Delete a container
sudo docker rm CONTAINERNAME

You can of course create your own images, which will not be discussed in this blog post. It is just important to know that you can’t move containers from your host so some other machine directly. You would need to commit the changes made to the image and create a new container based on that image. Please be aware that this does not include the actual data stored in that container! You need to manually export any data and files from the original container and import it in the new container again. This is another trap worth noting. You can, however,  also mount data in the image, if the data is available at the host at the time of image creation. Details on data in containers can be found here.

Persisting Data Across Containers

The way how Docker persists data needs getting used to in the beginning, especially as it is easy to confuse images with containers. Remember that Docker images serve only as the template. So when you issue the command sudo docker run …  this actually creates a container from an image first and then starts it. So whenever you issue this command again, you will end up with a new container which does share any data with the previously created container.

Docker 1.9 introduced data volume containers, which allow to create dedicated data containers which can be used from several other containers. Data volume containers can be used for persisting data. The following listing shows how to create a data volume container and mount the volume in a container.

# Create a data volume
sudo docker volume create --name data-volume-test
# List all volumes
sudo docker volume ls
# Delete the container
sudo docker rm debian-test
# Create a new container, now with the data volume 
sudo docker create -v data-volume-test:/test-data -t --name debian-test debian:stable
# Start the container
sudo docker start debian-test
# Get the shell
sudo docker exec -i -t debian-test bash

After we logged into the shell, we can see the data volume we mounted on the directory test-data:

root@d4ac8c89437f:/# ls -la
total 76
drwxr-xr-x  28 root root 4096 Aug  3 13:11 .
drwxr-xr-x  28 root root 4096 Aug  3 13:11 ..
-rwxr-xr-x   1 root root    0 Aug  3 13:10 .dockerenv
drwxr-xr-x   2 root root 4096 Jul 27 20:03 bin
drwxr-xr-x   2 root root 4096 May 30 04:18 boot
drwxr-xr-x   5 root root  380 Aug  3 13:11 dev
drwxr-xr-x  41 root root 4096 Aug  3 13:10 etc
drwxr-xr-x   2 root root 4096 May 30 04:18 home
drwxr-xr-x   9 root root 4096 Nov 27  2014 lib
drwxr-xr-x   2 root root 4096 Jul 27 20:02 lib64
drwxr-xr-x   2 root root 4096 Jul 27 20:02 media
drwxr-xr-x   2 root root 4096 Jul 27 20:02 mnt
drwxr-xr-x   2 root root 4096 Jul 27 20:02 opt
dr-xr-xr-x 267 root root    0 Aug  3 13:11 proc
drwx------   2 root root 4096 Jul 27 20:02 root
drwxr-xr-x   3 root root 4096 Jul 27 20:02 run
drwxr-xr-x   2 root root 4096 Jul 27 20:03 sbin
drwxr-xr-x   2 root root 4096 Jul 27 20:02 srv
dr-xr-xr-x  13 root root    0 Aug  3 13:11 sys
drwxr-xr-x   2 root root 4096 Aug  3 08:26 <span style="color: #0000ff;"><strong>test-data</strong></span>
drwxrwxrwt   2 root root 4096 Jul 27 20:03 tmp
drwxr-xr-x  10 root root 4096 Jul 27 20:02 usr
drwxr-xr-x  11 root root 4096 Jul 27 20:02 var```


We can navigate into that folder and create a 100 M data file with random data.

root@d4ac8c89437f:~# cd /test-data/ root@d4ac8c89437f:/test-data# dd if=/dev/urandom of=100M.dat bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 6.69175 s, 15.7 MB/s root@d4ac8c89437f:/test-data# du -h . 101M .



When we exit the container, we can see the file in the host file system  here:

stefan@stefan-desktop:~$ sudo ls -l /var/lib/docker/volumes/data-volume-test/_data insgesamt 102400 -rw-r–r– 1 root root 104857600 Aug 3 15:17 100M.dat```

We can use this volume transparently in the container, but it is not depending on the container itself. So whenever we have to delete to container or want to use the data with a different container, this solution works perfectly. Thw following command shows how we mount the same volume in an Ubuntu container and execute the ls command to show the content of the directory.

stefan@stefan-desktop:~$ sudo docker run -it -v data-volume-test:/test-data-from-debian --name ubuntu-test ubuntu:16.10 ls -l /test-data-from-debian
total 102400
-rw-r--r-- 1 root root 104857600 Aug  3 13:17 100M.dat

You can display a lot of usefil information about a container with the inspect command. It also shows the data container and where it is mounted.

sudo docker inspect ubuntu-test

...
        "Mounts": [
            {
                "Name": "data-volume-test",
                "Source": "/var/lib/docker/volumes/data-volume-test/_data",
                "Destination": "/test-data-from-debian",
                "Driver": "local",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
...```


We delete the ubuntu container and create a new one. We then start the container, open a bash session and write some test data into the directory.

stefan@stefan-desktop:~$ sudo docker create -v data-volume-test:/test-data-ubuntu -t –name ubuntu-test ubuntu:16.10 f3893d368e11a32fee9b20079c64494603fc532128179f0c08d10321c8c7a166 stefan@stefan-desktop:~$ sudo docker start ubuntu-test ubuntu-test stefan@stefan-desktop:~$ sudo docker exec -it ubuntu-test bash root@f3893d368e11:/# cd /test-data-ubuntu/ root@f3893d368e11:/test-data-ubuntu# ls 100M.dat root@f3893d368e11:/test-data-ubuntu# touch ubuntu-writes-a-file.txt



When we check the Debian container, we can immediately see the written file, as the volume is transparently mounted.

stefan@stefan-desktop:~$ sudo docker exec -i -t debian-test ls -l /test-data total 102400 -rw-r–r– 1 root root 104857600 Aug 3 13:17 100M.dat -rw-r–r– 1 root root 0 Aug 3 13:42 ubuntu-writes-a-file.txt```

Please be aware that the docker volume is just a regular folder on the file system. Writing from both containers the same file can lead to data corruption. Also remember that you can read and write the volume files directly from the host system.

Backups and Migration

Backing up data is also an important aspect when you use named data volumes as shown above. Currently, there is no way of moving Docker containers or volumes natively to a different host. The intention of Docker is to make the creation and destruction  of containers very cheap and easy. So you should not get too attached to your containers, because you can re-create them very fast. This of course is not true for the data stored in volumes. So you need to take care of your data yourself, for instance by creating automated backups like this sudo tar cvfz Backup-data-volume-test.tar.gz /var/lib/docker/volumes/data-volume-test and re-store the data when needed in a new volume. How to backup volumes using a container is described here.

Plotting Colourful Graphs with R, RStudio and Ggplot2

The Aesthetics of Data Science

Data visualization is a powerful tool for communicating results and recently receives more and more attention due to the hype of data science. Integrating a meaningful graph into a paper or your thesis could improve readability and understandability more than any formulas or extended textual descriptions can. There exists a variety of different approaches for visualising data. Recently a lot of new Javascript based frameworks have gained quite some momentum, which can be used in Web applications and apps. A more classical work horse for data science is the R project and its plotting engine ggplot2. The reason why I decide to stick with R is its popularity and flexibility, which is still  impressive. Also with RStudio, there exists a convenient IDE which provides useful features for data scientists.

Plotting Graphs

In this blog post, I demonstrate how to plot time series data and use colours to highlight a specific aspect of data. As almost all techniques, R and ggplot2 require practise and training, which I realised again today when I spent quite a bit of time struggling with getting a simple plot right.

Currently I am evaluating two systems I developed and I needed to visualize their storage and execution time demands in comparison. My goal was to create a plot for each non-functional property, the execution time and the storage demand, while each plot should depict both systems’ performance. Each system runs a set of operations, think of create, read, update and delete operations (CRUD). Now for visualizing which of these operations has the most effects on the system, I needed to colourise each operation within one graph. This is the easy part. What was more tricky is to provide for each graph a defined set of colours, which can be mapped to each instance of the variable. Things which have the same meaning in both graphs should visualized in the same way, which requires a little hack.

Prerequisits

Install the following packages via apt

sudo apt-get install r-base r-recommended r-cran-ggplot2

and RStudio by downloading the deb – File from the project homepage.

Evaluation Data

As an example,we plan to evaluate the storage demand of two different systems and compare the results. Consider the following sample data.

# Set seed to get the same random numbers for this example
set.seed(42);
# Generate 200 random data records
N <- 200
# Generate a random, increasing sequence of integers that we assume is the storage demand in some unit
storage1 =sort(sample(1:100000, size = N, replace = TRUE),decreasing = FALSE)
storage2 = sort(sample(1:100000, size = N, replace = TRUE),decreasing = FALSE)
# Define the operations availabel and draw a random sample
operationTypes = c('CREATE','READ','UPDATE','DELETE')
operations = sample(operationTypes,N,replace=TRUE)
# Create the dataframe
df  df
     id storage1 storage2 operations
1     1       24      238     CREATE
2     2      139     1755     UPDATE
3     3      158     1869     UPDATE
4     4      228     2146       READ
5     5      395     2967     DELETE
6     6      734     3252     CREATE
7     7      789     4049     DELETE
8     8     2909     4109       READ
9     9     3744     4835     CREATE
10   10     3894     4990       READ

....

We created a random data set simulating the characteristics of system measurement data. As you can see, we have a list of operations of the four types CREATE, READ, UPDATE and DELETE and a measurement value for the storage demand in both systems.

The Simple Plot

Plotting two graphs of thecolumns storage1 and storage2 is straight forward.

# Simple plot
p1 <- ggplot(df, aes(x,y)) +
  geom_point(aes(x=id,y=storage1,color="Storage 1")) +
  geom_point(aes(x=id,y=storage2,color="Storage 2")) +
  ggtitle("Overview of Measurements") +
  xlab("Number of Operations") +
  ylab("Storage Demand in MB") +
  scale_color_manual(values=c("Storage 1"="forestgreen", "Storage 2"="aquamarine"), 
                     name="Measurements", labels=c("System 1", "System 2"))

print(p1)

We assign for each point plot a color. Note that the color nme “Storage 1” for instance of course does not denote a color, but it assignes a level for all points of the graph. This level can be thought of as a category, which ensures that all the points which belong to the same category have the same color. As you can see at the definition of the color scale, we assign the actual color to this level there.  This is the result:

Plotting Levels

A common task is to visualise categories or levels of measurement data. In this example, there are four different levels we could observe: CREATE, READ, UPDATE and DELETE.

# Plot with levels
p1 <- ggplot(df, aes(x,y)) +
  geom_point(aes(x=id,y=storage1,color=operations)) +
  geom_point(aes(x=id,y=storage2,color=operations)) +
  ggtitle("Overview of Measurements") +
  labs(color="Measurements") +
  scale_color_manual(values=c("CREATE"="darkgreen", 
                              "READ"="darkolivegreen", 
                              "UPDATE"="forestgreen", 
                              "DELETE"="yellowgreen"))
print(p1)

Instead of assigning two colours, one for each graph, we can also assign colours to the operations. As you can see in the definition of the graphs and the colour scale, we map the colours to the variable operations instead. As a result we get differently coloured points per operation, but we get these of course for both graphs in an identical fashion as the categories are the same for both measurements. The result looks like this:

Now this is obviously not what we want to achieve as we cannot differentiate between the two graphs any more.

Plotting the same Levels for both Graphs in Different Colours

This last part is a bit tricky, as ggplot2 does not allow assigning different colour schemes within one plot. There do exist some hacks for this, but the solution does not improve the readability of the code in my opinion. In order to apply different colour schemes for the two graphs while still using the categories, I appended two extra columns to the data set. If we append some differentiation between the two graphs and basically double the categories from four to eight, where each graph now uses its own four categories, we can also assign distinct colours to them.

df$operationsStorage1 <- paste(df$operations,"-Storage1", sep = '')
df$operationsStorage2 <- paste(df$operations,"-Storage2", sep = '')

p3 <- ggplot(df, aes(x,y)) +
  geom_point(aes(x=id,y=storage1,color=operationsStorage1)) +
  geom_point(aes(x=id,y=storage2,color=operationsStorage2)) +
  ggtitle("Overview of Measurements") +
  xlab("Number of Operations") +
  ylab("Storage Demand in MB") +
  labs(color="Operations") +
  scale_color_manual(values=c("CREATE-Storage1"="darkgreen", 
                              "READ-Storage1"="darkolivegreen", 
                              "UPDATE-Storage1"="forestgreen", 
                              "DELETE-Storage1"="yellowgreen",
                              "CREATE-Storage2"="aquamarine", 
                              "READ-Storage2"="dodgerblue",
                              "UPDATE-Storage2"="royalblue",
                              "DELETE-Storage2"="turquoise"))
print(p3)

We then assign the new column for each system individually as colour value. This ensures that each graph only considers the categories that we assigned in this step. Thus we can assign a different color scheme for wach graph and print the corresponding colours in the label (legend) next to the chart. This is the result:

Now we can see which operation was used at every measurement and still be able to distinguish between the two systems.

Timelape Photography with the Camera Module V2 and a Raspberry Pi Model B

Recently, I bought a camera module for the Raspberry Pi and experimented a little bit with the possibilities a scriptable camera provides. The new Camera Module V2 offers 8.08 MP from a Sony sensor and can be controlled with a well documented Python library. It allows to take HD videos and shoot still images. Assembly is easy, but as the camera is attached with a rather short ribbon cable, which renders the handling is a bit cumbersome. For the moment, a modified extra hand from my soldering kit acts as a makeshift.

Initial Setup

The initial setup is easy and just requires a few steps, which is not surprising because most of the documentation is targeted to kids in order to encourage their inner nerd. Still works for me as well 🙂

Attach the cable to the raspberry pi as described here. You can also buy an adapter for the Pi Zero. Once the camera is wired to the board, activate the module with the tool raspi-config.

Then you are ready to install the Python library with sudo apt-get install python3-picamera, add your user to the video group with usermod -a -G video USERNAME  and then reboot the Raspberry. After you logged in again, you can start taking still images with the simple command raspistill -o output.jpg. You can find some more documentation and usage examples here.

Timelapse Photography

What I really enjoy is making timelapse videos with the Raspberry Pi, which gives a nice effect for everyday phenomena and allows to observe processes which are usually too slow to follow. The following Gif shows a melting ice cube. I took one picture every five seconds.

A Small Python Script

The following script creates a series of pictures with a defined interval and stores all images with a filename indicating the time of shooting in a folder. It is rather self explanatory. The camera needs a little bit of time to adjust, so we set the adjustTime variable to 5 seconds. Then we take a picture every 300 seconds, each image has a resolution of 1024×768 pixels.

import os
import time
import picamera
from datetime import datetime

# Grab the current datetime which will be used to generate dynamic folder names
d = datetime.now()
initYear = "%04d" % (d.year)
initMonth = "%02d" % (d.month)
initDate = "%02d" % (d.day)
initHour = "%02d" % (d.hour)
initMins = "%02d" % (d.minute)
initSecs = "%02d" % (d.second)

folderToSave = "timelapse_" + str(initYear) + str(initMonth) + str(initDate) +"_"+ str(initHour) + str(initMins)
os.mkdir(folderToSave)

# Set the initial serial for saved images to 1
fileSerial = 1

# Create and configure the camera
adjustTime=5
pauseBetweenShots=300

# Create and configure the camera
with picamera.PiCamera() as camera:
    camera.resolution = (1024, 768)
    #camera.exposure_compensation = 5

    # Start the preview and give the camera a couple of seconds to adjust
    camera.start_preview()
    try:
        time.sleep(adjustTime)

        start = time.time()
        while True:
            d = datetime.now()
            # Set FileSerialNumber to 000X using four digits
            fileSerialNumber = "%04d" % (fileSerial)

            # Capture the CURRENT time (not start time as set above) to insert into each capture image filename
            hour = "%02d" % (d.hour)
            mins = "%02d" % (d.minute)
            secs = "%02d" % (d.second)
            camera.capture(str(folderToSave) + "/" + str(fileSerialNumber) + "_" + str(hour) + str(mins) + str(secs) + ".jpg")

            # Increment the fileSerial
            fileSerial += 1
            time.sleep(pauseBetweenShots)

    except KeyboardInterrupt:
        print ('interrupted!')
        # Stop the preview and close the camera
        camera.stop_preview()

finish = time.time()
print("Captured %d images in %d seconds" % (fileSerial,finish - start))

This script then can run unattended and it creates a batch of images on the Raspberry Pi.

Image Metadata

The file name preserves the time of the shot, so that we can see later when a picture was taken. But the tool also stores EXIF metadata, which can be used for processing. You can view the data with the exiftool.

>ExifTool Version Number         : 9.46
File Name                       : 1052.jpg
Directory                       : .
File Size                       : 483 kB
File Modification Date/Time     : 2016:07:08 08:49:52+02:00
File Access Date/Time           : 2016:07:08 09:19:14+02:00
File Inode Change Date/Time     : 2016:07:08 09:17:52+02:00
File Permissions                : rw-r--r--
File Type                       : JPEG
MIME Type                       : image/jpeg
Exif Byte Order                 : Big-endian (Motorola, MM)
Make                            : RaspberryPi
Camera Model Name               : RP_b'imx219'
X Resolution                    : 72
Y Resolution                    : 72
Resolution Unit                 : inches
Modify Date                     : 2016:07:05 08:37:33
Y Cb Cr Positioning             : Centered
Exposure Time                   : 1/772
F Number                        : 2.0
Exposure Program                : Aperture-priority AE
ISO                             : 50
Exif Version                    : 0220
Date/Time Original              : 2016:07:05 08:37:33
Create Date                     : 2016:07:05 08:37:33
Components Configuration        : Y, Cb, Cr, -
Shutter Speed Value             : 1/772
Aperture Value                  : 2.0
Brightness Value                : 2.99
Max Aperture Value              : 2.0
Metering Mode                   : Center-weighted average
Flash                           : No Flash
Focal Length                    : 3.0 mm
Maker Note Unknown Text         : (Binary data 332 bytes, use -b option to extract)
Flashpix Version                : 0100
Color Space                     : sRGB
Exif Image Width                : 1024
Exif Image Height               : 768
Interoperability Index          : R98 - DCF basic file (sRGB)
Exposure Mode                   : Auto
White Balance                   : Auto
Compression                     : JPEG (old-style)
Thumbnail Offset                : 1054
Thumbnail Length                : 24576
Image Width                     : 1024
Image Height                    : 768
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Aperture                        : 2.0
Image Size                      : 1024x768
Shutter Speed                   : 1/772
Thumbnail Image                 : (Binary data 24576 bytes, use -b option to extract)
Focal Length                    : 3.0 mm
Light Value                     : 12.6

Processing Images

The Raspberry Pi would need a lot of time to create an animated Gif or a video from these images. This is why I decided to add new images automatically to a Git repository on Github and fetch the results on my Desktop PC. I created a new Git repository and adapted the script shown above to store the images within the folder of the repository. I then use the following script to add and push the images to Github using a cronjob.

>cd /home/stefan/Github/Timelapses
now=$(date +"%m_%d_%Y %H %M %S")
echo $now
git pull
git add *.jpg
git commit -am "New pictures added $now"
git push

You can add this to you user’s cron table with crontab -e and the following line, which adds the images every 5 minutes,

*/5	*	*	*	*	/home/stefan/Github/Timelapses/addToGit.sh

On a more potent machine, you can clone the repository and pull the new images like this:

cd /home/stefan-srv/Github/Timelapses
now=$(date +"%m_%d_%Y %H %M %S")
echo "$now"
git pull --rebase

The file names are convenient for being able to read the date when it was taken, but most of the Linux tools require the files to be named within a sequence. The following code snippet renames the files into a sequence with four digits and pads them with zeros if possible.

>a=1
for i in *.jpg; do
  new=$(printf "%04d.jpg" "$a") #04 pad to length of 4
  mv -- "$i" "$new"
  let a=a+1
done

Animated Gifs

Imagemagick offers a set of great tools for images processing. With its submodule convert, you can create animated Gifs from a series of images like this:

convert -delay 10 -loop 0 *.jpg Output.gif

This adds a delay after each images and loops the gif images infinitely. ImageMagick requires a lot of RAM for larger Gif images and does not handle memory allocation well, but the results are still nice. Note that the files get very large, so a smaller resolution might be more practical.

Still Images to Videos

The still images can also be converted in videos. Use the following command to create an image with 10 frames per second:

>avconv -framerate 10 -f image2 -i %04d.jpg -c:v h264 -crf 1 out.mov

Example: Nordkette at Innsbruck, Tirol

This timelapse video of the Inn Valley Range in the north of the city of Innsbruck has been created by taking a picture with a Raspberry Pi Camera Module V2 every 5 minutes. This video consists of 1066 still images.

Encrypt a USB Drive (or any other partition) Using LUKS

Did you ever want to feel like secret agent or do you really need to transport and exchange sensitive data? Encrypting your data is not much effort and can be used to protect a pen drive or any partition and the data on it from unauthorized access. In the following example you see how to create an encrypted partition on a disk. Note two things: If you accidentally encrypt the wrong partition, the data is lost. For ever. So be careful when entering the commands below. Secondly, the method shown below only protects the data at rest. As soon as you decrypt and mount the device, the data can be read from everyone else if you do not use correct permissions.

Preparation

Prepare a mount point for your data and change ownership.

# Create a mount point
sudo mkdir /media/cryptoUSB
# Set permissions for the owner
sudo chown stefan:stefan /media/cryptoUSB

Create an Encrypted Device

Encrypt the device with LUKS. Note that all data on the partition will be overwritten during this process.

# Create encrypted device 
sudo cryptsetup --verify-passphrase luksFormat /dev/sdX -c aes -s 256 -h sha256

# From the man page:
       --cipher, -c 
              Set the cipher specification string.
       --key-size, -s 
              Sets  key  size in bits. The argument has to be a multiple of 8.
              The possible key-sizes are limited by the cipher and mode used.
       --verify-passphrase, -y
              When interactively asking for a passphrase, ask for it twice and
              complain  if  both  inputs do not match.
       --hash, -h 
              Specifies the passphrase hash for open (for  plain  and  loopaes
              device types).

# Open the Device
sudo cryptsetup luksOpen /dev/sdX cryptoUSB
# Create a file system (ext3)
sudo mkfs -t ext3 -m 1 -O dir_index,filetype,sparse_super /dev/mapper/cryptoUSB
# Add a label
sudo tune2fs -L Crypto-USB /dev/mapper/cryptoUSB
# Close the devicesudo cryptsetup luksClose cryptoUSB

Usage

The usage is pretty simple. With a GUI you will be prompted for decrypting the device. At the command line, use the following commads to open and decrypt the device.

# Open the Device
sudo cryptsetup luksOpen /dev/sdcX cryptoUSB
# Mount it
sudo mount /dev/mapper/cryptoUSB /media/cryptoUSB

When you are finished with your secret work, unmount and close the device properly.

sudo umount /media/cryptoUSB 
sudo cryptsetup luksClose cryptoUSB

A Reasonable Secure, Self-Hosted Password Database with Versioning and Remote Access

The average computer users needs to memorize at least 17 passwords for private accounts. Power users need to handle several additional accounts for work too and memorizing (good and complex) passwords quickly becomes a burden if not yet impossible. To overcome the memory issue, there exists a variety of tools, which allow to store passwords and associated metadata in password stores. Typically, a password manager application consists of a password file, which contains the passwords and the metadata in structured form, and an application, assisting the user in decrypting and encrypting the passwords. A typical example is Keepass, which is an open source password management application. Keypass uses a master password in order to encrypt the password file. An additional key file can be used in order to increase security by requiring a second factor to open the password database. There exists a very large variety of ports of this software, which allow to open, edit and store passwords on virtually any platform. As the passwords are stored in a single file, a versioning mechanism is required, which allows to track changes in the passwords on all devices and merge them together in order to keep the synchronized. There also exist online services which handle versioned password storage, but obviously this requires to give away sensitive information and to trust the provider for handling the passwords safely. Storing the encrypted password file in a cloud drive such as Dropbox, Google Drive or Microsoft Azure also solves the versioning issue partially, but still the data is out there on foreign servers. For this reason, the new Raspberry Pi Zero is a low cost, low power device, which can be turned into a privately managed and reasonable secure, versioned password store under your own control.

What is needed?

  1. A Raspberry Pi (in fact, a Linux system of any kind, in this example we use a new Zero Pi)
  2. Power supply
  3. SD micro card
  4. USB Hub
  5. Wifi Dongle
  6. USB Keyboard

Preparing the Raspberry Pi Zero

The Raspbian operating system can be easily installed by dumping the image to the SD micro card. As the Zero Pi does not come with an integrated network interface, a Wifi dongle can be used for enabling wireless networking. You can edit the config file  directly on the SD card, by opening it on a different PC with any editor, and provide the SSID and the shared secret already in advance.

# File /etc/wpa_supplicant/wpa_supplicant.conf
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

network={
    ssid="MyWIFI
    psk="SECRET"
}

Then place the card again in the Pi and boot it with a keyboard and wifi dongle attached and the Pi connected to a screen. Boot the device and login with standard credentials, which are the user name pi and the password raspberry.

sudo adduser stefan # add new user
sudo apt-get install openssh-server git-core # Install ssh server and git
passwd # change the default password
sudo adduser stefan sudo # add the new user to the sudoers

In the next step, it is recommended to use a static IP address for the Pi, as we need to configure port forwarding for a specific IP address for the router in a later step. Open the interfaces file and provide a static IP address as follows:

# File: /etc/network/interfaces
allow-hotplug wlan0
iface wlan0 inet static
    address 192.168.0.100
    netmask 255.255.255.0
    gateway 192.168.0.1
    wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

You can then remove the HDMI cable and also the keyboard, as now SSH is available via the static IP address we just defined above.  The next step covers the installation of the [Git server][1] and the configuration of public key authentication.

sudo adduser git # add a new git user
su git # change into the git account
cd /home/git # change to home directory
mkdir .ssh # create the directory for the keys
chmod 700 .ssh # secure permissions
touch .ssh/authorized_keys  # create file for authorized keys
chmod 600 .ssh/authorized_keys # secure permissions for this file```


We are now ready to create a key pair consting of a private and a public key. You can do this on your normal pc or on the Pi directly.

ssh-keygen -t rsa # Create a key pair and provide a strong password for the private key```

Note that you can provide a file name during the procedure. The tool creates a key pair consisting of a private and a public key. The public key ends with the suffix pub.

# Folder ~/Passwordstore $ ll
insgesamt 32
drwxr-xr-x  2 stefan stefan 4096 Mär 13 22:40 .
drwxr-xr-x 10 stefan stefan 4096 Mär 13 22:38 ..
-rw-------  1 stefan stefan 1766 Mär 13 22:40 pi_git_rsa
-rw-r--r--  1 stefan stefan  402 Mär 13 22:40 pi_git_rsa.pub

If you created the key files on a different PC than the Pi, you need to upload the public key to the Pi. We can do this with the following command:

cat ~/Passwordstore/pi_git_rsa.pub | ssh git@192.168.0.100 "cat >&gt;  ~/.ssh/authorized_keys"```


If you generated the keys directly on the Pi it is sufficient to cat the key into the file directly. After you managed this step, verify that the key has been copied correctly. If the file looks similar like the following example, it worked.

git@zeropi:~/.ssh $ cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDZ7MZYvI……..wnQqchM1 stefan@pc



We can the test key based SSH authentication with the following command.

ssh -i pi_git_rsa git@192.168.0.100 # connect with using the private key```

You are then prompted to connect to the Pi by using the private key password you specified earlier. Note that this password differs from the one we created for the git user. A less secure but more convenient solution is to leave the password empty during the key pair creation process. If the password has not been set, then everyone can connect to the Pi who gets hold of the private key.  By the way, additional interesting facts about passwords can be found here.

In order to increase convenience, you can add a short cut for this connection, by editing the /home/user/.ssh/config file. Simply add the following record for the password store SSH connection.

Host passwords
hostname 192.168.0.100
port 22
user git
IdentityFile    /home/stefan/passwort/pi_git_rsa

Now you can connect to the Pi by typing the following command `ssh passwords . Note that now you need to provide the password for the key file instead of the user password. Delete the pre-installed user pi from the system:

sudo userdel pi```


The default Raspbian partition configuration only utilises 2 GB of your SD card. This can become insuficient quickly. There exists a convenient tool which allows to increase the root partition to the full size of your SD card. Simply run the following command and select the appropriate menu item.

<span class="lang:default decode:true  crayon-inline ">sudo raspi-config</span>

## Prepare the Git Repository

In the following, we create an empty git repository which we will use for versioning the password database from Keepass.

mkdir Password-Repository git@zeropi:~ $ cd Password-Repository/ git@zeropi:~/Password-Repository $ git init –bare Initialisierte leeres Git-Repository in /home/git/Password-Repository/



The repository on the Pi is now ready for ingesting the passwords.

## Checkout the new Repository on your PC and add the Password File

Now that the repository is inititalized, we can start versioning the password file with git.  Clone the repository and add the password file to git, by copying the password file into the cloned repository directory.

git clone passwords:/home/git/Password-Repository cp /home/user/oldLocation/Password-Database.kdb ~/Password-Repository git add Password-Database.kdb git commit -m “initial commit”



The last step is to push the  newly committed password file to  the remote repository. You can improve the security by not adding the key file for KeePass into the repository.

git push origin master```

The basic setup is now completed and you can clone this repository on any device, in order to have the latest password file available.

Checkout the Password Repository on Your Phone

There exists a variety of Git clients for Android, which can deal with identity files and private key authentication. I have good experience with Pocket Git. Clone the repository by using the URL like this:

ssh://git@pi.duckdns.org:1234/home/git/Password-Repository```


### Versioning the Password File: Pull, Commit and Push

Handling versions of the password file follows the standard git procedure. The only difference is, that in contrast to source code files for which git is usually used for, the encrypted password database does not allow for diffs. So you cannot find differences between to versions of the password database. For this reason, you need to make sure that you get the latest version of the password database before you edit the file. Otherwise you need to merge the file manually.  In order to avoid this, follow these steps from within the repository everytime you plan additions, edits or deletes of the password database.

  1. git pull
  2. \## make your changes
  3. git commit -m &#8220;describe your changes&#8221;
  4. git push

## Enabling Remote Access

You can already access the Git repository locally in your own network. But in order to retrieve, edit and store passwords from anywhere, you need to enable port forwarding and Dynamic DNS. Port forwarding is pretty easy. Enter your router&#8217;s Web interface, browse to the port forwarding options and specify an external and internal port which points to the IP of the Raspberry Pi.

  * IP Address 192.168.0.100
  * Internal port 22
  * External port (22100)
  * Protocol: both

Now the SSH service and therefore the Git repository becomes available via the external port 22100. As we left the internal port at the default, no changes for the SSH service are required.

For Dynamic DNS I regularly use <a href="http://www.duckdns.org" target="_blank">Duck DNS</a>, which is a free service for resolving dynamic IP addresses to a static host name. After registering for the service, you can choose a host name and download the installer. There exists an installer particularly for the <a href="https://www.duckdns.org/install.jsp" target="_blank">Raspberry Pi</a>. Follow this instructions and exchange the token and the domain name in the file to match your account.  You can now use the domain you registered for accessing the service from other machines outside your network.

## Security Improvements

The setup so far is reasonably secure, as only users having the key file and its password may authenticate with the Git repository user. It is in general good practice to disallow root to connect via SSH and to restrict remote access. Ensure that all other users on the system can only connect via SSH if and only if they use public key based authentication. Always use passwords for the key file, so that if someone should get hold of your keys, the still require a password.

You can also disable password login for the user git explicitly and allow passwords for local users. Add these lines in the sshd config file.

Match User git
PasswordAuthentication no

Match address 192.168.0.0/24 PasswordAuthentication yes



If you know the IP addresses where you will update the password file in advance, consider limiting access only to these. The git user can authenticate with the key, but still may have too many privilieges and also could execute potentially harmful commands. Ensure that the git user is not in the list of superusers:

grep -Po ‘^sudo.+:\K.*$’ /etc/group```

The user git should not be in the output list. In order to limit the commands that the git user may execute, we can specify a list of allowed commands executable via SSH and utilise a specialised shell, which only permits git commands. Prepend the public key of the git user in the authorised_keys file as follows:

no-port-forwarding,no-agent-forwarding ssh-rsa AAAAB ........```


In addition, we can change the default shell for the user git. Switch to a different user account with sudo privileges and issue the following command:

sudo usermod -s /usr/bin/git-shell git```

This special shell is called git-shell and comes with the git installation automatically. It only permits git specific commands, such as push and pull, which is sufficient for our purpose. If you now connect to the Pi with the standard SSH command, the connection will be refused:

stefan $ ssh passwords 
Enter passphrase for key '/home/passwort/pi_git_rsa': 

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Mar 20 20:48:04 2016 from 192.168.0.13
fatal: Interactive git shell is not enabled.
hint: ~/git-shell-commands should exist and have read and execute access.
Connection to 192.168.0.100 closed.

Firewall

The Uncomplicated FireWall (ufw) is way less comlex to setup than classic IP tables and provides exactly what the name implies: a simple firewall. You can install and initialize it as follows:

sudo apt-get install ufw # Insatall it
sudo ufw default deny incoming # Deny all incoming traffic 
sudo ufw allow ssh # Only allow incoming SSH
sudo ufw allow out 80 # Allow outgoing port 80 for the Duck DNS request
sudo ufw enable # Switch it on
sudo ufw status verbose # Verify status```


The great tutorials at <a href="https://www.digitalocean.com/community/tutorials/how-to-setup-a-firewall-with-ufw-on-an-ubuntu-and-debian-cloud-server" target="_blank">Digital Ocean</a> provide more details.

## Conclusion

In this little tutorial. we installed a Git server on a Raspberry Pi Zero (or any other Linux machine) and created a dedicated user for connecting to the service. The user requires a private key to access the service and the git server only permits key based logins from users other than users from the local network. The git user may only use a restricted shell and cannot login interactively. The password file is encrypted and all versions of the passwords are stored within the git repository.



<div class="twttr_buttons">
  <div class="twttr_twitter">
    <a href="http://twitter.com/share?text=A+Reasonable+Secure%2C+Self-Hosted+Password+Database+with+Versioning+and+Remote+Access" class="twitter-share-button" data-via="" data-hashtags=""  data-size="default" data-url="https://blog.stefanproell.at/2016/03/20/a-reasonable-secure-password-database-with-versioning-and-remote-access/"  data-related="" target="_blank">Tweet</a>
  </div>
  
  <div class="twttr_followme">
    <a href="https://twitter.com/@stefanproell" class="twitter-follow-button" data-show-count="true" data-size="default"  data-show-screen-name="false"  target="_blank">Follow me</a>
  </div>
</div>

 [1]: https://git-scm.com/book/en/v2/Git-on-the-Server-Setting-Up-the-Server

Add Self-Signed Certificates to your local trusted CA

When you use different self hosted services, such as your own Gitlab instance for example, you probably use self signed certificates for securing the connection. Self-signed certificates have the disadvantage that Browsers and other applications do not trust them and therefore either display error messages or refuse to connect at all. The IDEA IntelliJ IDE for instance does not clone from repositories which are protected by self signed certificates. Instead of disabling the SSL certificate verification completely, it is recommended to add your self signed certificate to the local certificate store.

Ubuntu and other Linux distributions store certification authorities in the file /etc/ssl/certs/ca-certificates.crt. This file is a list of trusted public keys, which is compiled by appending all certificates with the file ending .crt in the directory <span class="lang:default decode:true crayon-inline ">/etc/ssl/certs . Once a CA is in this list, the operating system trusts all certificates which have been signed by this CA. You could simply add the public part of the server certificate to this file, but it might get overwritten once the root CAs are updated. Therefore it is the better practice to simply store your self signed certificate in the mentioned directory and trigger the creation of the ca-certificates.crt file manually.

Fetch the Certificate from your Server

First, retrieve the certificate from your server. The command below trims away the information that is not needed and writes the certificate public key into a new file within the certificate directory. Run this command as root.

echo | openssl s_client -connect $SERVERADDRESS:443 2>/dev/null | openssl x509 &gt; /usr/local/share/ca-certificates/$SERVERADDRESS.crt

Update the Certificate Authority File

Then, by simply running the command sudo update-ca-certificates , the certificate will be added and trusted, as a link is created within the directory /etc/ssl/certs. The output of the command should mention that one certificate was added. Note that the command creates a new symbolic link from the file  $SERVERADDRESS.crtto$SERVERADDRESS.pem`. This certificate will then be accepted by all applications, which utilise the ca-certificates file.

Create an ER Diagram of an Existing SQLite Database (or many other RDBMS)

Visualisation helps solving problems and is therefore an important tool database design. Many database providers have their product specific tools for re-engineering existing schemata, but self-contained, serverless, embedded relational database management systems (RDBMS) such as SQLite often come without much tooling support. The extremely small footprint of SQLite provides a very powerful tool for implementing database driven applications without the hassle of database administration, user privilege management and other demanding tasks that come with more complex systems. There does not exist a workbench-like tool for SQLite, but we can use the open source SchemaCrawler for analysing database schemata and table relationships. The tool provides a plethora of commands and options, in this post we will only cover the diagramming part, which allows creating ER diagrams of the table.

After downloading and extracting the tool to your local drive, you can find a lot of examples included. The tool can handle SQLite, Oracle,  MS SQL Server, IBM DB2, MySQL, MariaDB, Postgres and Sybase database servers and is therefore very versatile. You will need Java 8 in order to run it. Have a look at the script below, which creates a PNG image of the database schema of the Chinook test database.

#!/bin/bash
# The path of the unzipped SchemaCrawler directory
SchemaCrawlerPATH=/media/stefan/Datenbank/schemacrawler-14.05.04-main
# The path of the SQLite database
SQLiteDatabaseFILE=/media/stefan/Datenbank/ChinookDatabase1.4_CompleteVersion/Chinook_Sqlite.sqlite
# The type of the database system.
RDBMS=sqlite
# Where to store the image
OutputPATH=/media/stefan/Datenbank/ER.png
# Username and password need to be empty for SQLite
USER=
PASSWORD=

java -classpath $(echo ${SchemaCrawlerPATH}/_schemacrawler/lib/*.jar | tr ' ' ':') schemacrawler.Main -server=${RDBMS} -database=${SQLiteDatabaseFILE} -outputformat=png -outputfile=${OutputPATH} -command=graph -infolevel=maximum -user=${USER} -password=${PASSWORD}

The SchemaCrawlerPATH variable contains the path to the directory where we unzipped the SchemaCrawler files to. This is needed in order to load all the required libraries into the classpath below. We then specify the SQLite database file, define the RDBMS and provide an output path where we store the image. Additionally, we provide an empty user name and password combination. SQLite does not provide user authentication, thus those two parameters need to be empty, SchemaCrawler simply ignores them. Then we can execute the command and the tool generates the PNG of the ER diagram for us.

You can find a lot of examples also online, which gives you an overview of the features of this tool. One of the main purposes of SchemaCrawler is to generate diffable text outputs of the database schemata. In combination with a source code version management tool such as Git or Subversion, you can create clean and usable reports of your databases and keep track of the changes there. You can retrieve an overview of the options  with the following command.

java -classpath $(echo ${SchemaCrawlerPATH}/_schemacrawler/lib/*.jar | tr ' ' ':') schemacrawler.Main -?

You can also HTML reports with the following command:

java -classpath $(echo ${SchemaCrawlerPATH}/_schemacrawler/lib/*.jar | tr ' ' ':') schemacrawler.Main -server=${RDBMS} -database=${SQLiteDatabaseFILE} -outputformat=html -outputfile=report.html -command=details -infolevel=maximum -user=${USER} -password=${PASSWORD}

Other available output formats are plain text, CSV or JSON.

Importing your Repositories from Github to your Gitlab Instance

Github is a great service and offers a free micro account for students, which includes five private repositories. Still this might not be enough private repositories for some of us, especially if you start a lot of side projects (which are hardly finished :-)) and want to keep your code and data organised, versioned and collaborative but non-public (because it is not finished yet :-)). For this and other reasons I am also running a private Gitlab instance, which basically offers the same features as Github and can be hosted on your private server.

I still use Github for some projects, which means that I continuously run out of private repositories there as only five are included in my plan. Now Gitlab (the private instance) offers a great feature, which allows one to hook the Github account into the Gitlab instance and import code hosted in private repositories at Github into your Gitlab instance, where no limits for private repositories apply. Obviously the repositories could also be transferred manually, but the import feature via the Web interface is very convenient.

Installing the oAuth feature for Github and other services is very simple and described here for Gihub and there in general. Read both documentations carefully, as using the wrong settings can lead to a serious security issues. After following the instructions carefully, the warning that oAuth needs to be configured kept popping up in the Gitlab dashboard:

To enable importing projects from GitHub, as administrator you need to configure OAuth integration.

The final hint was given in this SO post here. The official documentation failed to mention that the changes in the configuration file must be updated by reconfiguring Gitlab. The following list of steps should lead to success:

  1. Login into Github and open your account settings
  2. Select Applications on the left and chose the Developer Applications tab
  3. Register a new application by providing the name (e.g. Gitlab Access), a homepage (can be anything), a description (optional) and the callback URL (e.g. https://gitlab.example.org).
  4. Add the initial OmniAuth configuration as described in the official documentation.
  5. Provide the settings for Github.
  6. Save the file and execute the following two commands.
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart

The first time you select “Import project from Github”, you will be redirected to the Github authentication page, where you need to grant permission for the Gitlab user to retrieve the repositories. After you imported the private repositories into Gitlab, you can delete them from Github and reclaim some free space.

A quick fix for the ‘ascii’ codec can’t encode character error in Python and the csvkit tool

For a current project I need to migrate large volumes of CSV data into a relational database management system. The Python driven csvkit is the swiss army knife of CSV tools and very handy for this purpose. However, a few CSV files cause troubles when I tried to pipe the SQL CREATE statements I created with the csvkit tool to a file.

csvsql -i sqlite -d ';' -e 'utf8' --db-schema test_schema --table test_table inputfile.csv > output.sql
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 46: ordinal not in range(128)```


Adding the &#8211;verbose flag gives more clarity

csvsql -i sqlite -d ‘;’ -e ‘utf8’ –db-schema test_schema –table test_table inputfile.csv > output.sql Traceback (most recent call last): File “/usr/local/bin/csvsql”, line 9, in load_entry_point(‘csvkit==0.9.1’, ‘console_scripts’, ‘csvsql’)() File “/usr/local/lib/python2.7/dist-packages/csvkit/utilities/csvsql.py”, line 161, in launch_new_instance utility.main() File “/usr/local/lib/python2.7/dist-packages/csvkit/utilities/csvsql.py”, line 134, in main self.output_file.write('%s\n’ % sql.make_create_table_statement(sql_table, dialect=self.args.dialect)) UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\ufeff’ in position 46: ordinal not in range(128)```

Although the input file is already in UTF-8, the file is written in ASCII, which causes an error if Umlauts are included.
Adding the following code after the import commands at the top of the csvsql.py file sets the correct encoding for the output file in this Python script.

reload(sys)
sys.setdefaultencoding('utf-8')

This is rather a quick fix than an elegant solution, but it allows me to continue with my work.

Den UPC Wireless Cable Router durch den eigenen WLAN-Router ersetzen

Nachdem das Netzteil meines UPC Modems das zeitliche gesegnet hat, pilgerte ich zum UPC Store am Keplerplatz und ließ das gesamte Gerät tauschen. Der Austausch erfolgte unkompliziert, doch leider hatte ich anstelle des althergebrachten Modems nun endgültig einen Wifi-Router erhalten, wogegen ich mich bisher immer gesträubt hatte. Schließlich betreibe ich einen liebevoll eingerichteten Netgear WNR 3500L mit der altenativen Firmware TomatoUSB und hätte keine Lust, mein Setup auf dem UPC Gerät umzusetzen und auf Funktionen zu verzichten. Das Verwenden eines eigenen Routers stellt an und für sich kein Problem dar und wird auch in der UPC FAQ beschrieben.

Die Switch Variante

Der UPC Router kann auch als Switch eingesetzt werden. In diesem Modus kann eine Bridge zwischen beiden Geräten hergestellt werden. Das UPC Gerät leitet in diesem Modus alle Anfragen direkt an den eigenen Router durch. Explizites Portforwarding ist bei dieser Variante nicht nötig, sie erfolgt lediglich am eigenen Router. Um diesen Modus zu aktivieren, wählen wir die entsprechende Option in den UPC Systemeinstellungen und setzen die Auswahl auf Bridge Modus. Danach starten wir den UPC Router neu. Dadurch erhält das Gerät die IP-Adresse 192.168.100.1. Diese IP Adresse tragen wir nun im TomatoVPN Interface für das Modem ein, wie die folgende Abbildung zeigt:

Nachdem Sie den Bridge Modus aktiviert haben, können Sie den UPC Router nicht mehr aus dem Netzwerk erreichen. Das Administrationsinterface ist lediglich von einem direkt am UPC Router angeschlossenen PC möglich. Mehr Informationen zum Bridge Modus finden sich hier.

Die Router Variante

Das UPC Gerät funktioniert standardmäßig als Router. Damit zwei Router verwendet werden können, müssen auch alle Ports zwischen den beiden Routern weitergeleitet werden. Zunächst muss wie beschrieben das WLAN Modul deaktiviert werden. Danach geht es an die lokalen Netzwerkeinstellungen. Leider sind die Beschreibungen jedoch nicht ganz vollständig, es wurde darauf vergessen, den DNS Servereintrag ebenfalls auf die Class A IP Adresse 10.0.0.1 (also der IP des UPC Routers) zu setzen. Durch die falsche IP Adresse des DNS Servers, können logischerweise keine Hostnamen aufgelöst werden und das Surfen macht nur jenen mit ausgezeichnetem IP-Adressen-Gedächtnis Spaß. Im folgenden Screenshot sind die funktionierenden Settings eingetragen.

Das Diagramm des kleinen Heimnetzwerks ist im folgenden Listing abgebildet, erstellt habe ich es mit ASCII-Flow.

+------------+
                                                                          |            |
                                                                        +-+  Android   |
+-------------------+    +------------------+       +---------------+   | |            |
|                   |    |    UPC Router    |       |Netgear Router |   | +------------+
|      Internet     +----+   IP 10.0.0.1    +-------+IP 192.168.0.1 +---+
|                   |    |WAN 234.124.122.13|       | WAN 10.0.0.2  |   | +------------+
+-------------------+    +------------------+       +---------------+   | |            |
                                                                        +-+   Desktop  |
                                                                          |            |
                                                                          +------------+

Falls Sie Ports per Port Forwarding freigegeben haben, müssen Sie diese natürlich zuerst vom UPC Router auf ihren eigenen Router weiterleiten, also z.B. den Port 443 zum Zielrechner 10.0.0.2.

Migrating Markup Languages

There exists a broad range of different markup languages, which allow you to structure, highlight and format your documents nicely. Different wiki platforms utilise various markup languages for rendering texts, Github and other code sharing platforms use their own dedicated languages for their documentation files. These markup languages such as Textile or Markdown work rather similar, as they provide users with a basic set formats and structural elements for text. In order to avoid vendor lock-in or when you need to move to a different system, you will need to extract the information from the old markup and make it available in a different language. Copying the rendered text is insufficient, as the semantics contained in the markup is lost. Therefor you need to apply a tool such as pandoc for this task. The pandoc website says that the tool

can read Markdown, CommonMark, PHP Markdown Extra, GitHub-Flavored Markdown, and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, TWiki markup, Haddock markup, OPML, Emacs Org mode, DocBook, txt2tags, EPUB, ODT and Word docx; and it can write plain text, Markdown, CommonMark, PHP Markdown Extra, GitHub-Flavored Markdown, reStructuredText, XHTML, HTML5, LaTeX (including beamer slide shows), ConTeXt, RTF, OPML, DocBook, OpenDocument, ODT, Word docx, GNU Texinfo, MediaWiki markup, DokuWiki markup, Haddock markup, EPUB (v2 or v3), FictionBook2, Textile, groff man pages, Emacs Org mode, AsciiDoc, InDesign ICML, and Slidy, Slideous, DZSlides, reveal.js or S5 HTML slide shows. It can also produce PDF output on systems where LaTeX or ConTeXt is installed.

Pandoc’s enhanced version of Markdown includes syntax for footnotes, tables, flexible ordered lists, definition lists, fenced code blocks, superscripts and subscripts, strikeout, metadata blocks, automatic tables of contents, embedded LaTeX math, citations, and Markdown inside HTML block elements.

Install the tool from the Ubuntu repositories:

sudo apt-get install pandoc

You then can convert or migrate between the available formats, for instance from markdown to textile (i.e. Github to Redmine):

pandoc -f markdown_github -t textile inputFile.markdown -o outputFile.textile

For migrating your Github wiki pages for instance, you can checkout the wiki from Github and convert all markdown documents in a loop:

# clone repository
git clone https://gitlab.example.org/user/project.wiki.git
cd project.wiki

for fileName in *.markdown; do
    # Remove the extension markdown and replace it with textile
    pandoc -f markdown_github -t textile $fileName -o "${fileName#"markdown"}".textile
done