MySQL

Parsing SQL Statements

JDBC and the Limits of ResultSet Metadata

For my work in the area of data citation, I need to analyse queries, which are used for creating subsets. I am particularly interested in query parameters, sortings and filters. One of the most commonly used query languages is SQL, which is used by many relational database management systems such as MySQL. In some cases, the interaction with databases is abstract, meaning that there is hardly any SQL statements executed directly. The SQL statements are rather built on the fly by object relational mappers such as Hibernate. Other scenarios use SQL statements as String and also prepared statements, which are executed via JDBC. However,  analysing SQL statements is tricky as the language is very flexible.

In order to understand what columns have been selected, it is sufficient to utilise the ResultSet Metadata and retrieve the column names from there. In my case I need to extract this imformation from the query in advance and potentially enforce a specific sorting by adding columns to the ORDER BY clause. In this scenario, I need to parse the SQL statement and retrieve this information from the statement itself. Probably the best way to do this would be to implement a parser for the SQL dialect with ANTLR (ANother Tool for Language Recognition). But this is quite a challenge, so I decided to take a shortcut: FoundationDB.

The FoundationDB Parser

FoundationDB was a NoSQL database which provided several layers for supporting different paradigms at once. I am using past tense here, because the project got acquired by Apple in 2015 and since then does pursue the open source project any more. However, the Maven libraries for the software are still available at Maven Central. FoundationDB uses its own SQL parser, which understands standard SQL queries. These queries can be interpreted as a tree and the parser library allows traversing SQL statements and analyse the nodes. We can use this tree to parse and interpret SQL statements and extract additional information.

The Foundations of FoundationDB

The FoundationDB parser can be included into your own project with the following Maven dependency:

<dependency>
    <groupId>com.foundationdb&lt;/groupId&gt;
    <artifactId>fdb-sql-parser&lt;/artifactId&gt;
    <version>1.6.1&lt;/version&gt;
</dependency>

The usage of the parser is straight forward. We use the following example SQL statement as input:

	FROM tableA AS a, tableB AS b 
	WHERE a.firstColumn = b.secondColumn AND 
	b.thirdColumn < 5 
	ORDER BY a.thirdColumn,a.secondColumn DESC

The following function calls the parser and prints the tree of the statement.

	/*
     * Print a SQL statement
     * @param sqlString
     */
	public void parseSQLString(String sqlString) {
                Parser parser = new Parser(); 
		StatementNode stmt;
		try {
		    stmt = this.parser.parseStatement(sqlString);
                    stmt.treePrint();

		} catch (StandardException e) {
			e.printStackTrace();
		}
	}

The resulting tree is listed below. The statement has also been normalized, which ensures a stable sequence of the parameters.

name: null
updateMode: UNSPECIFIED
statementType: SELECT
resultSet: 	
	com.foundationdb.sql.parser.SelectNode@21b8d17c
	isDistinct: false
	resultColumns: 		
		com.foundationdb.sql.parser.ResultColumnList@6433a2

		[0]:		
		com.foundationdb.sql.parser.ResultColumn@5910e440
		exposedName: firstcolumn
		name: firstcolumn
		tableName: null
		isDefaultColumn: false
		type: null
		expression: 			
			com.foundationdb.sql.parser.ColumnReference@6267c3bb
			columnName: firstcolumn
			tableName: a
			type: null
		[1]:		
		com.foundationdb.sql.parser.ResultColumn@533ddba
		exposedName: secondcolumn
		name: secondcolumn
		tableName: null
		isDefaultColumn: false
		type: null
		expression: 			
			com.foundationdb.sql.parser.ColumnReference@246b179d
			columnName: secondcolumn
			tableName: b
			type: null
		[2]:		
		com.foundationdb.sql.parser.ResultColumn@7a07c5b4
		exposedName: thirdcolumn
		name: thirdcolumn
		tableName: null
		isDefaultColumn: false
		type: null
		expression: 			
			com.foundationdb.sql.parser.ColumnReference@26a1ab54
			columnName: thirdcolumn
			tableName: b
			type: null
	fromList: 		
		com.foundationdb.sql.parser.FromList@3d646c37

		[0]:		
		com.foundationdb.sql.parser.FromBaseTable@41cf53f9
		tableName: tablea
		updateOrDelete: null
		null
		correlation Name: a
		a
		[1]:		
		com.foundationdb.sql.parser.FromBaseTable@5a10411
		tableName: tableb
		updateOrDelete: null
		null
		correlation Name: b
		b
	whereClause: 		
		com.foundationdb.sql.parser.AndNode@2ef1e4fa
		operator: and
		methodName: and
		type: null
		leftOperand: 			
			com.foundationdb.sql.parser.BinaryRelationalOperatorNode@306a30c7
			operator: =
			methodName: equals
			type: null
			leftOperand: 				
				com.foundationdb.sql.parser.ColumnReference@b81eda8
				columnName: firstcolumn
				tableName: a
				type: null
			rightOperand: 				
				com.foundationdb.sql.parser.ColumnReference@68de145
				columnName: secondcolumn
				tableName: b
				type: null
		rightOperand: 			
			com.foundationdb.sql.parser.BinaryRelationalOperatorNode@27fa135a
			operator: <
			methodName: lessThan
			type: null
			leftOperand: 				
				com.foundationdb.sql.parser.ColumnReference@46f7f36a
				columnName: thirdcolumn
				tableName: b
				type: null
			rightOperand: 				
				com.foundationdb.sql.parser.NumericConstantNode@421faab1
				value: 5
				type: INTEGER NOT NULL
orderByList: 	
	com.foundationdb.sql.parser.OrderByList@2b71fc7e
	allAscending: false
	[0]:	
	com.foundationdb.sql.parser.OrderByColumn@5ce65a89
	ascending: true
	nullsOrderedLow: false
	columnPosition: -1
	expression: 		
		com.foundationdb.sql.parser.ColumnReference@25f38edc
		columnName: thirdcolumn
		tableName: a
		type: null
	[1]:	
	com.foundationdb.sql.parser.OrderByColumn@1a86f2f1
	ascending: false
	nullsOrderedLow: false
	columnPosition: -1
	expression: 		
		com.foundationdb.sql.parser.ColumnReference@3eb07fd3
		columnName: secondcolumn
		tableName: a
		type: null

This tree offers a lot of information, which can be used programmatically as well. In the top of the output, we can see that the statement was a SELECT statement and that it was not DISTINCT. Then follows the ResultSet, which contains a list of the three ResultColumns, which have been specified in the SELECT clause. We can see the column names and the table names from which they are drawn. The next block provides the referenced tables (the FROM list) and their alias names. The WHERE – block contains the operands which have been used for filtering and last but not least, there is the list of ORDER BY clauses and their sorting directions.

The Visitor

In order to access the information shown above programmatically, we need to access the content of the node one by one. This can be achieved with the visitor pattern, which traverses all the nodes of the tree. The following listing shows how the visitor pattern can be used for accessing the list of columns from the SELECT clause.

	**
     * Return a list of columns of a SELECT clause
     * @param sql
     * @return
     */
	public ArrayList selectColumnsList(String sql){
        SQLParser parser = new SQLParser();
        BooleanNormalizer normalizer = new BooleanNormalizer(parser);
        StatementNode stmt = null;

        try {
            stmt = parser.parseStatement(sql);
            stmt = normalizer.normalize(stmt);
        } catch (StandardException e) {
            e.printStackTrace();
        }


        final ArrayList<ResultColumn> columns = new ArrayList&lt;ResultColumn&gt;();
        Visitor v = new Visitor() {

            @Override
            public boolean visitChildrenFirst(Visitable node) {
                if (node instanceof SelectNode)
                    return true;
                return false;
            }

            @Override
            public Visitable visit(Visitable node) throws StandardException {
                if (node instanceof ResultColumn) {
                    ResultColumn resultColumn = (ResultColumn) node;
                    columns.add(resultColumn);
                    System.out.println("Column " + columns.size()+ ": " + resultColumn.getName());
                }
                return null;
            }

            @Override
            public boolean stopTraversal() {
                // TODO Auto-generated method stub
                return false;
            }

            @Override
            public boolean skipChildren(Visitable node) throws StandardException {
                if (node instanceof FromList) {
                    return true;
                }
                return false;
            }
        };

        try {
            stmt.accept(v);
        } catch (StandardException e) {
            e.printStackTrace();
        }

        return columns;

    }

This code example, we define a visitor which traverses all the ResultColumn nodes. Every time the current node is an instance of ResultColumn, we add this node to our list of columns. The nodes are only visited, if they are children of a SELECT statement. This is our entry point into the tree. We leave the tree when we reach the FROM list. We then apply the visitor to the statement, which initiates the traversal. As a result, we receive a list of columns which have been used for the result set.

In order to get the list of ORDER BY columns, we can utilise a similar approach. The following functions gives an example:

/**
     * Return list of order by clauses
     * @param sqlText
     * @return
     */
	public OrderByList orderByColumns(String sqlText){

		SQLParser parser = new SQLParser();
		BooleanNormalizer normalizer = new BooleanNormalizer(parser);
		StatementNode stmt;
        OrderByList orderByList = null;
        try {
			stmt = parser.parseStatement(sqlText);
			stmt = normalizer.normalize(stmt);
            CursorNode node = (CursorNode) stmt;
            orderByList = node.getOrderByList();
            int i=0;
            for(OrderByColumn orderByColumn : orderByList){
                i++;
                String direction;
                if(orderByColumn.isAscending()){
                    direction="ASC";
                }else{
                    direction="DESC";
                }
                System.out.println("ORDER BY Column " +i+ ": " +orderByColumn.getExpression().getColumnName()+ " Direction: " + direction );

            }
        } catch (StandardException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		return orderByList;


	}

This time, we retrieve the list of ORDER BY columns directly from the CurserNode. Similar principles can be used for manipulating SQL statements and apply a different sorting for instance.

Persistent Data in a MySQL Docker Container

Running MySQL in Docker

In a recent article on Docker in this blog, we presented some basics for dealing with data in containers. This article will present another popular application for Docker: MySQL containers. Running MySQL instances in Docker allows isolating database infrastructure with ease.

Connecting to the Standard MySQL Container

The description of the MySQL docker image provides a lot of useful information how to launch and connect to a MySQL container. The first step is to create standard MySQL container from the latest available image.

sudo docker run \
   --name=mysql-instance 
   -e MYSQL_ROOT_PASSWORD=secret 
   -p 3307:3306 
   -d 
   mysql:latest

This creates a MySQL container where the root password is set to secret. As the host is already running its own MySQL instance (which has nothing to do with this docker example), the standard port 3306 is already taken. Thus we publish utilise the port 3307 on the host system and forward it to the 3306 standard port from the container.

Connect from the Host

We can then connect from the command line like this:

mysql -uroot -psecret -h 127.0.0.1 -P3307

We could also provide the hostname localhost for connecting to the container, but as the MySQL client per default assumes that a localhost connection is via a socket, this would not work. Thus when using the hostname localhost, we needed to specify the protocol TCP, wo that the client connects via the network interface.

mysql -uroot -psecret -h localhost --protocol TCP -P3307

Connect from other Containers

Connecting from a different container to the MySQL container is pretty straight forward. Docker allows to link two containers and then use the exposed ports between them. The following command creates a new ubuntu container and links to the MySQL container.

sudo docker run -it --name ubuntu-container --link mysql-instance:mysql-link ubuntu:16.10 bash

After this command, you are in the terminal of the Ubuntu container. We then need to install the MySQL client for testing:

# Fetch the package list
root@7a44b3e7b088:/# apt-get update
# Install the client
root@7a44b3e7b088:/# apt-get install mysql-client
# Show environment variables
root@7a44b3e7b088:/# env

The last command gives you a list of environment variables, among which is the IP address and port of the MySQL container.

MYSQL_LINK_NAME=/ubuntu-container/mysql-link
HOSTNAME=7a44b3e7b088
TERM=xterm
MYSQL_LINK_ENV_MYSQL_VERSION=5.7.14-1debian8
MYSQL_LINK_PORT=tcp://172.17.0.2:3306
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MYSQL_LINK_PORT_3306_TCP_ADDR=172.17.0.2
MYSQL_LINK_PORT_3306_TCP=tcp://172.17.0.2:3306
PWD=/
MYSQL_LINK_PORT_3306_TCP_PORT=3306
SHLVL=1
HOME=/root
MYSQL_LINK_ENV_MYSQL_MAJOR=5.7
MYSQL_LINK_PORT_3306_TCP_PROTO=tcp
MYSQL_LINK_ENV_GOSU_VERSION=1.7
MYSQL_LINK_ENV_MYSQL_ROOT_PASSWORD=secret
_=/usr/bin/env

You can then connect either manually of by providing the variables

mysql -uroot -psecret -h 172.17.0.2
mysql -uroot -p$MYSQL_LINK_ENV_MYSQL_ROOT_PASSWORD -h $MYSQL_LINK_PORT_3306_TCP_ADDR -P $MYSQL_LINK_PORT_3306_TCP_PORT

If you only require a MySQL client inside a container, simply use the MySQL image from docker. Batteries included!

Plotting Colourful Graphs with R, RStudio and Ggplot2

The Aesthetics of Data Science

Data visualization is a powerful tool for communicating results and recently receives more and more attention due to the hype of data science. Integrating a meaningful graph into a paper or your thesis could improve readability and understandability more than any formulas or extended textual descriptions can. There exists a variety of different approaches for visualising data. Recently a lot of new Javascript based frameworks have gained quite some momentum, which can be used in Web applications and apps. A more classical work horse for data science is the R project and its plotting engine ggplot2. The reason why I decide to stick with R is its popularity and flexibility, which is still  impressive. Also with RStudio, there exists a convenient IDE which provides useful features for data scientists.

Plotting Graphs

In this blog post, I demonstrate how to plot time series data and use colours to highlight a specific aspect of data. As almost all techniques, R and ggplot2 require practise and training, which I realised again today when I spent quite a bit of time struggling with getting a simple plot right.

Currently I am evaluating two systems I developed and I needed to visualize their storage and execution time demands in comparison. My goal was to create a plot for each non-functional property, the execution time and the storage demand, while each plot should depict both systems’ performance. Each system runs a set of operations, think of create, read, update and delete operations (CRUD). Now for visualizing which of these operations has the most effects on the system, I needed to colourise each operation within one graph. This is the easy part. What was more tricky is to provide for each graph a defined set of colours, which can be mapped to each instance of the variable. Things which have the same meaning in both graphs should visualized in the same way, which requires a little hack.

Prerequisits

Install the following packages via apt

sudo apt-get install r-base r-recommended r-cran-ggplot2

and RStudio by downloading the deb – File from the project homepage.

Evaluation Data

As an example,we plan to evaluate the storage demand of two different systems and compare the results. Consider the following sample data.

# Set seed to get the same random numbers for this example
set.seed(42);
# Generate 200 random data records
N <- 200
# Generate a random, increasing sequence of integers that we assume is the storage demand in some unit
storage1 =sort(sample(1:100000, size = N, replace = TRUE),decreasing = FALSE)
storage2 = sort(sample(1:100000, size = N, replace = TRUE),decreasing = FALSE)
# Define the operations availabel and draw a random sample
operationTypes = c('CREATE','READ','UPDATE','DELETE')
operations = sample(operationTypes,N,replace=TRUE)
# Create the dataframe
df  df
     id storage1 storage2 operations
1     1       24      238     CREATE
2     2      139     1755     UPDATE
3     3      158     1869     UPDATE
4     4      228     2146       READ
5     5      395     2967     DELETE
6     6      734     3252     CREATE
7     7      789     4049     DELETE
8     8     2909     4109       READ
9     9     3744     4835     CREATE
10   10     3894     4990       READ

....

We created a random data set simulating the characteristics of system measurement data. As you can see, we have a list of operations of the four types CREATE, READ, UPDATE and DELETE and a measurement value for the storage demand in both systems.

The Simple Plot

Plotting two graphs of thecolumns storage1 and storage2 is straight forward.

# Simple plot
p1 <- ggplot(df, aes(x,y)) +
  geom_point(aes(x=id,y=storage1,color="Storage 1")) +
  geom_point(aes(x=id,y=storage2,color="Storage 2")) +
  ggtitle("Overview of Measurements") +
  xlab("Number of Operations") +
  ylab("Storage Demand in MB") +
  scale_color_manual(values=c("Storage 1"="forestgreen", "Storage 2"="aquamarine"), 
                     name="Measurements", labels=c("System 1", "System 2"))

print(p1)

We assign for each point plot a color. Note that the color nme “Storage 1” for instance of course does not denote a color, but it assignes a level for all points of the graph. This level can be thought of as a category, which ensures that all the points which belong to the same category have the same color. As you can see at the definition of the color scale, we assign the actual color to this level there.  This is the result:

Plotting Levels

A common task is to visualise categories or levels of measurement data. In this example, there are four different levels we could observe: CREATE, READ, UPDATE and DELETE.

# Plot with levels
p1 <- ggplot(df, aes(x,y)) +
  geom_point(aes(x=id,y=storage1,color=operations)) +
  geom_point(aes(x=id,y=storage2,color=operations)) +
  ggtitle("Overview of Measurements") +
  labs(color="Measurements") +
  scale_color_manual(values=c("CREATE"="darkgreen", 
                              "READ"="darkolivegreen", 
                              "UPDATE"="forestgreen", 
                              "DELETE"="yellowgreen"))
print(p1)

Instead of assigning two colours, one for each graph, we can also assign colours to the operations. As you can see in the definition of the graphs and the colour scale, we map the colours to the variable operations instead. As a result we get differently coloured points per operation, but we get these of course for both graphs in an identical fashion as the categories are the same for both measurements. The result looks like this:

Now this is obviously not what we want to achieve as we cannot differentiate between the two graphs any more.

Plotting the same Levels for both Graphs in Different Colours

This last part is a bit tricky, as ggplot2 does not allow assigning different colour schemes within one plot. There do exist some hacks for this, but the solution does not improve the readability of the code in my opinion. In order to apply different colour schemes for the two graphs while still using the categories, I appended two extra columns to the data set. If we append some differentiation between the two graphs and basically double the categories from four to eight, where each graph now uses its own four categories, we can also assign distinct colours to them.

df$operationsStorage1 <- paste(df$operations,"-Storage1", sep = '')
df$operationsStorage2 <- paste(df$operations,"-Storage2", sep = '')

p3 <- ggplot(df, aes(x,y)) +
  geom_point(aes(x=id,y=storage1,color=operationsStorage1)) +
  geom_point(aes(x=id,y=storage2,color=operationsStorage2)) +
  ggtitle("Overview of Measurements") +
  xlab("Number of Operations") +
  ylab("Storage Demand in MB") +
  labs(color="Operations") +
  scale_color_manual(values=c("CREATE-Storage1"="darkgreen", 
                              "READ-Storage1"="darkolivegreen", 
                              "UPDATE-Storage1"="forestgreen", 
                              "DELETE-Storage1"="yellowgreen",
                              "CREATE-Storage2"="aquamarine", 
                              "READ-Storage2"="dodgerblue",
                              "UPDATE-Storage2"="royalblue",
                              "DELETE-Storage2"="turquoise"))
print(p3)

We then assign the new column for each system individually as colour value. This ensures that each graph only considers the categories that we assigned in this step. Thus we can assign a different color scheme for wach graph and print the corresponding colours in the label (legend) next to the chart. This is the result:

Now we can see which operation was used at every measurement and still be able to distinguish between the two systems.

Neue Rezension: iX 11/2015

Im iX Magazin für professionelle Informationstechnik wurde unser Buch in der Ausgabe 11/2015 von Bernhard Röhrig rezensiert.

Gelobt wurden fachliche Qualität, Umfang, Testdatenbank und Layout, Kritik gab es für das Verstreuen der Neuerungen sowie kleineren Detailangaben zu speziellen Features.

Neue Rezension: Linux Magazin – Alles über MySQL

Das Linux Magazin hat in seiner Ausgabe 09/2015 die folgende Rezension veröffentlicht:


Dass dieses Buch wenige Fragen offen lassen will, machen schon seine Dimensionen deutlich: Über 800 Seiten ergeben einen soliden Wälzer. So stellen die Autoren ihren umfangreichen Erklärungen zu MySQL denn auch eine Einführung in das Thema Datenbanken an sich voran und machen den Leser mit den grundlegenden Techniken der Datenmodellierung bekannt.

Im zweiten Teil geht es dann systematisch um die MySQL-Administration: Er erläutert die Architektur der Datenbank, die Installation und die Bordwerkzeuge nebst einem Abstecher zu den Themen Backup und Monitoring sowie die Performance-Optimierung im Hinblick auf Hardware und Konfiguration. Der Optimierung von Abfragen ist dagegen ein eigenes Kapitel gewidmet. Außerdem werden in eigenen Kapiteln Replikation und Hochverfügbarkeit beziehungsweise Sicherheit diskutiert.

Der sehr umfangreiche dritte Teil widmet sich dann der Software-Entwicklung in und für MySQL: Angefangen von Stored Procedures über SQL-Programmierung bis zu Schnittstellen zu anderen Programmiersprachen. Auch NoSQL mit MySQL stellt er von Grund auf vor.

Die Autoren verstehen es, ihren Stoff gründlich und gut verständlich zu vermitteln. Zahlreiche Beispiele illustrieren das Dargebotene. Damit ist das Buch eine Empfehlung für angehende DBAs, aber auch für den gestandenen Admin, dem es als Referenz nützlich ist.

Quelle: Linux Magazin


Mehr Informationen zur aktuellen und früheren Auflagen des umfassenden Handbuchs finden Sie hier.

Die dritte Auflage des MySQL Handbuchs

Nachdem wir im Dezember erfahren hatten, dass sich die Bestände der zweiten Auflage zu Ende neigen, ging es für uns Autoren an die Planung der dritten, aktualisierten Auflage des MySQL Buchs. Von Jänner bis April 2015 haben Eva Zangerle, Wolfgang Gassler und ich an der Neuauflage unseres Buches gearbeitet. Eva, Wolfgang und ich haben seitenweise Release-Notes durchgeackert, Leser-feedback und wünsche eingearbeitet und natürlich so gut wie alle neuen Features berücksichtigt. Selbstverständlich haben wir auch das Datenset komplett neu erzeugt und mit Millionen von Einträgen gefüttert. Wir haben alle Beispiele selbst noch ein mal durch-getestet, damit uns auch ja nichts entgeht und haben den ein oder anderen Schnitzer ausgebessert.

Diese Auflage ist nun bereits unter dem neuen Verlagsnamen Rheinwerk Verlag erscheinen und seit 25. Mai verfügbar. Sie erhalten das Buch im gut sortierten Buchhandel, direkt beim Rheinwerk Verlag (versandkostenfrei) und natürlich auch bei den Großen. Details zur dritten Auflage und dem Prozess, der hinter einer Neuauflage steht, finden Sie hier.

There was always snow at Christmas – Retrieving and processing historical weather data with Forecast IO

The weather was unusually mild this winter and many people seemed to remember back to the days of their childhood. In our memories, there was always snow at Christmas, but is that actually true? The curious among us know what to do, so lets see what the Internet has to offer for verifying our hypothesis. Unfortunately there are hardly any open weather data sources allowing to retrieve the weather conditions for any historical date., but in this short blog post I would like to show you how you can combine a few APIs and a little Java programming in order to retrieve the appropriate data. The complete project can be downloaded at GitHub.

What we need

We will use the following technologies to get what we want:

  • Google Geo-Coding API to retrieve longitude and latitude coordinates
  • Forecast IO to retrieve the weather data for a specific location
  • MySQL database to store the data
  • Hibernate
  • Maven

What we want

We want to retrieve weather data for a specific location at a given time. This data should be stored in a relational database which allows processing the data conveniently.

How to get started

Recently I switched from Eclipse to IntelliJ and I am quite happy with it. What ever IDE you prefer, create a new Maven project and copy the following dependencies into the pom.xml in order to retrieve the libraries.

<dependencies<
<!-- Forecast IO API Wrapper --<
<dependency<
<groupId<com.github.dvdme</groupId<
<artifactId<ForecastIOLib</artifactId<
<version<1.5.1</version<
</dependency<

<!-- Google Geo API --<
<dependency<
<groupId<com.google.code.geocoder-java</groupId<
<artifactId<geocoder-java</artifactId<
<version<0.16</version<
</dependency<

<!--. Hibernate --<
<dependency<
<groupId<org.hibernate</groupId<
<artifactId<hibernate-core</artifactId<
<version<4.3.4.Final</version<
</dependency<
<dependency<
<groupId<org.hibernate</groupId<
<artifactId<hibernate-envers</artifactId<
<version<4.3.5.Final</version<
</dependency<

<!-- MySQL --<
<dependency<
<groupId<mysql</groupId<
<artifactId<mysql-connector-java</artifactId<
<version<5.1.31</version<
</dependency<
</dependencies<

Setup the Backend

In order to store the data we want to retrieve, we will create a Java POJO mapping the data structure from the server response. We also need to setup MySQL and Hibernate. The structure of the project is shown in the following image:

Create a new database in MySQL and assign user rights. You could also consider using an embedded database if you do not already have a running MySQL instance. The Hibernate configuration file is shown below:

<?xml version="1.0" encoding="UTF-8"?<

<!DOCTYPE hibernate-configuration PUBLIC "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd"<
<hibernate-configuration<
<session-factory<

<property name="hibernate.dialect"<org.hibernate.dialect.MySQL5InnoDBDialect</property<
<property name="show_sql"<false</property<
<!--  drop the tables every time: create-drop --<
<!--  use auto or update or create-drop --<
<property name="hibernate.hbm2ddl.auto"<validate</property<

<property name="hibernate.current_session_context_class"<thread</property<
<property name="format_sql"<true</property<
<property name="use_sql_comments"<false</property<

<property name="hibernate.connection.autocommit"<true</property<

<property name="hibernate.jdbc.batch_size"<20</property<

<mapping class="WeatherData.WeatherData"/<

</session-factory<

</hibernate-configuration<

The Hibernate session management is controlled with the following class. As you can see, sensitive information is read from Java property files.

public class HibernateUtil {
private static SessionFactory sessionFactory;
private static ServiceRegistry serviceRegistry;

static {
try {

String filename = "db.properties";
Properties prop = null;

prop = Helpers.readPropertyFile(filename);

String dbhost=prop.getProperty("dbhost");
String dbport=prop.getProperty("dbport");
String dbname=prop.getProperty("dbname");
String dbuser=prop.getProperty("dbuser");
String dbpw=prop.getProperty("dbpassword");

String mysqlString = "jdbc:mysql://" + dbhost + ":"+ dbport+ "/"+ dbname;
System.out.println("db string_ " + mysqlString);
Properties extraProperties=new Properties();
extraProperties.setProperty("hibernate.connection.url",mysqlString);
extraProperties.setProperty("hibernate.connection.username",dbuser);
extraProperties.setProperty("hibernate.connection.password",dbpw);

Configuration configuration = new Configuration();
configuration=configuration.configure("hibernate.cfg.xml");
configuration=configuration.addProperties(extraProperties);

configuration.configure();

serviceRegistry = new StandardServiceRegistryBuilder().applySettings(configuration.getProperties()).build();
sessionFactory = configuration.buildSessionFactory(serviceRegistry);

} catch (HibernateException he) {
System.err.println("Error creating Session: " + he);
throw new ExceptionInInitializerError(he);
}
}

public static SessionFactory getSessionFactory() {
return sessionFactory;
}
}

The database access information will be read during the runtime from the file db.properties, this gives you more flexibility an easier access to the sensitive data. We will use the same technique for the API key that you need to use the forecast API (see below). The property file for the database access simply contains this data:

# host address
dbhost=localhost
# port
dbport=3306
# database name
dbname=<<databasename<<
#user name
dbuser=<<username<<
# password
dbpassword=<<password<<

The Hibernate snippet for storing the data is very simple, it opens a connection and stores the data, which is a Weather Data object.

this.session = HibernateUtil.getSessionFactory().openSession();
this.session.beginTransaction();
this.session.save(weatherData);
this.session.getTransaction().commit();
this.session.flush();
this.session.close();

Google Geo Coder API

Google offers a convenient API which provides you the geo coordinates from any specific address that you provide. The following snippet retrieves the longitude and latitude coordinates. The following snippet shoes you how to get the data:

final Geocoder geocoder = new Geocoder();
GeocoderRequest geocoderRequest = new GeocoderRequestBuilder().setAddress(locationAddress).setLanguage("en").getGeocoderRequest();
GeocodeResponse geocoderResponse = null;

try {
geocoderResponse = geocoder.geocode(geocoderRequest);
} catch (IOException e) {
e.printStackTrace();
}

List<GeocoderResult< geoResultList = geocoderResponse.getResults();

System.out.println("Retrieved geo");
for (GeocoderResult result : geoResultList) {
address = result.getFormattedAddress();
GeocoderGeometry geometry = result.getGeometry();
latitude = String.valueOf(geometry.getLocation().getLat());
longitude = String.valueOf(geometry.getLocation().getLng());

System.out.println(result.getFormattedAddress() + " lat: " + latitude + " long: " + longitude);

}

Now that we have the coordinates, we can pass this data to the Forecast API.

Powered by Forecast IO

Forecast IO is a service which offers a REST API which you can call for retrieving historical weather data for any specific location. You need to register if you want to call the API, which gives you 1000 calls per day for free. The API is very well described and simple to use. We will use the Forecast IO API Wrapper in order to call the API from within Java. The API key is also stored in a property file called api.properties. Copy your code in this file.

forecast-key=<<YOUR API CODE<<

In Java we read teh key with the following snippet:

// set the API key
Helpers helper;
helper = new Helpers();
String filename = "api.properties";
Properties prop = null;
prop = Helpers.readPropertyFile(filename);
this.APIKEY = prop.getProperty("forecast-key");

Now you can access the API with the wrapper library. The code snippet retrieves the hourly weather data for a specified date.

        ForecastIO fio = new ForecastIO(this.APIKEY); //instantiate the class with the API key.
        fio.setUnits(ForecastIO.UNITS_SI);             //sets the units as SI - optional

        fio.setLang(ForecastIO.LANG_ENGLISH);

        fio.setTime(dateString);

        fio.getForecast(latitude, longitude);
      //  System.out.println("Latitude: " + fio.getLatitude());
      //  System.out.println("Longitude: " + fio.getLongitude());
      //  System.out.println("Timezone: " + fio.getTimezone());

        String key ="";
        String value ="";

        FIOHourly hourly = new FIOHourly(fio);

We then need to store the data in a Java object in order to persist it in the database.

        for (int i = 0; i < hourly.hours(); i++) {
            String[] h = hourly.getHour(i).getFieldsArray();
            String hour = String.valueOf(i + 1);
            System.out.println("Hour #" +hour);

            /**
             * Populate the map of data values
             */
            weatherDataHashMap.clear();
            weatherDataHashMap.put("Hour",hour);
            for (int j = 0; j < h.length; j++){

                key = h[j];
                value = hourly.getHour(i).getByKey(h[j]);
                if (value == null){
                    System.out.println("value war NULL");
                    value="";
                }
                System.out.println(key + ": " + value);
                System.out.println("\n");

                weatherDataHashMap.put(key,value);

            }</pre<
<pre<```


In a last step we need to populate the Java object and persist it in the database.

/**

  • Create the weather object */ WeatherData hourData; hourData = new WeatherData(); System.out.println("———- " + weatherDataHashMap.get(“Hour”)); hourData.setHour(this.parseIntValue(weatherDataHashMap.get(“Hour”))); hourData.setSummary(weatherDataHashMap.get(“summary”)); hourData.setIcon(weatherDataHashMap.get(“icon”)); hourData.setWindspeed(this.parseDoubleValue(weatherDataHashMap.get(“windSpeed”))); Date measureData = this.getDateFromString(weatherDataHashMap.get(“time”)); hourData.setWeather_timestamp(measureData); hourData.setHumidity(this.parseDoubleValue(weatherDataHashMap.get(“humidity”))); hourData.setVisibility(this.parseDoubleValue(weatherDataHashMap.get(“visibility”))); hourData.setWindBearing(this.parseIntValue(weatherDataHashMap.get(“windBearing”))); hourData.setApparentTemperature(this.parseDoubleValue(weatherDataHashMap.get(“apparentTemperature”))); hourData.setWindBearing(this.parseIntValue(weatherDataHashMap.get(“windBearing”))); hourData.setPrecipProbability(this.parseDoubleValue(weatherDataHashMap.get(“precipProbability”))); hourData.setPrecipIntensity(this.parseDoubleValue(weatherDataHashMap.get(“precipIntensity”))); hourData.setDewPoint(this.parseDoubleValue(weatherDataHashMap.get(“dewPoint”))); hourData.setTemperature(this.parseDoubleValue(weatherDataHashMap.get(“temperature”))); hourData.setPrecipType(this.removeQuotes(weatherDataHashMap.get(“precipType”))); hourData.setAddress(address); hourData.setLatitude(latitude); hourData.setLongitude(longitude);

this.persist(hourData);



If you loop over the data you can retrieve a lot of interesting weather data from the services and perform your analysis. You can use the MySQL Workbench directly or export the data for instance into CSV.

[<img class="aligncenter size-large wp-image-659" src="./media/2015/01/results-1024x275.png" alt="Results" width="730" height="196" srcset="./media/2015/01/results-1024x275.png 1024w, ./media/2015/01/results-300x81.png 300w, ./media/2015/01/results.png 1529w" sizes="(max-width: 730px) 100vw, 730px" />][3]

## The Chart

Now that we have the data, we can use a visualization API such as <a href="https://developers.google.com/chart/" target="_blank">Google Charts</a> for displaying and interpreting the data. All you need to do is export the data into an appropriate format. The following SQL statement retrieves all measurements at 11 p.m. for the available Christmas days and transforms them into the format which is expected by the Google Charts API.

SELECT DISTINCT CONCAT("[new Date (",YEAR(weather_timestamp),").12.24,",temperature,"],") FROM weatherDB.WeatherData WHERE hour=23 ORDER BY weather_timestamp DESC;



Then you can use the following HTML and JavaScript code for rendering the data.

Obviously you could also automate the creation of the chart as well. The complete example shown in this post can be downloaded from my GitHub page.

Virtuelle Maschinen – Die perfekte Testumgebung

Wer sich richtig mit MySQL auseinandersetzen möchte muss natürlich alles ausprobieren, an jeder Schraube drehen und jede Detaileinstellung testen. Da intensive Tests nicht an einem Produktivsystem durchgeführt werden sollten, bietet es sich an auf eine virtuelle Umgebung auszuweichen, die vom eigentlichen System abgekapselt ist. Eine solche Testumgebung lässt sich sehr leicht mit der kostenlosen Software VirtualBox realisieren. Eine genaue Anleitung dazu habe ich im Blog zum MySQL-Buch verfasst.

MySQL on DigitalOcean Servers Crashing

The smalles instance of a DigitalOcean droplet only provides 512GB of RAM. Do not forget that this rather low in comparison with other Web servers. If you do not check your configuration files for Apache and MySQL and leave them at defaults, you can easily run out of memory, because DO droplets come without swap by default. In the case of MySQL, your will be confronted with this error message:
InnoDB: Fatal error: cannot allocate memory for the buffer pool
The solution is rather simple, as provided by a comment in the support forum. Just follow this guide in order to enable some swap space. As the system is completely on a SSD, you should not have a huge performance impact.

MySQL 5.6 – Die zweite Auflage des Handbuchs für Administratoren

Fanfare, Freude, Feuerwerk! Vor wenigen Tagen ist die zweite Auflage unseres MySQL Administrationshandbuchs erschienen. Eva Zangerle, Wolfgang Gassler und ich haben uns in den letzten Monaten wieder an die Arbeit gemacht und alle wichtigen und vielschichtigen Neuerungen der Version 5.6 von MySQL in eine neue Auflage eingearbeitet. Vor zwei Jahren ist die erste Version dieses Buchs erschienen. Wir haben tolles Feedback bekommen, mit Lesern diskutiert und dabei selbst sehr viel gelernt. Nicht nur das Buchcover ist frisch und erstrahlt in neuem Glanz, denn die brandneue Auflage umfasst sehr viele Neuerungen. Wir haben sowohl die teilweise subtilen Änderungen von Variablen, Einstellungen und Tuningparameter an die aktuelle Version 5.6 angepasst, als auch ganze Kapitel überarbeitet. In die zweite Auflage haben wir gänzlich neue Themen und Features aufgenommen, welche die neue GA Version von MySQL auszeichnen.

Zu den besonderen Neuerungen in dieser Auflage zählen vor allem:

  •  die neue NoSQL-Schnittstelle für skalierbare Webanwendungen
  •  der Volltextindex jetzt auch für InnoDB
  •  das neue PERFORMANCE_SCHEMA zur Performanceanalyse
  •  aktuelle Programmierschnittstellen wie Node.js
  •  erweiterte Sicherheitsfeatures

Natürlich werden diese Neuerungen mit dem altbewährten und durchgängigen Praxisbeispiel vorgestellt und erläutert. Aktuelle Details und News finden Sie auf unserem Buchblog <www.mysqladmin.at>