Graphics with Processing

When I like to visualize data, I am mostly using the gglot2 library in R. My impression is that this package provides the most advance data visualization environment in R. But there is one thing, that you can’t do with ggplot2: Creating interactive graphics.

If I need an interactive graphic, Processing is my choice. Processing is a dialect of Java and provides a (simple) programming environment for creating data visualizations. The cool thing about Processing is that you can display your interactive graphics on a website using the JavaScript library processing.js and the HTML5 canvas-tag.

There is a fair number of tutorials in the web. I recommend to get your hands on Visualizing Data written by one of the developers of Processing, Ben Fry. It has some where useful examples and is written super accesible.

Other tutorials I found useful:

There is one thing, that took me a while to recognize: If you use processing.js you need to put the processing code inside the html body-tag! If it is outside (in the head-tag), it won’t work.


How MySQL supports data collection and analysis

Frequently I scrape textual data from the Internet or digital documents (e.g. pdfs) and combine these data with some other data for an analysis. I usually use not only Python for scraping/refining, but also R. And usually I also switch between both environments during data collection, refining and analysis. A typical workflow goes like this: Scrape the data and make a basic cleanup (strip html etc.), send the data to R for some data aggregation and re-structuring (merge with other data etc.), send the data to Processing to make some nice interactive graphics (if necessary) and finally go back to R and run a model. The key question is: How to exchange the data between these three environments? The most obvious answer: Use csv-files. But in my experience, a local MySQL database is more useful, since then it is easier to:

  • subset the data and selectively import / export data to / from each environment
  • directly search the data without importing the full dataset (e.g –> R is super-slow in searching text vectors)
  • separate tables in a MySQL database can be used to make every refining step reversible
  • easy to migrate the data to the web

Of course, all these advantages only apply if you work with big datasets. One the other hand, if you work with textual data, you certainly quickly approaching “big”.

There are tons of tutorials in the web, explaining how to use set-up MySQL and use it with Python, R and Processing. Here is a list of those that I found most helpful at the beginning:

Some hints:

  • Install the 5.1. Version of MySQL – not the 5.5! Looks like that the new version has some bugs (see also here). After installation my MySQL server didn’t start at all.
  • If you get an error while installing MySQL-Python (the driver to connect from Python to MySQL) via easy_install, use this (replace XYZ with your MySQL Version!): PATH=$PATH:/usr/local/mysql-XYZ/bin sudo easy_install -Z MySQL-python No worries, this is only modifying your PATH once – not permanently! (Source)
  • if you want to play around without installing MySQL, download XAMPP and create a socket using sudo ln -s /tmp/mysql.sock /Applications/XAMPP/xamppfiles/var/mysql/mysql.sock That way R/Python/Processing can connect to it.