How to change your Dropbox folder name

Somebody wrote a nice Python script to change the name of a Dropbox folder (currently the preference settings only allow to change the location of the folder, but not the name!). Here is the forum link with the script. You can also do it manually if you have a Mac. Just navigate to ~/.dropbox and spot the .db-file (on my machine its config.db). Open it with sqlite3 ~/.dropbox/config.db and change the ‘value’-value for the row where key=”dropbox_path” to the desired path,
e.g via " UPDATE config SET value="~/Cloud" WHERE key="dropbox_path"; ".

How to setup R and JAGS on Amazon EC2

Yesterday I setup a Amazon EC2 machine and installed R, JAGS and rjags. Unfortunately I was unable to compile the latest version of JAGS. The configuration step failed with an error that says that a specific LAPACK library is missing. I tried to install the missing library manually but for some reason JAGS doesn’t recognize the new version. After an hour, I decided to go with the older JAGS version. Here is how I installed all three packages.

[If you never worked on an EC2 machine: Here are two [1,2] helpful tutorials how start an EC2 instance and install the command line tool on your computer to access it via ssh.]

Login via ssh. In a first step we install an older version of R. It is necessary to use the older version in order to get JAGS 1.0.4 running later. After downloading the R source file, we unzip it, cd into the folder and start installing a couple of additional libraries and compilers that we need to compile R (and later JAGS). We then configure the R installer without any x-window support and install everything. The last line makes sure that you can start R with simply typing ‘R’.

cd ~
tar xf R-2.10.0.tar.gz
cd ~/R-2.10.0
sudo yum install gcc
sudo yum install gcc-c++
sudo yum install gcc-gfortran
sudo yum install readline-devel
sudo yum install make
./configure --with-x=no

Next, try to start R and install the coda package.


Now, we proceed with installing JAGS. First, get some libraries and the JAGS source code. Unzip, cd into the folder and run configure with specifying the shared library folder. The specification of the shared library path is crucial, otherwise rjags fails to load in R.

sudo yum install blas-devel lapack-devel
cd ~/download/
tar xf JAGS-1.0.4.tar.gz
cd ~/download/JAGS-1.0.4
./configure --libdir=/usr/local/lib64 
sudo make install

In a last step, download the appropriate rjags version and install it it.

cd ~/download/
R --with-jags-modules=/usr/local/lib/JAGS/modules/ CMD INSTALL rjags_1.0.3-13.tar.gz 

Sources of inspiration. Here, here, here and here.

Visualizing networks with ggplot2 in R

When I had to visualize some network data last semester in my social network analysis class, I wasn’t happy with the plot function in R‘s sna-package. It is not very flexible and doesn’t allow to modify the graph figure flexible. Thus, I decided to write a little function to visualize network data with the ggplot2 engine.

The biggest challenge in network visualization is usually to come up with the coordinates of the nodes in the two dimensional space. The sna-package relies on a set of functions that can calculate a set of optimal coordinates with respect to some criteria. Two of the most prominente algorithms (Fruchterman & Reingold’s force-directed placement algorithm and Kamada-Kawai’s) are implemented in the sna-package function gplot.layout.fruchtermanreingold and gplot.layout.kamadakawai. Both can be used in my function below.

In the first part of the function, the layout function calculates the coordinates for every node in a two dimensional space. In line 14 to 18 the function takes the node coordinates and combines them with the edge list data to come up with the coordinate pairs to characterize the edges in the network.
In the middle part the data are passed to the ggplot function and used to plot the nodes (a set of points) and edges (a set of segments). In line 26 to 30 I am discarding the default grid from the ggplot figure and other default layout elements. The last part of the code generates a random network and passes it to the plot function.


plotg <- function(net, value=NULL) {
	m <- # get sociomatrix
	# get coordinates from Fruchterman and Reingold's force-directed placement algorithm.
	plotcord <- data.frame(gplot.layout.fruchtermanreingold(m, NULL)) 
	# or get it them from Kamada-Kawai's algorithm: 
	# plotcord <- data.frame(gplot.layout.kamadakawai(m, NULL)) 
	colnames(plotcord) = c("X1","X2")
	edglist <-
	edges <- data.frame(plotcord[edglist[,1],], plotcord[edglist[,2],])
	plotcord$elements <- as.factor(get.vertex.attribute(net, "elements"))
	colnames(edges) <-  c("X1","Y1","X2","Y2")
	edges$midX  <- (edges$X1 + edges$X2) / 2
	edges$midY  <- (edges$Y1 + edges$Y2) / 2
	pnet <- ggplot()  + 
			geom_segment(aes(x=X1, y=Y1, xend = X2, yend = Y2), 
				data=edges, size = 0.5, colour="grey") +
			geom_point(aes(X1, X2,colour=elements), data=plotcord) +
			scale_colour_brewer(palette="Set1") +
			scale_x_continuous(breaks = NA) + scale_y_continuous(breaks = NA) +
			# discard default grid + titles in ggplot2 
			opts(panel.background = theme_blank()) + opts(legend.position="none")+
			opts(axis.title.x = theme_blank(), axis.title.y = theme_blank()) +
			opts( legend.background = theme_rect(colour = NA)) + 
			opts(panel.background = theme_rect(fill = "white", colour = NA)) + 
			opts(panel.grid.minor = theme_blank(), panel.grid.major = theme_blank())

g <- network(150, directed=FALSE, density=0.03)
classes <- rbinom(150,1,0.5) + rbinom(150,1,0.5) + rbinom(150,1,0.5)
set.vertex.attribute(g, "elements", classes)


I was too lazy to make this function more general (and user friendly). That’s why, for most practical purposes it needs to be modified to make pretty visualization – but nevertheless I hope that it provides a useful jumping point for others. Some of my plots from the class are below. I included them to show the flexibility when using the ggplot2 engine instead of sna’s default plot function. Unfortunately I can’t post the data for these networks.

Update: Another interesting approach to visualized geocoded network data in R is explained in the FlowingData blog.

Chrome Browser History in R

Visualizing the number of visits per websites over months is pretty easy. I took the following steps to extract the data and visualize them.

1) Find the data. Google Chrome stores the browsing history in SQLite files. If you have Mac, you can easily look into them by first navigating in the Terminal to the folder with the files (usually:
~/Library/Application\ Support/Google/Chrome/Default ) and then type: sqlite3 Archived\ History . This opens the SQLite client. You can now look at the table layout visits in the file Archived History via:

.schema visits

2.) Extract the data. Next I wanted to have a csv-file with the timestamp of the visit and the URL. I extracted the data with a modified Python script, that I found here.

import os
import datetime
import sqlite3
import codecs, re

pattern = "(((http)|(https))(://)(www.)|().*?)\.[a-z]*/"

SQL_STATEMENT = 'SELECT urls.url, visit_time FROM visits, urls WHERE;'

storage ='out.csv', 'w', 'utf-8')

def date_from_webkit(webkit_timestamp):
    epoch_start = datetime.datetime(1601,1,1)
    delta = datetime.timedelta(microseconds=int(webkit_timestamp))
    return epoch_start + delta

paths = ["~/Archived History", "~/History"] 

for path in paths:
	c = sqlite3.connect(path)
	for row in c.execute(SQL_STATEMENT):
		date_time = date_from_webkit(row[1])
		url =, row[0])
		try: urlc =
		except: urlc = "ERROR"
		storage.write(str(date_time)[0:19] + "\t" + urlc + "\n")

The script opens two history files from the Google Chrome Browser and selects two variables from them (urls.url, visit_time). Using the function date_from_webkit it walks through the timestamp variable and converts it to a more readable format (“%Y-%m-%d %H:%M:%S). It also extracts the domain of the URL using a regular expression defined in the variable pattern. The last step outputs a csv file with a timestamp column and a short URL.

3.) Visualize. The output of the Python script can be easily imported into R for any kind of analysis. I made the graphic above with the following code:


# Import the data
data <- read.csv("out.csv", sep="\t")
colnames(data) <- c("time","url")
data$time <- as.POSIXct(data$time)
data$day <- format(data$time, format="%Y-%m-%d")

# Count visits per day
lgrep <- function(x,pat){ c <- grep(pat, x$url); return(length(c)) } <- ddply(data, .(day), "lgrep", pat="", .progress="text")
counts.mail <- ddply(data, .(day), "lgrep", pat="", .progress="text")
counts.facebook <- ddply(data, .(day), "lgrep", pat="facebook", .progress="text") <- ddply(data, .(day), "lgrep", pat="spiegel", .progress="text")
counts.nytimes <- ddply(data, .(day), "lgrep", pat="nytimes", .progress="text") <- ddply(data, .(day), "lgrep", pat="wikipedia", .progress="text")
counts.leo <- ddply(data, .(day), "lgrep", pat="dict.leo", .progress="text")
counts.hulu <- ddply(data, .(day), "lgrep", pat="hulu", .progress="text")

# Make new data.frame
df <- data.frame($day,$lgrep, GMail = counts.mail$lgrep, Facebook=counts.facebook$lgrep,$lgrep, NYTimes=counts.nytimes$lgrep,$lgrep, Leo=counts.leo$lgrep, hulu=counts.hulu$lgrep) 
em <- melt(df, id = "day")

# Plot it 
ggplot(aes(as.Date(day), value, color = variable), colour=clarity , data=em) + 
	scale_x_date('') + 
	stat_smooth() + 
	scale_y_continuous('visits') + 
	geom_line(alpha=0.10) +  
	geom_point(alpha=0.20) + 
	opts(legend.title = theme_text(colour = 'white', size = 0)) +