Gnuplot and the Beta Distribution

I recently discovered gnuplot, a command-line plotting tool. I discovered it while trying to get some work done with Maxima, a Computer Algebra System Software. Gnuplot doesn’t make plots as pretty as R’s ggplots or d3 – but it is flexible and you get visual impressions of your math functions quickly. Here is a little animation that I have done with the Beta distribution: code and result.


Visualizing networks with ggplot2 in R

When I had to visualize some network data last semester in my social network analysis class, I wasn’t happy with the plot function in R‘s sna-package. It is not very flexible and doesn’t allow to modify the graph figure flexible. Thus, I decided to write a little function to visualize network data with the ggplot2 engine.

The biggest challenge in network visualization is usually to come up with the coordinates of the nodes in the two dimensional space. The sna-package relies on a set of functions that can calculate a set of optimal coordinates with respect to some criteria. Two of the most prominente algorithms (Fruchterman & Reingold’s force-directed placement algorithm and Kamada-Kawai’s) are implemented in the sna-package function gplot.layout.fruchtermanreingold and gplot.layout.kamadakawai. Both can be used in my function below.

In the first part of the function, the layout function calculates the coordinates for every node in a two dimensional space. In line 14 to 18 the function takes the node coordinates and combines them with the edge list data to come up with the coordinate pairs to characterize the edges in the network.
In the middle part the data are passed to the ggplot function and used to plot the nodes (a set of points) and edges (a set of segments). In line 26 to 30 I am discarding the default grid from the ggplot figure and other default layout elements. The last part of the code generates a random network and passes it to the plot function.

library(network)
library(ggplot2)
library(sna)
library(ergm)


plotg <- function(net, value=NULL) {
	m <- as.matrix.network.adjacency(net) # get sociomatrix
	# get coordinates from Fruchterman and Reingold's force-directed placement algorithm.
	plotcord <- data.frame(gplot.layout.fruchtermanreingold(m, NULL)) 
	# or get it them from Kamada-Kawai's algorithm: 
	# plotcord <- data.frame(gplot.layout.kamadakawai(m, NULL)) 
	colnames(plotcord) = c("X1","X2")
	edglist <- as.matrix.network.edgelist(net)
	edges <- data.frame(plotcord[edglist[,1],], plotcord[edglist[,2],])
	plotcord$elements <- as.factor(get.vertex.attribute(net, "elements"))
	colnames(edges) <-  c("X1","Y1","X2","Y2")
	edges$midX  <- (edges$X1 + edges$X2) / 2
	edges$midY  <- (edges$Y1 + edges$Y2) / 2
	pnet <- ggplot()  + 
			geom_segment(aes(x=X1, y=Y1, xend = X2, yend = Y2), 
				data=edges, size = 0.5, colour="grey") +
			geom_point(aes(X1, X2,colour=elements), data=plotcord) +
			scale_colour_brewer(palette="Set1") +
			scale_x_continuous(breaks = NA) + scale_y_continuous(breaks = NA) +
			# discard default grid + titles in ggplot2 
			opts(panel.background = theme_blank()) + opts(legend.position="none")+
			opts(axis.title.x = theme_blank(), axis.title.y = theme_blank()) +
			opts( legend.background = theme_rect(colour = NA)) + 
			opts(panel.background = theme_rect(fill = "white", colour = NA)) + 
			opts(panel.grid.minor = theme_blank(), panel.grid.major = theme_blank())
	return(print(pnet))
}


g <- network(150, directed=FALSE, density=0.03)
classes <- rbinom(150,1,0.5) + rbinom(150,1,0.5) + rbinom(150,1,0.5)
set.vertex.attribute(g, "elements", classes)

plotg(g)

I was too lazy to make this function more general (and user friendly). That’s why, for most practical purposes it needs to be modified to make pretty visualization – but nevertheless I hope that it provides a useful jumping point for others. Some of my plots from the class are below. I included them to show the flexibility when using the ggplot2 engine instead of sna’s default plot function. Unfortunately I can’t post the data for these networks.

Update: Another interesting approach to visualized geocoded network data in R is explained in the FlowingData blog.


Graphics with Processing

When I like to visualize data, I am mostly using the gglot2 library in R. My impression is that this package provides the most advance data visualization environment in R. But there is one thing, that you can’t do with ggplot2: Creating¬†interactive graphics.

If I need an interactive graphic, Processing is my choice. Processing is a dialect of Java and provides a (simple) programming environment for creating data visualizations. The cool thing about Processing is that you can display your interactive graphics on a website using the JavaScript library processing.js and the HTML5 canvas-tag.

There is a fair number of tutorials in the web. I recommend to get your hands on Visualizing Data written by one of the developers of Processing, Ben Fry. It has some where useful examples and is written super accesible.

Other tutorials I found useful:

There is one thing, that took me a while to recognize: If you use processing.js you need to put the processing code inside the html body-tag! If it is outside (in the head-tag), it won’t work.