Visualizing networks with ggplot2 in R

When I had to visualize some network data last semester in my social network analysis class, I wasn’t happy with the plot function in R‘s sna-package. It is not very flexible and doesn’t allow to modify the graph figure flexible. Thus, I decided to write a little function to visualize network data with the ggplot2 engine.

The biggest challenge in network visualization is usually to come up with the coordinates of the nodes in the two dimensional space. The sna-package relies on a set of functions that can calculate a set of optimal coordinates with respect to some criteria. Two of the most prominente algorithms (Fruchterman & Reingold’s force-directed placement algorithm and Kamada-Kawai’s) are implemented in the sna-package function gplot.layout.fruchtermanreingold and gplot.layout.kamadakawai. Both can be used in my function below.

In the first part of the function, the layout function calculates the coordinates for every node in a two dimensional space. In line 14 to 18 the function takes the node coordinates and combines them with the edge list data to come up with the coordinate pairs to characterize the edges in the network.
In the middle part the data are passed to the ggplot function and used to plot the nodes (a set of points) and edges (a set of segments). In line 26 to 30 I am discarding the default grid from the ggplot figure and other default layout elements. The last part of the code generates a random network and passes it to the plot function.

library(network)
library(ggplot2)
library(sna)
library(ergm)


plotg <- function(net, value=NULL) {
	m <- as.matrix.network.adjacency(net) # get sociomatrix
	# get coordinates from Fruchterman and Reingold's force-directed placement algorithm.
	plotcord <- data.frame(gplot.layout.fruchtermanreingold(m, NULL)) 
	# or get it them from Kamada-Kawai's algorithm: 
	# plotcord <- data.frame(gplot.layout.kamadakawai(m, NULL)) 
	colnames(plotcord) = c("X1","X2")
	edglist <- as.matrix.network.edgelist(net)
	edges <- data.frame(plotcord[edglist[,1],], plotcord[edglist[,2],])
	plotcord$elements <- as.factor(get.vertex.attribute(net, "elements"))
	colnames(edges) <-  c("X1","Y1","X2","Y2")
	edges$midX  <- (edges$X1 + edges$X2) / 2
	edges$midY  <- (edges$Y1 + edges$Y2) / 2
	pnet <- ggplot()  + 
			geom_segment(aes(x=X1, y=Y1, xend = X2, yend = Y2), 
				data=edges, size = 0.5, colour="grey") +
			geom_point(aes(X1, X2,colour=elements), data=plotcord) +
			scale_colour_brewer(palette="Set1") +
			scale_x_continuous(breaks = NA) + scale_y_continuous(breaks = NA) +
			# discard default grid + titles in ggplot2 
			opts(panel.background = theme_blank()) + opts(legend.position="none")+
			opts(axis.title.x = theme_blank(), axis.title.y = theme_blank()) +
			opts( legend.background = theme_rect(colour = NA)) + 
			opts(panel.background = theme_rect(fill = "white", colour = NA)) + 
			opts(panel.grid.minor = theme_blank(), panel.grid.major = theme_blank())
	return(print(pnet))
}


g <- network(150, directed=FALSE, density=0.03)
classes <- rbinom(150,1,0.5) + rbinom(150,1,0.5) + rbinom(150,1,0.5)
set.vertex.attribute(g, "elements", classes)

plotg(g)

I was too lazy to make this function more general (and user friendly). That’s why, for most practical purposes it needs to be modified to make pretty visualization – but nevertheless I hope that it provides a useful jumping point for others. Some of my plots from the class are below. I included them to show the flexibility when using the ggplot2 engine instead of sna’s default plot function. Unfortunately I can’t post the data for these networks.

Update: Another interesting approach to visualized geocoded network data in R is explained in the FlowingData blog.

Advertisements