IR dyad-year dataset with 10 lines of dplyr + tidyr code

Directed (country)-dyad-year datasets are quite common in International Relations (IR) research and from time to time one requires an empty one. Creating this used to be a hassle with +50 lines of code (see example in PERL or R). Luckily, with tidyr + dplyr + pipping building such a dataset requires at most 10 lines of code:

system <- read.csv("./raw data/states2011.csv")
system <- system %>%  select(ccode, styear, endyear) 
system <- system %>%  expand(statea=ccode, stateb=ccode, year=seq(1816,2011)) %>%
				  filter(statea!=stateb) %>% 
				  left_join(., system, by=c("statea"="ccode")) %>%
				  filter(year >= styear & year <= endyear) %>%
				  select(-styear,-endyear) %>% 
				  left_join(., system, by=c("stateb"="ccode")) %>%
				  filter(year >= styear & year <= endyear) %>%
				  select(-styear,-endyear)

First, the code block creates all possible country-country-year pairings (line 3) and then filters out the dyad-years which are inadmissible either because a) the dyad involves the same country (line 4) or b) at least one of the dyad-members does not exist in a particular year (lines 5-10).

Data Source: Correlates of War System Membership 2011

Advertisements

2 Comments on “IR dyad-year dataset with 10 lines of dplyr + tidyr code”

  1. pinheiro.f@gmail.com says:

    Thanks for the code. But notice something: it produces repeated dyads. For example, I have Brazil and US in 1990 and US and Brazil in 1990 in the same dataset. How can I solve that using dplyr or tidyd?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s