How to change your Dropbox folder name

Somebody wrote a nice Python script to change the name of a Dropbox folder (currently the preference settings only allow to change the location of the folder, but not the name!). Here is the forum link with the script. You can also do it manually if you have a Mac. Just navigate to ~/.dropbox and spot the .db-file (on my machine its config.db). Open it with sqlite3 ~/.dropbox/config.db and change the ‘value’-value for the row where key=”dropbox_path” to the desired path,
e.g via " UPDATE config SET value="~/Cloud" WHERE key="dropbox_path"; ".


Chrome Browser History in R

Visualizing the number of visits per websites over months is pretty easy. I took the following steps to extract the data and visualize them.

1) Find the data. Google Chrome stores the browsing history in SQLite files. If you have Mac, you can easily look into them by first navigating in the Terminal to the folder with the files (usually:
~/Library/Application\ Support/Google/Chrome/Default ) and then type: sqlite3 Archived\ History . This opens the SQLite client. You can now look at the table layout visits in the file Archived History via:

.tables
.schema visits

2.) Extract the data. Next I wanted to have a csv-file with the timestamp of the visit and the URL. I extracted the data with a modified Python script, that I found here.

import os
import datetime
import sqlite3
import codecs, re

pattern = "(((http)|(https))(://)(www.)|().*?)\.[a-z]*/"

SQL_STATEMENT = 'SELECT urls.url, visit_time FROM visits, urls WHERE visits.url=urls.id;'

storage = codecs.open('out.csv', 'w', 'utf-8')

def date_from_webkit(webkit_timestamp):
    epoch_start = datetime.datetime(1601,1,1)
    delta = datetime.timedelta(microseconds=int(webkit_timestamp))
    return epoch_start + delta

paths = ["~/Archived History", "~/History"] 

for path in paths:
	c = sqlite3.connect(path)
	for row in c.execute(SQL_STATEMENT):
		date_time = date_from_webkit(row[1])
		url = re.search(pattern, row[0])
		try: urlc = url.group(0)
		except: urlc = "ERROR"
		storage.write(str(date_time)[0:19] + "\t" + urlc + "\n")

The script opens two history files from the Google Chrome Browser and selects two variables from them (urls.url, visit_time). Using the function date_from_webkit it walks through the timestamp variable and converts it to a more readable format (“%Y-%m-%d %H:%M:%S). It also extracts the domain of the URL using a regular expression defined in the variable pattern. The last step outputs a csv file with a timestamp column and a short URL.

3.) Visualize. The output of the Python script can be easily imported into R for any kind of analysis. I made the graphic above with the following code:

library(plyr)
library(ggplot2)

# Import the data
data <- read.csv("out.csv", sep="\t")
colnames(data) <- c("time","url")
data$time <- as.POSIXct(data$time)
data$day <- format(data$time, format="%Y-%m-%d")

# Count visits per day
lgrep <- function(x,pat){ c <- grep(pat, x$url); return(length(c)) }

counts.google <- ddply(data, .(day), "lgrep", pat="www.google", .progress="text")
counts.mail <- ddply(data, .(day), "lgrep", pat="mail.google", .progress="text")
counts.facebook <- ddply(data, .(day), "lgrep", pat="facebook", .progress="text")
counts.spiegel <- ddply(data, .(day), "lgrep", pat="spiegel", .progress="text")
counts.nytimes <- ddply(data, .(day), "lgrep", pat="nytimes", .progress="text")
counts.wiki <- ddply(data, .(day), "lgrep", pat="wikipedia", .progress="text")
counts.leo <- ddply(data, .(day), "lgrep", pat="dict.leo", .progress="text")
counts.hulu <- ddply(data, .(day), "lgrep", pat="hulu", .progress="text")

# Make new data.frame
df <- data.frame(day=counts.google$day, Google=counts.google$lgrep, GMail = counts.mail$lgrep, Facebook=counts.facebook$lgrep, SPIEGEL=counts.spiegel$lgrep, NYTimes=counts.nytimes$lgrep, Wikipedia=counts.wiki$lgrep, Leo=counts.leo$lgrep, hulu=counts.hulu$lgrep) 
em <- melt(df, id = "day")

# Plot it 
ggplot(aes(as.Date(day), value, color = variable), colour=clarity , data=em) + 
	scale_x_date('') + 
	stat_smooth() + 
	scale_y_continuous('visits') + 
	geom_line(alpha=0.10) +  
	geom_point(alpha=0.20) + 
	opts(legend.title = theme_text(colour = 'white', size = 0)) + 
	scale_colour_brewer(palette="Set1")

How MySQL supports data collection and analysis

Frequently I scrape textual data from the Internet or digital documents (e.g. pdfs) and combine these data with some other data for an analysis. I usually use not only Python for scraping/refining, but also R. And usually I also switch between both environments during data collection, refining and analysis. A typical workflow goes like this: Scrape the data and make a basic cleanup (strip html etc.), send the data to R for some data aggregation and re-structuring (merge with other data etc.), send the data to Processing to make some nice interactive graphics (if necessary) and finally go back to R and run a model. The key question is: How to exchange the data between these three environments? The most obvious answer: Use csv-files. But in my experience, a local MySQL database is more useful, since then it is easier to:

  • subset the data and selectively import / export data to / from each environment
  • directly search the data without importing the full dataset (e.g –> R is super-slow in searching text vectors)
  • separate tables in a MySQL database can be used to make every refining step reversible
  • easy to migrate the data to the web

Of course, all these advantages only apply if you work with big datasets. One the other hand, if you work with textual data, you certainly quickly approaching “big”.

There are tons of tutorials in the web, explaining how to use set-up MySQL and use it with Python, R and Processing. Here is a list of those that I found most helpful at the beginning:

Some hints:

  • Install the 5.1. Version of MySQL – not the 5.5! Looks like that the new version has some bugs (see also here). After installation my MySQL server didn’t start at all.
  • If you get an error while installing MySQL-Python (the driver to connect from Python to MySQL) via easy_install, use this (replace XYZ with your MySQL Version!): PATH=$PATH:/usr/local/mysql-XYZ/bin sudo easy_install -Z MySQL-python No worries, this is only modifying your PATH once – not permanently! (Source)
  • if you want to play around without installing MySQL, download XAMPP and create a socket using sudo ln -s /tmp/mysql.sock /Applications/XAMPP/xamppfiles/var/mysql/mysql.sock That way R/Python/Processing can connect to it.