Introduction to OpenCPU for R on EC2 with Python

OpenCPUopencpu is (simply put) a server implementing a RESTful web API for remotely executing R functions and retrieving results. In this tutorial I am going to showcase how OpenCPU can be installed on an EC2 instance running Ubuntu 14.04. Python and its requests package come into play for the purpose of conveniently handling HTTP communication. First and foremost thanks to the effort Jeroen Ooms put into developing OpenCPU and composing its documentation the whole process is comparatively easy and painfree.

In case you are merely interested in the API interactions, feel free to skip the first three sections. You can also install OpenCPU locally or simply use public.opencpu.org/ocpu. An IPython Notebook listing the successive API calls for public.opencpu.org/ocpu you may find here.

Continue reading

OAuth 2.0 for Google (Analytics) API with Python Explained

oauth2In this tutorial I am going to explain how OAuth 2.0 works and how to apply it for interacting with Google Analytics API using Python. Google provides for that purpose a Python package – which so far only supports Python 2 though … well.

OAuth2 seems to be quite a mess at first and Google’s documentation on this subject is not that well organized in my opinion. So with this article I do my best to save you the sweat I had to invest. After all it’s not that complicated anyway, as you will probably agree.

Continue reading

As a Data Scientist it is my Obligation to support #nobagida, #nopegida and any other #no[a-z]{2}gida today :)

Political Opinion on a Scale from 0 to 2π

nopegida

Just came back with my girlfriend from the demonstration at Sendlinger Tor. Noticed quite a few Palestinian flags being waved around – fair enough – but I thought to myself that I would actually like to see one or two Israeli flags as well. Later we went over the street to have a look at the pegida guys when I noticed no less than two Isareali flags there. That’s was kind of weird … but of course for pegida a lot of their presentation revolves around emphasizing how not-Nazi they are – which is slightly odd given the occasional pegida-israel-flagNeonazi hanging around with them. Also given their focus on how bad muslims are, to those little educated people it might seem plausible to show off how prosemitic they are b/c Jews supposedly share some of their views.

Continue reading

Germans used to have more Sex in Summer!

DE-monthWow – what a headline … okay, I admit it’s phrased quite sensational given that it anticipates just one possible interpretation of increasingly more births around summer / autumn compared to in spring … but I guess I just get more proactive at marketing with every post I publish!

Okay, enough of that – here’s the deal – essentially looking at the monthly variation of the seasonal component of live births by month reveals strong trends indicating more births in the first half of the year towards the 60s relative to towards 2010 and the opposite relation manifests in the second half. Now assuming a standard duration of labour of 9 months we can calculate the maximum-likelihood month of intercourse – and that’s where it gets juicy! BTW – less children being born in spring means less anxious generations ahead – but more on that later.

Continue reading

Hierarchical Clustering with R (feat. D3.js and Shiny)

hclust-shinyAgglomerative hierarchical clustering is a simple, intuitive and well-understood method for clustering data points. I used it with good results in a project to estimate the true geographical position of objects based on measured estimates. With this tutorial I would like to describe the basics of this method, how to implement it in R with hclust and some ideas on how to decide where to cut the tree. This was also a great opportunity for composing anohter Shiny/D3.js app (GitHub, shinyapps.io) – something I wanted to do for a while now. At the end of the text I am writing a bit about what I learned in that regard.

Continue reading

MongoDB – State of the R

mongodbNaturally there are two reasons for why you need to access MongoDB from R:

  1. MongoDB is already used for whatever reason and you want to analyze the data stored therein
  2. You decide you want store your data in MongoDB instead of using native R technology like data.table or data.frame

In-memory data storage like data.table is very fast especially for numerical data, provided the data actually fits into your RAM – but even then MongoDB comes along with a bag of goodies making it a tempting choice for a number of use cases:

  • Flexible schema-less data structures
  • spatial and textual indexing
  • spatial queries
  • persistence of data
  • easily accessible from other languages and systems

In case you would like to learn more about MongoDB then I have good news for you – MongoDB Inc. provides a number of very well made online courses catering to various languages. An overview you may find here.

Continue reading

Twitter’s REST API v1.1 with R (for Linux and Windows)

twitterIn this tutorial I am going to describe a straightforward way of how to make use of Twitter’s REST API v1.1. For that purpose I composed a little package (RTwitterAPI), so that requesting data just needs the API URL, the API parameters and a vector containing the OAuth parameters.

Before you can get started you have to login to your Twitter account on dev.twitter.comcreate an application and generate an “Access Token” for it. So let’s jump right in and fetch IDs of 10 followers of @hrw (Human Rights Watch). The necessary code is located on GitHub as a package named RTwitterAPI which may be installed using devtools::install_github().

Continue reading

Reasonable Inheritance of Cluster Identities in Repetitive Clustering

… or Inferring Identity from Observations

cluster-identityLet’s assume the following application:

A conservation organisation starts a project to geographically catalogue the remaining representatives of an endangered plant species. For that purpose hikers are encouraged to communicate the location of the plant if they encounter it. Due to those hikers using GPS technology ranging from cheap smartphones to highend GPS devices and weather as well as environmental circumstances the measurements are of varying accuracy. The goal of the conservation organisation is to build up a map locating all found plants with an ID assigned to them. Now every time a new location measurement is entered into the system a clustering is applied to identify related measurements – i.e. belonging to the same plant.

Continue reading

Interactive Heatmaps with Google Maps API v3

indiaThanks to the Google Maps API it is pretty easy to code up a small JavaScript to turn a bunch of points into an interactively explorable and lovely looking heatmap. You’re welcome to give it a try on heatmap.joyofdata.de where you can load a CSV to display its contained points. The CSV is supposed to be semicolon delimited and contain at least two columns “lat” and “lon” for the geographical location and an optional third numerical column “weight”. The order does not matter. And of course the parsing is done with Papa Parse – what else!

Continue reading

Parsing a Local CSV File with JavaScript and Papa Parse

In this tutorial I am going to show you how to read a local CSV file using JavaScript and parse it with the Papa Parse library. In case you are interested in a working example then have a look at heatmap.joyofdata.de for which you will find detailed description here.

For whatever reason JavaScript developed into an awesome tool for breathing life into data. But before you can visualize data, you have to read and parse it. JSON is with JavaScript a cinch anyway – and CSVs are now a cinch, too, thanks to Papa Parse – yes, I kind of don’t like the name as well, but let’s be pragmatic. Thanks to HTML5’s rather new File API it became possible to read locally stored files with JavaScript from a browser. Due to security concerns this file access is bound to files that have been opened manually by the user though. That means it is not (yet) possible to simply read arbitrary files from a folder.

Continue reading