Notes on R

R – an interpreted programming environment for statistical computing and graphics

Coming to R from Python and other programming languages, my notes are comparative versus exhaustive. See the following for a good guide.

There is a mechanism for installing R local packages (akin to the usage of virtualenv for isolated Python package installations)

 install.packages

Lists are like dictionaries

$ is a key (like [‘sfsf’] or dot)

data.frames are like pandas

dot just a naming convention

<- and <= assignment

Speed tips:
Vectorise (map/reduce like functions) vs looping in r

Can easily import/export CSV

Space tip – If you need to store large amounts of data, consider using binary vs ASCII or other plain text files.

Can access DBs directly, using data frames

Note to self, look up NETCDF (commonly used as a climate data format)

Nice built in datasets

Very easily generate charts and save the output to jpeg, pdf, etc. See for more options:

Plot.ly

Talk entitled: Collaborative, streaming, 3D, and interactive matplotlib, ggplot2, and MATLAB plots in an IPython Notebook with Plotly by Chris Parmer & Carole Griffiths

The idea behind plot.ly was to bring web standards to graphing and data analysis.

Reason for being: Experiencing teams making graphs in various tools and trying to share it them.  This could literally be a case of working with a dedicated graphing tool and then creating and emailing screen shots, as their colleagues all used different tools.

The team wrote a wrapper for iPython to translate graphs into a plot.ly based graphs.

Graphing news feed- https://plot.ly/feed/#sob

Overview

  • creates shareable d3.js graphs links that are interactive
  • includes plotting streaming data
  • limited by the browser – 50,000 points before browser slows – can get to 200,000 points in some cases using optimisation of rendering overlapping points
  • like github for graphs
  • aimed at small to medium data at present
  • currently around 20 developers – doubling ever few months over the last 6 months
  • 18 month old project
  • can have it installed locally on a private network – paid for model. Free tool, can create a limited number of private graphs, say 20.

Wrappers/APIs

Various wrappers exists – https://plot.ly/api/

  • ggplotly for R
  • matplotlib
  • MATLAB
  • plot.ly – has a spreadsheet like interface online too.
  • any graph can be edited via code or through online GUI

User base

  • data journalists
  • engineers
  • etc

Various output formats, such as svg, png, available as RESTful calla.

Open source library – cached request model – to handle connectivity breaks.

Can pull graphs back into Python. Also can extract JSON output of the data used in the graph.

No limit on the data storage at this stage.  1/2 million rows have been uploaded say

Technology

  • d3.js for the graphs
  • Uses map.js – for adding notations/formulas to graphs
  • Plot.ly is  a layer written on top of d3.js

JavaScript makes this graphing tool available to more people, as the technology is already in their browser.

Roadmap

  • Better data API
  • Adding datasets directly to plot.ly – several graphs referencing a single dataset.
  • Symbolic formulas, to plot transformations say in Physics
  • To keep a trace of the transformations, all the steps to transformation.
  • Improve 3D maps