Plot.ly

Talk entitled: Collaborative, streaming, 3D, and interactive matplotlib, ggplot2, and MATLAB plots in an IPython Notebook with Plotly by Chris Parmer & Carole Griffiths

The idea behind plot.ly was to bring web standards to graphing and data analysis.

Reason for being: Experiencing teams making graphs in various tools and trying to share it them.  This could literally be a case of working with a dedicated graphing tool and then creating and emailing screen shots, as their colleagues all used different tools.

The team wrote a wrapper for iPython to translate graphs into a plot.ly based graphs.

Graphing news feed- https://plot.ly/feed/#sob

Overview

  • creates shareable d3.js graphs links that are interactive
  • includes plotting streaming data
  • limited by the browser – 50,000 points before browser slows – can get to 200,000 points in some cases using optimisation of rendering overlapping points
  • like github for graphs
  • aimed at small to medium data at present
  • currently around 20 developers – doubling ever few months over the last 6 months
  • 18 month old project
  • can have it installed locally on a private network – paid for model. Free tool, can create a limited number of private graphs, say 20.

Wrappers/APIs

Various wrappers exists – https://plot.ly/api/

  • ggplotly for R
  • matplotlib
  • MATLAB
  • plot.ly – has a spreadsheet like interface online too.
  • any graph can be edited via code or through online GUI

User base

  • data journalists
  • engineers
  • etc

Various output formats, such as svg, png, available as RESTful calla.

Open source library – cached request model – to handle connectivity breaks.

Can pull graphs back into Python. Also can extract JSON output of the data used in the graph.

No limit on the data storage at this stage.  1/2 million rows have been uploaded say

Technology

  • d3.js for the graphs
  • Uses map.js – for adding notations/formulas to graphs
  • Plot.ly is  a layer written on top of d3.js

JavaScript makes this graphing tool available to more people, as the technology is already in their browser.

Roadmap

  • Better data API
  • Adding datasets directly to plot.ly – several graphs referencing a single dataset.
  • Symbolic formulas, to plot transformations say in Physics
  • To keep a trace of the transformations, all the steps to transformation.
  • Improve 3D maps

The IPython notebook is for everyone (Gautier Hayoun)

Installation

pip install ipython[notebook]

or

pip install ipython[all]

NB: I had trouble installing and needed to upgrade my version of pip

More info: http://ipython.org/install.html

Commands

ipython notebook
  • it runs in the background
  • it opens in your browser
  • opens a listing of programs in the same folder

What can it do

Each notebook can be divided into different cells

  • a markdown cell for documentation or introductions
  • code cells, can run even run other installed languages such as Perl, Ruby
  • easy to edit multiple lines of code
  • cells share variable scope
  • can also run a limited set of local shell commands e.g. capturing shell command output in a variable
    x = !ls ..;
  • can display HTML in cells

Use cases

Individual exploration

  • play with external data
  • easy to tweak and see output in situ

Suggestions from the audience

  • can use to iterate to play with rewriting code in CPython, as you can see results side by side

Collaboration

  • very readable
  • self contained
  • you can share .ipynb files
  • you can host a server yourself. WARNING there is shell access!  So not suitable for pubic access.

Suggestions

  • SageMathCloud: provides a FREE hosted service, which has been extended to for asynchronous use (e.g. for google docs like collaboration) and for git snap shots
  • PythonAnywhere

Publishing

  • nbviewer serves read only notebooks, nbviewer.ipython.org – is like gist for notebooks
  • can export as HTML, python scripts, various other formats including as presentations

Overall

Our speaker, Gautier Hayoun, wanted to implore that iPython Notebook was for everyone, not only data scientists. It tells a story of your data processing.  Or if can be used to parse server logs or other data for ad hoc queries.

An interesting talk and a fine moustache ;)