Plot.ly

Talk entitled: Collaborative, streaming, 3D, and interactive matplotlib, ggplot2, and MATLAB plots in an IPython Notebook with Plotly by Chris Parmer & Carole Griffiths

The idea behind plot.ly was to bring web standards to graphing and data analysis.

Reason for being: Experiencing teams making graphs in various tools and trying to share it them.  This could literally be a case of working with a dedicated graphing tool and then creating and emailing screen shots, as their colleagues all used different tools.

The team wrote a wrapper for iPython to translate graphs into a plot.ly based graphs.

Graphing news feed- https://plot.ly/feed/#sob

Overview

  • creates shareable d3.js graphs links that are interactive
  • includes plotting streaming data
  • limited by the browser – 50,000 points before browser slows – can get to 200,000 points in some cases using optimisation of rendering overlapping points
  • like github for graphs
  • aimed at small to medium data at present
  • currently around 20 developers – doubling ever few months over the last 6 months
  • 18 month old project
  • can have it installed locally on a private network – paid for model. Free tool, can create a limited number of private graphs, say 20.

Wrappers/APIs

Various wrappers exists – https://plot.ly/api/

  • ggplotly for R
  • matplotlib
  • MATLAB
  • plot.ly – has a spreadsheet like interface online too.
  • any graph can be edited via code or through online GUI

User base

  • data journalists
  • engineers
  • etc

Various output formats, such as svg, png, available as RESTful calla.

Open source library – cached request model – to handle connectivity breaks.

Can pull graphs back into Python. Also can extract JSON output of the data used in the graph.

No limit on the data storage at this stage.  1/2 million rows have been uploaded say

Technology

  • d3.js for the graphs
  • Uses map.js – for adding notations/formulas to graphs
  • Plot.ly is  a layer written on top of d3.js

JavaScript makes this graphing tool available to more people, as the technology is already in their browser.

Roadmap

  • Better data API
  • Adding datasets directly to plot.ly – several graphs referencing a single dataset.
  • Symbolic formulas, to plot transformations say in Physics
  • To keep a trace of the transformations, all the steps to transformation.
  • Improve 3D maps

The IPython notebook is for everyone (Gautier Hayoun)

Installation

pip install ipython[notebook]

or

pip install ipython[all]

NB: I had trouble installing and needed to upgrade my version of pip

More info: http://ipython.org/install.html

Commands

ipython notebook
  • it runs in the background
  • it opens in your browser
  • opens a listing of programs in the same folder

What can it do

Each notebook can be divided into different cells

  • a markdown cell for documentation or introductions
  • code cells, can run even run other installed languages such as Perl, Ruby
  • easy to edit multiple lines of code
  • cells share variable scope
  • can also run a limited set of local shell commands e.g. capturing shell command output in a variable
    x = !ls ..;
  • can display HTML in cells

Use cases

Individual exploration

  • play with external data
  • easy to tweak and see output in situ

Suggestions from the audience

  • can use to iterate to play with rewriting code in CPython, as you can see results side by side

Collaboration

  • very readable
  • self contained
  • you can share .ipynb files
  • you can host a server yourself. WARNING there is shell access!  So not suitable for pubic access.

Suggestions

  • SageMathCloud: provides a FREE hosted service, which has been extended to for asynchronous use (e.g. for google docs like collaboration) and for git snap shots
  • PythonAnywhere

Publishing

  • nbviewer serves read only notebooks, nbviewer.ipython.org – is like gist for notebooks
  • can export as HTML, python scripts, various other formats including as presentations

Overall

Our speaker, Gautier Hayoun, wanted to implore that iPython Notebook was for everyone, not only data scientists. It tells a story of your data processing.  Or if can be used to parse server logs or other data for ad hoc queries.

An interesting talk and a fine moustache ;)

Keynote speaker Friday

Notes from Friday’s keynote speech at PyConUK

Used in animation to move assets

Used as a teaching language

What are the issues

Some embedded systems will never upgrade to Python 3.

Interoperability:

Ability to wrap and call other specialist languages like fortress, c++.

We have to continue to be able to call into other libraries of new languages via wrappers etc.

Ability to hack

Look up the birth and death of javascript

Look at writer of django talking about javascript

Competition Go

Neat built in threaded support with channels etc

gc?

Feels like Python paired down,  aimed to be low level

gofmt

Gorun so fast it almost feels like not a compiled language

However,  read about eventloop or goloop ?

Deploys neatly

Openstack parts rewriten in go, faster less memory

We need to invest in improving deployment

Dev8D 2011 – day 2

I attended Dexy and Molly workshops and Ask the Experts discussions.

Ask the Experts about how to go about Dealing with dirty data

Top tips

  • Google Refine – CSV clean up; output into other formats
  • AntiWord – Word formats to plain text converter
  • FMT (formatting)
  • Beautiful soup (python) – scrapper
  • Scrapperwiki – remember this can be useful – can be used like a remote data store
  • Python unicodedata.normalize – to format data into normal form C – flatten aceented characters
  • Mozilla has auto detect character encoding tools

Dev8D 2011 – day 1

For the second year I have attended Dev8D, the JISC funded developer conference.  An excellent opportunity to network with other developers in Higher Education and to learn about new technologies.

Below are some highlights of the sessions and talks I attended on day 1.

Blackboard

About

  • written in Java
  • had a stable API since 2001
  • many recent API and plugin development improvements
  • direct database querying is now licensed
  • entity relationship diagram is published
  • lots of existing plugins, including text/SMS services
  • plugin exposure can be targeted to specific users
  • supports plugins not written in alternative languages
  • uses SOAP but a Newcastle chap has created a REST API
  • CourseSites.com offers 5 free courses (supports subset of OpenID like providers)
  • granular security policies for web service API – down to the per function  level

General VLE tit bits

More VLEs are moving towards using *IMS common cartridges* providing greater interoperability

Look into further

  • LIS
  • SIS
  • OCELOT community
  • SCORM stands for Sharable Content Object Reference Model
  • IMS LTI (Learning Tools Interoperability) standard uses SOAP

Lightening talks

See: http://data.dev8d.org/2011/programme/session-type.php?type=http://data.dev8d.org/2011/programme/dev8d_programme.rdf%23sessiontype-Lightning

Molly

  • feed aggregator
  • HE produced using a sustainable approach
  • Python/Django
  • Opensource
  • XML
  • JSON
  • HTML5
  • Format X – future formats
  • can still target handsets – smart/feature phone
  • very inclusive
  • long term support, University of Oxford has committed 2.5 FTE to project
See related application: MyMobile Bristol

www.dreamspark.com

  • Free Microsoft professional tools for students.
  • .NET

Naturelocator

The JISC funded “Nature Locator” project will help the researchers by creating mobile applications that provide geo-tagged photographs, and visualisation tools to facilitate crowd-sourced verification of the data submitted during 2011

Related technologies

  • Titanium – translates your hard won web skills into native applications that perform and look just like they were written in Objective-C [iPhone and iPad] or Java [Android].
  • PhoneGap – an opensource mobile phone framework using HTML5, CSS3, JavaScript and Cloud deployment

All your bases belong to us: L10N & L18N @ Dev8D

Malte Ressin discusses his Phd research at Thames Valley University and the issues that affect internationalised and localised projects
  • pluralisation (2 kinds in Russian)
  • sort order (in Spanish ll is one letter not double l)
  • affects many areas: sales, UI design, legal, marketing, publisher, …
  • early research required to discover any cultural content guidelines
Contact Malte Ressin if interested in being part of his case studies

What makes Dexy so Sexy for creating beautiful code documentation?

Dexy is an open source document automation tool that can help you create documents using your favourite programming languages and your favourite software.
  • Output formats include: CSV, latex, PDF, and WordPress posts
  • Can incorporate testing output (cucumber or was that water??)
  • Can automate screen shots – part of implementation of the test
  • Python command line tool

Lucero

Linking University Content for Education and Research Online
http://lucero-project.info/lb/

Look up

Uses for RDFa to promote details for pricing info in Google

Archives Hub Data and APIs

  • union catalogue
  • could be institutional records, …
  • use EAD (Encoded Archival Description – XML)
  • provides searches/indexes of the archives
  • uses CQL (Contextual Query Language)

LOCAH

  • project using Linked Data
  • related to COPAC data

Look up related projects

DBpedia – a community effort to allow you to ask sophisticated queries against Wikipedia

Molly 1

About

  • feed aggregator
  • Opensource
  • Python >= 2.6 < 3
  • Django 1.2
  • PostgreSQL but others DB could be used
  • mobile web vs native
  • could be just used as an aggregation service not for mobile web
  • produced by University of Oxford
  • nearing version 1.0
  • akin to Java based MyMobile Bristol application developed by ILRT, University of Bristol during the same period

Features

  • Geodata including Open Street Map
  • nearest real time bus info (Oxford specific)
  • contacts via LDAP
  • library search
  • maps – well targeted for feature (non smart) phones
  • podcasts
  • feedback/voting
  • url shortener
  • QR codes (3D barcodes)
  • oAuth
  • batch processing
  • easy to override blocks of templates and media
  • easy to plug in different data providers
  • installer will ask most setting questions
University of Oxford runs on Ubutntu 10.04 LTS VMWare ESXi, 2GB RAM
Also learnt: oAuth supports disabling device access from source application.  Useful if a device is lost.