Navigating the magical data visualisation forest

Speaker: Dr Margriet Groenendijk from IDM

Margriet is a tech enthusiast at IBM and gave a lightning talk at the Django Bristol Bath meetup as well.

Slides are online here: https://www.slideshare.net/PoleSystematicParisRegion/a-beginners-guide-to-weather-climate-data-margriet-groenendijk

This talk is about using Jupyter notebooks, https://jupyter.org/, for data analysis and visualisations.

NB: These can be run on the desktop and is available in the cloud as well.

Libraries Margriet has used:

PixieDust package, https://github.com/pixiedust/pixiedust is an addon for Jupyter notebooks https://www.ibm.com/cloud/pixiedust

– is an open source package developed by David Taieb and Margriet Groenendijk.  It is a a wrapper around various libraries, which turns into GUI options in the Jupyter notebook.  This is an amazing useful tool for data scientist and others that would like to explore their data without learning as much code and the various differences between each library.

It can load pandas and spark data frames.  It can also load data via URLs, very helpful for cloud based notebooks.

PixieDust provides a serious amount of options and less code for busy people or those exploring data or data science newbies.

seabourne has a nice map based visualisation.

PixieDust integrates with google, mapbox, and seaborne

PixieApps – https://dataplatform.cloud.ibm.com/docs/content/pixiedust/pixieapps.html

The technical lead for PixieDust, David Taieb https://twitter.com/dtaieb55, has also published this book: Thoughtful Data Science – https://www.safaribooksonline.com/library/view/thoughtful-data-science/9781788839969/ 

 

The IPython notebook is for everyone (Gautier Hayoun)

Installation

pip install ipython[notebook]

or

pip install ipython[all]

NB: I had trouble installing and needed to upgrade my version of pip

More info: http://ipython.org/install.html

Commands

ipython notebook
  • it runs in the background
  • it opens in your browser
  • opens a listing of programs in the same folder

What can it do

Each notebook can be divided into different cells

  • a markdown cell for documentation or introductions
  • code cells, can run even run other installed languages such as Perl, Ruby
  • easy to edit multiple lines of code
  • cells share variable scope
  • can also run a limited set of local shell commands e.g. capturing shell command output in a variable
    x = !ls ..;
  • can display HTML in cells

Use cases

Individual exploration

  • play with external data
  • easy to tweak and see output in situ

Suggestions from the audience

  • can use to iterate to play with rewriting code in CPython, as you can see results side by side

Collaboration

  • very readable
  • self contained
  • you can share .ipynb files
  • you can host a server yourself. WARNING there is shell access!  So not suitable for pubic access.

Suggestions

  • SageMathCloud: provides a FREE hosted service, which has been extended to for asynchronous use (e.g. for google docs like collaboration) and for git snap shots
  • PythonAnywhere

Publishing

  • nbviewer serves read only notebooks, nbviewer.ipython.org – is like gist for notebooks
  • can export as HTML, python scripts, various other formats including as presentations

Overall

Our speaker, Gautier Hayoun, wanted to implore that iPython Notebook was for everyone, not only data scientists. It tells a story of your data processing.  Or if can be used to parse server logs or other data for ad hoc queries.

An interesting talk and a fine moustache ;)