Machine Learning as a Service – Anand Chitipothu

This talk is about creating a simple end interface for running your machine learning code.

Anand is the co-founder of https://rorodata.com/ a Platform-as-a-Service,
designed for data scientists, for running machine learning code.

Machine learning libraries that were mentioned that I had not come across before:

PyTorch – https://pytorch.org/ – a deep learning framework for fast, flexible experimentation.

joblib – https://pypi.org/project/joblib/ – is a set of tools to provide lightweight pipelining in Python.

The Rorodata firefly tool creates a RESTful API for your client defined functions.

The config format is YAML and not unlike defining a Bitbucket pipeline.

The end user only needs to a Python function, define API with firefly, deploy and an endpoint on the paas is created.

You can add further system requirements to the runtimes available in rorodata.

It also supports configuring CORs domains.

You can define the size and scale of platform your code will be run against.

You can also use the power of rorodata on your own servers or cloud infrastructure using https://github.com/rorodata/rorolite NB: size/scale config are not available but otherwise can use the same code.

Navigating the magical data visualisation forest

Speaker: Dr Margriet Groenendijk from IDM

Margriet is a tech enthusiast at IBM and gave a lightning talk at the Django Bristol Bath meetup as well.

Slides are online here: https://www.slideshare.net/PoleSystematicParisRegion/a-beginners-guide-to-weather-climate-data-margriet-groenendijk

This talk is about using Jupyter notebooks, https://jupyter.org/, for data analysis and visualisations.

NB: These can be run on the desktop and is available in the cloud as well.

Libraries Margriet has used:

PixieDust package, https://github.com/pixiedust/pixiedust is an addon for Jupyter notebooks https://www.ibm.com/cloud/pixiedust

– is an open source package developed by David Taieb and Margriet Groenendijk.  It is a a wrapper around various libraries, which turns into GUI options in the Jupyter notebook.  This is an amazing useful tool for data scientist and others that would like to explore their data without learning as much code and the various differences between each library.

It can load pandas and spark data frames.  It can also load data via URLs, very helpful for cloud based notebooks.

PixieDust provides a serious amount of options and less code for busy people or those exploring data or data science newbies.

seabourne has a nice map based visualisation.

PixieDust integrates with google, mapbox, and seaborne

PixieApps – https://dataplatform.cloud.ibm.com/docs/content/pixiedust/pixieapps.html

The technical lead for PixieDust, David Taieb https://twitter.com/dtaieb55, has also published this book: Thoughtful Data Science – https://www.safaribooksonline.com/library/view/thoughtful-data-science/9781788839969/ 

 

Categorizing Tweets Using Machine Learning – Halide Bey

Code in this talk can be found here https://github.com/halidebey/PyCon2018

Speaker made use of this tool, Kaggle – https://www.kaggle.com/ – the place to do data science projects.

Speaker refers to this source of data

https://www.figure-eight.com/data-for-everyone/

Discussed the approaches to machine learning

Need some information about statistics and algorithms, a such as LogisticRegression.

Is it Shakespeare?

Using Python for authorship attribution in Renaissance drama

A lecturer, Paul Brown, and a student, Katie Jones, present their exploration of analysing old plays in an automated fashion.

The period of interest there are at least a 1/3 of which the authorship is unknown.

A source of early plays online is the Early Book Online tool, https://eebo.chadwyck.com/home

Not common to use their approach to examine the full canon of work.  They turned to Python to look at the treatment of whore and prostitutes during the time period.

 

Getting the Edge with Network Analysis with Alan Nir

Today I am lucky enough to be attending PyCon UK 2018.  I have chosen to get a brief introduction into network analysis.

Alan Nir is giving a high level intro into how network analysis can benefit your work/research.  He is giving a brief introduction and no mathematics and algorithms will be introduced in this session.

My favourite line, a network in just a collection of points & lines.

Computer science use this terminology:

  • points AKA nodes
  • lines – AKA edges, lines

Alan’s demo and recommendation is to start with this Python library, NetworkX.  A comprehensive Python library with a nice API and good documentation which the logic behind the functions. More information can be found here:

Alan created a demo of creating and visualising a network in only a few lines of Python.

You can input data in various formats including tuples, JSON, and pickles.

NB: Alan will share slides later, spoke rather quickly sadly I was sat too far away to read the projected screen

Examples shown where networks showing Karma cheaters & Eurovision nation voters.

Demo data was from a public payment platform based in the US, Venmo.

Question: Why does this platform exist?  Why would you want to publicise your payments?

NB: The Speaker obfuscated the data, why as it is already in the public domain. I suppose reuse rights may be in question.