A living and ever changing library of useful technologies, libraries and platforms for data science.

Contents

Python

fast.ai

Fast.ai is a machine and deep learning library designed by Jeremy Howard with the intention to allow developers without a maths background to develop world class ML/DL models in the shortest time possible. Built on PyTorch.

Accompanying the library is the fast.ai MOOC, which has been the best ML / DL course I have completed to date.

The fast.ai blog is also an invaluable resource

# Installation
pip install fastai

Fast progress

Simple and flexible progress bar for Jupyter Notebook and console, developed by the fast.ai team

# Installation
pip install fast_progress

Waterfall Charts

Waterfall charts for visualising marginal value contributions using a starting value (bias).

# Installation
pip install waterfallcharts

TPOT

TPOT (Tree-based Pipeline Optimization Tool) is an automated machine learning tool that optimizes machine learning pipelines using genetic programming.

TPOT is built on top of Scikit-learn and it automates the most tedious parts of machine learning like feature selection, model selection, feature construction, etc, by exploring thousands of possible pipelines to find the best one for the data. It then provides you with the Python code for the best pipeline it found for manual exploration and tweaking.

Follow the installation instructions: Installation Guide

# Then install TPOT
pip install tpot

FeatureTools

Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning.

# Installation
pip install featuretools

Then follow the quickstart guide: 5 minute Quick Start

Jupyter nbextensions

The jupyter_contrib_nbextensions package contains a collection of community-contributed unofficial extensions that add functionality to the Jupyter notebook. These extensions are mostly written in Javascript and will be loaded locally in your browser.

# Install Python package
pip install jupyter_contrib_nbextensions

# Install JavaScript and CSS files
jupyter contrib nbextension install --user

Documentation

Missingno

A great Python package to visually display the extent of missing values in a dataset.

# Install Python package
pip install quilt
quilt install ResidentMario/missingno_data

# Import
import missingno as msno

# Bar chart example
msno.bar(collisions.sample(1000))

Documentation

Maths and Statistics

Linear Algebra

Statistics

Machine Learning, Deep Learning and AI