10 useful extensions for data scientists

5 min


These Jupyter Notebook extensions make the data scientist’s life easier

Every Data Scientist spends most of their time visualizing data, preprocessing it, and tuning the model based on the results. For every data scientist, these are the hardest parts of the process, as a good model can only be obtained if you follow these three steps accurately. And here are 10 very useful Jupyter Notebook extensions to guide you through these steps.


1. Qgrid


Qgrid Is a Jupyter Notebook widget that uses SlickGridto render pandas dataframes in Jupyter Notebook. This allows you to explore your data frames with intuitive scrolling, sorting, and filtering controls, and to edit frames by double-clicking cells.

Installation

pip install qgrid #Installing with pip
conda install qgrid #Installing with conda

2.itables


ITables turns pandas data frames and series into interactive data tables in both your notebooks and their HTML representation. ITables uses simple Javascript, which makes it only work in Jupyter Notebook, not JupyterLab.

Installation


pip install itables

Activate interactive mode for all series and data frames like this:

from itables import init_notebook_mode
init_notebook_mode(all_interactive=True)import world_bank_data as wb

df = wb.get_countries()
df

3. Jupyter DataTables

Data scientists and many developers work with a dataframe every day to interpret the data for processing. The general workflow is to display a dataframe, look at the data schema, and then create some graphs to check how the data is distributed, getting a clearer picture, and perhaps find new data in the table, etc.

But what if these distribution plots were part of a standard data frame and we could quickly search the table with minimal effort? What if this view were the default view?

To draw a table, jupyter-datatables uses jupyter-require

Installation


pip install jupyter-datatables

How to use the extension?


from jupyter_datatables import init_datatables_mode
init_datatables_mode()

4.ipyvolume


ipyvolume helps with Python 3d graphics in Jupyter, using IPython and WebGL as a basis.

Today Ipyvolume can:

  • Do multiple volume rendering.
  • Render scatter plots (up to ~ 1 million glyphs).
  • Draw quiver plots (for example, a scatter, but with an arrow in a certain direction).
  • Supports arbitrary areas that you draw with the mouse.
  • Renders in stereo for virtual reality using Google Cardboard.
  • Animates in a d3 style, for example if the x coordinates or the color of scatter plots change.
  • Animation or sequences, all properties of a scatter plot or quiver plot can be a list of arrays, which in turn can represent snapshots, etc.

Installation


pip install ipyvolume #Installing with pip
conda install -c conda-forge ipyvolume #Installing with conda

5.bqplot


bqplot Is a 2D rendering system for Jupyter, based on the Grammar of Graphics constructs.

Library tasks

  • A complete 2D visualization framework with Python APIs.
  • A robust API to add custom interactions (pan, zoom, select, etc.).

Two APIs presented

  • Users can create custom visualizations using an internal object model inspired by the Gramamr of Graphics constructs (drawing, labels, axes, scales) and enrich their visualization with our interaction layer.
  • Or you can use a context API like Matplotlib’s pyplot, which provides reasonable defaults for most parameters.

Installation


pip install bqplot #Installing with pip
conda install -c conda-forge bqplot #Installing with conda

6.livelossplot

Don’t blindly train deep learning models! Take a look at every era of your learning!

livelossplot provides a real-time loss graph in Jupyter Notebook for Keras, PyTorch and other frameworks models.

Installation


pip install livelossplot

How to use the extension?


from livelossplot import PlotLossesKeras

model.fit(X_train, Y_train,
epochs=10,
validation_data=(X_test, Y_test),
callbacks=[PlotLossesKeras()],
verbose=0)

7. TensorWatch


TensorWatch Is a debugging and visualization tool for data processing, deep learning and knowledge reinforcement from Microsoft Research. The package works in Jupyter Notebook, showing real-time visualizations of your machine learning and performing several other key tasks of analyzing models and data.

Installation


pip install tensorwatch

8. Polyaxon


Polyaxon Is a platform for building, training and monitoring large-scale deep learning applications. We create a system for solving problems of reproducibility, automation and scalability of machine learning applications. Polyaxon is deployed in any data center, hosted by any cloud provider, or can be hosted and operated by Polyaxon, and supports all major deep learning frameworks such as Tensorflow, MXNet, Caffe, Torch, and more.

Installation


pip install -U polyaxon

9.handcalcs


handcalcs Is a library for automatically rendering Python computation code in Latex, but in such a way as to simulate the format of the calculation as if it were written in pencil: write a symbolic formula followed by numeric substitutions, and then the result.

Installation


pip install handcalcs

10.jupyternotify


jupyternotify provides the magic value %% notify, which notifies the user when a potentially lengthy cell has finished using browser push notifications. The use cases include machine learning models that take a long time to train, grid search, or Spark computations. %% notify allows you to jump to another job and get notified the moment your cell shuts down.

Installation


pip install jupyternotify

We hope you find these extensions useful. If you have any useful extensions in mind that were not included in this collection – share them in the comments!

image
Find out the detailshow to get a Level Up in skills and salary or an in-demand profession from scratch by taking SkillFactory online courses with a 40% discount and a promotional code HABR, which will give another + 10% discount on training:


0 Comments

Leave a Reply