Streamline your Data Science Experience with Jupyter Hub

Improve your workflow with Jupyter Hub, ipykernel and systemd

Ashraf Miah
9 min readApr 4, 2021

Scope

As an avid user of Jupyter Notebooks and/or Lab, this guide shows how to automatically start-up Jupyter server allowing you to select your desired Python environment using the web interface and then start coding.

If like me, your typical workflow was to start a terminal, activate your desired Python virtual environment before running jupyter notebook or jupyterlab, then this guide should provide an easier and more integrated experience. Its achieved by installing jupyterhub into a dedicated virtual environment, setting up access to multiple iPython kernels and using a systemd start-up script (for relevant Linux distributions) to launch jupyterhub at system start-up.

This setup is not recommended for servers (which already have dedicated guides depending on the number of parallel users); furthermore the guide is for Linux; Mac and Windows users can follow a similar approach with the exception of automatically starting Jupyter Hub.

Image by the Author | See below for full Attribution details

Introduction

Like many Data Scientists, I utilise the best practice of creating dedicated, controlled and known Python environments, using tools like conda. Similarly, I’m an avid use of the Jupyter ecosystem, in particular Jupyter Notebooks and sometimes Jupyter Lab. My typical workflow is represented by the top branch in the diagram below:

Jupyter Workflow | by the Author

The bottom branch is the new way of working I have adopted recently, and although on the surface it looks like saving a single step, in reality for someone engaging with Jupyter regularly the improvement in the experience is more significant. The end result is an environment where Jupyter Hub is always running in the background and therefore both Jupyter Notebook and Lab are available on-demand. Furthermore, I do not explicitly activate a particular conda environment as it’s selectable and changeable directly from the web interface.

I appreciate much of this can be achieved without actually using Jupyter Hub itself, but I find it much easier when all the elements are combined together.

Finally, a number of users will either be running Jupyter in the cloud or a server (either local or remote) to which there are dedicated guides available. The key difference is getting Transport Layer Security (TLS) certificates to verifiably encrypt communication between the client and the server:

Jupyter Hub Set-up

Conda Setup

The guide assumes you have a working conda environment either with Miniconda or Miniforge using mamba, which is my recommendation:

Enterprise / commercial users please note the license change from Anaconda regarding using the main channel for conda from August 2020, which may restrict the channels you can access.

Create Environment

For regular readers the following setup should be familiar but generally self-explanatory. So in your preferred terminal type the following:

Conda environment for Jupyter Hub | by the Author

Given the imminent introduction to Python 3.10, I have bumped the Python version I use to the latest 3.8 version (3.8.8 at the time of writing) and also the popular jupyter_contrib_extensions package including autopep8.

Starting jupyterhub at this stage should show the following:

Screenshot of the Jupyter Hub Login Page | by the Author

The page can be accessed via http://127.0.0.1:8000 (assuming port 8000 was free) and then use your system login details i.e. use the same username and password you use to login into your Linux distribution.

By default, Jupyter Hub will redirect to the Jupyter Notebook server (http://127.0.0.1:8000/user/<username>/tree) but Lab can be accessed via: http://127.0.0.1:8000/user/<username>/lab as well:

Screenshot of Jupyter Notebook Server with New Notebook Menu | by the Author

When creating a new notebook, we have an unhelpful “Python 3” display entry, where it’s unclear which conda environment it refers to. The display is similar with Jupyter Lab:

Screenshot of Jupyter Lab with New Notebook Menu | by the Author

If it’s unclear to you, then the “Python 3” actually refers to the jupyter environment created in this guide. The next step is to update the display name and make other conda environments accessible from the same Jupyter instance.

Jupyter Python Kernels

When you install Jupyter by default it installs the ipython kernel, but restricted to within the current conda environment (obviously). The following commands show the current accessible kernels and all the environments on the system:

List of Accessible Kernels | by the Author

The snippet above shows that only a single kernel is available in the environment, compared to the five conda environments on the system in total.

Rename Existing Kernels

The first step is to rename the existing kernel to something easier to recognise. The properties and metadata for each kernel are contained in a dedicated file called kernel.json. Line 5 in the above snippet shows the path to the file which contains the following:

Kernel JSON file | by the Author

Line 9 contains the display-name, which should be changed to “Jupyter (py3.8)” in a text editor or via command line to the following:

Updated Kernel JSON file | by the Author

Refreshing either Jupyter Notebook or Lab will show the updated description:

Screenshot of Jupyter Lab with updated description | by the Author

The screenshot shows the updated display-name for the existing kernel, which is now much clearer as to the environment (Jupyter) and version of Python (py3.8).

Additional Kernels

We want to use the jupyter conda environment (and its Jupyter Hub installation) to access other conda environments directly from within the web interface. This is detailed in the ipython documentation for access to multiple kernels and demonstrated below:

Adding ds01 to shard kernel list | by the Author

The example above shows the activation of the ds01 conda environment and the installation of the ipython kernel in user space for the environment ( — name) ds01 with display-name “ds01 (py3.7)”. The is also now reflected within Jupyter Notebook / Lab:

Screenshot of Jupyter Lab with ds01 environment added | by the Author

It should be noted that the shared kernels extend beyond creating a new notebook. An existing notebook based on one kernel can be switched to using a different kernel after it has been loaded meaning that it’s possible for a single project to use multiple environments from a single interface. For example, you can have one environment dedicated to visualisations, another to deep learning and another for documentation etc.

It should also be noted that there is currently no custom sorting option for how the environments are displayed. The default is in alphabetical order, so if you have a preference you an change the order of the environments by using numeric prefixes such as 1_, 2_ or A_, B_ etc.

Having set-up the desired easy to use jupyter environment, the next step is to automatically start Jupyter Hub.

Autoload Jupyter Hub with Systemd

This step is restricted to Linux distributions using systemd and based on this great Stackoverflow answer with the addition of the Environment variable:

As a bit of background there are multiple hurdles to overcome in getting jupyterlab to start on system start-up wth systemd. Firstly, there is one layer of abstraction with systemd followed by another layer due to conda itself. With much trial-and-error these steps permit (at least for Ubuntu 20.04) for the correct conda environment to be utilised for jupyterhub to autoload. The required systemd service file is:

Systemd Service file for Jupyter Hub | by the Author

The tag <username> should be replaced with your username. In essence there are two parts to loading Jupyter Hub at system start:

  1. Constructing the right service file with the correct commands
  2. Running the service as user instead of root

Constructing the Service File

The two difficult components to constructing the service file were setting the correct PATH variable and setting the correct environmental variables.

The required command for ExecStart is:

/bin/bash -c 'PATH=/home/<username>/miniconda3/envs/jupyter/bin:$PATH exec jupyterhub'

It contains the absolute path to the shell (/bin/bash) and a command that updates the PATH variable to reflect the location of conda and its binaries, followed by running the jupyterhub command.

The second part is passing the correct environmental variables; these can be obtained by running the env command:

List of Conda related Variables | by the Author

The list needs to be converted into a space delimited, single line list instead of a newline delimited list for systemd. These are then copied across into the service file for the Environment= parameter.

Enabling and Starting Systemd Service

With systemd its possible to run without root privileges by placing the service file as jupyterhub.service in ~/.config/systemd/user/. The following commands reloads the list of service files, check it’s status and starts the process for testing purposes:

Systemd Service Check | by the Author

The first command ensures systemd loads the service file from the user directory. The status command confirms there are no formatting or parameter errors (which are listed as additional messages in red). To test the service file the start command allows the user to start the service and can be checked by visiting either http://127.0.0.1:8000 or http://localhost:8000. Running the status command again shows the output of running jupyterhub.

These steps confirm that the service file works as intended. To ensure the service starts at the system start i.e. when the laptop starts, the service has to be explicitly enabled. The final test is of course to restart the machine and after logon check that Jupyter Hub is indeed running.

Configure Jupyter Notebook Extensions

There are a number of poplar extensions to the original Jupyter Notebook, which I would recommend. I wont cover this exhaustively as there are many guides available. To access the extensions page, simply click on the NBextensions tab:

Screenshot of Nbextensions tab | by the Author

I would recommend the following extensions (you can read about each on by clicking on it):

  • Help Panel — yes you can press Shift Tab but this provides a good backup
  • Live Markdown Preview — view the markdown render as you type
  • ExecuteTime — the most useful feature is knowing the run time of a command
  • Scratchpad — press CTRL+B to get a temporary environment to run locals for example
  • Skip-Traceback — make error messages more manageable
  • Autopep8 — easily format code cells; I use it primarily for cleaning up manually typed lists with the hammer button
  • Table of Contents — can display a side menu of all the headers in a notebook
Screenshot of Configured Notebook Extensions | by the Author

Summary

Jupiter and one of its moons — Io | Image by flflflfl from Pixalbay

This guide presents three components to enable a better Jupyter Notebook or Lab experience. The first step was the installation of Jupyter Hub into a dedicated conda environment. The second step was renaming the existing kernel to be more obvious and sharing other kernels from different conda environments with appropriate display-name. The final step is activating the new conda environment and running jupyterhub automatically using systemd. As a bonus, some basic extensions were also recommended for Jupyter Notebook.

The end result is an easy to use web based environment that allows easy switching between conda environments.

Attribution

All gists , notebooks and terminal casts are by the author. All of the artwork is based on assets explicitly CC0, Public Domain license or SIL OFL and is therefore non-infringing. Theme is inspired by and based on my favourite vim theme: Gruvbox.

--

--

Ashraf Miah

CTO, Data Scientist & Chartered Engineer (MEng CEng EUR ING MRAeS) with over 20 years experience in the Aerospace, Rail & Energy Industry.