Streamline your Data Science Experience with Jupyter Hub
Improve your workflow with Jupyter Hub, ipykernel and systemd
Scope
As an avid user of Jupyter Notebooks and/or Lab, this guide shows how to automatically start-up Jupyter server allowing you to select your desired Python environment using the web interface and then start coding.
If like me, your typical workflow was to start a terminal, activate your desired Python virtual environment before running jupyter notebook
or jupyterlab
, then this guide should provide an easier and more integrated experience. Its achieved by installing jupyterhub
into a dedicated virtual environment, setting up access to multiple iPython kernels and using a systemd
start-up script (for relevant Linux distributions) to launch jupyterhub
at system start-up.
This setup is not recommended for servers (which already have dedicated guides depending on the number of parallel users); furthermore the guide is for Linux; Mac and Windows users can follow a similar approach with the exception of automatically starting Jupyter Hub.
Introduction
Like many Data Scientists, I utilise the best practice of creating dedicated, controlled and known Python environments, using tools like conda
. Similarly, I’m an avid use of the Jupyter ecosystem, in particular Jupyter Notebooks and sometimes Jupyter Lab. My typical workflow is represented by the top branch in the diagram below:
The bottom branch is the new way of working I have adopted recently, and although on the surface it looks like saving a single step, in reality for someone engaging with Jupyter regularly the improvement in the experience is more significant. The end result is an environment where Jupyter Hub is always running in the background and therefore both Jupyter Notebook and Lab are available on-demand. Furthermore, I do not explicitly activate a particular conda
environment as it’s selectable and changeable directly from the web interface.
I appreciate much of this can be achieved without actually using Jupyter Hub itself, but I find it much easier when all the elements are combined together.
Finally, a number of users will either be running Jupyter in the cloud or a server (either local or remote) to which there are dedicated guides available. The key difference is getting Transport Layer Security (TLS) certificates to verifiably encrypt communication between the client and the server:
Jupyter Hub Set-up
Conda Setup
The guide assumes you have a working conda
environment either with Miniconda or Miniforge using mamba
, which is my recommendation:
Enterprise / commercial users please note the license change from Anaconda regarding using the main
channel for conda
from August 2020, which may restrict the channels you can access.
Create Environment
For regular readers the following setup should be familiar but generally self-explanatory. So in your preferred terminal type the following:
Given the imminent introduction to Python 3.10, I have bumped the Python version I use to the latest 3.8 version (3.8.8 at the time of writing) and also the popular jupyter_contrib_extensions
package including autopep8
.
Starting jupyterhub
at this stage should show the following:
The page can be accessed via http://127.0.0.1:8000 (assuming port 8000 was free) and then use your system login details i.e. use the same username and password you use to login into your Linux distribution.
By default, Jupyter Hub will redirect to the Jupyter Notebook server (http://127.0.0.1:8000/user/<username>/tree) but Lab can be accessed via: http://127.0.0.1:8000/user/<username>/lab as well:
When creating a new notebook, we have an unhelpful “Python 3” display entry, where it’s unclear which conda
environment it refers to. The display is similar with Jupyter Lab:
If it’s unclear to you, then the “Python 3” actually refers to the jupyter
environment created in this guide. The next step is to update the display name and make other conda
environments accessible from the same Jupyter instance.
Jupyter Python Kernels
When you install Jupyter by default it installs the ipython
kernel, but restricted to within the current conda
environment (obviously). The following commands show the current accessible kernels and all the environments on the system:
The snippet above shows that only a single kernel is available in the environment, compared to the five conda
environments on the system in total.
Rename Existing Kernels
The first step is to rename the existing kernel to something easier to recognise. The properties and metadata for each kernel are contained in a dedicated file called kernel.json
. Line 5 in the above snippet shows the path to the file which contains the following:
Line 9 contains the display-name
, which should be changed to “Jupyter (py3.8)” in a text editor or via command line to the following:
Refreshing either Jupyter Notebook or Lab will show the updated description:
The screenshot shows the updated display-name
for the existing kernel, which is now much clearer as to the environment (Jupyter
) and version of Python (py3.8
).
Additional Kernels
We want to use the jupyter
conda
environment (and its Jupyter Hub installation) to access other conda
environments directly from within the web interface. This is detailed in the ipython documentation for access to multiple kernels and demonstrated below:
The example above shows the activation of the ds01
conda
environment and the installation of the ipython
kernel in user space for the environment ( — name
) ds01
with display-name
“ds01 (py3.7)”. The is also now reflected within Jupyter Notebook / Lab:
It should be noted that the shared kernels extend beyond creating a new notebook. An existing notebook based on one kernel can be switched to using a different kernel after it has been loaded meaning that it’s possible for a single project to use multiple environments from a single interface. For example, you can have one environment dedicated to visualisations, another to deep learning and another for documentation etc.
It should also be noted that there is currently no custom sorting option for how the environments are displayed. The default is in alphabetical order, so if you have a preference you an change the order of the environments by using numeric prefixes such as 1_
, 2_
or A_
, B_
etc.
Having set-up the desired easy to use jupyter
environment, the next step is to automatically start Jupyter Hub.
Autoload Jupyter Hub with Systemd
This step is restricted to Linux distributions using systemd
and based on this great Stackoverflow answer with the addition of the Environment
variable:
As a bit of background there are multiple hurdles to overcome in getting jupyterlab
to start on system start-up wth systemd
. Firstly, there is one layer of abstraction with systemd
followed by another layer due to conda
itself. With much trial-and-error these steps permit (at least for Ubuntu 20.04) for the correct conda
environment to be utilised for jupyterhub
to autoload. The required systemd
service file is:
The tag <username>
should be replaced with your username. In essence there are two parts to loading Jupyter Hub at system start:
- Constructing the right service file with the correct commands
- Running the service as
user
instead ofroot
Constructing the Service File
The two difficult components to constructing the service file were setting the correct PATH
variable and setting the correct environmental variables.
The required command for ExecStart
is:
/bin/bash -c 'PATH=/home/<username>/miniconda3/envs/jupyter/bin:$PATH exec jupyterhub'
It contains the absolute path to the shell (/bin/bash
) and a command that updates the PATH
variable to reflect the location of conda
and its binaries, followed by running the jupyterhub
command.
The second part is passing the correct environmental variables; these can be obtained by running the env
command:
The list needs to be converted into a space delimited, single line list instead of a newline delimited list for systemd
. These are then copied across into the service file for the Environment=
parameter.
Enabling and Starting Systemd Service
With systemd
its possible to run without root
privileges by placing the service file as jupyterhub.service
in ~/.config/systemd/user/
. The following commands reloads the list of service files, check it’s status and starts the process for testing purposes:
The first command ensures systemd
loads the service file from the user directory. The status command confirms there are no formatting or parameter errors (which are listed as additional messages in red). To test the service file the start
command allows the user to start the service and can be checked by visiting either http://127.0.0.1:8000 or http://localhost:8000. Running the status
command again shows the output of running jupyterhub
.
These steps confirm that the service file works as intended. To ensure the service starts at the system start i.e. when the laptop starts, the service has to be explicitly enabled. The final test is of course to restart the machine and after logon check that Jupyter Hub is indeed running.
Configure Jupyter Notebook Extensions
There are a number of poplar extensions to the original Jupyter Notebook, which I would recommend. I wont cover this exhaustively as there are many guides available. To access the extensions page, simply click on the NBextensions
tab:
I would recommend the following extensions (you can read about each on by clicking on it):
- Help Panel — yes you can press Shift Tab but this provides a good backup
- Live Markdown Preview — view the markdown render as you type
- ExecuteTime — the most useful feature is knowing the run time of a command
- Scratchpad — press CTRL+B to get a temporary environment to run locals for example
- Skip-Traceback — make error messages more manageable
- Autopep8 — easily format code cells; I use it primarily for cleaning up manually typed lists with the hammer button
- Table of Contents — can display a side menu of all the headers in a notebook
Summary
This guide presents three components to enable a better Jupyter Notebook or Lab experience. The first step was the installation of Jupyter Hub into a dedicated conda
environment. The second step was renaming the existing kernel
to be more obvious and sharing other kernels from different conda
environments with appropriate display-name
. The final step is activating the new conda
environment and running jupyterhub
automatically using systemd
. As a bonus, some basic extensions were also recommended for Jupyter Notebook.
The end result is an easy to use web based environment that allows easy switching between conda
environments.
Attribution
All gists
, notebooks and terminal casts are by the author. All of the artwork is based on assets explicitly CC0, Public Domain license or SIL OFL and is therefore non-infringing. Theme is inspired by and based on my favourite vim
theme: Gruvbox.