SageMaker: install Jupyter extensions in restart-proof way
“Every time my notebook shuts down and restarts, I lose notebook extensions and have to reinstall them from the terminal”, my teammate said. Eventually, he gave up reinstalling them.
This might be fine if you’re using SageMaker occasionally, but if you’re using it every day like he and I do, it’s a bummer. It will slow you down, you won’t use the Jupyter setup you like.
This guide will show you how to install Jupyter (and JupyterLab) extensions and make them stay after notebook instance restarts. Like other guides on SageMaker, it’ll take just a few minutes to set it up.
Setup
The setup does not really make Jupyter extensions “stay” after notebook instance restarts. It installs them every time your instance boots. The effect is the same, though. You get your favorite jupyter extensions.
Preparing lifecycle configuration
To make sure we have the place to inject the installation commands, we’ll have to make sure there is a lifecycle configuration attached to it. No worries, here are steps to help with that.
- Login to AWS console
- Stop your notebook instance (and wait for the instance to stop)
- Make sure you have jq tool installed
-
Copy the following code into your terminal (on your computer, not SageMaker).
If you know there is one and you know the name, you can just fill
CONFIGURATION_NAME
variables and skip to configuring the extensions to install.# fill in the instance name here INSTANCE_NAME="team-ml-mario"
CONFIGURATION_NAME=$(aws sagemaker describe-notebook-instance --notebook-instance-name "${INSTANCE_NAME}" | jq -e '.NotebookInstanceLifecycleConfigName | select (.!=null)' | tr -d '"') echo "Configuration \"$CONFIGURATION_NAME\" attached to notebook instance $INSTANCE_NAME"
if [[ -z "$CONFIGURATION_NAME" ]]; then # there is no attached configuration name, create a new one CONFIGURATION_NAME="better-sagemaker" echo "Creating new configuration $CONFIGURATION_NAME..." aws sagemaker create-notebook-instance-lifecycle-config \ --notebook-instance-lifecycle-config-name "$CONFIGURATION_NAME" \ --on-start Content=$(echo '#!/usr/bin/env bash'| base64) \ --on-create Content=$(echo '#!/usr/bin/env bash' | base64) # attaching lifecycle configuration to the notebook instance echo "Attaching configuration $CONFIGURATION_NAME to ${INSTANCE_NAME}..." aws sagemaker update-notebook-instance \ --notebook-instance-name "$INSTANCE_NAME" \ --lifecycle-config-name "$CONFIGURATION_NAME" fi
That’s it, we just have to define which extensions to install.
Configuring the extensions to install
Now we have to define a code to install your extensions every time.
-
Copy the code and fill the
EXTENSION_NAME
variable with the name of jupyter extension andPIP_PACKAGE_NAME
with the name of pip package.For example, git extension for JupyterLab is named “jupyterlab_git” and pip package name is “jupyterlab-git”.
export PIP_PACKAGE_NAME="jupyterlab-git" export EXTENSION_NAME="jupyterlab_git"
echo "Downloading on-start.sh..." # save the existing on-start script into on-start.sh aws sagemaker describe-notebook-instance-lifecycle-config --notebook-instance-lifecycle-config-name "$CONFIGURATION_NAME" | jq '.OnStart[0].Content' | tr -d '"' | base64 --decode > on-start.sh
echo "Adding extenstions install to on-start.sh..." echo '' >> on-start.sh echo '# install jupyter extension' >> on-start.sh echo "export PIP_PACKAGE_NAME=\"${PIP_PACKAGE_NAME}\"" >> on-start.sh echo "export EXTENSION_NAME=\"${EXTENSION_NAME}\"" >> on-start.sh echo 'curl https://raw.githubusercontent.com/mariokostelac/sagemaker-setup/master/scripts/install-server-extension/on-start.sh | bash' >> on-start.sh
echo "Uploading on-start.sh..." # update the lifecycle configuration config with updated on-start.sh script aws sagemaker update-notebook-instance-lifecycle-config \ --notebook-instance-lifecycle-config-name "$CONFIGURATION_NAME" \ --on-start Content="$((cat on-start.sh)| base64)"
- Repeat step 1 for every extension you want to install.
If you want to install nb extensions or JupyterLab extensions, use this code for nb extension and this code for JupyterLab extensions.
Why do my jupyter extensions disappear at all?
After all this, why did we have to go through all the effort? Why do these extensions disappear?
The way AWS set up SageMaker is to save just files located in ~/SageMaker. Everything else is created from scratch, every time you boot your notebook instance.
Jupyter extensions don’t get installed there. They are installed in the JupyterSystemEnv conda environment, which is outside that directory.
Troubleshooting
My instance is failing to start now
It’s likely you’ve put the name of pip or extension package wrong. Follow steps below and remove the installation code.
Uninstalling extensions
Follow these steps:
-
Run this to download the on-start.sh script
echo "Downloading on-start.sh..." # save the existing on-start script into on-start.sh aws sagemaker describe-notebook-instance-lifecycle-config --notebook-instance-lifecycle-config-name "$CONFIGURATION_NAME" | jq '.OnStart[0].Content' | tr -d '"' | base64 --decode > on-start.sh
- Open on-start.sh in your editor and remove scripts extensions that you do not want.
-
Run this to upload changed on-start.sh.
echo "Uploading on-start.sh..." # update the lifecycle configuration config with updated on-start.sh script aws sagemaker update-notebook-instance-lifecycle-config \ --notebook-instance-lifecycle-config-name "$CONFIGURATION_NAME" \ --on-start Content="$((cat on-start.sh)| base64)"
More about the same topic
-
How to make startup scripts for Jupyter kernels reliable?
-
SageMaker: SSH to notebook instances
-
SageMaker: save your conda environments after the machine restarts
-
SageMaker: automatically stop your instances when idle