Yes, you already know Jupyter notebooks enable bad code design. They often have some hidden state, you execute cells out of order so notebooks are often not even runnable from scratch. And engineers running production complain. They often throw your code away and rewrite everything from scratch because it’s not “production code.”
Still, we both know it’s hard to give up on notebooks. They are too useful for exploratory analyses, they help us move faster. Isn’t that a strong enough argument?
I have bad news and good news for you.
Bad news first—they’re right. I know it because I belong to both camps. I’m often using Jupyter Notebooks to experiment on new features or as an interactive console on steroids, but also I have to put all the code in production and make it tick.
Relying on jupyter notebooks to contain the code is difficult and fragile. The interactive stateful notebooks design breaks some fundamentals of how the software is developed.
The good news is that the code you write in notebooks is not tainted. It’s just written without testability and modularity in mind.
Here is a simple way to gradually change that! I’ll also show you how I do it with VsCode and switch between these two very fast.
1. Recognize pieces of code that can be isolated as a separate unit
This is the most difficult part as it requires some experience and domain knowledge. My approach is to start from the bottom, looking at pieces of code that are used in multiple places. Another question that helps is which functions I wish I had implemented by somebody else or a library.
For example, I’ve recently been generating some models based on a dataframe with a bunch of URLs. While preprocessing the data, I wanted to “compress” URLs into URL groups and save it to a new column. That seemed like a good piece to start.
2. Isolate that piece into a function
Once you know what to isolate, create a new function and extract the logic there. Make sure all places in your notebook call the function instead of inlining the logic.
3. Move the function into a .py file
Cut the function from the notebook and paste it into the .py file. Import the file and call the function from there.
4. Turn on autoreload in your notebook
You’re experimenting and it’s possible you’ll want to change this function. Turn on autoreload in your notebook.
Every time you change your .py file, the notebook will get the new code.
5. Bonus points: write simple tests for your function
Nothing really prevents you from adding simple tests in your notebook. The support from testing frameworks is not great, but what I often do is use plain “assert.”
assert square(2) == 4
assert square(3) == 9
There are two benefits from adding tests early, even in this crude format:
- It’s often easy to translate them into the testing framework you use.
- It helps you structure your functions better.
I don’t have a hard rule on whether I keep these tests in a notebook or in the .py file. I’m going to shape them to unittest framework anyway. Moving it to the .py file makes it easier to review the code, though.
How do I switch between Jupyter Notebook UI and VS Code?
This process would be very painful if switching between .py file and Jupyter notebook was slow.
I’ve found using a combination of VsCode for .py files and Jupyter Notebook UI in the browser very efficient. VsCode has support for Jupyter Notebooks, too, but I haven’t found myself equally productive in it yet. In VsCode, I open the directory that contains notebooks and other files (if on a remote machine, I use Remote Development). Here the quick video of it (full screen for better quality).
If you’re using PyCharm, I’ve heard it supports something similar.
Wrapping it up
Writing the production ready code is not what you think of first when you’re experimenting and the code and requirements change a lot. But that does not mean you have to start from scratch after the experimentation phase ends.
Instead, you can use the best of both worlds and start slowly transitioning towards production friendly code as you’re leaving the experimentation phase. First step in doing that is extracting functions into .py files