Dependency management
-
Why requirements.txt isn't enough
If you're using only requirements.txt to manage your dependencies, you're in trouble.
-
The minimal conda cheatsheet
The smallest conda cheatsheet you'll find around.
-
Overview of python dependency management tools
I briefly describe each tool (pip, venv, pip-tools, pyenv, conda, pipenv, poetry and Docker), why it’s created and the problems it’s tackling. You can find a table summarizing all the information and the usual setups people use.
-
Importing packages in Jupyter notebooks
Seeing “ImportError: No module named tensorflow” but you know you installed it? Sometimes you can import packages from the console, but not from the Jupyter notebook? !pip install tensorflow sometimes just does not work?
-
The difference between conda and pip and how not to break your environment again?
One of the first things I’ve noticed while coming into Machine Learning Engineer role was package management mess. People often ask me “Should I use conda or pip to install packages?”. “Is conda just python with preinstalled packages?”. “I am getting some compiler errors. I thought we’re using just python?!”.
AWS SageMaker
-
SageMaker: install Jupyter extensions in restart-proof way
“Every time my notebook shuts down and restarts, I lose notebook extensions and have to reinstall them from the terminal”, my teammate said. Eventually, he gave up reinstalling them.
-
SageMaker: SSH to notebook instances
If you’re using SageMaker as a development machine, you’ll need SSH access to notebook instances sooner or later.
-
SageMaker: save your conda environments after the machine restarts
Are you creating conda environments and installing packages from scratch every time you start a SageMaker machine?
-
SageMaker: automatically stop your instances when idle
If your company is running on AWS, it’s likely that AWS Sagemaker is a central piece of the infrastructure you use daily. It’s fantastic how easy it is to start an instance and get a lot of CPU and GPU resources for your experimentation.
Improve your coding
-
Start structuring your code like a software engineer
You've entered the data scientist role and nobody told you that you actually have to write code like a software engineer? It's classic. CS and data engineers complain that your code isn't written "the right way". Worst of all, you know they're onto something, but nobody can help you apart from saying you'll have to learn software engineering.
-
Transform exploratory Jupyter notebook into production friendly code: step one
Yes, you already know Jupyter notebooks enable bad code design. They often have some hidden state, you execute cells out of order so notebooks are often not even runnable from scratch. And engineers running production complain. They often throw your code away and rewrite everything from scratch because it’s not “production code.”
Other
-
Nuance is difficult to sell on the Internet. Including hybrid work
I’ve had that inkling for a while that it’s difficult to explain the nuance, but also make it appealing on the internet at the same time. It may not be difficult to explain it, but it seems really difficult to sell it.
-
How to make startup scripts for Jupyter kernels reliable?
Running some code whenever your Jupyter notebook starts is handy and easy.
-
Comparison of language identification models
Up-to-date info on language identification libraries usable in production. Accuracy, language coverage, speed and memory consumption. Everything you need as an ML engineer to pick a library quickly.
-
Github Actions: using python version from .python-version file (pyenv)
Github Actions: how to use pyenv's python version - one from .python-version file.
-
Batched backpropagation: connecting the code and the math
If you want to be an effective machine learning engineer, it’s a good idea to understand how frameworks like PyTorch and TensorFlow work. You don’t need to know all the details of building the framework from scratch, but you should be comfortable with building a simple neural network using low-level building blocks.