The difference between conda and pip and how not to break your environment again?
One of the first things I’ve noticed while coming into Machine Learning Engineer role was package management mess. People often ask me “Should I use conda or pip to install packages?”. “Is conda just python with preinstalled packages?”. “I am getting some compiler errors. I thought we’re using just python?!”.
And I did not know what to answer. Everybody is mix and matching words pip, conda and install until it starts working! And it usually works, until it breaks your environment completely. And you do not know why it broke so you remove and install all packages from scratch. I was there. I can guarantee it does not have to be like that!
After you read what pip and conda do and how they work, you won’t be breaking your environment again.
pip
pip (recursive “pip Installs Packages”) is a Python package installer. It downloads and installs packages you want to use. Conda does that as well. The difference is that pip does just that and nothing else!
Pip is very simple:
- it does not support multiple versions of same package installed - that’s why you can’t have to projects using different versions of same package!
- for many packages, it uses your compiler, which could be incompatible - this is where your compiler errors are coming from
- it does not know anything about python versions - python 2, python 3? Python 3.5 or 3.6? It can’t help with that.
conda
conda does far more than pip! It’s made for people like you, like me, people doing machine learning and data science.
- It allows you to use different versions of the same package.
- It gives you the power to use any python version you want!
- It compiles packages before publishing to the package repository so you don’t get compiler errors.
(It actually does far more, but we’ll stop here 🤓)
Conda can do that because is a package manager, but also an environment manager. It isolates different python and package versions so they do not interact with each other.
Use one conda per project to stay away from the trouble!
If you don’t have it installed, go install anaconda!
System that helps me stay away from the trouble is to create one conda environment per project. Whenever you start working on a new project, just run conda env create -n
If you have a specific version of python you want to use, run conda env create -n <project_name> python=3.5
(for version 3.5).
It’s going to install some basic packages and give instructions for activating created environment. This is important to remember. You have to activate the environment explicitly every time you want to use it in new console session. But it’s easy, just type conda activate <project_name>
.
Installing packages
Once you activate the conda environment, you can install any package you want - just type conda install <package_name>
. Same as you would do with pip.
But I often have these once-off notebooks that don’t fit into any project! 🔧
No problem, just create a onceoff environment! (hint: conda env create -n onceoff
). It’s possible you will break this one, but you remove it with conda env remove onceoff. It is much faster than uninstalling all pip packages or all of conda 🍻.
I’ve forgotten the environment name 🤦🏼♂️
conda env list
will list all existing environments.
This setup saved me hours of debugging issues, figuring out it’s the wrong package version and then reinstalling everything from scratch. I’m interested to hear how it works for you! Or if it does not, what does? 🗣
More about the same topic
-
Why requirements.txt isn't enough
-
The minimal conda cheatsheet
-
Overview of python dependency management tools
-
Importing packages in Jupyter notebooks