A Python 2 based Machine Learning Virtual Environment
Régis KLA, April 2017
1. Introduction
This article introduces a very simple and quick way to create a special [1] virtual environment for Python for your Machine Learning (ML) Proof-Of-Concept (PoC).
The method presented here is not direct and may seem weird to the reader ; however it is safe and it works even on some “oldies” (but very stable) Linux distributions like Debian where the available packages are often outdated and create many conflicts that can be very difficult to solve.
One good example of such incompatibilities is the conflict between the version of the virtualenv package and the numpy packages of a given Debian Release. To avoid this, the following layered architecture is proposed:
- layer 0 : create on top of the OS, a virtual environment implemented with conda [2] (from Anaconda 2 package [Tea16]), and
- layer 1 : on top of the conda venv, install the (special) packages (e.g. in the context of this paper, they are the ones recommanded for doing ML with Python 2.7): matplotlib, scipy, numpy, scikit-learn, keras.
The advanced reader should wonder now: what is weird in this architecture? Nothing!?! You’re right; but wait a minute and discover how the layer 1 is implemented in practice...
2. Layer 0: Create the Conda Virtual Environment¶
Consider the following parameters we want to reach:
- target directory: ${HOME}/workspace/python_venvs/ml-p2.7-conda-venv
- Python interpreter: 2.7
One expects that all the virtual environment is content (e.g. boxed) within the aforementioned directory. And when activated the Python 2.7 interpreter is available.
2.1. Prepare the system
Some OS level packages will be required by Anaconda 2. You can simply install the corresponding packages for your distribution package manager. For this paper, I use a Debian Jessie or Debian 8 distribution.
$ sudo apt-cache install build-essential python-dev
In the previous listing, the packages build-essential and python-dev are installed on the system. Please adapt the commands to your own system.
We also need to install Anaconda 2 from sources (it is safer) and use its built-in Python interpreter as the default one. Follow the instructions provided at [Tea16].
Now, let us assume the installation of Anaconda 2 is done and the install directory is: /opt/anaconda2; then one should have:
root@myhost:~# ls -l /opt/anaconda2/ total 100 drwxr-xr-x 2 root root 12288 avril 11 2016 bin drwxr-xr-x 2 root root 12288 avril 11 2016 conda-meta drwxr-xr-x 2 root root 4096 avril 11 2016 envs drwxr-xr-x 3 root root 4096 avril 11 2016 etc drwxr-xr-x 3 root root 4096 avril 11 2016 Examples drwxr-xr-x 4 root root 4096 avril 11 2016 imports drwxr-xr-x 38 root root 4096 avril 11 2016 include drwxr-xr-x 10 root root 12288 avril 11 2016 lib -rw-rw-r-- 1 root root 4524 févr. 4 2016 LICENSE.txt drwxr-xr-x 111 root root 4096 avril 11 2016 mkspecs drwxr-xr-x 178 root root 12288 avril 11 2016 pkgs drwxr-xr-x 13 root root 4096 avril 11 2016 plugins drwxr-xr-x 12 root root 4096 avril 11 2016 share drwxr-xr-x 3 root root 4096 avril 11 2016 ssl drwxr-xr-x 3 root root 4096 avril 11 2016 tests drwxr-xr-x 3 root root 4096 juin 30 2016 var
When done properly, your /etc/profile should resemble to something like this:
... export ANACONDA2_HOME="/opt/anaconda2" export PATH="$ANACONDA2_HOME/bin:$PATH" ...
Please notice how the $ANACONDA2_HOME/bin value comes before the $PATH one. This guarantees that the binaries of Anaconda 2 will appear first in the search road for the command.
Test the installation as follows:
$ source /etc/profile $ . ~/.bashrc $ python --version Python 2.7.11 :: Anaconda 4.0.0 (64-bit)
2.2. Create the Virtual Environment
The following packages have to be installed:
$ sudo apt-get install virtualenv virtualenvwrapper
Now one can (really) create the conda virtual environment which can be considered as an enhanced version of the traditional virtualenv:
$ conda create --prefix=${HOME}/workspace/python_venvs/ml-p2.7-conda-venv matplotlib scipy numpy
... (may take some time)
The previous script will create the conda virtual environment and after will install automatically the 3 specified packages: matplotlib, scipy, and numpy. Notice that by default the Python 2.7 interpreter will be setup as the default interpreter because it is the one used by Anaconda 2. Thus, nothing special needs to be done for this. However, the reader can consult the help page of conda to discover other options.
Unfortunately not all the packages can be installed like this: on the fly. Some are more difficult to handle and we have to install them manually as follows.
$ source activate ${HOME}/workspace/python_venvs/ml-p2.7-conda-venv
... (use source deactivate to deactivate the env)
(/home/user/workspace/python_venvs/ml-p2.7-conda-venv) $
... (the prompt is modified: the env is activated)
Tip: When the environment is activated the prompt is updated with the path of the virtual environment as prefix. However, it can be long and thus disturbing. One can use the following hack - which is completely optional - to make thing prettier. Edit the file /opt/anaconda2/bin/activate and do the following modifications:
...
# Add a new variable MY_PROMPT_PREFIX
MY_PROMPT_PREFIX="$(basename $1)"
...
if (( $("$_THIS_DIR/conda" ..changeps1) )); then
CONDA_OLD_PS1="$PS1"
# PS1="($CONDA_DEFAULT_ENV)$PS1"
PS1="($MY_PROMPT_PREFIX)$PS1"
...
In the new version of the activate script, the legacy $PS1 variable is replaced by its basename version which is shorter. After source deactivate followed by source activate ... again, the prompt is displayed in its short version.
3. Layer 1: Install Required Packages in the Virtual Environment
Now the virtual environment (venv) is up’n running and that is fine. However, we still miss a lot of important packages for doing ML with Python 2. We’ll install them manually now in the activated venv.
(ml-p2.7-conda-venv):...$ pip install pillow h5py scikit-learn
When done, we are ready to install Tensorflow and Theano which are the main dependencies for Keras (the Neural Network Python lib). The installation of Tensorflow using pip will fail; thus we use conda install to install Tensorflow from conda-forge as follows:
(ml-p2.7-conda-venv):...$ conda install -c conda-forge tensorflow
When done we go back to a more “natural” way of doing things. We install Theano now using pip as follows:
(ml-p2.7-conda-venv):...$ pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
When done, Keras can be installed:
(ml-p2.7-conda-venv):...$ pip install keras
4. Test the Virtual Environment
Now we turn-off the venv and run a final end-to-end test:
(ml-p2.7-conda-venv):...$ source deactivate
Activate and test a Python script that uses Keras:
~$ source activate ${HOME}/workspace/python_venvs/ml-p2.7-conda-venv
discarding /opt/anaconda2/bin from PATH
prepending ${HOME}/workspace/python_venvs/ml-p2.7-conda-venv/bin to PATH
(ml-p2.7-conda-venv):...$ python
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import keras
Using TensorFlow backend.
>>>
The import of Keras as a module is successful, so it is correctly installed and configured.
5. Are You GPU Ready?
When one talks about keras he/she means Neural Networks (NN) and thus huge amount of calculations. Then, having a supplemental CPU on his/her graphic card can be highly useful to reduce the computation time.
If your graphic card is equiped with such chipset, Keras through Tensorflow will automatically delegate to it some computation tasks.
One can use the following steps to check if he/she is GPU equiped and ready:
- Modify/create the file ${HOME}/.theanorc:
[global] floatX = float32 device = gpu0 [nvcc] fastmath = True
... (ml-p2.7-conda-venv):...$ python ... >>> import keras Using TensorFlow backend. Using GPU ... >>>
Footnotes
| [1] | The term special here denotes an environment with special libraries installed. |
| [2] | Here, it is important to use Conda instead of virtualenv. Indeed, on my Jessie flavor I noticed several conflicts with virtualenv, numpy, and pip |
Bibliography
| [Tea16] | Continiuum Analytics - Anaconda Team. Download and install anaconda for linux. 2016. https://www.continuum.io/downloads#_unix |