Available Python Installations
Python is an interpreted, high-level and general-purpose programming language. On our clusters, we have different types of python installations and versions available via modules. To check which versions are available, use the following command (the
$ is the prompt; do not type it):
$ module available python
You then need to load the appropriate module for the version you want to use, as will be described below. Amongst the options we offer, you will find both branches 2.x and 3.x of python. However, the 2.x branch of the Python programming language is no longer supported by its creators, the Python Software Foundation. Therefore, it is strongly recommended that you use Python 3, unless you have a specific need for version 2. For general information about using modules, please see our page on Lmod.
Anaconda python distribution
For most people and most purposes, the Anaconda Python Distribution is preferred. The most recent Anaconda distribution has over 250 of the most widely used data science packages (and their dependencies) pre-installed. Earlier versions of Anaconda python have an extensive collection of pre-installed packages as well. With this in mind, you may prefer to use one of the Anaconda modules since you will have to install fewer python packages. To load one of the anaconda modules, enter the following command specifying the module name and, optionally, the specific version that you want:
$ module load python3.8-anaconda/2020.07
End of support for version 3.6 is scheduled December 2021. We recommend the use of either version 3.9 or 3.8, unless otherwise constrained to an older version.
We also install the bare python language for those people who wish to develop python software and install libraries by themselves as needed. To load a module for one of the developer versions of python, use the following command:
$ module load python/3.9.1
substituting the version number you wish to use. If the version number is omitted, Lmod will load the default module which is currently set to version 3.9.1.
On our clusters, system installed versions of both Python 2 and Python 3 are available as either python2 (version 2.7.5) or python3 (version 3.6.8) without having to load a module. However, we recommend using one of the python or anaconda modules instead, since they provide more current releases, as well as the benefit of pre-installed packages. You may find the system python installations handy for quickly testing something, but they should not be a part of your general workflow.
Running Python on the Cluster
To run Python at a prompt, simply type
To run a Python script, from the the command line enter the name of the script as an argument to the python command. For example, to run the Python script
my_analysis.py from the current directory, enter
$ python ./my_analysis.py
To run Python in a Jupyter Notebook using our web interface, start a session of Open OnDemand according to the instructions provided in this user guide. After choosing Jupyter Notebook from the Interactive Apps menu, you’ll need to specify the version of Anaconda Python you wish to use and provide the necessary job details.
Installing libraries and packages
Python packages are a set of python modules, while python libraries are a group of python functions aimed to carry out special tasks. There are over 137,000 python libraries and over 235,000 python packages. These libraries and packages can ease a developer’s experience and avoid the need to re-invent the wheel, as the saying goes. Henceforth, the word package(s) will be used when referring to both packages and libraries. While there are distinctions within python, the words packages and libraries are frequently interchanged, and the process for installing them is generally the same.
There are several ways that Python packages can be installed and managed. ARC support staff can install packages that can be made available to anyone who loads the appropriate module, while individuals can install packages for their own use. There are two main routes for users to install packages. One is to install directly into the user’s personal python library using the
pip command. In that case, all packages (for a given version of python) are stored in the same directory. However, this approach can result in conflicts with package version requirements.
Sometimes one application needs a particular version of a package but a different application needs another version. Since the requirements conflict, installing either version will leave one application unable to run. This situation can be resolved by using virtual environments. A virtual environment is a semi-isolated Python environment that allows packages to be installed for use by a particular application or for a particular project.
Common tools used for Python package installation and environment management include:
- pip – the Python package installer, can be used on its own or within a virtual environment
- venv – an environment manager within which pip is used to install packages
- virtualenv – an environment manager within which pip is used to install packages
- conda – an environment manager and a package installer, it does not rely on pip to install packages
You should pick one method and use it exclusively to avoid mixing types of installations.
Python packages are installed for the specific version of Python that is in use during installation. If you switch from using a module for one version of Python to a different one with where either the major or minor version changes, then you will have to re-install any packages/libraries in order to make them available in the library of the new version of Python. You only need to install Python packages once for each cluster on which you wish to use the library and, separately, for each version of Python that you use. Please note, Python packages should be installed using the command line from a login node, not from within Jupyter Notebook or the JupyterLab app.
A brief description of how to use each tool to install packages and manage virtual environments, along with a description of where you can expect to find installed packages, is provided below.
To install a Python package into your personal library using
pip, enter the following command, replacing <package_name> with the actual package name:
$ pip install --user <package_name>
--user tag will, by default, place packages in
?.? indicates the versioning of the Python release. The library will then be available to you for this and future sessions.
You can install a specific version of a package by giving the package name followed by
== and the version number:
pip install --user tensorflow==2.3.2
venv is the standard Python tool for creating virtual environments, and has been part of Python since version 3.3. Starting with Python 3.4, it defaults to installing pip into all created virtual environments. Installing packages into an active venv is done via the
pip command, as described above.
Virtual environments are created as follows:
$ python -m venv /path/to/new/virtual/environment
Alternatively, you can change into the directory of the project you are working on and simply provide a name for the virtual environment in place of the full path:
$ cd /path/to/my/project $ python -m venv myenv
To activate a virtual environment, type:
$ source myenv/bin/activate
If you are not in your project directory, then you must provide the full path to the virtual environment you specified when creating the virtual environment:
$ source /path/to/my/project/myenv/bin/activate
When you are in an active virtual environment, you will see the name of the environment in the prompt. The PATH environment variable is updated so that the virtual environment’s bin directory is at the beginning:
(myenv) $ which pip python ~/my_project/myvenv/bin/pip ~/my_project/myvenv/bin/python
At this point, you would use the
pip command to install any needed packages. Packages that you install using pip while in a virtual environment will be placed in the myenv folder, isolated from the global Python installation, and only available to you from within the virtual environment.
You can deactivate a virtual environment by typing
deactivate in your terminal.
virtualenv is a third party alternative (and predecessor) to venv that is used with Python 2. If you are using the python2.7-anaconda module on either the Great Lakes or Armis2 cluster and would like to work within a virtual environment, you will need to install virtualenv.
virtualenv with pip:
pip install --user virtualenv
Create a virtual environment for a project:
$ cd project_folder $ virtualenv <env_name>
Similar to the way venv works with Python 3,
virtualenv myenv will create a folder in the current directory which will contain the Python executable files and a copy of the pip library which you can use to install other packages. To begin using the virtual environment, it has to be activated:
$ source myenv/bin/activate
The name of the current virtual environment will now appear in parenthesis to the left of the prompt to let you know that it’s active.
When done working in the virtual environment, simply deactivate it:
(myvenv) $ deactivate
Conda is both a package installer, like pip, and an environment manager, like venv and virtualenv. While pip, venv, and virtualenv are for Python, conda is language agnostic and works with other languages as well as Python.
To create a virtual environment for Python with conda, enter the following:
$ conda create --name conda-env python
where conda-env can be replaced with whatever name you choose for your virtual environment. Also,
-n can be used in place of
This environment will use the same version of Python as your current shell’s Python interpreter. To specify a different version of Python, specify the version number when creating your virtual environment as follows:
$ conda create -n conda-env python=3.7
You can install additional packages when creating an environment, by specifying them after the environment name. You can also specify which versions of packages you’d like to install.
$ conda create -n conda-env python=3.7 numpy=1.16.1 requests=2.19.1
It’s recommended to install all packages that you want to include in an environment at the same time in order to avoid dependency conflicts.
You can then activate your conda environment as follows:
$ conda activate conda-env (conda-env) $
As when using other virtual environment programs, the name of the current virtual environment will now appear in parenthesis to the left of the prompt to let you know that it’s active. When you are finished working in the environment, simply enter
$ conda deactivate and your normal prompt will return.
Virtual environments created with conda reside, by default, in the envs directory found in the following path: