The Euler HPC cluster is an absolutely amazing resource, for starters it’s free to use for all ETH members that have a nethz account with zero paperwork. I use it frequently for training of bayesian predictive models using MCMC sampling built with PyMC3. Here are a few tricks I learned for setting up and executing python batch jobs on the LSF system.
Connecting to Euler
Access to the HPC clusters is only possible via secure protocols (ssh, sftp, scp, rsync) and must be accessed from ETH network or via VPN connection. ETH Recommends using the Cisco AnyConnect Client which they have a quick-install script for (vpnsetup.sh found at https://sslvpn.ethz.ch). After installing Cisco AnyConnect Client:
# connect /opt/cisco/anyconnect/bin/vpn connect sslvpn.ethz.ch # disconnect /opt/cisco/anyconnect/bin/vpn disconnect
See ETH HPC docs here
After accepting the user agreement you can set up ssh-keys
ETH HPC docs for python can be found here: https://scicomp.ethz.ch/wiki/Python
Set the global python interpreter to 3.6.1. This is important for which version we use to configure our environment.
module load new gcc/4.8.2 python/3.6.1
The #! line in scripts should point to the module selected previously, not OS interpreter.
The best method I found to manage third party libraries is miniconda: https://conda.io/miniconda.html
I installed miniconda in my personal directory and manage environments from there while cloning code into my $SCRATCH diretory for processing.
# environment setup source $HOME/miniconda3/bin/activates conda config --add channels conda-forge conda env create -f environment.yml # start the environment source activate pymc3 # to export environment (not needed to run) # run this on local before and distribute .yml file to remote conda env export > environment.yml
venv (alternative to miniconda)
Alternative (more difficult) to miniconda is venv environment:
python3.6 -m venv env source env/bin/activate python3.6 -m pip install -r requirements.txt
Note: installing pygpu via pip does not appear to work. It can be removed from requirements.txt and not installed without error.
Packages are managed with requirements.txt and the following pip commands:
# install SomePackage /env/bin/pip install SomePackage==1.0.4 # generate requirements.txt from installed packages in venv /env/bin/pip freeze > requirements.txt # install all required packages /env/bin/pip install -r requirements.txt
Running Python programs as LSF batch jobs
Info about the ETH LSF batch system here
Python scripts can be run with:
bsub [LSF options] "python my_python_script.py"
where [LSF options] can be:
- -n number_of_processors
- -I (interactive)
- -o output_file
- -R “rusage[mem=XXX]”
or use their web tool to generate LSF options
bsub -n 1 -W 4:00 -oo output.txt 'python training.py --njobs 4 --draws 10000'
Large groups of batch jobs can be run from bash scripts generated during preprocessing. These are to be run also with bsub, not in the bash shell directly.
bsub < script.sh
output.txt gives the stdout and the LSF end of run information. Desired output can be copied to local using scp.
scp -r email@example.com:output ./