class: center, middle, inverse, title-slide # A Practical Introduction to OSC ### Jelmer Poelstra ### MCIC Wooster ### 2020/12/16 (updated: 2020-12-14) --- class: inverse middle center # Overview .pull-left[ ### [Introduction](#introduction) ### [Connecting to OSC](#connect) ### [Transferring Files](#transfer) ### [Software at OSC](#software) ] --- class:inverse middle center name: overview # Overview .pull-left[ ### [Introduction](#introduction) ### [Connecting to OSC](#connect) ### [Transferring Files](#transfer) ### [Software at OSC](#software) ] .pull-right[ ### [Starting a Compute Job](#jobs) ### [Compute Job Options](#job_options) ### [Bonus Material](#bonus) ] --- class: inverse middle center name: introduction # Introduction ----- <br><br><br><br> ### [[Back to overview]](#overview) --- ## What is the OSC? - The **Ohio Supercomputer Center (OSC)** provides computing resources across the state of Ohio (it is not a part of OSU). - OSC has two supercomputers and lots of infrastructure for their usage. -- <br> - Research usage is charged but via institutions at [heavily subsidized rates](https://www.osc.edu/content/academic_fee_model_faq). --- ## Supercomputer? - A highly interconnected set of many processors and storage units. - Also known as: - A **cluster** - A High-Performance Computing cluster (**HPC cluster**) <br> -- ### When do we need a supercomputer? - Our dataset is too large to be handled (efficiently) by our computer. - Long-running analyses can be sped up by using more computing power. - We need to repeat a computation (or have many datasets). --- ## Learning about OSC - OSC regularly has online introductory sessions, both overviews and more hands-on sessions -- see the [OSC Events page](https://www.osc.edu/events). -- - OSC provides excellent introductory material at their [Getting Started Page](https://www.osc.edu/resources/getting_started). **Do read these!** <p align="center"> <img src=figs_OSC/gettingstarted.png width="560"> </p> --- ## Learning about OSC (cont.) - Also have a look at all the ["HOWTO" pages](https://www.osc.edu/resources/getting_started/howto) -- very specific and includes more advanced material. <br> <p align="center"> <img src=figs_OSC/HOWTOs.png width="500"> </p> --- ## Terminology: cluster and node #### Cluster - Many processors and storage units tied together. - OSC currently has two clusters: **_Pitzer_** and **_Owens_**. (Largely separate, but same long-term storage space is accessed by both.) <br> -- #### Node - Essentially a (large/powerful) computer -- one of many in a cluster. - **Login node**: A node you end up on after logging in, which is not meant for computing. (A few per cluster.) - **Compute node**: A node for compute jobs. (Hundreds per cluster.) --- ## Terminology: core #### Core - One of often many *processors* in a node -- e.g. standard *Pitzer* nodes have 40-48 cores. -- - Each core can run and be reserved independently. - But, using multiple cores for a job is also common, to: - Parallelize computations across cores. - Access more memory. --- ## Terminology: cluster > node > core <br> <p align="center"> <img src=figs_OSC/cluster-node-core.png width="100%"> </p> --- class:inverse middle center name:connect # Connecting to OSC ----- <br><br><br><br> ### [[Back to overview]](#overview) --- ## Connecting to OSC with "OnDemand" - You can make use of OSC not only through `ssh` at the command line, but also through the web browser: from https://ondemand.osc.edu. <p align="center"> <img src=figs_OSC/ondemand1.png width="900"> </p> -- - At OnDemand, you can e.g.: - Browse and upload files - Submit and manage jobs visually - Access a terminal - Run RStudio Server -- - **Interface Demo** -- <br> - See the bonus slides starting [here](#ssh) for logging in from your local terminal. --- ## Login nodes: best practices - After logging in to OSC, you're on a **login node**. - Use login nodes for *navigation*, *housekeeping*, and *job submission*. -- - Any process that is active for >20 minutes or uses >1GB **will be killed**. - It is good practice to avoid even getting close to these limits: login nodes are *shared by everyone*, and can get clogged. <br> .content-box-purple[
For more info, see OSC's page [Login environment at OSC](https://www.osc.edu/supercomputing/login-environment-at-osc). ] --- class: inverse middle center name: transfer # Transferring Files with GUIs ----- ### For shell options, see the [Bonus Material](#shell_transfer). <br><br><br><br> ### [[Back to overview]](#overview) --- ## Transferring files: OnDemand and Globus #### For small transfers (<1 GB): use OnDemand <p align="center"> <img src=figs_OSC/ondemand2_circle.png width="900"> </p> -- ---- <p align="center"> <img src=figs_OSC/ondemand3_circle.png width="900"> </p> -- ----- <br> #### For large transfers (>1 GB): use Globus - Especially useful for very large and/or complex transfers. - Does need a [local installation](https://www.osc.edu/resources/getting_started/howto/howto_transfer_files_using_globus_connect) and some set-up. --- class: inverse middle center name: software # Software at OSC ----- <br><br><br><br> ### [[Back to overview]](#overview) --- ## Available software at OSC - For a list of software that has been installed at OSC, and is available for all users, see: <https://www.osc.edu/resources/available_software>. -- - Loading software is also done with `module` commands: ```bash $ module load R # Load default version $ module load R/4.0.2-gnu9.1 # Load a specific version $ module unload R ``` --- class: inverse middle center name: jobs # Starting a Compute Job ## (i.e., Accessing Compute Nodes) ----- <br><br><br> ### [[Back to overview]](#overview) --- ## Starting a compute job: background - To do analyses at OSC, you need access to a **compute node**. - **Automated scheduling software** allows 100s of people with different requirements to access cluster effectively and fairly. -- - Since Dec 15, 2020, both Pitzer and Owens use the **_SLURM_ scheduler** (see [here](https://www.osc.edu/supercomputing/knowledge-base/slurm_migration) for OSC SLURM info). --- ## Starting a compute job: how? - In OnDemand, start an "**Interactive App**" like RStudio Server. - Provide *SLURM* directives in or along with **scripts**. - Run an **interactive shell job**. -- <br> .content-box-purple[
For all these options, you can provide instructions to the *SLURM* scheduler to request a specific number of nodes/cores/memory for a specific amount of time, etc. ] --- ## Starting a compute job: OnDemand Apps <p align="center"> <img src=figs_OSC/apps.png width="570"> </p> --- name:rstudio_server_job ## Starting a compute job: OnDemand Apps (cont.) <p align="center"> <img src=figs_OSC/submit-RStudio.png width="400"> </p> <p align="center"> <img src=figs_OSC/submit-RStudio2.png width="400"> </p> --- ## Starting a compute job: Shell script - To submit a job: ```sh $ sbatch [sbatch-options] script.sh [script-arguments] $ sbatch hello.sh Jelmer # Submitted batch job 2526085 ``` -- <br> - The `sbatch` instructions can also be provided at the top of your script: ```sh #SBATCH --account=PAS0471 #SBATCH --time=00:45:00 #SBATCH --nodes=1-1 #SBATCH --cpus-per-task=2 #SBATCH --mem=8G ``` --- ## Starting a compute job: Interactive job - To start an [interactive job](https://www.osc.edu/supercomputing/knowledge-base/slurm_migration/how_to_monitor_and_manage_jobs): ```sh $ sinteractive # Default: 30-mins, 1-core, 1-node $ sinteractive -t 120 # Two hours ``` - This will get you interactive shell access on a compute node. --- class: inverse middle center name: job_options # Compute Job Options ----- <br><br><br><br> ### [[Back to overview]](#overview) --- ## Compute job options: Nodes and cores - Only ask for >1 **node** when you have explicit parallelization (*you probably don't*). - By default, R will not even use multiple **cores**, though some of the packages we'll work with *can* do this. <br> <br> <br> <br> <br> <br> <br> <br> ```sh #SBATCH --nodes=1-1 # 1 node min. and max. # Two ways to ask for two cores: #SBATCH --cpus-per-task=2 # Preferred for multi-threading #SBATCH --ntasks-per-node=2 # Preferred for multi-process ``` --- ## Compute job options: Time - You need to specify a **time limit** for your job (= "wall time"). - Your job **gets killed** as soon as it hits the time limit! -- <br> - **Ask for (much) more time than you think you will need**. - Your project will be charged for the time *actually used*. - Slight trade-off: shorter jobs are likely to start sooner. <br> <br> ```sh # Acceptable time formats include "minutes", "minutes:seconds", # "hours:minutes:seconds","days-hours", "days-hours:minutes" and # "days-hours:minutes:seconds". #SBATCH --time=60 # 1 hour #SBATCH --time=03:30:00 # 3.5 hours #SBATCH --time=5-0 # 5 days ``` --- ## Compute job options: Memory - Specify a maximum amount of RAM (Random Access Memory), not to be confused with disk storage. - Your job **gets killed** when it hits the memory limit! - Still, default amounts memory may often be enough. R reads all data into memory, so larger data sets = more memory needed. ```sh #SBATCH --mem=20G # Default unit is MB; use "G" for GB ``` -- ## Compute job options: Project - Your jobs always needs to be billed to a project. (If you have only one project, no need to specify it.) ```sh #SBATCH --account=PAS0471 ``` --- class: center middle inverse # Questions? ----- --- class: inverse center middle name:bonus # Bonus Material ----- ### I: [OSC File Systems](#filesystems) ### II: [OSC Access & File Transfer In Your Shell](#shell) ### III: [Monitoring *SLURM* Jobs](#monitor) ### IV: [More on Software at OSC](#conda) <br><br> ### [[Back to overview]](#overview) --- class: inverse center middle name:filesystems # Bonus Material I: # OSC File Systems <br><br><br><br> ### [[Back to overview]](#overview) --- background-color: #ededed ## OSC file systems - OSC has several file systems with different typical use cases: we'll discuss in terms of short-term versus long-terms storage. --- background-color: #ededed ## File systems: long-term storage locations ### Home dir - Can always be accessed with shortcuts **`$HOME`** and **`~`**. - 500 GB capacity, daily back-ups. - Your home dir will include your first project ID, e.g. `/users/PAS0471/jelmer`. (However, this is not the project dir!) -- ### Project dir - `/fs/project/<projectID>`. (MCIC project: `/fs/project/PAS0471`) - The total storage limit is whatever a PI sets this too. Charges are for the reserved space, not actual usage. - Daily back-ups. --- background-color: #ededed ## File systems: short-term storage locations ### scratch - `/fs/scratch/<projectID>`, sometimes `fs/ess/scratch/<projectID>`. - Fast I/O, good to use for large input and output files. - Temporary: deleted after 120 days, and not backed up. -- ### `$TMPDIR` - Represents storage space *on your compute node*. 1 TB max. - Available in the job script through the environment variable `$TMPDIR`. - **Deleted after job ends** -- so copy to/from in job scripts! - See [this bonus slide](#tmpdir) for some example code for working with `$TMPDIR`. --- name: tmpdir background-color: #ededed ## Working with `$TMPDIR` in scripts #### Copy input to $TMPDIR, copy output to home: ```bash cp -R $HOME/my/data/ $TMPDIR/ #[...analyze data with I/O happening in $TMPDIR....] cp -R $TMPDIR/* $HOME/my/data/ ``` <br> #### This trick will copy the data even if the job is killed: ```bash trap "cd $PBS_O_WORKDIR;mkdir $PBS_JOBID;cp -R $TMPDIR/* $PBS_JOBID;exit" TERM ``` --- background-color: #ededed ## File systems: in closing <br> #### If this was a bit much: For those working with "small" NGS datasets (like from MiSeq runs), you may be able to **stick to your home dir** both for storage and computing. <br> #### See [here](https://www.osc.edu/supercomputing/storage-environment-at-osc/available-file-systems) for more info about the file systems. --- class: inverse center middle name: shell # Bonus Material II: # OSC Access & File Transfer In Your Shell ----- <br><br><br><br> ### [[Back to overview]](#overview) --- background-color: #ededed ## Connecting to OSC with `ssh` - If you want to log in from your *own local terminal*, you need to **connect through `ssh`.** - Basic `ssh` usage: ```sh $ ssh <username>@pitzer.osc.edu # e.g. jelmer@pitzer.osc.edu $ ssh <username>@owens.osc.edu # e.g. jelmer@ownes.osc.edu ``` - You will be prompted for your OSC password. --- background-color: #ededed ## `ssh` set-up: avoid being prompted for password 1. On your own computer, generate a key: ```bash $ ssh-keygen -t rsa ``` 1. On your own computer, transfer the key to the remote computer: ```bash $ cat ~/.ssh/id_rsa.pub | ssh <user>@owens.osc.edu 'mkdir -p .ssh && cat >> .ssh/authorized_keys' ``` 1. On remote computer (OSC), set permissions: ```bash $ chmod 700 .ssh; chmod 640 .ssh/authorized_keys ``` <br> - For more details, see this [Tecmint post](https://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/). --- background-color: #ededed ## `ssh` set-up: use shortcuts 1. Create a file called `~/.ssh/config`: ```bash $ touch `~/.ssh/config` ``` 1. Open the file and add your alias(es): ```bash > Host <arbitrary-alias-name> > HostName <remote-name> > User <user-name> ``` -- <br> - This is what it looks like on my machine, so I can login using "`ssh jo`": <br/> <p align="center"> <img src=figs_OSC/ssh.png width="400"> </p> --- background-color: #ededed name: shell_transfer ## Transferring files: using your *local* shell #### For small transfers (<1 GB): use `scp` / `rsync` ```bash # scp $ scp /path/in/local/file.txt <user>@owens.osc.edu:/path/in/remote/ ``` ```bash # rsync (recommended, especially to keep folders synched) $ rsync -avrz --progress /path/in/local/dir \ <user>@owens.osc.edu:/path/in/remote/ ``` ----- #### For any transfer size: use `sftp` ```bash $ sftp sftp.osc.edu sftp> put /path/in/local/file.txt /path/in/remote sftp> put file.txt # From local working dir to $HOME in remote sftp> get /path/in/remote/file.txt /path/in/local/ ``` --- name: monitor class: inverse center middle # Bonus Material III: # Monitoring *SLURM* Jobs ----- <br><br><br><br> ### [[Back to overview]](#overview) --- background-color: #ededed ## Monitoring *SLURM* Jobs - *Queued* and *running* jobs can be monitored using the `squeue` command. (Make sure to specify that you want to see your jobs, or *all* jobs on the entire cluster will be shown.) - When a job is queued (waiting to start), the "`ST`" (status) column will read "`PD`": ```sh $ squeue -u $USER # JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) # 2526085 serial-40 01-cutad jelmer PD 0:00 1 (None) ``` - When a job is running, the "`ST`" column will read "`R`": ```sh squeue -u $USER # JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) # 2526085 serial-40 01-cutad jelmer R 0:02 1 p0002 ``` - As soon as a job has finished running, it will disappear from this list. --- - You can see more details about any job, including finished jobs, using `scontrol show job`: ```sh scontrol show job 2526085 # Replace the number with your JOBID # UserId=jelmer(33227) GroupId=PAS0471(3773) MCS_label=N/A # Priority=200005206 Nice=0 Account=pas0471 QOS=pitzer-default # JobState=RUNNING Reason=None Dependency=(null) # Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 # RunTime=00:02:00 TimeLimit=01:00:00 TimeMin=N/A # SubmitTime=2020-12-14T14:32:44 EligibleTime=2020-12-14T14:32:44 # AccrueTime=2020-12-14T14:32:44 # StartTime=2020-12-14T14:32:47 EndTime=2020-12-14T15:32:47 Deadline=N/A # SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-14T14:32:47 # Partition=serial-40core AllocNode:Sid=pitzer-login01:57954 # ReqNodeList=(null) ExcNodeList=(null) # NodeList=p0002 # BatchHost=p0002 # NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* # TRES=cpu=1,mem=4556M,node=1,billing=1,gres/gpfs:project=0 # Socks/Node=* NtasksPerN:B:S:C=1:0:*:1 CoreSpec=* # MinCPUsNode=1 MinMemoryCPU=4556M MinTmpDiskNode=0 # Features=(null) DelayBoot=00:00:00 # OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) # [...] ``` --- name: conda class: inverse center middle # Bonus Material IV: # More on software at OSC ----- <br><br><br><br> ### [[Back to overview]](#overview) --- ## What if software isn't installed at OSC? ### 1. Install it yourself - Possible but can be tricky due to lack of admin rights. -- ### 2. Get others to install it - For commonly used software, send an email to <oschelp@osc.edu>. - I can help with (more niche) bioinformatics software. -- ### 3. Alternative approaches - The **`conda`** software management system, and **`Singularity` containers** are alternatives when installation is tricky. - These are also superior for reproducibility purposes. For a `conda` example, see the bonus slides starting from [here](#conda). --- background-color: #ededed ## `conda` **`conda` is a software environment manager** - Install software *into a conda environment*, no need for admin (`sudo`) rights - Takes care of software dependencies. - You can easily have different "environments" that each have a different version of the same software - load and unload similar to the `module` system. - A *lot* of bioinformatics software is available through `conda`. --- background-color: #ededed ## `conda` in practice - Load the `conda` module at OSC (part of Python): ```bash $ module load python/3.6-conda5.2 ``` <br> - Create a new environment with "`multiqc`" installed: ```bash $ conda create -y -n my-multiqc-env -c bioconda multiqc ``` <br> - Activate the `multiqc` environment and see if we can run it: ```bash [<user>@owens-login01 ~]$ conda activate my-multiqc-env # Note environment indicator (multiqc) [<user>@owens-login01 ~]$ multiqc --help ```