The Ohio Supercomputer Center (OSC)
Week 1 – Lecture C
1 Introduction
1.1 Overview
This session introduces high-performance computing and the Ohio Supercomputer Center (OSC).
This is only meant as a brief overview to give you some context about the working environment that you will start using next week: you will do all your coding and computing at OSC during this course.
Along the way, you’ll learn a lot more about most topics that we will touch on here. Specifically, a deeper dive into OSC will follow in week 5.
1.2 Learning goals
In this session, you will learn:
- What a supercomputer is and why they are useful
- What resources the Ohio Supercomputer Center (OSC) provides
- How to access OSC resources through its OnDemand webportal
2 High-performance computing
A supercomputer (also known as a “compute cluster” or simply a “cluster”) consists of many computers that are connected by a high-speed network, and that can be accessed remotely by its users.
Supercomputers provide high-performance computing (HPC) resources, which consists of two main aspects:
- “Compute”: computing power to run your data processing and analysis
- Storage: space for (long-term) storage of your data and results
This is what Cardinal, one of the OSC supercomputers, physically looks like:
Here are some possible reasons to use a supercomputer instead of your own laptop or desktop:
- Your analyses take a long time to run, need large numbers of processors, or a large amount of memory.
- You need to run an analysis many times.
- You need to store a lot of data.
- Your analyses require software available only for the Linux operating system, but you have Windows.
- Your analyses require specialized hardware, such as GPUs (Graphical Processing Units).
When you’re working with omics data, many of these reasons typically apply. This can make it hard or impossible to run all your analyses on your personal workstation, and supercomputers provide a solution.
3 The Ohio Supercomputer Center (OSC)
The Ohio Supercomputer Center (OSC) is a facility provided by the state of Ohio. It has several supercomputers, lots of storage space, and an excellent infrastructure for accessing these resources.
Access to OSC’s compute and storage goes through OSC “Projects”:
- A project can be tied to a research project or lab, or be educational like this course’s project,
PAS2880
. - Each project has a budget in terms of “compute hours” and storage space1
- As a user, it’s possible to be a member of multiple different projects.
- OSC projects are typically requested and managed by PIs.
OSC has three main websites — in this course, we will almost exclusively use the first:
- https://ondemand.osc.edu: A web portal to use OSC resources through your browser (login needed).
- https://my.osc.edu: Account and project management (login needed).
- https://osc.edu: General website with information about the supercomputers, installed software, and usage.
4 The structure of a supercomputer center
4.1 Terminology
Let’s start with some (super)computing terminology, going from smaller things to bigger things:
- Node
A single computer that is a part of a supercomputer. - Supercomputer / Cluster
A collection of connected computers. OSC currently has three: “Ascend”, “Cardinal”, and “Pitzer”. - Supercomputer Center
A facility like OSC that has one or more supercomputers.
4.2 Supercomputer components
We can think of a supercomputer as having three main parts:
- File Systems: Where files are stored (these are shared between the OSC supercomputers!)
- Login Nodes: The handful of computers everyone shares after logging in
- Compute Nodes: The many computers you can reserve to run your analyses
We wil briefly discuss these below, and come back to them in more detail later in the course.
File systems
OSC has several distinct file systems:
File system | Located within “path” | Main purpose |
---|---|---|
Project | /fs/ess/ |
OSC’s main data storage location |
Scratch | /fs/scratch/ |
Additional, temporary storage |
Home | /users/ |
General, personal files not tied to research projects or courses |
During the course, we’ll work in the project folder of the course’s OSC Project PAS2880
: /fs/ess/PAS2880
.
Paths, like those shown in the table above, specify the locations of folders and files on a computer. You will learn more about them in the next few weeks.
Login Nodes
Login nodes are an initial landing spot for everyone who logs in to a supercomputer. There are only a handful of them on each supercomputer, they are shared among everyone, and cannot be reserved for exclusive usage.
Therefore, login nodes are meant only for things like organizing files and creating scripts for compute jobs. They are not meant for serious computing – in other words, they don’t provide compute, which is the function of compute nodes.
Compute Nodes
Data processing and analysis is done on compute nodes. You can only use compute nodes after putting in a request for compute resources (a “compute job”).
A job scheduler program called Slurm, which you’ll learn to use later in this course, then assigns the requested resources: you may, for example, get exclusive access to a specific compute node for two hours.
The processing and analysis of data on a supercomputer is typically done by running code through scripts.
If you have some familiarity with doing so on a laptop or a desktop, you may wonder what (else) works differently on a supercomputer like at OSC. You’ll learn much more about these later on in the course, but here is an overview:
- “Non-interactive” computing is common
It is common to write and then submit scripts to a queue instead of running programs interactively. - Software
You generally can’t install software like on a personal computer, and a lot of installed software needs to be “loaded” before you can use it. - Operating system
Supercomputers run on the Linux operating system rather than on Windows or MacOS. - Login versus compute nodes
As mentioned, the nodes you end up on after logging in are not meant for heavy computing and you have to request access to compute nodes to run most analyses.
5 OSC OnDemand
The OSC OnDemand web portal is an amazing recourse that allows you to use a web browser to access OSC resources. For example, it offers access to:
- A file browser/explorer
- A Unix shell
- “Interactive Apps”: programs such as RStudio and VS Code
Go to https://ondemand.osc.edu and log in (use the boxes on the left-hand side). Once logged in, you should see a landing page similar to the one below:
We will now go through some of the dropdown menus in the blue bar along the top.