Resource Constraints | Gator-AIM's Hub

Resource Constraints

To effectively utilize HiPerGator, each user must be aware of the resource constraints imposed by both UFRC and our research group. Respecting these constraints is crucial, as it ensures optimal utilization of available resources and helps maintain efficient operations across the labs that share the computing resources.

Quality of Service (QOS) Limits

Each account has two QOS levels: high-priority investment QOS and low-priority burst QOS. The burst QOS allows short-term borrowing of unused resources from other groups. Each user is associated with a scheduler account that determines the available QOS levels, including those from secondary group memberships.

QOS levels determine the available computational resources (CPU cores, memory, maximum run time) and their priority. For detailed information, visit the HiPerGator QOS Limits Documentation.

CPU Cores and Memory (RAM) Resource Limits

CPU cores and RAM are allocated to jobs independently as requested in job scripts. Consider the QOS limits, hardware limitations, and fair usage to ensure efficient and fair resource allocation. HiPerGator’s compute nodes have limited resources (CPU cores, memory, memory bandwidth, network bandwidth, and local storage). Fully consuming one resource can waste others, affecting overall system performance.

When submitting a job, if no resource request is specified, default limits are applied: 1 CPU core, 4GB memory, and 10-minute time limit. Use --mem for total job memory, --mem-per-cpu for per-core memory, and --time for setting appropriate time limits within QOS constraints.

To avoid job pending due to QOS limits, ensure your resource requests are within limits. If a job cannot be satisfied within QOS limits or hardware constraints, the scheduler will return an error.

Use the slurmInfo command to view a summary of active jobs for a group:

slurmInfo -pu -g groupname

Time Limits

Different sets of hardware configurations have their own time limits. An example configuration is:

#SBATCH --time=4-00:00:00 # Walltime in hh:mm or d-hh:mm

Compute Partitions

Partitions include hpg-default, hpg2-compute, and bigmem. If no partition is specified for a job, hpg-default and hpg2-compute are selected by default.

1.Investment QOS

Default: 10 minutes
Maximum: 31 days (744 hours)

2.Burst QOS

Default: 10 minutes
Maximum: 4 days (96 hours)

Interactive Work

Partitions for interactive work include hpg-dev, gpu, and hpg-ai.

Default time limit: 10 minutes
hpg-dev Maximum: 12 hours
gpu:
- 12 hours for srun .... --pty bash -i sessions
- 72 hours for Jupyter sessions in Open OnDemand
hpg-ai:
- 12 hours for srun .... --pty bash -i sessions

Jupyter

JupyterHub: Sessions have preset individual limits shown in the menu
JupyterLab in Open OnDemand: Maximum of 72 hours for the GPU partition, other partitions follow standard limits

GPU/HPG-AI Partitions

Default: 10 minutes
Maximum: 14 days

BMI Resources Usage Guideline

As members of Dr. Liu’s research group, we share significant computational resources across various labs within our division. Currently, we have:

CPUs: 180 units
GPUs: 20 units
BlueStorage: 12 TB
OrangeStorage: 10 TB

To ensure fair and efficient use of these resources, please adhere to the following guidelines:

Resource Registration and Usage

Prior Registration: Before submitting jobs that exceed the thresholds of your fair share, ensure to check and register your requirements on the HPG usage table.
Resource Limits: Do not reserve more than 64 CPUs, 500 GB of memory, or 5 GPUs for any single job without additional coordination.
Duration of Use: Limit node reservations to no more than 7 days unless additional arrangements are made.

Storage Usage

Blue Storage: Do not use more than 1 TB.
Orange Storage: Usage is capped at 1 TB.
Storage Coordination: Contact Qi for further coordination if the remaining storage space in either Blue or Orange storage falls below 1 TB.

← SLURM Scheduler

Standard Operating Procedure: HiPerGator →