PACE
- project directory
~/p-fschaefer7-0
- home directory
/storage/home/hcoda1/6/shuan7
- scratch (temp)
~/scratch
logging in
- need to run GT VPN (GlobalProtect)
- logging in:
- (kitty sets
$TERM
wrong)
TERM=xterm-color ssh shuan7@login-phoenix.pace.gatech.edu
- (GT password)
- to see headnodes
pace-whoami
transferring files
- transferring files
- just use
scp
/rsync
...
submitting jobs
- submitting jobs
- account
gts-fschaefer7
- see accounts
pace-quota
- see queue status
pace-check-queue -c inferno
- make slurm file
#!/bin/bash
#SBATCH -Jcknn-cg # job name
#SBATCH --account=gts-fschaefer7 # charge account
#SBATCH --nodes=1 # number of nodes and cores per node required
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=22gb # memory per core
#SBATCH -t48:00:00 # duration of the job (hh:mm:ss)
#SBATCH -qinferno # QOS name
#SBATCH -ojobs/cg_%j.out # combined output and error messages file
cd $SLURM_SUBMIT_DIR # change to working directory
# load modules
module load anaconda3
# enter conda virtual environment
conda activate ./venv
# run commands
lscpu
lsmem
time srun python -m experiments.cg
-
submit job to scheduler (two queues:
inferno
andembers
)
sbatch job.sbatch
-
job inherits current directory, have to run
sbatch
from proper directory! -
submitted job status
squeue -u shuan7
interactive session
- request interactive session
salloc -A gts-fschaefer7 -q inferno -N 1 --ntasks-per-node=4 -t 1:00:00
- for gpu
salloc -A gts-fschaefer7 -q inferno -N 1 --gres=gpu:A100:1 --mem-per-gpu=12G -t 0:15:00