Lab 3: DelftBlue cluster setup
In this manual you will learn how to access your account in the DelftBlue high performance computing cluster created for this course. You should test if you can login into the server before the lab, to avoid any issues during the lab session.
Warning: In the DelftBlue cluster jobs are managed through a queuing system. So it is unpredictable to know when your job will execute.
Software
You can use any terminal application to connect to DelftBlue. Please note that if you are not using campus network, you have to use eduVPN first. Once you have connected to the campus network via either eduVPN or on-campus network, you can open your Linux/ Windows/ MacOS Terminal
How to connect
Depending on the software you have chosen, specify your NETID in the ssh command below, and run it in your terminal.
ssh <NETID>@login.delftblue.tudelft.nl
The login node then asks your TUDelft password, and after that, you should see the cluster prompt. Contact a TA if you have any issues. This step is essential to ensure your personal \home
folder is created.
If you wish to use VSCode to connect and access your files through SSH. Follow the instructions in DelftBlue integrated development using an IDE
Download the template on DelftBlue
Run the following command to download the template and unzip it inside your \home
directory.
curl https://cese.ewi.tudelft.nl/computer-engineering/docs/CESE4130_Lab3.zip --output CESE4130_Lab3.zip
unzip CESE4130_Lab3.zip -d CESE4130_Lab3
You are now ready to use the template.
Running the test on DelftBlue
Please use the following set of commands to run a simple device query test on the GPU node.
cd CESE4130_Lab3/device-query/
sbatch job.sh
After that run squeue -u $USER
to check the job queue status. If the job is executed right after the sbatch
command, due to its very short running time, you will see an empty queue. However, an empty queue doesn't always mean the job has been run successfully. If there is a problem with the script or command, the job manager will terminate the job right after the submission and again you will see an empty queue.
In the picture below, you see that: 1. The submitted job is in 'R' state which means the job is running on the target node; 2. The queue becomes empty. Please note that, if the target node is not ready yet, the job remains in the queue. This is specified with "PD" in the "ST" columns which means the submitted job is in pending state.
Once the queue becomes empty, an output file is generated in the folder and you can see the content with cat ce-lab3-part0-*.out
. In a successful run, you should see device information, e.g. NVIDIA A100 or NVIDIA Volta specifications. If you see an error message in the output file, contact a TA.
You can now continue with the Submitting jobs on DelftBlue section.