ITG Unix Support
>    
     |  List directory  |  History  |  Similar  |  Print version  

HPC > IonMan Cluster > Viewing the status of submitted jobs

Viewing the status of submitted jobs

List of jobs in queue

The easiest way to see the status of submitted jobs is visit the ionman status portal at http://ionman.chem.indiana.edu/status/ or http://ionman2.chem.indiana.edu/status/.

The second easiest way is the jobstatus command, which outputs the following:

  jobid |  host |     user |   duration | sample | cmd                                     
---------------------------------------------------------------------------
 n1-242 |   n13 | tstrombe |   03:20:32 | 0823H1 | runjob 0823H1.inf (go3d)       
 n1-243 |   n15 | mplasenc | 1+01:08:01 | test5  | runjob test5.inf (wait n15-8)     
 n13-20 |   n15 | tstrombe |   03:17:00 | 0823H1 | pp3d pp3d.inp       
 n15-8  | diana | tstrombe |   00:19:47 | test5  | runmascot mak2.inp    
  • jobid is the internal condor queue id. It is made up of two parts: the node that submitted the job, and an job number which increments with each submission.
  • host is the name of the machine which the job is currently executing on
  • user is the name of the user who the job is executing as
  • duration is the number of days, hours, minutes, and seconds since the job was submitted.
  • sample is the sample name the node is working on
  • cmd is the command that is currently running on the host

When the cmd field says "wait n15-8", that means that runjob is waiting for the jobid n15-8 to complete before it moves to the next step.

jobstatus also takes some arguments, see the manpage.

Seeing what is going on with a particular job

If you go into the jobs/queue/queueid/sample folder, you will see three primary files of interest:

  • condor/output.txt - The output of (runjob). Most standard job output is here.
  • condor/error.txt - The error output of (runjob). If you have an error, this is the first place to look.
  • perf.txt - a list of what commands have already completed, and how long it took to complete them.

The condor/ directory also has other output.txt and error.txt files, one per subjob that is launched. They may be worth checking out!

 

Reference http://wiki.chem.indiana.edu/HPC/ViewingTheStatusOfSubmittedJobs
Rights rw-rw-r--   tstrombe   IonMan

Prev. Resubmitting incomplete jobs to IonMan   Aborting a job on IonMan Next