Viewing the status of submitted jobs
List of jobs in queueThe easiest way to see the status of submitted jobs is visit the ionman status portal at http://ionman.chem.indiana.edu/status/ or http://ionman2.chem.indiana.edu/status/. The second easiest way is the jobstatus command, which outputs the following:
jobid | host | user | duration | sample | cmd
---------------------------------------------------------------------------
n1-242 | n13 | tstrombe | 03:20:32 | 0823H1 | runjob 0823H1.inf (go3d)
n1-243 | n15 | mplasenc | 1+01:08:01 | test5 | runjob test5.inf (wait n15-8)
n13-20 | n15 | tstrombe | 03:17:00 | 0823H1 | pp3d pp3d.inp
n15-8 | diana | tstrombe | 00:19:47 | test5 | runmascot mak2.inp
- jobid is the internal condor queue id. It is made up of two parts: the node that submitted the job, and an job number which increments with each submission.
- host is the name of the machine which the job is currently executing on
- user is the name of the user who the job is executing as
- duration is the number of days, hours, minutes, and seconds since the job was submitted.
- sample is the sample name the node is working on
- cmd is the command that is currently running on the host
When the cmd field says "wait n15-8", that means that runjob is waiting for the jobid n15-8 to complete before it moves to the next step. jobstatus also takes some arguments, see the manpage.
Seeing what is going on with a particular jobIf you go into the jobs/queue/queueid/sample folder, you will see three primary files of interest:
- condor/output.txt - The output of (runjob). Most standard job output is here.
- condor/error.txt - The error output of (runjob). If you have an error, this is the first place to look.
- perf.txt - a list of what commands have already completed, and how long it took to complete them.
The condor/ directory also has other output.txt and error.txt files, one per subjob that is launched. They may be worth checking out!
|