hailholyghost
November 12th, 2011, 10:18 PM
Hello,:popcorn:
I am attempting to load output of a command to an array to minimize hard drive use.
This is the original script (which works)
#!/bin/bash
qstat -u $USER > tmp
awk '/R/{sum+=$7} END {print "There are", sum, "processors in use"}' tmp
awk '/R/{print $7}' tmp | wc -l | ~/pca '$1-1' | awk '{print "There are", $0, "jobs running"}'
awk '/Q/{print $7}' tmp | wc -l | ~/pca '$1-1' | awk '{print "There are", $0, "jobs in queue"}'
awk '/C/{print $7}' tmp | wc -l | awk '{print "There are", $0, "jobs stopped"}'
rm tmp
The problem with this script is that it writes to a file called "tmp" on the HD, which requires a lot of I/O.
I am trying to alter the code so the output of qstat -u $USER, which looks like:
[dave@bluehive ~]$ qstat -u $USER
bhsn:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
1101607.bhsn-int dave standard gacc.s.syn.99 15026 1 8 -- 119:5 R 101:2
1101611.bhsn-int dave standard gacc.s.anti.tor 17515 1 8 -- 119:5 R 100:1
1101612.bhsn-int dave standard s.syn.ol 25678 1 8 -- 119:5 R 100:0
1105954.bhsn-int dave standard s.anti.ol 25377 1 8 -- 119:5 R 87:22
1108616.bhsn-int dave standard lna.caau.99 32329 1 8 -- 119:5 R 39:37
1116620.bhsn-int dave long n.syn.ol 21717 1 8 -- 336:0 R 13:55
1128672.bhsn-int dave standard rna.caau.ol 31860 1 8 -- 119:5 R 01:47
is written to an array instead.
My first attempts have failed (I've commented out the last 4 commands, just trying the first right now).
#!/bin/bash
jobs=($(qstat -u $USER))
awk '/R/{sum+=$7} END {print "There are", sum, "processors in use"}' "${jobs[@]}"
#awk '/R/{print $7}' | wc -l | ~/pca '$1-1' | awk '{print "There are", $0, "jobs running"}'
#awk '/Q/{print $7}' tmp | wc -l | ~/pca '$1-1' | awk '{print "There are", $0, "jobs in queue"}'
#awk '/C/{print $7}' tmp | wc -l | awk '{print "There are", $0, "jobs stopped"}'
#rm tmp
The reason I'm trying this is because I've had issues in the past about hard drive I/O massively slowing down the calculations. I've got two questions:
1. Does writing to an array, as opposed to writing to a temporary file and calling from that file, speed up calculations? I think this is because the array uses RAM, as opposed to the file "tmp" which is on the hard drive.
2. How can I use these awk and 'wc -l' commands to work on the array "jobs" the same way they worked with the file tmp?
Thanks so much!:P
-Dave
I am attempting to load output of a command to an array to minimize hard drive use.
This is the original script (which works)
#!/bin/bash
qstat -u $USER > tmp
awk '/R/{sum+=$7} END {print "There are", sum, "processors in use"}' tmp
awk '/R/{print $7}' tmp | wc -l | ~/pca '$1-1' | awk '{print "There are", $0, "jobs running"}'
awk '/Q/{print $7}' tmp | wc -l | ~/pca '$1-1' | awk '{print "There are", $0, "jobs in queue"}'
awk '/C/{print $7}' tmp | wc -l | awk '{print "There are", $0, "jobs stopped"}'
rm tmp
The problem with this script is that it writes to a file called "tmp" on the HD, which requires a lot of I/O.
I am trying to alter the code so the output of qstat -u $USER, which looks like:
[dave@bluehive ~]$ qstat -u $USER
bhsn:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
1101607.bhsn-int dave standard gacc.s.syn.99 15026 1 8 -- 119:5 R 101:2
1101611.bhsn-int dave standard gacc.s.anti.tor 17515 1 8 -- 119:5 R 100:1
1101612.bhsn-int dave standard s.syn.ol 25678 1 8 -- 119:5 R 100:0
1105954.bhsn-int dave standard s.anti.ol 25377 1 8 -- 119:5 R 87:22
1108616.bhsn-int dave standard lna.caau.99 32329 1 8 -- 119:5 R 39:37
1116620.bhsn-int dave long n.syn.ol 21717 1 8 -- 336:0 R 13:55
1128672.bhsn-int dave standard rna.caau.ol 31860 1 8 -- 119:5 R 01:47
is written to an array instead.
My first attempts have failed (I've commented out the last 4 commands, just trying the first right now).
#!/bin/bash
jobs=($(qstat -u $USER))
awk '/R/{sum+=$7} END {print "There are", sum, "processors in use"}' "${jobs[@]}"
#awk '/R/{print $7}' | wc -l | ~/pca '$1-1' | awk '{print "There are", $0, "jobs running"}'
#awk '/Q/{print $7}' tmp | wc -l | ~/pca '$1-1' | awk '{print "There are", $0, "jobs in queue"}'
#awk '/C/{print $7}' tmp | wc -l | awk '{print "There are", $0, "jobs stopped"}'
#rm tmp
The reason I'm trying this is because I've had issues in the past about hard drive I/O massively slowing down the calculations. I've got two questions:
1. Does writing to an array, as opposed to writing to a temporary file and calling from that file, speed up calculations? I think this is because the array uses RAM, as opposed to the file "tmp" which is on the hard drive.
2. How can I use these awk and 'wc -l' commands to work on the array "jobs" the same way they worked with the file tmp?
Thanks so much!:P
-Dave