PDA

View Full Version : Howto : Install Torque/PBS (job scheduler/manager) for a workstation


avelldiroll
October 31st, 2006, 11:50 AM
Disclaimer :
* This is more some quick'n dirty notes than a real Howto, so feel free to dislike the presentation.
* This was done on dapper, it should work mostly on edgy but as I didn't play with it much yet I don't know if upstart (the new init system) is still configured that way (update-rc.d)
* I tend to consider people interested in job scheduling to be CLI friendly and able to know when to be root ... as I might forget to be that precise, please bear with me and feel free to comment on where it bothers you.

Background :
I am working on different clusters on a daily basis some of them I am in charge with. To configure those I am not using Ubuntu or any Debian based distro, I am mostly using Rocks cluster http://www.rocksclusters.org for its quick installation process.
Recently I put my hand on a 4-cores machine (2*dualcore), that I wanted to share with other people for small calculations (so no cluster here). I installed Ubuntu (my distro of choice for desktop - the machine being also used for visualisation of data) and I didn't find any job management system in the repositories (appart from cron (too basic) and drqueue (dedicated to 3D rendering)). I then searched for some Ubuntu howto and didn't find any, hence this post. Finally I searched for some linux job management tools for workstations (on master only) and didn't find any ... if anybody has heard of one I would be happy to know about it.
So I went back and used the beast I knew, I set up Torque/PBS with the upsetting feeling that I was hammering a nail with a sledgehammer.

Howto:

Installing Torque PBS on a workstation

( This is mostly following the quickstart guide : http://www.clusterresources.com/wiki/doku.php?id=torque:appendix:l_torque_quickstart_gu ide)

* Get the latest torque tarball from http://www.clusterresources.com/downloads/torque/

* Compile and install it somewhere it won't bothers you

tar -xzvf torque.tar.gz
cd torque
./configure --prefix=/opt/local
make
make install

* Launch the setup tool (in the torque folder from the tarball) indicating an existing user name (Admin_User here)

torque.setup Admin_USER

(This launch pbs_server at the end)

* Quick'n dirty configuration for a 4 cpus workstation

(the torque executables should be in your path, if you used the same installation directory as I did you sould have hat the following line to your ~/.bashrc :
export PATH=$PATH:/opt/local/bin:/opt/local/sbin
)
(By default $(TORQUECFG)=/var/spool/torque )

cd $(TORQUECFG)

Edit server_priv/nodes with your favourite editor :
vi server_priv/nodes
and add the following line :
myworkstation np=4
(myworkstation : the name you gave to the machine
4 : the number of cpus)

Set the client server :
vi mom_priv/config
and add the following line :
$pbs_server = 127.0.0.1

Start the client daemon :
pbs_mom

Restart the pbs server daemon :
qterm
pbs_server

Launch the scheduler daemon :

pbs_sched

You can see the current Torque/PBS config :
qmgr -c "list server"
qmgr -c "list queue batch"

And set some options (here for a 4-core worstation) :
qmgr -c "set server query_other_jobs = True"
qmgr -c "set queue batch resources_max.ncpus=4"

Finally you may want the 3 servers to be launched at boot time :

For that purpose, you need to create those 3 files (based on /etc/init.d/skeleton) :
/etc/init.d/pbs_server
/etc/init.d/pbs_mom
/etc/init.d/pbs_sched

###/etc/init.d/pbs_mom###

#! /bin/sh
### BEGIN INIT INFO
# Provides: skeleton
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Example initscript
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO
#
# Author: Miquel van Smoorenburg <miquels@cistron.nl>.
# Ian Murdock <imurdock@gnu.ai.mit.edu>.
#
# Please remove the "Author" lines above and replace them
# with your own name if you copy and modify this script.
#
# Version: @(#)skeleton 2.85-23 28-Jul-2004 miquels@cistron.nl
#

set -e

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin
DESC="PBS MOM Client Daemon"
NAME=pbs_mom
DAEMON=/opt/local/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0

# Read config file if it is present.
#if [ -r /etc/default/$NAME ]
#then
# . /etc/default/$NAME
#fi

#
# Function that starts the daemon/service.
#
d_start() {
start-stop-daemon --start --quiet --pidfile $PIDFILE \
--exec $DAEMON \
|| echo -n " already running"
}

#
# Function that stops the daemon/service.
#
d_stop() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME \
|| echo -n " not running"
}

#
# Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME --signal 1
}

case "$1" in
start)
echo -n "Starting $DESC: $NAME"
d_start
echo "."
;;
stop)
echo -n "Stopping $DESC: $NAME"
d_stop
echo "."
;;
#reload)
#
# If the daemon can reload its configuration without
# restarting (for example, when it is sent a SIGHUP),
# then implement that here.
#
# If the daemon responds to changes in its config file
# directly anyway, make this an "exit 0".
#
# echo -n "Reloading $DESC configuration..."
# d_reload
# echo "done."
#;;
restart|force-reload)
#
# If the "reload" option is implemented, move the "force-reload"
# option to the "reload" entry above. If not, "force-reload" is
# just the same as "restart".
#
echo -n "Restarting $DESC: $NAME"
d_stop
# One second might not be time enough for a daemon to stop,
# if this happens, d_start will fail (and dpkg will break if
# the package is being upgraded). Change the timeout if needed
# be, or change d_stop to have start-stop-daemon use --retry.
# Notice that using --retry slows down the shutdown process somewhat.
sleep 1
d_start
echo "."
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 3
;;
esac

exit 0


###/etc/init.d/pbs_sched###

#! /bin/sh
### BEGIN INIT INFO
# Provides: skeleton
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Example initscript
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO
#
# Author: Miquel van Smoorenburg <miquels@cistron.nl>.
# Ian Murdock <imurdock@gnu.ai.mit.edu>.
#
# Please remove the "Author" lines above and replace them
# with your own name if you copy and modify this script.
#
# Version: @(#)skeleton 2.85-23 28-Jul-2004 miquels@cistron.nl
#

set -e

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin
DESC="PBS Scheduler Daemon"
NAME=pbs_sched
DAEMON=/opt/local/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0

# Read config file if it is present.
#if [ -r /etc/default/$NAME ]
#then
# . /etc/default/$NAME
#fi

#
# Function that starts the daemon/service.
#
d_start() {
start-stop-daemon --start --quiet --pidfile $PIDFILE \
--exec $DAEMON \
|| echo -n " already running"
}

#
# Function that stops the daemon/service.
#
d_stop() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME \
|| echo -n " not running"
}

#
# Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME --signal 1
}

case "$1" in
start)
echo -n "Starting $DESC: $NAME"
d_start
echo "."
;;
stop)
echo -n "Stopping $DESC: $NAME"
d_stop
echo "."
;;
#reload)
#
# If the daemon can reload its configuration without
# restarting (for example, when it is sent a SIGHUP),
# then implement that here.
#
# If the daemon responds to changes in its config file
# directly anyway, make this an "exit 0".
#
# echo -n "Reloading $DESC configuration..."
# d_reload
# echo "done."
#;;
restart|force-reload)
#
# If the "reload" option is implemented, move the "force-reload"
# option to the "reload" entry above. If not, "force-reload" is
# just the same as "restart".
#
echo -n "Restarting $DESC: $NAME"
d_stop
# One second might not be time enough for a daemon to stop,
# if this happens, d_start will fail (and dpkg will break if
# the package is being upgraded). Change the timeout if needed
# be, or change d_stop to have start-stop-daemon use --retry.
# Notice that using --retry slows down the shutdown process somewhat.
sleep 1
d_start
echo "."
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 3
;;
esac

exit 0


###/etc/init.d/pbs_server###

#! /bin/sh
### BEGIN INIT INFO
# Provides: skeleton
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Example initscript
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO
#
# Author: Miquel van Smoorenburg <miquels@cistron.nl>.
# Ian Murdock <imurdock@gnu.ai.mit.edu>.
#
# Please remove the "Author" lines above and replace them
# with your own name if you copy and modify this script.
#
# Version: @(#)skeleton 2.85-23 28-Jul-2004 miquels@cistron.nl
#

set -e

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin
DESC="PBS Server"
NAME=pbs_server
DAEMON=/opt/local/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0

# Read config file if it is present.
#if [ -r /etc/default/$NAME ]
#then
# . /etc/default/$NAME
#fi

#
# Function that starts the daemon/service.
#
d_start() {
start-stop-daemon --start --quiet --pidfile $PIDFILE \
--exec $DAEMON \
|| echo -n " already running"
}

#
# Function that stops the daemon/service.
#
d_stop() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME \
|| echo -n " not running"
}

#
# Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME --signal 1
}

case "$1" in
start)
echo -n "Starting $DESC: $NAME"
d_start
echo "."
;;
stop)
echo -n "Stopping $DESC: $NAME"
d_stop
echo "."
;;
#reload)
#
# If the daemon can reload its configuration without
# restarting (for example, when it is sent a SIGHUP),
# then implement that here.
#
# If the daemon responds to changes in its config file
# directly anyway, make this an "exit 0".
#
# echo -n "Reloading $DESC configuration..."
# d_reload
# echo "done."
#;;
restart|force-reload)
#
# If the "reload" option is implemented, move the "force-reload"
# option to the "reload" entry above. If not, "force-reload" is
# just the same as "restart".
#
echo -n "Restarting $DESC: $NAME"
d_stop
# One second might not be time enough for a daemon to stop,
# if this happens, d_start will fail (and dpkg will break if
# the package is being upgraded). Change the timeout if needed
# be, or change d_stop to have start-stop-daemon use --retry.
# Notice that using --retry slows down the shutdown process somewhat.
sleep 1
d_start
echo "."
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 3
;;
esac

exit 0




Now update the rc's :

update-rc.d pbs_server defaults 95
update-rc.d pbs_mom defaults 96
update-rc.d pbs_sched defaults 97


And you are done.

A sample script for qsub using lam/mpi would be :

#!/bin/bash
#PBS -l ncpus=4

echo $PBS_JOBID
echo "Start time :"
date
lamboot

mpirun -np 4 your_mpi_command

echo "End Time :"
date
lamclean
lamhalt


HIH

motin
February 25th, 2007, 07:37 PM
Great that you explain how to configure autostart of the services! Looking all over for this to no avail.

However, when ran the setup-script after installing, I got a lot of complaints about libtorque.so.0 not being found.

I had to copy the libs manually:
cp ./src/lib/Libpbs/.libs/libtorque.so.0 /usr/lib/libtorque.so.0
cp ./src/lib/Libpbs/.libs/libtorque.so.0.0.0 /usr/lib/libtorque.so.0.0.0

In case someone has the same problem...

widedangel
September 16th, 2009, 09:35 AM
when I execute
torque.setup ADMIN_NAME it gives me the following error
./torque.setup: 31: pbs_server: not found
./torque.setup: 33: qmgr: not found
ERROR: cannot set TORQUE admins
./torque.setup: 39: qterm: not foundcan someone help me?