Disclaimer :
* This is more some quick'n dirty notes than a real Howto, so feel free to dislike the presentation.
* This was done on dapper, it should work mostly on edgy but as I didn't play with it much yet I don't know if upstart (the new init system) is still configured that way (update-rc.d)
* I tend to consider people interested in job scheduling to be CLI friendly and able to know when to be root ... as I might forget to be that precise, please bear with me and feel free to comment on where it bothers you.
Background :
I am working on different clusters on a daily basis some of them I am in charge with. To configure those I am not using Ubuntu or any Debian based distro, I am mostly using Rocks cluster http://www.rocksclusters.org for its quick installation process.
Recently I put my hand on a 4-cores machine (2*dualcore), that I wanted to share with other people for small calculations (so no cluster here). I installed Ubuntu (my distro of choice for desktop - the machine being also used for visualisation of data) and I didn't find any job management system in the repositories (appart from cron (too basic) and drqueue (dedicated to 3D rendering)). I then searched for some Ubuntu howto and didn't find any, hence this post. Finally I searched for some linux job management tools for workstations (on master only) and didn't find any ... if anybody has heard of one I would be happy to know about it.
So I went back and used the beast I knew, I set up Torque/PBS with the upsetting feeling that I was hammering a nail with a sledgehammer.
Howto:
Installing Torque PBS on a workstation
( This is mostly following the quickstart guide : http://www.clusterresources.com/wiki...ickstart_guide)
* Get the latest torque tarball from http://www.clusterresources.com/downloads/torque/
* Compile and install it somewhere it won't bothers you
Code:
tar -xzvf torque.tar.gz
cd torque
./configure --prefix=/opt/local
make
make install
* Launch the setup tool (in the torque folder from the tarball) indicating an existing user name (Admin_User here)
Code:
torque.setup Admin_USER
(This launch pbs_server at the end)
* Quick'n dirty configuration for a 4 cpus workstation
(the torque executables should be in your path, if you used the same installation directory as I did you sould have hat the following line to your ~/.bashrc :
export PATH=$PATH:/opt/local/bin:/opt/local/sbin
)
(By default $(TORQUECFG)=/var/spool/torque )
Edit server_priv/nodes with your favourite editor :
Code:
vi server_priv/nodes
and add the following line :
(myworkstation : the name you gave to the machine
4 : the number of cpus)
Set the client server :
and add the following line :
Code:
$pbs_server = 127.0.0.1
Start the client daemon :
Restart the pbs server daemon :
Launch the scheduler daemon :
You can see the current Torque/PBS config :
Code:
qmgr -c "list server"
qmgr -c "list queue batch"
And set some options (here for a 4-core worstation) :
Code:
qmgr -c "set server query_other_jobs = True"
qmgr -c "set queue batch resources_max.ncpus=4
"
Finally you may want the 3 servers to be launched at boot time :
For that purpose, you need to create those 3 files (based on /etc/init.d/skeleton) :
/etc/init.d/pbs_server
/etc/init.d/pbs_mom
/etc/init.d/pbs_sched
###/etc/init.d/pbs_mom###
Code:
#! /bin/sh
### BEGIN INIT INFO
# Provides: skeleton
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Example initscript
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO
#
# Author: Miquel van Smoorenburg <miquels@cistron.nl>.
# Ian Murdock <imurdock@gnu.ai.mit.edu>.
#
# Please remove the "Author" lines above and replace them
# with your own name if you copy and modify this script.
#
# Version: @(#)skeleton 2.85-23 28-Jul-2004 miquels@cistron.nl
#
set -e
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin
DESC="PBS MOM Client Daemon"
NAME=pbs_mom
DAEMON=/opt/local/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0
# Read config file if it is present.
#if [ -r /etc/default/$NAME ]
#then
# . /etc/default/$NAME
#fi
#
# Function that starts the daemon/service.
#
d_start() {
start-stop-daemon --start --quiet --pidfile $PIDFILE \
--exec $DAEMON \
|| echo -n " already running"
}
#
# Function that stops the daemon/service.
#
d_stop() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME \
|| echo -n " not running"
}
#
# Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME --signal 1
}
case "$1" in
start)
echo -n "Starting $DESC: $NAME"
d_start
echo "."
;;
stop)
echo -n "Stopping $DESC: $NAME"
d_stop
echo "."
;;
#reload)
#
# If the daemon can reload its configuration without
# restarting (for example, when it is sent a SIGHUP),
# then implement that here.
#
# If the daemon responds to changes in its config file
# directly anyway, make this an "exit 0".
#
# echo -n "Reloading $DESC configuration..."
# d_reload
# echo "done."
#;;
restart|force-reload)
#
# If the "reload" option is implemented, move the "force-reload"
# option to the "reload" entry above. If not, "force-reload" is
# just the same as "restart".
#
echo -n "Restarting $DESC: $NAME"
d_stop
# One second might not be time enough for a daemon to stop,
# if this happens, d_start will fail (and dpkg will break if
# the package is being upgraded). Change the timeout if needed
# be, or change d_stop to have start-stop-daemon use --retry.
# Notice that using --retry slows down the shutdown process somewhat.
sleep 1
d_start
echo "."
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 3
;;
esac
exit 0
###/etc/init.d/pbs_sched###
Code:
#! /bin/sh
### BEGIN INIT INFO
# Provides: skeleton
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Example initscript
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO
#
# Author: Miquel van Smoorenburg <miquels@cistron.nl>.
# Ian Murdock <imurdock@gnu.ai.mit.edu>.
#
# Please remove the "Author" lines above and replace them
# with your own name if you copy and modify this script.
#
# Version: @(#)skeleton 2.85-23 28-Jul-2004 miquels@cistron.nl
#
set -e
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin
DESC="PBS Scheduler Daemon"
NAME=pbs_sched
DAEMON=/opt/local/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0
# Read config file if it is present.
#if [ -r /etc/default/$NAME ]
#then
# . /etc/default/$NAME
#fi
#
# Function that starts the daemon/service.
#
d_start() {
start-stop-daemon --start --quiet --pidfile $PIDFILE \
--exec $DAEMON \
|| echo -n " already running"
}
#
# Function that stops the daemon/service.
#
d_stop() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME \
|| echo -n " not running"
}
#
# Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME --signal 1
}
case "$1" in
start)
echo -n "Starting $DESC: $NAME"
d_start
echo "."
;;
stop)
echo -n "Stopping $DESC: $NAME"
d_stop
echo "."
;;
#reload)
#
# If the daemon can reload its configuration without
# restarting (for example, when it is sent a SIGHUP),
# then implement that here.
#
# If the daemon responds to changes in its config file
# directly anyway, make this an "exit 0".
#
# echo -n "Reloading $DESC configuration..."
# d_reload
# echo "done."
#;;
restart|force-reload)
#
# If the "reload" option is implemented, move the "force-reload"
# option to the "reload" entry above. If not, "force-reload" is
# just the same as "restart".
#
echo -n "Restarting $DESC: $NAME"
d_stop
# One second might not be time enough for a daemon to stop,
# if this happens, d_start will fail (and dpkg will break if
# the package is being upgraded). Change the timeout if needed
# be, or change d_stop to have start-stop-daemon use --retry.
# Notice that using --retry slows down the shutdown process somewhat.
sleep 1
d_start
echo "."
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 3
;;
esac
exit 0
###/etc/init.d/pbs_server###
Code:
#! /bin/sh
### BEGIN INIT INFO
# Provides: skeleton
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Example initscript
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO
#
# Author: Miquel van Smoorenburg <miquels@cistron.nl>.
# Ian Murdock <imurdock@gnu.ai.mit.edu>.
#
# Please remove the "Author" lines above and replace them
# with your own name if you copy and modify this script.
#
# Version: @(#)skeleton 2.85-23 28-Jul-2004 miquels@cistron.nl
#
set -e
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin
DESC="PBS Server"
NAME=pbs_server
DAEMON=/opt/local/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0
# Read config file if it is present.
#if [ -r /etc/default/$NAME ]
#then
# . /etc/default/$NAME
#fi
#
# Function that starts the daemon/service.
#
d_start() {
start-stop-daemon --start --quiet --pidfile $PIDFILE \
--exec $DAEMON \
|| echo -n " already running"
}
#
# Function that stops the daemon/service.
#
d_stop() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME \
|| echo -n " not running"
}
#
# Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
start-stop-daemon --stop --quiet --pidfile $PIDFILE \
--name $NAME --signal 1
}
case "$1" in
start)
echo -n "Starting $DESC: $NAME"
d_start
echo "."
;;
stop)
echo -n "Stopping $DESC: $NAME"
d_stop
echo "."
;;
#reload)
#
# If the daemon can reload its configuration without
# restarting (for example, when it is sent a SIGHUP),
# then implement that here.
#
# If the daemon responds to changes in its config file
# directly anyway, make this an "exit 0".
#
# echo -n "Reloading $DESC configuration..."
# d_reload
# echo "done."
#;;
restart|force-reload)
#
# If the "reload" option is implemented, move the "force-reload"
# option to the "reload" entry above. If not, "force-reload" is
# just the same as "restart".
#
echo -n "Restarting $DESC: $NAME"
d_stop
# One second might not be time enough for a daemon to stop,
# if this happens, d_start will fail (and dpkg will break if
# the package is being upgraded). Change the timeout if needed
# be, or change d_stop to have start-stop-daemon use --retry.
# Notice that using --retry slows down the shutdown process somewhat.
sleep 1
d_start
echo "."
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 3
;;
esac
exit 0
Now update the rc's :
Code:
update-rc.d pbs_server defaults 95
update-rc.d pbs_mom defaults 96
update-rc.d pbs_sched defaults 97
And you are done.
A sample script for qsub using lam/mpi would be :
Code:
#!/bin/bash
#PBS -l ncpus=4
echo $PBS_JOBID
echo "Start time :"
date
lamboot
mpirun -np 4 your_mpi_command
echo "End Time :"
date
lamclean
lamhalt
HIH
Bookmarks