Results 1 to 5 of 5

Thread: Run slurm in 18.04

  1. #1

    Run slurm in 18.04

    Hi,

    As TORQUE resource manager is no longer open source I decided to switch to SLURM. The installation instructions that you can Google are not up to date for 18.04, so I am posting the instructions here for reference:

    # Install munge and slurm:
    $ sudo apt install munge slurm-wlm
    # Open /usr/share/doc/slurmctld/slurm-wlm-configurator.easy.html in a browser and generate the configuration file in the browser.
    # I am using just one node, so I used the host name for the ControlMachine, the NodeName and the ClusterName.
    # The unit for RealMemory seems to be MB, so use 65536 for example if the node has 64GB.
    # However, my queue got stuck in status Draining due to Low Real Memory at my first attempt, so I did not specify RealMemory on my second attempt.
    # vi etc/slurm-llnl/slurm.conf and copy/paste the configuration file from the browser. My configuration file is:
    Code:
    # slurm.conf file generated by configurator easy.html.
    # Put this file on all nodes of your cluster.
    # See the slurm.conf man page for more information.
    #
    ControlMachine=<YOUR-HOST-NAME>
    #ControlAddr=
    #
    #MailProg=/bin/mail
    MpiDefault=none
    #MpiParams=ports=#-#
    ProctrackType=proctrack/pgid
    ReturnToService=1
    SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
    #SlurmctldPort=6817
    SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
    #SlurmdPort=6818
    SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
    SlurmUser=slurm
    #SlurmdUser=root
    StateSaveLocation=/var/lib/slurm-llnl/slurmctld
    SwitchType=switch/none
    TaskPlugin=task/none
    #
    #
    # TIMERS
    #KillWait=30
    #MinJobAge=300
    #SlurmctldTimeout=120
    #SlurmdTimeout=300
    #
    #
    # SCHEDULING
    FastSchedule=1
    SchedulerType=sched/builtin
    #SchedulerPort=7321
    SelectType=select/linear
    #
    #
    # LOGGING AND ACCOUNTING
    AccountingStorageType=accounting_storage/none
    ClusterName=<YOUR-HOST-NAME>
    #JobAcctGatherFrequency=30
    JobAcctGatherType=jobacct_gather/none
    #SlurmctldDebug=3
    SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
    #SlurmdDebug=3
    SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
    #
    #
    # COMPUTE NODES
    NodeName=<YOUR-HOST-NAME> CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN
    PartitionName=long Nodes=<YOUR-HOST-NAME> Default=YES MaxTime=INFINITE State=UP
    # Enable and start the manager slurmctld:
    $ sudo systemctl enable slurmctld
    $ sudo systemctl start slurmctld
    # Enable and start the agent slurmd:
    $ sudo systemctl enable slurmd
    $ sudo systemctl start slurmd
    # Check the status of the manager and the agent:
    $ sinfo
    Code:
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    long*        up   infinite      1   idle <YOUR-HOST-NAME>
    $ scontrol show node
    Code:
    NodeName=<YOUR-HOST-NAME> Arch=x86_64 CoresPerSocket=4   CPUAlloc=0 CPUErr=0 CPUTot=4 CPULoad=0.48
       AvailableFeatures=(null)
       ActiveFeatures=(null)
       Gres=(null)
       NodeAddr=<YOUR-HOST-NAME> NodeHostName=<YOUR-HOST-NAME> Version=17.11
       OS=Linux 4.15.0-38-generic #41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018 
       RealMemory=1 AllocMem=0 FreeMem=58769 Sockets=1 Boards=1
       State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
       Partitions=long 
       BootTime=2018-10-28T06:59:30 SlurmdStartTime=2018-10-28T07:03:34
       CfgTRES=cpu=4,mem=1M,billing=4
       AllocTRES=
       CapWatts=n/a
       CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
       ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
    # Create a shell script and make it executable:
    $ vi submit.sh
    Code:
    #!/bin/bash
    sleep 30
    env
    $ chmod +x submit.sh
    # Submit the shell script:
    $ sbatch submit.sh
    # Check the status of the cluster and the queue:
    $ sinfo
    Code:
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    long*        up   infinite      1  alloc <YOUR-HOST-NAME>
    $ squeue
    Code:
                 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                     5      long submit.s    <YOUR-USER-NAME>  R       0:19      1 <YOUR-HOST-NAME>
    # check for output after 30 seconds

    $ cat slurm-<JOBID>.out

    Regards,
    Gijsbert

  2. #2
    Join Date
    Apr 2019
    Beans
    1

    Re: Run slurm in 18.04

    Thanks for this! Got me running slurm on my new machine in 10 minutes.
    Would just like to add that if you have a gpu, you have to add GresTypes and Gres to the conf file. And also make a gres.conf file.

  3. #3
    Join Date
    Jun 2019
    Beans
    1

    Re: Run slurm in 18.04

    I thought this post is very simple and easy to novices. I'm trying to install slurm on my personal computer. (One computer or node, One board for cpu, and one gpu)
    Meanwhile, I encounter with a trouble when I wrote

    sudo systemctl start slurmd

    The message is

    Job for slurmd.service failed because the control process exited with error code.
    See "systemctl status slurmd.service" and "journalctl -xe" for details.

    So I typed the above commands,

    root@noki:/etc/slurm-llnl# systemctl status slurmd.service
    ● slurmd.service - Slurm node daemon
    Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
    Active: failed (Result: exit-code) since Mon 2019-06-03 01:33:15 KST; 19s ago
    Docs: man:slurmd(8)
    Process: 4164 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=1/FAILURE)


    Jun 03 01:33:15 noki systemd[1]: Starting Slurm node daemon...
    Jun 03 01:33:15 noki slurmd[4164]: fatal: Unable to determine this slurmd's NodeName
    Jun 03 01:33:15 noki systemd[1]: slurmd.service: Control process exited, code=exited status=1
    Jun 03 01:33:15 noki systemd[1]: slurmd.service: Failed with result 'exit-code'.
    Jun 03 01:33:15 noki systemd[1]: Failed to start Slurm node daemon.

    root@noki:/etc/slurm-llnl# journalctl -xe
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: GeForce 7 (G7x)
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: GeForce 8 (G8x)
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: GeForce GTX 200 (NVA0)
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: GeForce GTX 400 (NVC0)
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) modesetting: Driver for Modesetting Kernel Drivers: kms
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) FBDEV: driver for framebuffer: fbdev
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) VESA: driver for VESA chipsets: vesa
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) systemd-logind: releasing fd for 226:0
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Loading sub module "fb"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) LoadModule: "fb"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Loading /usr/lib/xorg/modules/libfb.so
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Module fb: vendor="X.Org Foundation"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: compiled for 1.20.1, module version = 1.0.0
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: ABI class: X.Org ANSI C Emulation, version 0.4
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Loading sub module "wfb"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) LoadModule: "wfb"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Loading /usr/lib/xorg/modules/libwfb.so
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Module wfb: vendor="X.Org Foundation"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: compiled for 1.20.1, module version = 1.0.0
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: ABI class: X.Org ANSI C Emulation, version 0.4
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Loading sub module "ramdac"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) LoadModule: "ramdac"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Module "ramdac" already built-in
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (WW) Falling back to old probe method for modesetting
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (WW) Falling back to old probe method for fbdev
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Loading sub module "fbdevhw"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) LoadModule: "fbdevhw"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Module fbdevhw: vendor="X.Org Foundation"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: compiled for 1.20.1, module version = 0.0.2
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: ABI class: X.Org Video Driver, version 24.0
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) NVIDIA(0): Creating default Display subsection in Screen section
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: "Default Screen Section" for depth/fbbpp 24/32
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (==) NVIDIA(0): Depth 24, (==) framebuffer bpp 32
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (==) NVIDIA(0): RGB weight 888
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (==) NVIDIA(0): Default visual is TrueColor
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Applying OutputClass "nvidia" options to /dev/dri/card0
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (**) NVIDIA(0): Option "AllowEmptyInitialConfiguration"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (**) NVIDIA(0): Enabling 2D acceleration
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Loading sub module "glxserver_nvidia"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) LoadModule: "glxserver_nvidia"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Loading /usr/lib/x86_64-linux-gnu/xorg/libglxserver_nvidia.so
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: compiled for 4.0.2, module version = 1.0.0
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: Module class: X.Org Server Extension
    Jun 03 01:03:51 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) NVIDIA GLX Module 415.27 Thu Dec 20 17:12:23 CST 2018
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:1:0:0
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): DFP-0
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): DFP-1
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): DFP-2
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): DFP-3
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): DFP-4
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): DFP-5
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): DFP-6 (boot)
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): DFP-7
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) NVIDIA(0): NVIDIA GPU GeForce GTX 1080 (GP104-A) at PCI:1:0:0 (GPU-0)
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): Memory: 8388608 kBytes
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(0): VideoBIOS: 86.04.60.00.f9
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (II) NVIDIA(0): Detected PCI Express Link width: 16X
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-0: disconnected
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-0: 330.0 MHz maximum pixel clock
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0):
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-1: disconnected
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-1: Internal TMDS
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-1: 165.0 MHz maximum pixel clock
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0):
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-2: disconnected
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-2: Internal DisplayPort
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-2: 1440.0 MHz maximum pixel clock
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0):
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-3: disconnected
    Jun 03 01:03:52 noki /usr/lib/gdm3/gdm-x-session[1120]: (--) NVIDIA(GPU-0): DFP-3: Internal TMDS

    What should I do?


    --------------------------------------------------------------------------------------------------------------------------------------------------

    I switched root to user noki and noki to root several times... And it might cause more problems now...

    sudo systemctl start slurmctld

    doesn't work too.
    Last edited by noki-lee; June 2nd, 2019 at 06:55 PM.

  4. #4
    Join Date
    Feb 2020
    Beans
    1

    Re: Run slurm in 18.04

    I have installed slurm according to your guidelines. I can run jobs in my own node. But now, I want to make a slurm cluster with other nodes that have been installed slurm like this. What should I do?

  5. #5
    Join Date
    Jul 2008
    Location
    The Left Coast of the USA
    Beans
    Hidden!
    Distro
    Kubuntu

    Re: Run slurm in 18.04

    @pmko

    This thread is going on a year old. It is unlikely that you will get an answer by hijacking it.

    Please start a new thread of your own. You may reference this thread in your new one.

    Closed. Rest In Peace.
    Please read The Forum Rules and The Forum Posting Guidelines

    A thing discovered and kept to oneself must be discovered time and again by others. A thing discovered and shared with others need be discovered only the once.
    This universe is crazy. I'm going back to my own.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •