Results 1 to 10 of 23

Thread: Script to tune mdadm raid5 (others too, maybe)

Hybrid View

  1. #1
    Join Date
    Apr 2010
    Beans
    11
    Distro
    Ubuntu 9.10 Karmic Koala

    Post Script to tune mdadm raid5 (others too, maybe)

    edit: new version of the script, you now have to do less adjustments. should be pretty save to run on almost any system. regards, alex

    I just wrote a script to improve my mdadm-raid's performance and felt like sharing. Maybe someone else can use it or get some inspiration from it.

    Happy tuning,
    Alex

    PS: do NOT just run it, depending on your configuration it might seriously screw up your system if you don't alter it accordingly!

    Code:
    #!/bin/bash
    ###############################################################################
    #  simple script to set some parameters to increase performance on a mdadm
    # raid5 or raid6. Ajust the ## parameters ##-section to your system!
    #
    #  WARNING: depending on stripesize and the number of devices the array might
    # use QUITE a lot of memory after optimization!
    #
    #  27may2010 by Alexander Peganz
    ###############################################################################
    
    
    ## parameters ##
    MDDEV=md51              # e.g. md51 for /dev/md51
    CHUNKSIZE=1024          # in kb
    BLOCKSIZE=4             # of file system in kb
    NCQ=disable             # disable, enable. ath. else keeps current setting
    NCQDEPTH=31             # 31 should work for almost anyone
    FORCECHUNKSIZE=true     # force max sectors kb to chunk size > 512
    DOTUNEFS=false          # run tune2fs, ONLY SET TO true IF YOU USE EXT[34]
    RAIDLEVEL=raid5         # raid5, raid6
    
    
    ## code ##
    # test for priviledges
    if [ "$(whoami)" != 'root' ]
    then
      echo $(date): Need to be root >> /data51/smbshare1/#tuneraid.log
      exit 1
    fi
    
    # set number of parity devices
    NUMPARITY=1
    if [[ $RAIDLEVEL == "raid6" ]]
    then
      NUMPARITY=2
    fi
    
    # get all devices
    DEVSTR="`grep \"^$MDDEV : \" /proc/mdstat` eol"
    while \
     [ -z "`expr match \"$DEVSTR\" '\(\<sd[a-z]1\[[12]\?[0-9]\]\((S)\)\? \)'`" ]
    do
      DEVSTR="`echo $DEVSTR|cut -f 2- -d \ `"
    done
    
    # get active devices list and spares list
    DEVS=""
    SPAREDEVS=""
    while [ "$DEVSTR" != "eol" ]; do
      CURDEV="`echo $DEVSTR|cut -f -1 -d \ `"
      if [ -n "`expr match \"$CURDEV\" '\(\<sd[a-z]1\[[12]\?[0-9]\]\((S)\)\)'`" ]
      then
        SPAREDEVS="$SPAREDEVS${CURDEV:2:1}"
      elif [ -n "`expr match \"$CURDEV\" '\(\<sd[a-z]1\[[12]\?[0-9]\]\)'`" ]
      then
        DEVS="$DEVS${CURDEV:2:1}"
      fi
      DEVSTR="`echo $DEVSTR|cut -f 2- -d \ `"
    done
    NUMDEVS=${#DEVS}
    NUMSPAREDEVS=${#SPAREDEVS}
    
    # test if number of devices makes sense
    if [ ${#DEVS} -lt $[1+$NUMPARITY] ]
    then
      echo $(date): Need more devices >> /data51/smbshare1/#tuneraid.log
      exit 1
    fi
    
    # set read ahead
    RASIZE=$[$NUMDEVS*($NUMDEVS-$NUMPARITY)*2*$CHUNKSIZE]   # in 512b blocks
    echo read ahead size per device: $RASIZE blocks \($[$RASIZE/2]kb\)
    MDRASIZE=$[$RASIZE*$NUMDEVS]
    echo read ahead size of array: $MDRASIZE blocks \($[$MDRASIZE/2]kb\)
    blockdev --setra $RASIZE /dev/sd[$DEVS]
    blockdev --setra $RASIZE /dev/sd[$SPAREDEVS]
    blockdev --setra $MDRASIZE /dev/$MDDEV
    
    # set stripe cache size
    STRCACHESIZE=$[$RASIZE/8]                               # in pages per device
    echo stripe cache size of devices: $STRCACHESIZE pages \($[$STRCACHESIZE*4]kb\)
    echo $STRCACHESIZE > /sys/block/$MDDEV/md/stripe_cache_size
    
    # set max sectors kb
    DEVINDEX=0
    MINMAXHWSECKB=$(cat /sys/block/sd${DEVS:0:1}/queue/max_hw_sectors_kb)
    until [ $DEVINDEX -ge $NUMDEVS ]
    do
      DEVLETTER=${DEVS:$DEVINDEX:1}
      MAXHWSECKB=$(cat /sys/block/sd$DEVLETTER/queue/max_hw_sectors_kb)
      if [ $MAXHWSECKB -lt $MINMAXHWSECKB ]
      then
        MINMAXHWSECKB=$MAXHWSECKB
      fi
      DEVINDEX=$[$DEVINDEX+1]
    done
    if [ $CHUNKSIZE -le $MINMAXHWSECKB ] &&
      ( [ $CHUNKSIZE -le 512 ] || [[ $FORCECHUNKSIZE == "true" ]] )
    then
      echo setting max sectors kb to match chunk size
      DEVINDEX=0
      until [ $DEVINDEX -ge $NUMDEVS ]
      do
        DEVLETTER=${DEVS:$DEVINDEX:1}
        echo $CHUNKSIZE > /sys/block/sd$DEVLETTER/queue/max_sectors_kb
        DEVINDEX=$[$DEVINDEX+1]
      done
      DEVINDEX=0
      until [ $DEVINDEX -ge $NUMSPAREDEVS ]
      do
        DEVLETTER=${SPAREDEVS:$DEVINDEX:1}
        echo $CHUNKSIZE > /sys/block/sd$DEVLETTER/queue/max_sectors_kb
        DEVINDEX=$[$DEVINDEX+1]
      done
    fi
    
    # enable/disable NCQ
    DEVINDEX=0
    if [[ $NCQ == "enable" ]] || [[ $NCQ == "disable" ]]
    then
      if [[ $NCQ == "disable" ]]
      then
        NCQDEPTH=1
      fi
      echo setting NCQ queue depth to $NCQDEPTH
      until [ $DEVINDEX -ge $NUMDEVS ]
      do
        DEVLETTER=${DEVS:$DEVINDEX:1}
        echo $NCQDEPTH > /sys/block/sd$DEVLETTER/device/queue_depth
        DEVINDEX=$[$DEVINDEX+1]
      done
      DEVINDEX=0
      until [ $DEVINDEX -ge $NUMSPAREDEVS ]
      do
        DEVLETTER=${SPAREDEVS:$DEVINDEX:1}
        echo $NCQDEPTH > /sys/block/sd$DEVLETTER/device/queue_depth
        DEVINDEX=$[$DEVINDEX+1]
      done
    fi
    
    # tune2fs
    if [[ $DOTUNEFS == "true" ]]
    then
      STRIDE=$[$CHUNKSIZE/$BLOCKSIZE]
      STRWIDTH=$[$CHUNKSIZE/$BLOCKSIZE*($NUMDEVS-$NUMPARITY)]
      echo setting stride to $STRIDE blocks \($CHUNKSIZEkb\)
      echo setting stripe-width to $STRWIDTH blocks \($[$STRWIDTH*$BLOCKSIZE]kb\)
      tune2fs -E stride=$STRIDE,stripe-width=$STRWIDTH /dev/$MDDEV
    fi
    
    # exit
    echo $(date): Success >> /data51/smbshare1/#tuneraid.log
    exit 0
    Last edited by apeganz; June 15th, 2010 at 04:51 PM.

  2. #2
    Join Date
    Jan 2006
    Beans
    2

    Re: Script to tune mdadm raid5 (others too, maybe)

    Thanks much for sharing your script. I've run it on my new Arch Linux system and it works great.

    I edited out:


    #blockdev --setra $RASIZE /dev/sd[$SPAREDEVS]


    As I have no spare devices; though I don't think that really mattered in the end anyway.

  3. #3
    Join Date
    Oct 2004
    Beans
    161

    Re: Script to tune mdadm raid5 (others too, maybe)

    Any benchmarks?
    Archlinux / Ubuntu.

  4. #4
    Join Date
    Aug 2007
    Beans
    53

    Re: Script to tune mdadm raid5 (others too, maybe)

    This worked for me with my seagate 2tb drives in a raid5 array.

    I did have to modify some lines in the script, the lines where it is expecting /dev/sd[a-z]1 in my case needed to be changed to /dev/sd[a-z]3 and then it ran.

    Before:
    Write speed 55 MB/s
    Read speed 115 MB/s

    After
    Write speed is 107 MB/s
    Read speed is 164 MB/s

    A significant improvement!

  5. #5
    Join Date
    May 2011
    Beans
    10

    Re: Script to tune mdadm raid5 (others too, maybe)

    I came across this post while searching for a way to change the stripe cache size and read ahead values in my RAID 5 array (10.4 desktop).

    A few questions:

    If I put this script in /etc/init.d and then execute an update rc.d defaults command, should this change these parameters every time the system boots up? I've tried to manually set them in the rc.local file, but it doesn't appear to work. I've read a few posts which inquire how to make these parameter changes permanent, and there don't seem to be any definitive solutions.

    How does one find out the chunk size and block size of one's array? Would the disk utility provide this info?

    And finally, what is the algorithm for setting these parameters in this script based upon?

    Thanks in advance for any answers to my queries,

    Nick

    PS Also thanks to the OP for sharing this script!

  6. #6
    Join Date
    Jul 2011
    Beans
    23

    Re: Script to tune mdadm raid5 (others too, maybe)

    This is a nice script, but it is lacking some documentation.

    1. How did you get to the 'formulas' to calculate the read ahead size for the component drives and the entire array?

      Code:
      RASIZE=$[$NUMDEVS*($NUMDEVS-$NUMPARITY)*2*$CHUNKSIZE]   # in 512b blocks
      echo read ahead size per device: $RASIZE blocks \($[$RASIZE/2]kb\)
      MDRASIZE=$[$RASIZE*$NUMDEVS]
      echo read ahead size of array: $MDRASIZE blocks \($[$MDRASIZE/2]kb\)
      This formula works great for me on a 7-disk RAID-6 array, but I'd like to understand it.

      I guess you wrote 512-byte blocks, because you assume that the logical sector size for all components is 512 bytes?
    2. The stripe cache size formula reduces the read and write performance of my array. I've checked the kernel documentation for MD but it doesn't give much information. For sequential read/write throughput I'm better off setting it to 32768. Are there any obvious disadvantages to setting it to the maximum value?
    3. What is 'set max sectors kb' about and is there a reason to not increase it above 512 kB (since you have a specific FORCECHUNKSIZE variable for that)?

  7. #7
    Join Date
    Nov 2007
    Beans
    187

    Re: Script to tune mdadm raid5 (others too, maybe)

    Quote Originally Posted by apeganz View Post
    edit: new version of the script, you now have to do less adjustments. should be pretty save to run on almost any system. regards, alex

    I just wrote a script to improve my mdadm-raid's performance and felt like sharing. Maybe someone else can use it or get some inspiration from it.

    Happy tuning,
    Alex

    PS: do NOT just run it, depending on your configuration it might seriously screw up your system if you don't alter it accordingly!
    Hi,

    I did a bit of modification to the original script to make it a bit more automatic and more usable for newbies like me.
    there is still some stuff that I don't understand but I think it's a bit more readable now.

    Use it at your own risk!

    Code:
    #!/bin/bash
    ###############################################################################
    #  simple script to set some parameters to increase performance on a mdadm
    # raid5 or raid6. Ajust the ## parameters ##-section to your system!
    #
    #  WARNING: depending on stripesize and the number of devices the array might
    # use QUITE a lot of memory after optimization!
    #
    #  27may2010 by Alexander Peganz
    #  18/01/2012 Alfonso made the script more verbose 
    ###############################################################################
    
    # use bash -x scriptname.sh to debug
    
    ## parameters ##
    LOGFILE=/tmp/tune_raid.log      # just a log file
    BLOCKSIZE=4                     # of file system in kb
                                    # this is a parameter of mkfs when you created your filesystem
                                    # e.g. mkfs.ext4 -b 4096 -E stride=16,stripe-width=32 /dev/md0
                                    # if you don't know your block size, run dumpe2fs -h /dev/md0 | grep "Block size"
    NCQ=disable                     # disable, enable. ath. else keeps current setting
                                    # to be honest I couldn't find any clear and short article
                                    # on NCQ but I saw all my disks are already disabled
                                    # so I'll keep this option disable
    NCQDEPTH=31                     # 31 should work for almost anyone
                                    # you only have to care about this if you want to enable NCQ
    FORCECHUNKSIZE=true             # force max sectors kb to chunk size > 512
                                    # I tried to figure out what this is... but I don't know...
    DOTUNEFS=false                  # run tune2fs, ONLY SET TO true IF YOU USE EXT[34]
    EXECUTE=false                   # if "true", run actual commands. Otherwise only inform you
    
    
    echo check $LOGFILE for messages in case of error.
    
    ## code ##
    # test for priviledges
    if [ "$(whoami)" != 'root' ]
    then
      echo $(date): Need to be root >> $LOGFILE
      exit 1
    fi
    
    if [ $EXECUTE == "true" ]
    then
      echo
      echo "************************************************"
      echo "You are about to make real changes to the system"
      echo "************************************************"
      echo
      read -p "Press [Enter] to continue or Ctrl-C to abort"
    fi
    
    # find out which one is your md
    # note that the script only works for one md. If you have more than one 
    # just uncomment the line below and type something like MDDEV=md0
    MDDEV="`cat /proc/mdstat | grep md | head -1 | awk '{print $1}'`"
    # MDDEV=md0
    
    if [ -z "$MDDEV" ]
    then
      echo $(date): Something wrong, I can\'t find any md >> $LOGFILE
      exit 1
    fi
    
    #
    # find out which RAID level
    #
    RAIDLEVEL="`mdadm --detail /dev/$MDDEV | grep raid | tr " " "\n" | grep raid`"
    if [ -z "$RAIDLEVEL" ]
    then
      echo $(date): Something wrong, I can\'t find which raidlevel you are using >> $LOGFILE
      exit 1
    fi
    if [ $RAIDLEVEL != "raid5" ] && [ $RAIDLEVEL != "raid6" ]
    then
      echo $(date): Something wrong, this script only works for raid5 and raid6 >> $LOGFILE
      exit 1
    fi
    
    #
    # find out your chunk size
    #
    # this expression takes the output of /proc/mdstat
    # then takes the line with chunk
    # then cuts it in many lines
    # then takes the line with chunk
    # then prints the first word
    # then removes all the letters and keeps the number
    # CAREFUL! CHECK IF YOUR /proc/mdstat REPORT CHUNK IN A DIFFERENT UNIT (e.g. mega instead of kilo)
    # this will BREAK the script
    CHUNKSIZE="`cat /proc/mdstat | grep chunk | tr "," "\n" | grep chunk | awk '{print $1}' | tr -d [a-z]`"
    
    # set number of parity devices
    NUMPARITY=1
    if [[ $RAIDLEVEL == "raid6" ]]
    then
      NUMPARITY=2
    fi
    
    #
    # get the letter of all NON spare devices from cat /proc/mdstat
    #
    # this expression takes the output of /proc/mdstat
    # then takes the line of our md
    # then changes spaces into new lines
    # then takes only lines starting with sd
    # then takes the lines without (S) at the end of the string (which are the NON spare disks)
    # then take the 3rd character (a for sda1, etc)
    # and then remove new lines to make a single string
    DEVS="`cat /proc/mdstat | grep $MDDEV | tr " " "\n" | grep '^sd' | grep -v \(S\)$ | awk '{print substr($0,3,1)}' | tr -d "\n"`"
    
    #
    # get the letter of all spare devices from cat /proc/mdstat
    #
    # this expression takes the output of /proc/mdstat
    # then takes the line of our md
    # then changes spaces into new lines
    # then takes only lines starting with sd
    # then takes the lines with (S) at the end of the string (which are the spare disks)
    # then take the 3rd character (a for sda1, etc)
    # and then remove new lines to make a single string
    SPAREDEVS="`cat /proc/mdstat | grep $MDDEV | tr " " "\n" | grep '^sd' | grep \(S\)$ | awk '{print substr($0,3,1)}' | tr -d "\n"`"
    
    NUMDEVS=${#DEVS}
    NUMSPAREDEVS=${#SPAREDEVS}
    
    # test if number of devices makes sense
    if [ ${#DEVS} -lt $[1+$NUMPARITY] ]
    then
      echo $(date): Need more devices >> $LOGFILE
      exit 1
    fi
    
    # set read ahead
    RASIZE=$[$NUMDEVS*($NUMDEVS-$NUMPARITY)*2*$CHUNKSIZE]   # in 512b blocks
    echo suggested read ahead size per device: $RASIZE blocks \($[$RASIZE/2]kb\)
    MDRASIZE=$[$RASIZE*$NUMDEVS]
    echo suggested read ahead size of array: $MDRASIZE blocks \($[$MDRASIZE/2]kb\)
    echo RUN blockdev --setra $RASIZE /dev/sd[$DEVS]
    echo your current value for readahead is `blockdev --getra /dev/sd[$DEVS]`
    if [ $EXECUTE == "true" ]
    then
      blockdev --setra $RASIZE /dev/sd[$DEVS]
    fi
    if [ $NUMSPAREDEVS -gt 0 ]
    then
      echo RUN blockdev --setra $RASIZE /dev/sd[$SPAREDEVS]
      echo your current value for readahead is `blockdev --getra /dev/sd[$SPAREDEVS]`
      if [ $EXECUTE == "true" ]
      then
        blockdev --setra $RASIZE /dev/sd[$SPAREDEVS]
      fi
    fi
    echo RUN blockdev --setra $MDRASIZE /dev/$MDDEV
    echo your current value for readahead is `blockdev --getra /dev/$MDDEV`
      if [ $EXECUTE == "true" ]
      then
        blockdev --setra $MDRASIZE /dev/$MDDEV
      fi
    
    # set stripe cache size
    STRCACHESIZE=$[$RASIZE/8]                               # in pages per device
    echo suggested stripe cache size of devices: $STRCACHESIZE pages \($[$STRCACHESIZE*4]kb\)
    echo RUN echo $STRCACHESIZE \> /sys/block/$MDDEV/md/stripe_cache_size
    echo current value of /sys/block/$MDDEV/md/stripe_cache_size is `cat /sys/block/$MDDEV/md/stripe_cache_size`
    if [ $EXECUTE == "true" ]
    then
      echo $STRCACHESIZE > /sys/block/$MDDEV/md/stripe_cache_size
    fi
    
    # set max sectors kb
    DEVINDEX=0
    MINMAXHWSECKB=$(cat /sys/block/sd${DEVS:0:1}/queue/max_hw_sectors_kb)
    until [ $DEVINDEX -ge $NUMDEVS ]
    do
      DEVLETTER=${DEVS:$DEVINDEX:1}
      MAXHWSECKB=$(cat /sys/block/sd$DEVLETTER/queue/max_hw_sectors_kb)
      if [ $MAXHWSECKB -lt $MINMAXHWSECKB ]
      then
        MINMAXHWSECKB=$MAXHWSECKB
      fi
      DEVINDEX=$[$DEVINDEX+1]
    done
    if [ $CHUNKSIZE -le $MINMAXHWSECKB ] &&
      ( [ $CHUNKSIZE -le 512 ] || [[ $FORCECHUNKSIZE == "true" ]] )
    then
      echo setting max sectors kb to match chunk size
      DEVINDEX=0
      until [ $DEVINDEX -ge $NUMDEVS ]
      do
        DEVLETTER=${DEVS:$DEVINDEX:1}
        echo RUN echo $CHUNKSIZE \> /sys/block/sd$DEVLETTER/queue/max_sectors_kb
        echo current value of /sys/block/sd$DEVLETTER/queue/max_sectors_kb is `cat /sys/block/sd$DEVLETTER/queue/max_sectors_kb`
        if [ $EXECUTE == "true" ]
        then
          echo $CHUNKSIZE > /sys/block/sd$DEVLETTER/queue/max_sectors_kb
        fi
        DEVINDEX=$[$DEVINDEX+1]
      done
      DEVINDEX=0
      until [ $DEVINDEX -ge $NUMSPAREDEVS ]
      do
        DEVLETTER=${SPAREDEVS:$DEVINDEX:1}
        echo RUN echo $CHUNKSIZE \> /sys/block/sd$DEVLETTER/queue/max_sectors_kb
        echo current value of /sys/block/sd$DEVLETTER/queue/max_sectors_kb is `cat /sys/block/sd$DEVLETTER/queue/max_sectors_kb`
        if [ $EXECUTE == "true" ]
        then
          echo $CHUNKSIZE > /sys/block/sd$DEVLETTER/queue/max_sectors_kb
        fi
        DEVINDEX=$[$DEVINDEX+1]
      done
    fi
    
    # enable/disable NCQ
    DEVINDEX=0
    if [[ $NCQ == "enable" ]] || [[ $NCQ == "disable" ]]
    then
      if [[ $NCQ == "disable" ]]
      then
        NCQDEPTH=1
      fi
      echo setting NCQ queue depth to $NCQDEPTH
      until [ $DEVINDEX -ge $NUMDEVS ]
      do
        DEVLETTER=${DEVS:$DEVINDEX:1}
        echo RUN echo $NCQDEPTH \> /sys/block/sd$DEVLETTER/device/queue_depth
        echo current value of /sys/block/sd$DEVLETTER/device/queue_depth is `cat /sys/block/sd$DEVLETTER/device/queue_depth`
        if [ $EXECUTE == "true" ]
        then
          echo $NCQDEPTH > /sys/block/sd$DEVLETTER/device/queue_depth
        fi
        DEVINDEX=$[$DEVINDEX+1]
      done
      DEVINDEX=0
      until [ $DEVINDEX -ge $NUMSPAREDEVS ]
      do
        DEVLETTER=${SPAREDEVS:$DEVINDEX:1}
        echo RUN echo $NCQDEPTH \> /sys/block/sd$DEVLETTER/device/queue_depth
        echo current value of /sys/block/sd$DEVLETTER/device/queue_depth is `cat /sys/block/sd$DEVLETTER/device/queue_depth`
        if [ $EXECUTE == "true" ]
        then
          echo $NCQDEPTH > /sys/block/sd$DEVLETTER/device/queue_depth
        fi
        DEVINDEX=$[$DEVINDEX+1]
      done
    fi
    
    # tune2fs
    if [[ $DOTUNEFS == "true" ]]
    then
      STRIDE=$[$CHUNKSIZE/$BLOCKSIZE]
      STRWIDTH=$[$CHUNKSIZE/$BLOCKSIZE*($NUMDEVS-$NUMPARITY)]
      echo setting stride to $STRIDE blocks \($CHUNKSIZE kb\)
      echo setting stripe-width to $STRWIDTH blocks \($[$STRWIDTH*$BLOCKSIZE] kb\)
      echo RUN tune2fs -E stride=$STRIDE,stripe-width=$STRWIDTH /dev/$MDDEV
      echo PLEASE NOTE: RUN tune2fs ONLY if you use ext3 or ext4
      if [ $EXECUTE == "true" ]
      then
        read -p "Press [Enter] to execute tune2fs or Ctrl-C to abort"
        tune2fs -E stride=$STRIDE,stripe-width=$STRWIDTH /dev/$MDDEV
      fi
    fi
    
    if [ $EXECUTE != "true" ]
    then
      echo
      echo PLEASE NOTE: this script did NOTHING! It simply informed you of your current parameters and suggested some changes.
      echo YOU HAVE TO run the commands suggested if you want to execute them or run again this script setting EXECUTE to true. 
    fi
    
    # exit
    echo $(date): Success >> $LOGFILE
    exit 0
    Use it at your own risk!


    EDIT: based on further tests, this tuning seem to degrade performances for my system.
    So finally I wrote my own script to tune settings.
    Last edited by alfonso78; January 28th, 2012 at 06:27 PM.

  8. #8
    Join Date
    Sep 2010
    Beans
    14

    Re: Script to tune mdadm raid5 (others too, maybe)

    Hey guys

    Thanks for all the good information. I have used this and what I could find elsewhere to tweak and benchmark my own system..

    To see my benchmarks, check it out here. I would be interested to see how my system compare to other systems.

    http://middoraid.blogspot.com.au/2013/01/tweaking.html

    let me know what you think...

    P.S. I also documented every step of the RAID assembly..

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •