Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: URGENT: UNIX command line to solve the median problem.

  1. #1
    Join Date
    Feb 2009
    Beans
    6

    URGENT: UNIX command line to solve the median problem.

    My input:
    10
    15
    20

    My desired output:
    Median = 15


    **From the three figures in above, I know that 15 is the median. How can I use the UNIX command line to find it out automatically? My senior is advised me can use the 'awk' command line and the 'if' 'else' condition at the same time. Hope any UNIX expert can help me solve this problem and give me correct command line that I should used. Really thanks a lot for your help.

  2. #2
    Join Date
    Feb 2007
    Location
    Romania
    Beans
    Hidden!

    Re: URGENT: UNIX command line to solve the median problem.

    Homework?

    Yes, you can solve it with awk. You don't need if/else statements.

    http://www.gnu.org/software/gawk/manual/gawk.html

    The arithmetic mean of a list of numbers is the sum of all of the list divided by the number of items in the list.

  3. #3
    Join Date
    Oct 2005
    Location
    Albany, NY
    Beans
    842
    Distro
    Ubuntu

    Re: URGENT: UNIX command line to solve the median problem.

    Quote Originally Posted by sisco311 View Post
    Homework?
    The arithmetic mean of a list of numbers is the sum of all of the list divided by the number of items in the list.

    Although your math is perfect, the OP is looking for the median, not the mean.

    I don't use awk unless I absolutely have to. There are a zillion ways to force a computer to find the median. OpenOffice.org Calc can certainly do this, although it's hardly a unix command line application. It's not overly difficult to write a simple script in python to find the median value, but you do have to be careful. Being lazy, I would not bother writing a script to calculate the median value of a list. Instead, I would use a program like R to calculate the median. Another option is PSPP. Since you specifically asked for "command-line", I'll show you how to do this in R. First, we will need to install it.
    Code:
    sudo apt-get install r-recommended
    You can start R with the command . . . . . "R" at the command prompt. The following syntax will return the median value in a list.
    > options(STERM='iESS', editor='emacsclient')
    > example <- c(10, 15, 25)
    > median(example)
    [1] 15
    Let's break this down. The first line isn't necessary. It just tells you that I use Emacs to interact with R. Lines 2, 3 are actual R syntax. Row 4 is R telling me that 15 is the median.

    Of course, your example is the easiest possible case, because it has an odd number of elements. Here's another example.
    > example <- c(10, 15, 25, 17)
    > median(example)
    [1] 16
    Median is a funny thing. Used correctly it _can_ be a useful measure of central tendency. But, it can also be kinda funny and mis-leading. As a policy analyst, I often use it to compare things like income, because it is not affected dramatically by outliers in the same way the mean is.

    If this is for a school assignment, I would be curious to learn where / what class. I do enjoy it when professors ask their students to use command-line tools. For extra credit you can sit down tomorrow and show me how to calculate the median in PostgreSQL or MySQL.
    Please Insert Funny Statement Here.

  4. #4
    Join Date
    Mar 2007
    Beans
    268

    Re: URGENT: UNIX command line to solve the median problem.

    ... and I have saved many weeks (months) of work over my life by using the command line extensively. Who wants to open OOcalc or R every time a small job is needed? And especially if one has 100s of files to process, which I often do.

    I must admit didn't spot an elegant solution yet to this problem with awk though, but as usual one can adopt the unix way and string together small commands. I use csh, and the following works for a data-file with an odd number of lines:

    set n=`wc input_file` && sort -n input_file | awk '{if(NR==(N+1)/2) print }' N=$n[1]

    First $n is set to the number of lines in the file. Actually to "3 input_file", so we need $n[1] to extract the number 3. (I don't know bash well enough to get that, but I presume this is trivial).

    I added sort -n in case the data are not in numerical order. If you have to cope with both odd and even number of lines, you will need an if, else construction in awk. Then it starts getting messier, and I would have written a perl/python script for that.

    If you know you have 3 lines, then it is trivial:

    awk '{if(NR=2) print }' input_file

    (Beat that for speed, R, OOcalc!)

    The above example is slightly messy I admit, but awk is often great for small jobs, e.g. to find the sum of the numbers in 3rd column of a file:

    awk '{ tot += $3 }END{ print tot }' input_file

    To find the max:

    awk 'BEGIN{max=-999e19} {if($3>max) max=$3}END{print max}' input_file

    etc.
    Last edited by xadder; February 4th, 2009 at 12:33 PM.

  5. #5
    Join Date
    Mar 2007
    Location
    Finland
    Beans
    256
    Distro
    Ubuntu 9.10 Karmic Koala

    Re: URGENT: UNIX command line to solve the median problem.

    Great examples xadder! I agtee with you that command line saves you a lot of time for small jobs like this, eventhough I do use R for bigger problems.

    Patrick, this thread has exactly what you need, its a command line software written in Perl just for your purpose (the name of the tool is average, but it can also calculate the median):
    http://ubuntuforums.org/showthread.php?t=1028667

  6. #6
    Join Date
    Oct 2005
    Location
    Albany, NY
    Beans
    842
    Distro
    Ubuntu

    Re: URGENT: UNIX command line to solve the median problem.

    R is easy to script and I often use it for small jobs simply because it's easy to use. I prefer to re-use tested code rather than over-think easy problems. I prefer:

    Code:
    median(k)
    over
    Code:
    set n=`wc input_file` && sort -n input_file | awk '{if(NR==(N+1)/2) print  }' N=$n[1]
    There is also the added advantage that I know R correctly handles lists with an even number of elements. With ad-hoc solutions (solutions not included in a distributed program or library) I would first have to read / understand the code to be sure that it operated correctly.

    If R is simply too much effort, I would also recommend googling for numpy, which can also calculate the median. I didn't think of it last night when I wrote my original reply. Since it is a python library, it might appeal more to people who believe R is too difficult to script (although I don't understand this at all). There is probably something similar for perl but I don't know perl very well.

    EDIT: Regarding the thread referred to by ahmatti, I find it interesting that most of the responses are examples of how to use R to do the same thing. But, I want to thank ahmatti for posting this thread here, because I have now learned about the "little r" program, which is designed to help you use R in a bash script. The number of uses to this little trick are incredible.
    Last edited by gunksta; February 4th, 2009 at 11:04 PM.
    Please Insert Funny Statement Here.

  7. #7
    Join Date
    Feb 2009
    Beans
    6

    Re: URGENT: UNIX command line to solve the median problem.

    Quote Originally Posted by xadder View Post
    ... and I have saved many weeks (months) of work over my life by using the command line extensively. Who wants to open OOcalc or R every time a small job is needed? And especially if one has 100s of files to process, which I often do.

    I must admit didn't spot an elegant solution yet to this problem with awk though, but as usual one can adopt the unix way and string together small commands. I use csh, and the following works for a data-file with an odd number of lines:

    set n=`wc input_file` && sort -n input_file | awk '{if(NR==(N+1)/2) print }' N=$n[1]

    First $n is set to the number of lines in the file. Actually to "3 input_file", so we need $n[1] to extract the number 3. (I don't know bash well enough to get that, but I presume this is trivial).

    I added sort -n in case the data are not in numerical order. If you have to cope with both odd and even number of lines, you will need an if, else construction in awk. Then it starts getting messier, and I would have written a perl/python script for that.

    If you know you have 3 lines, then it is trivial:

    awk '{if(NR=2) print }' input_file

    (Beat that for speed, R, OOcalc!)

    The above example is slightly messy I admit, but awk is often great for small jobs, e.g. to find the sum of the numbers in 3rd column of a file:

    awk '{ tot += $3 }END{ print tot }' input_file

    To find the max:

    awk 'BEGIN{max=-999e19} {if($3>max) max=$3}END{print max}' input_file

    etc.
    The command line you given can't find the median, max and total sum. You got any other better suggestion? Anywhere thanks for your advise. Have a nice day.

  8. #8
    Join Date
    Feb 2009
    Beans
    6

    Re: URGENT: UNIX command line to solve the median problem.

    Quote Originally Posted by sisco311 View Post
    Homework?

    Yes, you can solve it with awk. You don't need if/else statements.

    http://www.gnu.org/software/gawk/manual/gawk.html

    The arithmetic mean of a list of numbers is the sum of all of the list divided by the number of items in the list.
    You got any other better way to find out the median? Thanks for your advice. Hope can receive your reply soon.

  9. #9
    Join Date
    Feb 2009
    Beans
    6

    Re: URGENT: UNIX command line to solve the median problem.

    Quote Originally Posted by gunksta View Post
    Although your math is perfect, the OP is looking for the median, not the mean.

    I don't use awk unless I absolutely have to. There are a zillion ways to force a computer to find the median. OpenOffice.org Calc can certainly do this, although it's hardly a unix command line application. It's not overly difficult to write a simple script in python to find the median value, but you do have to be careful. Being lazy, I would not bother writing a script to calculate the median value of a list. Instead, I would use a program like R to calculate the median. Another option is PSPP. Since you specifically asked for "command-line", I'll show you how to do this in R. First, we will need to install it.
    Code:
    sudo apt-get install r-recommended
    You can start R with the command . . . . . "R" at the command prompt. The following syntax will return the median value in a list.
    > options(STERM='iESS', editor='emacsclient')
    > example <- c(10, 15, 25)
    > median(example)
    [1] 15
    Let's break this down. The first line isn't necessary. It just tells you that I use Emacs to interact with R. Lines 2, 3 are actual R syntax. Row 4 is R telling me that 15 is the median.

    Of course, your example is the easiest possible case, because it has an odd number of elements. Here's another example.
    > example <- c(10, 15, 25, 17)
    > median(example)
    [1] 16
    Median is a funny thing. Used correctly it _can_ be a useful measure of central tendency. But, it can also be kinda funny and mis-leading. As a policy analyst, I often use it to compare things like income, because it is not affected dramatically by outliers in the same way the mean is.

    If this is for a school assignment, I would be curious to learn where / what class. I do enjoy it when professors ask their students to use command-line tools. For extra credit you can sit down tomorrow and show me how to calculate the median in PostgreSQL or MySQL.
    gunksta,
    Do you have any better easier way to find out the median? My teacher only allow us to use the command line or type the program in script to find out the median. I will appreciate for your advice.

  10. #10
    Join Date
    Feb 2009
    Beans
    6

    Re: URGENT: UNIX command line to solve the median problem.

    Quote Originally Posted by ahmatti View Post
    Great examples xadder! I agtee with you that command line saves you a lot of time for small jobs like this, eventhough I do use R for bigger problems.

    Patrick, this thread has exactly what you need, its a command line software written in Perl just for your purpose (the name of the tool is average, but it can also calculate the median):
    http://ubuntuforums.org/showthread.php?t=1028667
    Ahmatti,
    You can type out the perl that you mention? Sorry I just start to learn the UNIX. Still very fresh for this operating system. That's why feel very stress when given this homework by my lecturer. Hope you can give me a better suggestion. Thanks a lot.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •