Jota37
January 2nd, 2009, 05:37 PM
Hi,
Excuse me for a little bit of advertising for my little program, which I hope might be useful to someone else. If you get to use it, please let me know of any bugs you might find or possible improvements to suggest.
The problem:
I do work in bioinformatics/genomics and keep dealing with large tables of information (e.g. long tabular BLAST reports, which not uncommonly can have hundreds of thousands of rows). Sometimes, I'm interested in knowing some simple statistics about a certain column, say average, or the median, maximum and minimum values, etc. I do almost all my work on the command-line (many times on a remote computer), so I'd rather stay there. And anyway, firing up a spreadsheet program and opening a file just to do this takes quite some time -- and the spreadsheet program almost always has a low limit on the number of rows it can handle.
So, I thought, there should be a command-line program in *nix systems already written to do this, right? Use "cut -f" to get the column you want, then pipe the output to a program that will give you, say, the average or whatever. After a lot of "apropos" use, I concluded there was no such program already included. And web searching revealed nothing that simple, at least as far as I could find. Lots of libraries and complex, interactive software, but not what I was looking for. Well, at least I didn't find it (if you know of it, please let me know!).
My solution:
Scratching my own itch, I wrote a simple program for dealing with these problems. It's called "average" (I chose that very unimaginative name since there was nothing named like that in our GNU/Linux or other *nix computers here at the uni, so name conflict is less likely), and it is available at Sourceforge (http://sourceforge.net/projects/average/), as usual.
Quick description:
Average is a simple and fast command-line Perl utility for calculating basic statistics on a list of numbers (one number per line of input, any non numerical data will be ignored), and it is licensed under the GPL v. 3. It works as a traditional Unix filter, and should work in any system where Perl is available. Average currently can: calculate the arithmetic mean; standard deviation; median; sum of all values; show the maximum and minimum values; and the total number of items (a.k.a "n"). The user can specify the number of decimal places to show in the results.
I hope anyone out there can find it useful. I use it all the time, and it saves me a lot of time. :-)
Thanks!
Excuse me for a little bit of advertising for my little program, which I hope might be useful to someone else. If you get to use it, please let me know of any bugs you might find or possible improvements to suggest.
The problem:
I do work in bioinformatics/genomics and keep dealing with large tables of information (e.g. long tabular BLAST reports, which not uncommonly can have hundreds of thousands of rows). Sometimes, I'm interested in knowing some simple statistics about a certain column, say average, or the median, maximum and minimum values, etc. I do almost all my work on the command-line (many times on a remote computer), so I'd rather stay there. And anyway, firing up a spreadsheet program and opening a file just to do this takes quite some time -- and the spreadsheet program almost always has a low limit on the number of rows it can handle.
So, I thought, there should be a command-line program in *nix systems already written to do this, right? Use "cut -f" to get the column you want, then pipe the output to a program that will give you, say, the average or whatever. After a lot of "apropos" use, I concluded there was no such program already included. And web searching revealed nothing that simple, at least as far as I could find. Lots of libraries and complex, interactive software, but not what I was looking for. Well, at least I didn't find it (if you know of it, please let me know!).
My solution:
Scratching my own itch, I wrote a simple program for dealing with these problems. It's called "average" (I chose that very unimaginative name since there was nothing named like that in our GNU/Linux or other *nix computers here at the uni, so name conflict is less likely), and it is available at Sourceforge (http://sourceforge.net/projects/average/), as usual.
Quick description:
Average is a simple and fast command-line Perl utility for calculating basic statistics on a list of numbers (one number per line of input, any non numerical data will be ignored), and it is licensed under the GPL v. 3. It works as a traditional Unix filter, and should work in any system where Perl is available. Average currently can: calculate the arithmetic mean; standard deviation; median; sum of all values; show the maximum and minimum values; and the total number of items (a.k.a "n"). The user can specify the number of decimal places to show in the results.
I hope anyone out there can find it useful. I use it all the time, and it saves me a lot of time. :-)
Thanks!