View Full Version : Perl parse tab del file
daveli
December 3rd, 2009, 05:24 PM
Dear experts:
I canīt find information on how to remove a heading from a tab_del text file after having read it in a Perl script. An example will be something like:
head -10 mydata.txt
plate_number: 1123479
user: jpt
scanner: 13_A
Obs d1 d2 d3 d4
1 13 12 9 17
2 4 0 -2 15
3 12 9 9 15
4 14 7 9 14
5 9 11 8 14
Notice the heading made of three lines of text plus a white line. How can I get the data in the columns into variables to do some stats with them?
Thanks very much for your help!
dave
myrtle1908
December 4th, 2009, 12:06 AM
You can create a hash of arrays where each data row is an array in the hash. There were no tabs in your data so I assumed space delimiter.
use strict;
use warnings;
my %dat;
while (<DATA>) {
next if $. < 5; # skip first 4 lines
next unless s/^(.*?)\s//; # grab the row number, i assumed this given it was incremental
$dat{$1} = [split]; # split remainder of line on space into an array
}
# do something with the data eg. print it
for my $row (sort keys %dat) {
print "Row $row: @{$dat{$row}}[0..3]\n";
}
__DATA__
plate_number: 1123479
user: jpt
scanner: 13_A
Obs d1 d2 d3 d4
1 13 12 9 17
2 4 0 -2 15
3 12 9 9 15
4 14 7 9 14
5 9 11 8 14
Yields
Row 1: 13 12 9 17
Row 2: 4 0 -2 15
Row 3: 12 9 9 15
Row 4: 14 7 9 14
Row 5: 9 11 8 14
Row Obs: d1 d2 d3 d4
You may not want the 'Obs' row in the data but you were not specific about this so it is there.
daveli
December 4th, 2009, 09:03 AM
Thanks a lot myrtle1908. Nearly there!
So that is a hash of arrays, right? I have to read into that deeper.
I get the idea of removing the heading, looking for the line of DATA that has 5 fields or columns (or more depending on the data). But then you store "rows" in the hash of arrays, and not "columns". That is my goal since I would like to do some statistics with the columns, for example mean(d1) or even see if d1 is statistically different from d4.
By the way, can you explain a bit further the substitution you make to get the row number? The Obs column is incremental this time, but it could be plain text, like names if it was a collection of data for patients for example.
next unless s/^(.*?)\s//
I would really appreciate your help here.
Best
ghostdog74
December 4th, 2009, 11:07 AM
since your header is always before the first blank line,
#!/usr/bin/perl
while(<>){
if ( $_=~ /^$/ ){ $f=1 ;next}
print if $f ;
}
myrtle1908
December 4th, 2009, 12:45 PM
I would like to do some statistics with the columns, for example mean(d1) or even see if d1 is statistically different from d4.
Given your data is basic tabular/csv I would use the DBD::CSV module. It allows you to query the data with SQL. Fairly certain the Perl SQL::Statement module supports some math functions although it has been a while since I got my hands dirty with this.
http://search.cpan.org/~hmbrand/DBD-CSV-0.26/lib/DBD/CSV.pm
By the way, can you explain a bit further the substitution you make to get the row number? The Obs column is incremental this time, but it could be plain text, like names if it was a collection of data for patients for example.
next unless s/^(.*?)\s//
Here we are simply grabbing (then removing) everything up until the first space character and using the resulting match ($1) as the hash key eg. 'Obs', '1', '2' etc.
Powered by vBulletin® Version 4.2.2 Copyright © 2024 vBulletin Solutions, Inc. All rights reserved.