PDA

View Full Version : Deleting lines from a text file



jamefarm
April 9th, 2010, 04:57 PM
Hello, I have several text files that I need to clean up. Here's a few sample lines from the file.



HIGHLIGHTS WEBCASTS CONFERENCES SPLITS IPOs ECONOMIC EVENTS
SYMBOL COMPANY EPS ESTIMATE EPS ACTUAL PREV. YEAR ACTUAL Date/Time (ET)
BAC Bank of America Corporation 0.09 0.00 0.44 20100416
EVBS Eastern Virginia Bankshares 0.12 0.00 0.10 20100416
FHN First Horizon National Corporation -0.16 0.00 -0.37 20100416
GCI Gannett 0.41 0.00 0.25 20100416
GE General Electric 0.16 0.00 0.26 20100416
SYMBOL COMPANY EPS ESTIMATE EPS ACTUAL PREV. YEAR ACTUAL Date/Time (ET)
CDL Taurex Resources plc 0.00 0.00 0.00 20100416- 20100416
CDL Taurex Resources plc 0.00 0.00 0.00 20100416- 20100416
ACU Acme United 0.06 0.00 0.01 20100416- 20100416


What I need to do is delete the lines that begin with HIGHLIGHTS and SYMBOL (lines 1, 2, and 8 in the example)

I also need to delete all lines after the second instance of SYMBOL etc, such as lines 9, 10, and 11. These lines end with "<date>- <date>"

Any ideas?

gmargo
April 9th, 2010, 07:28 PM
:P Here's a super simple perl filter (http://en.wikipedia.org/wiki/Filter_%28Unix%29).


#!/usr/bin/perl -w
use strict;
use warnings;

while (<>)
{
next if /^HIGHLIGHTS/
|| /^SYMBOL/
|| /\s\d{8}\s*-\s*\d{8}\s*\z/s;
print;
}

diesch
April 9th, 2010, 09:07 PM
Using egrep is shorter here:

egrep -v '(^HIGHLIGHTS|^SYMBOL)|([0-9]+- [0-9]+$)'

drvista
April 9th, 2010, 09:12 PM
sed /HIGHLIGHTS/d textfile.txt

Trumpen
April 9th, 2010, 09:19 PM
Here is a fourth way involving awk:


awk '/SYMBOL/{x++} x<2 && !/^(HIGHLIGHTS|SYMBOL)/' file