Deleting lines from a text file [Archive]

View Full Version : Deleting lines from a text file

jamefarm

April 9th, 2010, 04:57 PM

Hello, I have several text files that I need to clean up. Here's a few sample lines from the file.

HIGHLIGHTS WEBCASTS CONFERENCES SPLITS IPOs ECONOMIC EVENTS
SYMBOL COMPANY EPS ESTIMATE EPS ACTUAL PREV. YEAR ACTUAL Date/Time (ET)
BAC Bank of America Corporation 0.09 0.00 0.44 20100416
EVBS Eastern Virginia Bankshares 0.12 0.00 0.10 20100416
FHN First Horizon National Corporation -0.16 0.00 -0.37 20100416
GCI Gannett 0.41 0.00 0.25 20100416
GE General Electric 0.16 0.00 0.26 20100416
SYMBOL COMPANY EPS ESTIMATE EPS ACTUAL PREV. YEAR ACTUAL Date/Time (ET)
CDL Taurex Resources plc 0.00 0.00 0.00 20100416- 20100416
CDL Taurex Resources plc 0.00 0.00 0.00 20100416- 20100416
ACU Acme United 0.06 0.00 0.01 20100416- 20100416

What I need to do is delete the lines that begin with HIGHLIGHTS and SYMBOL (lines 1, 2, and 8 in the example)

I also need to delete all lines after the second instance of SYMBOL etc, such as lines 9, 10, and 11. These lines end with "<date>- <date>"

Any ideas?

gmargo

April 9th, 2010, 07:28 PM

:P Here's a super simple perl filter (http://en.wikipedia.org/wiki/Filter_%28Unix%29).

#!/usr/bin/perl -w
use strict;
use warnings;

while (<>)
{
next if /^HIGHLIGHTS/
|| /^SYMBOL/
|| /\s\d{8}\s*-\s*\d{8}\s*\z/s;
print;
}

diesch

April 9th, 2010, 09:07 PM

Using egrep is shorter here:

egrep -v '(^HIGHLIGHTS|^SYMBOL)|([0-9]+- [0-9]+$)'

drvista

April 9th, 2010, 09:12 PM

sed /HIGHLIGHTS/d textfile.txt

Trumpen

April 9th, 2010, 09:19 PM

Here is a fourth way involving awk:

awk '/SYMBOL/{x++} x<2 && !/^(HIGHLIGHTS|SYMBOL)/' file