Results 1 to 7 of 7

Thread: Sed: remove all but last consecutive matching line?

  1. #1
    Join Date
    Oct 2006
    Location
    Slovakia
    Beans
    590
    Distro
    Xubuntu 11.10 Oneiric Ocelot

    Sed: remove all but last consecutive matching line?

    Hello,

    I've got this problem: Whenever two or more lines matching "@@@@@" occur consecutively, I would like to remove all but the last one. Example file:

    Code:
    @@@@@ One
    Another line
    
    @@@@@ One
    @@@@@ Two
    @@@@@ Three
    @@@@@ Four
    Yet another line
    Desired outcome:

    Code:
    @@@@@ One
    Another line
    
    @@@@@ Four
    Yet another line
    I'd like to solve this in sed. Thanks very much in advance for any suggestion!
    Last edited by mahy; April 11th, 2011 at 08:58 AM. Reason: more precise example
    בראשית ברא אלהים את השמים ואת הארץ׃

  2. #2
    Join Date
    Sep 2010
    Beans
    62

    Re: Sed: remove all but last consecutive matching line?

    why sed only?
    Code:
    $ ruby -00 -ne '$_.gsub!(/^.*(@@@@@)/m,"\\1"); puts $_' file
    @@@@@ One
    Another line
    
    @@@@@ Four
    Yet another line
    or even awk

    Code:
    $ awk 'BEGIN{RS="";ORS="\n\n"}{ gsub(/.*@@@@@/,"@@@@@") }1'  file
    Last edited by kurum!; April 11th, 2011 at 09:22 AM.

  3. #3
    Join Date
    Oct 2006
    Location
    Slovakia
    Beans
    590
    Distro
    Xubuntu 11.10 Oneiric Ocelot

    Re: Sed: remove all but last consecutive matching line?

    Unfortunately, we don't have Ruby installed on our company servers, and, AFAIK, sed is generally faster than awk. The awk solution looks good though, have to try it out. THX.
    בראשית ברא אלהים את השמים ואת הארץ׃

  4. #4
    Join Date
    Oct 2006
    Location
    Slovakia
    Beans
    590
    Distro
    Xubuntu 11.10 Oneiric Ocelot

    Re: Sed: remove all but last consecutive matching line?

    I tried the AWK script and it has some problems:

    1.) All characters before "@@@@@" (not included in my example) in every matching line get deleted.
    2.) There is also a special line containing more than five @'s, that one should not be matched
    3.) All lines after the last line matching @@@@@ in the whole input file get deleted
    בראשית ברא אלהים את השמים ואת הארץ׃

  5. #5
    Join Date
    Sep 2010
    Beans
    62

    Re: Sed: remove all but last consecutive matching line?

    Quote Originally Posted by mahy View Post
    AFAIK, sed is generally faster than awk.
    that's so not true.

  6. #6
    Join Date
    Sep 2010
    Beans
    62

    Re: Sed: remove all but last consecutive matching line?

    Quote Originally Posted by mahy View Post
    I tried the AWK script and it has some problems:

    1.) All characters before "@@@@@" (not included in my example) in every matching line get deleted.
    2.) There is also a special line containing more than five @'s, that one should not be matched
    3.) All lines after the last line matching @@@@@ in the whole input file get deleted
    The awk script does not have problem. The problem lies with you. Why? I failed to comprehend why my example awk command would work if your provided examples contains discrepancies from what you have described. Does it not occur to you that you should provide clear examples to your question so that people would not waste time giving you answers that does not meet your requirement?

    To show you why it does work with your initial examples
    Code:
    $ cat file
    @@@@@ One
    Another line
    
    @@@@@ One
    @@@@@ Two
    @@@@@ Three
    @@@@@ Four
    Yet another line
    
    $ awk 'BEGIN{RS="";ORS="\n\n"}{ gsub(/.*@@@@@/,"@@@@@") }1'  file
    @@@@@ One
    Another line
    
    @@@@@ Four
    Yet another line

  7. #7
    Join Date
    Oct 2006
    Location
    Slovakia
    Beans
    590
    Distro
    Xubuntu 11.10 Oneiric Ocelot

    Re: Sed: remove all but last consecutive matching line?

    I'm sorry for wasting your time. I couldn't however post an actual file, because it's a confidential company stuff. I tried to make the example both simple and informative, but I forgot to take all options into consideration. Sorry for that.

    As for the sed vs. awk thing, I can't really comment on it. My colleagues told me to prefer sed, that's all.
    בראשית ברא אלהים את השמים ואת הארץ׃

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •