Results 1 to 8 of 8

Thread: shell script to delete before/after a pattern

  1. #1
    Join Date
    Aug 2009
    Beans
    105

    shell script to delete before/after a pattern

    I need to delete n no. of lines above a PATTERN and m no of lines below the PATTERN
    I tried using sed
    Code:
    sed '/PATTERN/,+12d' old.xml> new.xm
    But this deletes after the pattern, I need something to delete before the pattern also.

    Please suggest

  2. #2
    Join Date
    Feb 2007
    Location
    Romania
    Beans
    Hidden!
    Distro
    Ubuntu Development Release

    Re: shell script to delete before/after a pattern

    Sounds like your homework...

    Anyway, I'd use ed: http://wiki.bash-hackers.org/howto/edit-ed

    Something like:
    Code:
    ed -s file <<< $'g/PATTERN/-X,.d\n,p'
    ed -s file <<< $'g/PATTERN/.,+Xd\d,p'
    I will let you figure out how to combine the commands in a single one and how to write to the file directly instead of printing the result to stdout.

  3. #3
    Join Date
    Aug 2009
    Beans
    105

    Re: shell script to delete before/after a pattern

    well i am too grown up for a homework..but yes its related to my work.. i have to analize around 100,000 xml files and need to delete a specific set of lines from them, thus need a script..
    i have used sed for deleting after a pattern, but couldnt find anythin useful for delete n lines before a pattern

  4. #4
    Join Date
    May 2007
    Location
    USA
    Beans
    318
    Distro
    Kubuntu 8.04 Hardy Heron

    Re: shell script to delete before/after a pattern

    I whipped up an AWK program which I think does what you want. No guarantees though.

    Code:
    #! /usr/bin/awk -f
    
    # Deletes n lines before pattern and m lines after.
    # It is your own responsibility to evaluate the fitness of this program.
    
    BEGIN {
        # set the following initializers as you please
        # FS=" "
        # OFS=" "
        n=5
        m=3
    }
    
    /line 11/ { # insert your desired regex "pattern" between the slashes
        pattern_line=NR
        filename=FILENAME
        exit
    }
    
    END {
        while (getline < filename) {
    	++line
    	if (line < pattern_line - n) print
    	if (line == pattern_line)    print
    	if (line > pattern_line + m) print
        }
        close(filename)
    }
    Here's the input file I fed it.

    Code:
    $ cat data.xml
    line 1
    line 2
    line 3
    line 4
    line 5
    line 6
    line 7
    line 8
    line 9
    line 10
    line 11
    line 12
    line 13
    line 14
    line 15
    line 16
    line 17
    line 18
    line 19
    line 20
    $
    Here's the output produced.

    Code:
    $ ./del-n-before-m-after.awk data.xml
    line 1
    line 2
    line 3
    line 4
    line 5
    line 11
    line 15
    line 16
    line 17
    line 18
    line 19
    line 20
    $
    man awk

    If the program doesn't work for you then it would help to know which AWK you have.

    Code:
    awk -Wversion
    Last edited by Telengard C64; December 20th, 2011 at 07:58 AM.

  5. #5
    Join Date
    May 2006
    Beans
    1,787

    Re: shell script to delete before/after a pattern

    Quote Originally Posted by Blackbug View Post
    well i am too grown up for a homework..but yes its related to my work.. i have to analize around 100,000 xml files and need to delete a specific set of lines from them, thus need a script..
    i have used sed for deleting after a pattern, but couldnt find anythin useful for delete n lines before a pattern
    Are you sure your XML files are in such a well-defined format that you can use standard text tools on them? It may be better to use XML tools like xslt.

  6. #6
    Join Date
    Aug 2009
    Beans
    105

    Re: shell script to delete before/after a pattern

    Quote Originally Posted by sisco311 View Post
    Sounds like your homework...

    Anyway, I'd use ed: http://wiki.bash-hackers.org/howto/edit-ed

    Something like:
    Code:
    ed -s file <<< $'g/PATTERN/-X,.d\n,p'
    ed -s file <<< $'g/PATTERN/.,+Xd\d,p'
    I will let you figure out how to combine the commands in a single one and how to write to the file directly instead of printing the result to stdout.

    Thanks for the help i used "ed" for deleting before pattern.
    My script is not an idle way to do things, but somehow it worked and i removed the necessary elements from 100000 xml files.

    Code:
     
    FILENAME=$1
    TEMP_FILE="$FILENAME.temp"
    TEMP_FILE1="$FILENAME.temp1"
    sed '/<PATTERN>/ i\TEMP' $FILENAME >$TEMP_FILE
    ed -s $TEMP_FILE <<< $'g/TEMP/-91,.d\n,p' > $TEMP_FILE1
    sed '/<PATTERN>/,+12d' $TEMP_FILE1>"$FILENAME.final"
    rm $TEMP_FILE $TEMP_FILE1
    I was short of time so didnt considered best way, now will optimize it.

    Thanks for suggestions
    Last edited by Blackbug; December 20th, 2011 at 10:27 AM.

  7. #7
    Join Date
    Aug 2009
    Beans
    105

    Re: shell script to delete before/after a pattern

    Quote Originally Posted by Telengard C64 View Post
    I whipped up an AWK program which I think does what you want. No guarantees though.

    Code:
    #! /usr/bin/awk -f
     
    # Deletes n lines before pattern and m lines after.
    # It is your own responsibility to evaluate the fitness of this program.
     
    BEGIN {
        # set the following initializers as you please
        # FS=" "
        # OFS=" "
        n=5
        m=3
    }
     
    /line 11/ { # insert your desired regex "pattern" between the slashes
        pattern_line=NR
        filename=FILENAME
        exit
    }
     
    END {
        while (getline < filename) {
        ++line
        if (line < pattern_line - n) print
        if (line == pattern_line)    print
        if (line > pattern_line + m) print
        }
        close(filename)
    }
    Here's the input file I fed it.

    Code:
    $ cat data.xml
    line 1
    line 2
    line 3
    line 4
    line 5
    line 6
    line 7
    line 8
    line 9
    line 10
    line 11
    line 12
    line 13
    line 14
    line 15
    line 16
    line 17
    line 18
    line 19
    line 20
    $
    Here's the output produced.

    Code:
    $ ./del-n-before-m-after.awk data.xml
    line 1
    line 2
    line 3
    line 4
    line 5
    line 11
    line 15
    line 16
    line 17
    line 18
    line 19
    line 20
    $
    man awk

    If the program doesn't work for you then it would help to know which AWK you have.

    Code:
    awk -Wversion
    Thanks for your script it was really nice and useful but somehow the xml tags in my files werent happy about it and was giving some errors, didnt had time to solve the issue so just opted for the workaround posted above.

    Thanks anyway

  8. #8
    Join Date
    Feb 2007
    Location
    Romania
    Beans
    Hidden!
    Distro
    Ubuntu Development Release

    Re: shell script to delete before/after a pattern

    Quote Originally Posted by Arndt View Post
    It may be better to use XML tools like xslt.
    +1

    You can't realistically parse tag-based markup languages like HTML and XML using Bash or utilities such as grep, sed or cut.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •