Results 1 to 9 of 9

Thread: sed help

  1. #1
    Join Date
    Sep 2008
    Location
    Italy
    Beans
    96
    Distro
    Ubuntu Studio 12.04 Precise Pangolin

    sed help

    I've been playing around with sed, can somebody help figure this out?

    I want to remove all text in an xml file between two <caption> tags. I've used the following command:
    Code:
    sed 's/<caption>[A-Za-z]*<\/caption>/<caption><\/caption>/g' file.xml >file2.xml
    This works, but only if there is one word between the tags. I've tried modifying the command to remove multiple words in various ways but to no success. any sed gurus out there have advice?

  2. #2
    Join Date
    Jul 2007
    Beans
    414
    Distro
    Xubuntu 13.04 Raring Ringtail

    Re: sed help

    Quote Originally Posted by AstroLlama View Post
    I've been playing around with sed, can somebody help figure this out?

    I want to remove all text in an xml file between two <caption> tags. I've used the following command:
    Code:
    sed 's/<caption>[A-Za-z]*<\/caption>/<caption><\/caption>/g' file.xml >file2.xml
    This works, but only if there is one word between the tags. I've tried modifying the command to remove multiple words in various ways but to no success. any sed gurus out there have advice?
    I think you might need a space in that A-Za-z part of the clause:

    Code:
    sed 's/<caption>[A-Z a-z]*<\/caption>/<caption><\/caption>/g' file.xml >file2.xml

  3. #3
    Join Date
    Jul 2009
    Location
    London
    Beans
    1,480
    Distro
    Ubuntu 10.10 Maverick Meerkat

    Re: sed help

    Hi,
    another possibility would be:
    Code:
    sed 's/<caption>[^<]*<\/caption>/<caption><\/caption>/g'
    which will handle cases where there are punctuation or other non-alphabetic characters in the caption tags

  4. #4
    Join Date
    May 2008
    Beans
    Hidden!

    Re: sed help

    Quote Originally Posted by AstroLlama View Post
    I've been playing around with sed, can somebody help figure this out?

    I want to remove all text in an xml file between two <caption> tags. I've used the following command:
    Code:
    sed 's/<caption>[A-Za-z]*<\/caption>/<caption><\/caption>/g' file.xml >file2.xml
    This works, but only if there is one word between the tags. I've tried modifying the command to remove multiple words in various ways but to no success. any sed gurus out there have advice?
    This works, too, and shortens the command a bit:
    Code:
    sed 's/\(<caption>\)[^<]*\(<\/caption>\)/\1\2/g' file.xml >file2.xml
    Using [^<]* will match any character up to the </caption> tag. Putting <caption> and <\/caption> inside ( and ) means you only need \1\2 in the replacement end.

  5. #5
    Join Date
    May 2008
    Beans
    Hidden!

    Re: sed help

    Quote Originally Posted by DaithiF View Post
    Hi,
    another possibility would be:
    Code:
    sed 's/<caption>[^<]*<\/caption>/<caption><\/caption>/g'
    which will handle cases where there are punctuation or other non-alphabetic characters in the caption tags
    Ah! I had the preview open while I went off and did something else and come back and someone posts what I had!

    I should start refreshing the post again before hitting submit.

    AND I should start editing my post so as not to post twice in a row...

  6. #6
    Join Date
    Jul 2006
    Location
    somewhere :)
    Beans
    535
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: sed help

    Quote Originally Posted by mobilediesel View Post
    This works, too, and shortens the command a bit:
    Code:
    sed 's/\(<caption>\)[^<]*\(<\/caption>\)/\1\2/g' file.xml >file2.xml
    Using [^<]* will match any character up to the </caption> tag. Putting <caption> and <\/caption> inside ( and ) means you only need \1\2 in the replacement end.
    one thing worth noting is that the command separator for sed doesn't have to be a '/', so you could instead write it like this:
    Code:
    sed 's:\(<caption\)[^<]*\(</caption>\):\1\2:g' file.xml > file2.xml
    which is a bit more legible
    there are 10 types of people in the world: those that understand binary and i don't know who the other F are.

  7. #7
    Join Date
    May 2008
    Beans
    Hidden!

    Re: sed help

    Quote Originally Posted by howlingmadhowie View Post
    one thing worth noting is that the command separator for sed doesn't have to be a '/', so you could instead write it like this:
    Code:
    sed 's:\(<caption\)[^<]*\(</caption>\):\1\2:g' file.xml > file2.xml
    which is a bit more legible
    I keep forgetting that for some reason. There are plenty of instances where using different separators make sed WAY easier to work with.

  8. #8
    Join Date
    Sep 2006
    Beans
    2,914

    Re: sed help

    Quote Originally Posted by AstroLlama View Post
    I've been playing around with sed, can somebody help figure this out?

    I want to remove all text in an xml file between two <caption> tags. I've used the following command:
    Code:
    sed 's/<caption>[A-Za-z]*<\/caption>/<caption><\/caption>/g' file.xml >file2.xml
    This works, but only if there is one word between the tags. I've tried modifying the command to remove multiple words in various ways but to no success. any sed gurus out there have advice?

    All the sed solutions provided thus far does not take care of multiline tags.
    so why don't you ditch sed and use awk instead.

    Code:
    # more file
    <tag1>
      i want text here
      <tag2>
        remove text here
      </tag2>
      you want text here too
    </tag1>
    
    $ awk -vRS='</tag2>' '{ gsub(/<tag2>.*/,"")}1' file
    <tag1>
      i want text here
    
    
      you want text here too
    </tag1>
    Best of all, use a proper XML parser if you can.
    Last edited by ghostdog74; July 24th, 2010 at 02:59 AM.

  9. #9
    Join Date
    Sep 2008
    Location
    Italy
    Beans
    96
    Distro
    Ubuntu Studio 12.04 Precise Pangolin

    Re: sed help

    thanks for the quick replies, studying these examples has proven to be a great lesson in how to use sed.

    DaithiF
    Quote Originally Posted by DaithiF View Post
    Code:
    sed 's/<caption>[^<]*<\/caption>/<caption><\/caption>/g'
    which will handle cases where there are punctuation or other non-alphabetic characters in the caption tags
    Quote Originally Posted by mobilediesel View Post
    This works, too, and shortens the command a bit:
    Code:
    sed 's/\(<caption>\)[^<]*\(<\/caption>\)/\1\2/g' file.xml >file2.xml
    Using [^<]* will match any character up to the </caption> tag. Putting <caption> and <\/caption> inside ( and ) means you only need \1\2 in the replacement end.
    mobilediesel, though your command suggestion is slightly longer, the use of the parenthesis and numbers has other interesting applications that are sure to be helpful when doing more complicated sed work in the future.

    thanks again!

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •