Results 1 to 9 of 9

Thread: How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a file.

  1. #1
    Join Date
    Jul 2012
    Beans
    31

    How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a file.

    Hello folks,
    Again have hundreds of large file, from where i want to delete all internal and external link with their texts. Any idea about how can i do so with SED or ant other utility in commandline or script.

    Basically i have to delete
    Code:
    <a href=(maybe a no/ letter/ sign)</a>
    . The bracketed portion is highly changeable so looking for a sophisticated wildcard actually.

    Thanks for your time.

  2. #2
    Join Date
    May 2009
    Location
    Courtenay, BC, Canada
    Beans
    1,661

    Re: How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a fi

    Code:
    sed -ie 's|<[a-zA-Z] href.*\/[a-zA-Z]>||' fileOrFiles

  3. #3
    Join Date
    Jul 2012
    Beans
    31

    Re: How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a fi

    I think i have made a little mistake. Sorry. Actually the string patterns are
    Code:
    <a href="#[1-9]|[1-4][0-9]|50">See [1-9]|[1-4][0-9]|50</a>
    and
    Code:
    <a href="#[1-9]|[1-4][0-9]|50">See [1-9]|[1-4][0-9]|50</a> and <a href="#[1-9]|[1-4][0-9]|50"> [1-9]|[1-4][0-9]|50</a>
    sed have to delete both the 1st one and the 2nd one.

    Thanks guys for your time n help.

  4. #4
    Join Date
    May 2009
    Location
    Courtenay, BC, Canada
    Beans
    1,661

    Re: How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a fi

    Quote Originally Posted by HiImTye View Post
    Code:
    sed -ie 's|<[a-zA-Z] href.*\/[a-zA-Z]>||' fileOrFiles
    will grab
    Code:
    <a href=blah blah junk></a>
    <Z hrefblahblahblah/q>
    etc

  5. #5
    Join Date
    Jul 2012
    Beans
    31

    Re: How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a fi

    I guess that "e" in the command is an trouble besides your code does remove some <a href= ...../> but deleting whole lines after the link anchor and it is also not working properly to remove all instances of <a href=**>.

  6. #6
    Join Date
    May 2009
    Location
    Courtenay, BC, Canada
    Beans
    1,661

    Re: How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a fi

    I didn't undrrstand what your last comment was. give some specific examples of what yoi need that isn't being done, or you need done diffrrently, and I can give you some examples of how to achieve them.

  7. #7
    Join Date
    Mar 2008
    Beans
    1,219

    Re: How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a fi

    To catch and strip only <a></a> tags i'd suggest:
    Code:
    sed -e 's|</\?a[^<>]*>||g' < infile > outfile
    Should work in most situations.
    Last edited by prodigy_; May 30th, 2013 at 10:44 AM.

  8. #8
    Join Date
    Jul 2012
    Beans
    31

    Re: How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a fi

    Ya it is working nicely. Can you please describe the code a lil bit, so i can use them in informed way.

    Thanks for your time and help.

  9. #9
    Join Date
    Mar 2008
    Beans
    1,219

    Re: How to delete all instances of "<a href=(maybe a no/ letter/ sign)</a>" from a fi

    Well, the s command tells sed we're doing pattern replacement (substitution). | characters separate the pattern to math, substitution string (which is empty in this case) and sed commands. Finally the g command tells sed to do more that one substitution per line if necessary.

    </\?a[^<>]*> is the patthern we're looking for. Outer < and > are literals. /\? matches slash or nothing. a is literal. [^<>]* matches zero or more occurrences of anything that isn't < or > (regular expressions are greedy so without this part sed would remove a lot more than we wanted).

    If you want to learn sed, start here (nice tutorial with lots of examples):
    http://www.grymoire.com/Unix/Sed.html
    Last edited by prodigy_; May 31st, 2013 at 09:53 AM.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •