Hi, I am stuck on this, have spent hours trying to find an answer :/
I am trying to use sed to strip away content in .html files. Sometimes the html files have the html tags on own rows and sometimes not, as in the examples below:
Example a.html:
Code:
<html>
<head>
<title>Example A</title>
</head>
<body style="styleA">
<p>Some text
</body>
</html>
Example b.html:
Code:
<html><head><title>Example B</title></head><body style="styleB"><p>Some text</body></html>
With the following sed command I can add text after a tag independent of if it has a newline or not...
Code:
tobbe@virtualbox:~/$ sed 's/<[\/]*body[^>]*>/&\
/g' a.html
<html>
<head>
<title>Example A</title>
</head>
<body style="styleA">
<p>Some text
</body>
</html>
tobbe@virtualbox:~/$
tobbe@virtualbox:~/$
tobbe@virtualbox:~/$ sed 's/<[\/]*body[^>]*>/&\
/g' b.html
<html><head><title>Example B</title></head><body style="styleB">
<p>Some text</body>
</html>
BUT I only want to add the newline when the tag is NOT followed by a newline... So I am trying to do the same as above but use the searchcritera as above + NOT a newline which I thought was '^$', i.e.
Code:
tobbe@virtualbox:~/$ sed 's/<[\/]*body[^>]*>^$/&\
/g' b.html
<html><head><title>Example B</title></head><body style="styleB"><p>Some text</body></html>
Bookmarks