Solution option:
Code:
echo "please enter path of file to parse, relative to current directory:"
echo $(pwd)
read file
cat $file | grep "<hdwd>" |sed -e 's_<hdwd>__g' -e 's_</hdwd>__g' -e '/^<entry/d' > $file.parsed
cat $file.parsed
echo "Parsed file can be found at:"
echo $file.parsed
Explanation:
Code:
echo "please enter path of file to parse, relative to current directory:"
echo $(pwd)
read file
This simply asks the user to type the path to the file,
$(pwd) is the environmental variable that refer to the current working directory,
This line simply reads the user input and stores it in the variable "file"
Code:
cat $file | grep "<hdwd>" |sed -e 's_<hdwd>__g' -e 's_</hdwd>__g' -e '/^<entry/d' > $file.parsed
Broken down:
This simply opens the file and redirects its output to the pipe (standard out get piped to the next command by the pipe character "|")
This command searches for and prints (to standard out) all lines containing "<hdwd>" and then gets piped again to the next command
Code:
|sed -e 's_<hdwd>__g' -e 's_</hdwd>__g' -e '/^<entry/d'
This is the most difficult part. Sed is here used to do 3 things in sequence to the standard input received from grep:
1. <hdwd> gets replaced by nothing (deleted basically)
2. </hdwd> gets replaced by nothing (deleted basically)
3. all remaining lines starting with "<entry" gets deleted. This is one thing I noticed that remains after doing grep. Not sure why, but this removes the issue.
This simply redirects standard output to a file with the appended ".parsed" file name.
Code:
cat $file.parsed
echo "Parsed file can be found at:"
echo $file.parsed
This simply outputs the parsed file to the terminal,
And lets the user know where to find the newly created file.
I hope this helps. Will include a new post with a script to parse all files in a directory.
Good luck
Bookmarks