pataphysician
June 2nd, 2010, 12:13 AM
I'm trying to write a shell script that will pull all of the tags from an xml file. I'm finding that my regular expressions are either too greedy or too lazy. A simple example. Pretend I want to pull all the text that exists between quotes:
line='"bob," she said, "i have had enough."'
regexa='\"([^"]+)\"'
regexb='\"(.*)\"'
[[ $line =~ $regexa ]] # or b
i=1
n=${#BASH_REMATCH }
while [[ $i -lt $n ]]
do
echo ${BASH_REMATCH[$i]}
let i++
done
If i use regexa I only get 'bob,' returned. If I use regexb, I get everything between the first and last quote ('bob," she said, "i have had enough.') as one result. What expression can I use to get "bob," as one part, and, "i have had enough." as another? (Without the quotes.)
line='"bob," she said, "i have had enough."'
regexa='\"([^"]+)\"'
regexb='\"(.*)\"'
[[ $line =~ $regexa ]] # or b
i=1
n=${#BASH_REMATCH }
while [[ $i -lt $n ]]
do
echo ${BASH_REMATCH[$i]}
let i++
done
If i use regexa I only get 'bob,' returned. If I use regexb, I get everything between the first and last quote ('bob," she said, "i have had enough.') as one result. What expression can I use to get "bob," as one part, and, "i have had enough." as another? (Without the quotes.)