View Full Version : Replace Unicode Characters

March 15th, 2008, 12:50 AM

I'm trying to use xsel to feed any selected text into festival to be converted to speech but I'm hitting a problem that some of my text comes back with unicode characters in it for things it should be able to read like:

(2327 keV)

$xsel -o
(23\u201327 keV)


$xsel -o -u
(23€“27 keV)

Now is there anyway of replacing these basic unicode characters with their ascii equivalents?

March 15th, 2008, 09:38 AM
I can get a slightly more readable version with the line:

$xsel -o -u | uni2ascii -a A
(23<U2013>27 keV)

This would at least let me use sed on it. Is there some was of running sed to replace codes like this for ascii equivalents? As I understand it the common unicode codes are the same as the ascii codes only with leading zeros.

March 15th, 2008, 10:43 AM
OK. I have this, but there has to be a better way:


echo "<html><body>" > /tmp/convert.htm
cat | uni2ascii -a Q -a D >> /tmp/convert.htm
echo "</html></body>" >> /tmp/convert.htm
cat /tmp/convert.htm | lynx -stdin -dump
rm /tmp/convert.htm

Then to read it using festival add as a command in the window manager:

xsel -o -u | stdtext | festival --tts

March 26th, 2008, 09:38 PM
ANy ideas?