ladybug118
April 3rd, 2008, 02:00 AM
Hi there,
I'm sort of a beginner to perl and DEFINITELY new to unicode issues. Apologies if this is a totally newbie question, but I can't seem to figure it out on my own.
I have two files which contain unicode strings that I'd like to compare. One file is a javascript file, so the unicode characters are represented by \uXXXX (it's called an escape sequence I think?). The other file is a utf8 text file, so the characters appear correctly when viewed in my browser (Safari for Mac) with the text encoding set to utf8. Is there a function in perl which would allow me to convert one unicode format to the other? I am using "encoding(utf8)" to read both files...maybe there is another way I should be reading the .js file in order to convert all \uXXXX characters when opening/reading the file? Or do I have to write some function from scratch in order to do this (I hope not)?
For example:
In the .js file, the registered trademark symbol appears as \u00AE
But in the .txt file, it appears as ® <--registered trademark symbol
Thanks!
I'm sort of a beginner to perl and DEFINITELY new to unicode issues. Apologies if this is a totally newbie question, but I can't seem to figure it out on my own.
I have two files which contain unicode strings that I'd like to compare. One file is a javascript file, so the unicode characters are represented by \uXXXX (it's called an escape sequence I think?). The other file is a utf8 text file, so the characters appear correctly when viewed in my browser (Safari for Mac) with the text encoding set to utf8. Is there a function in perl which would allow me to convert one unicode format to the other? I am using "encoding(utf8)" to read both files...maybe there is another way I should be reading the .js file in order to convert all \uXXXX characters when opening/reading the file? Or do I have to write some function from scratch in order to do this (I hope not)?
For example:
In the .js file, the registered trademark symbol appears as \u00AE
But in the .txt file, it appears as ® <--registered trademark symbol
Thanks!