PDA

View Full Version : String variable under Python : Unicode problem



yanes
November 14th, 2010, 05:47 PM
Hi all ,
I have coded an application that manipulate some file paths , it works perfectly with some paths ,
But it fails with some files or dirs with french accented chars or arabic file names ,
please how to make python accept and opens files with unicode encoded paths ?

Thanks in advance :)

Arndt
November 14th, 2010, 07:18 PM
Hi all ,
I have coded an application that manipulate some file paths , it works perfectly with some paths ,
But it fails with some files or dirs with french accented chars or arabic file names ,
please how to make python accept and opens files with unicode encoded paths ?

Thanks in advance :)

I don't know what exactly is needed for this to work, but I tried this experiment, which worked nicely:


$ python
Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> open("pélé à gagné","w")
<open file 'pélé à gagné', mode 'w' at 0xb77774d0>
>>>
$ ls 'pélé à gagné'
pélé à gagné
$ echo LANG
en_US.UTF-8

Does the same thing work for you? Does your LANG environment variable perhaps not contain a reference to UTF-8?

mo.reina
November 14th, 2010, 07:30 PM
>>> s = 'hello byte string'
>>> u = unicode(s)
>>> u
u'hello byte string'
>>> u.encode()
'hello byte string'
>>>

worksofcraft
November 14th, 2010, 09:22 PM
Python source code defaults to ASCII.
To define a different source code encoding, you need a "magic comment" placed into the source files either as first or second line in the file:


#!/usr/bin/python
# -*- coding: utf-8 -*-


I think you also have to make sure that your locale is using utf-8 and not one of the other iso codes.

nvteighen
November 14th, 2010, 09:30 PM
Hi all ,
I have coded an application that manipulate some file paths , it works perfectly with some paths ,
But it fails with some files or dirs with french accented chars or arabic file names ,
please how to make python accept and opens files with unicode encoded paths ?

Thanks in advance :)

Please show us code that reflects this behaivor.

cgroza
November 14th, 2010, 09:33 PM
Python source code defaults to ASCII.
To define a different source code encoding, you need a "magic comment" placed into the source files either as first or second line in the file:


#!/usr/bin/python
# -*- coding: utf-8 -*-
I think you also have to make sure that your locale is using utf-8 and not one of the other iso codes.
^^This is the solution.

wmcbrine
November 15th, 2010, 02:27 AM
please show us code that reflects this behaivor.+1

nvteighen
November 15th, 2010, 08:05 AM
Yup it works:



#!/usr/bin/python
#-*- coding: utf-8 -*-

def main():
filename = "Εὐριπίδου Βάκχαι"
message = " Ἥκω Διὸς παῖς τήνδε Θηβαίων χθόνα..."
myfile = open(filename, "w")
myfile.write(message)
myfile.close()

if __name__ == "__main__":
main()


Call me stupid, but I though -*- coding: utf-8 -*- was an Emacs or vim thing, not a Python issue... :P

worksofcraft
November 15th, 2010, 08:41 AM
Call me stupid, but I though -*- coding: utf-8 -*- was an Emacs or vim thing, not a Python issue... :P

No, you are correct :) The designers of Python decided to make it compatible with a number of existing practices instead of making a new one of their own. I think all that it actually needs is the word "coding" and a legitimate code in a comment on the 1st or second line

yanes
November 18th, 2010, 12:24 AM
Please show us code that reflects this behaivor.

This is the code (the exact function that raises the err :


--My Plain code :
def askUserForFilename(self, **dialogOptions):
dialog = wx.FileDialog(self, **dialogOptions)
if dialog.ShowModal() == wx.ID_OK:
userProvidedFilename = True
self.filename = dialog.GetFilename()
self.dirname = dialog.GetDirectory()
print type(self.dirname)
print "fileName : " ,self.filename
print "DirName : " , self.dirname
else:
userProvidedFilename = False
dialog.Destroy()
return userProvidedFilename

--Python Output :
Event handler `On_Open' invoked!
<type 'unicode'>
fileName : dsc_1576.jpg
DirName : /home/yanes/Téléchargements
/home/yanes/Téléchargements/dsc_1576.jpg
Traceback (most recent call last):
File "img_r_2.py", line 207, in On_Open
print str(self.dirname+"/"+ self.filename)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 13: ordinal not in range(128)


As you look , the problem is in the variable "self.dirname" it's the result of a wx.FileDialog .
The second line of the Output is the type() of this variable , python knows that it's a unicode string ?? That's Strange ??
If it recognize it as unicode , why it raises this error ?

another thing that I think it's important :
From the module os.path I tried this :


Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os.path
>>> os.path.supports_unicode_filenames
False
>>>

:confused::confused:

So what do you think ?

Arndt
November 18th, 2010, 09:31 AM
This is the code (the exact function that raises the err :


--My Plain code :
def askUserForFilename(self, **dialogOptions):
dialog = wx.FileDialog(self, **dialogOptions)
if dialog.ShowModal() == wx.ID_OK:
userProvidedFilename = True
self.filename = dialog.GetFilename()
self.dirname = dialog.GetDirectory()
print type(self.dirname)
print "fileName : " ,self.filename
print "DirName : " , self.dirname
else:
userProvidedFilename = False
dialog.Destroy()
return userProvidedFilename

--Python Output :
Event handler `On_Open' invoked!
<type 'unicode'>
fileName : dsc_1576.jpg
DirName : /home/yanes/Téléchargements
/home/yanes/Téléchargements/dsc_1576.jpg
Traceback (most recent call last):
File "img_r_2.py", line 207, in On_Open
print str(self.dirname+"/"+ self.filename)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 13: ordinal not in range(128)


As you look , the problem is in the variable "self.dirname" it's the result of a wx.FileDialog .
The second line of the Output is the type() of this variable , python knows that it's a unicode string ?? That's Strange ??
If it recognize it as unicode , why it raises this error ?

another thing that I think it's important :
From the module os.path I tried this :


Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os.path
>>> os.path.supports_unicode_filenames
False
>>>

:confused::confused:

So what do you think ?

I can duplicate the problem, without any actual file handling. The simplest way is to do


$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os.path
>>> os.path.supports_unicode_filenames
False
>>> s = u"/home/yanes/Téléchargements"
>>> s
u'/home/yanes/T\xe9l\xe9chargements'
>>> print s
/home/yanes/Téléchargements
>>> type(s)
<type 'unicode'>
>>> str(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 13: ordinal not in range(128)
>>>


In the documentation, however, I see this: "...the slightly different str() function). The latter function is implicitly used when an object is written by the print() function." which is obviously not the case (documentation for 2.6 and 2.7 is the same here, and I'm using 2.6).

Can you just lose the 'str' call? Or is it inside some library code?

You can apparently transform the string so it becomes palatable to 'str':


>>> s = u"/home/yanes/Téléchargements"
>>> s2 = s.encode('utf-8')
>>> s2
'/home/yanes/T\xc3\xa9l\xc3\xa9chargements'
>>> type(s2)
<type 'str'>
>>> str(s2)
'/home/yanes/T\xc3\xa9l\xc3\xa9chargements'
>>> print(s2)
/home/yanes/Téléchargements
>>> print str(s2)
/home/yanes/Téléchargements
>>>

yanes
November 18th, 2010, 10:53 PM
Ok , I'll try the unicode() function
To be exact the problem was caused by the str() function , by removing it , the scipts executed sucseefully under the console


Python myscript.py

but raises an error under SPE python IDE
I don't understand the cause ;)

I'll try it now in the hope it will not raises the error :'No such file or directory' :) ;)
thanks for all of you

yanes
November 20th, 2010, 11:42 PM
Very well, your hint works fine Mr Arndt (http://ubuntuforums.org/member.php?u=106064),

thank you my program can now opens french, arabic , and some other arbitrary chars that i tested