Results 1 to 7 of 7

Thread: [Python] Get unicode strings from sys.argv

  1. #1
    Join Date
    Nov 2005
    Location
    Leeds, UK
    Beans
    1,634
    Distro
    Ubuntu Development Release

    [Python] Get unicode strings from sys.argv

    Hi,

    Is it possible to give a python script a unicode string from a commandline argument? I've tried searching for a way how but I've only managed to find a windows method to do it, and not for linux. Is this possible?

    Thanks!

  2. #2
    Join Date
    Mar 2005
    Beans
    947
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: [Python] Get unicode strings from sys.argv

    I get UTF-8 strings from sys.argv in Ubuntu, if that's what I pass to it. (UTF-8 is the normal system encoding for Ubuntu.) So, for example, if you want a Unicode string from the first argument, it would be "unicode(sys.argv[1], 'utf-8')" or "sys.argv[1].decode('utf-8')".

  3. #3
    Join Date
    Nov 2005
    Location
    Leeds, UK
    Beans
    1,634
    Distro
    Ubuntu Development Release

    Re: [Python] Get unicode strings from sys.argv

    Quote Originally Posted by wmcbrine View Post
    I get UTF-8 strings from sys.argv in Ubuntu, if that's what I pass to it. (UTF-8 is the normal system encoding for Ubuntu.) So, for example, if you want a Unicode string from the first argument, it would be "unicode(sys.argv[1], 'utf-8')" or "sys.argv[1].decode('utf-8')".
    Thanks! I eventually realised that sys.argv[1].decode('utf-8') does the trick. I still don't understand unicode properly but anyway, that can wait . Thanks again!

  4. #4
    Join Date
    Aug 2007
    Location
    127.0.0.1
    Beans
    1,800
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: [Python] Get unicode strings from sys.argv

    Then you have some reading to do:

    "Friendly" explanation (2003):
    http://www.joelonsoftware.com/articles/Unicode.html

    Mostly technical explanation of how unicode works:
    http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
    "Just in terms of allocation of time resources, religion is not very efficient. There's a lot more I could be doing on a Sunday morning."
    -Bill Gates

  5. #5
    Join Date
    Nov 2005
    Location
    Leeds, UK
    Beans
    1,634
    Distro
    Ubuntu Development Release

    Re: [Python] Get unicode strings from sys.argv

    I do understand unicode in general, I meant that I don't understand how python deals with it. My understanding is that python 3 treats them as normal strings where as python 2 treats them as a separate type. When decoding a string, the encoding you pass to the function is the encoding of the string itself, am I right? I would have thought that python would be able to automatically get the string's encoding?

  6. #6
    Join Date
    Mar 2005
    Beans
    947
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: [Python] Get unicode strings from sys.argv

    Quote Originally Posted by durand View Post
    I would have thought that python would be able to automatically get the string's encoding?
    How? That information is not recorded within the string.

    Now, you can do a semi-automatic UTF-8 recognition by relying on the fact that most non-ASCII, non-UTF-8 text will not validate as UTF-8, but you have to choose a fallback encoding, such as ISO8859-1:

    Code:
    try:
        utext = unicode(text, 'utf-8')
    except:
        utext = unicode(text, 'iso8859-1')

  7. #7
    Join Date
    Nov 2005
    Location
    Leeds, UK
    Beans
    1,634
    Distro
    Ubuntu Development Release

    Re: [Python] Get unicode strings from sys.argv

    Well, the input would be japanese kanji such as 馬山木火 so it will most likely be utf-8. I don't think it would matter much anyway as this script is just for me but I'll keep that in mind for the future. Thanks.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •