Results 1 to 3 of 3

Thread: Python interpreter coding?

  1. #1
    Join Date
    Apr 2008
    Beans
    320
    Distro
    Xubuntu 17.10 Artful Aardvark

    Python interpreter coding?

    Code:
    # -*- coding: utf-8 -*-
    The above is a line of code I like to put in most of my Python scripts. This makes everything work as if it's UTF-8 without having to go through lots of coding differences and rigmarole (no typing u"á"; just "á"). Particularly in this case, I don't end up having to see and deal with sequences like this: '\xc3\xa1'

    Anyway, I'm just wondering how to do the same thing from the Python interpreter. Any ideas? Is there a config setting somewhere?

    I'm aware that half of the escape sequence output I'm getting is probably the terminal's fault, but I'm just wondering what the equivalent to the code I cited above is.
    Last edited by kumoshk; May 10th, 2013 at 03:36 AM.

  2. #2
    Join Date
    Apr 2013
    Location
    43.49°N 7.46°E
    Beans
    117
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Python interpreter coding?

    you might try to create a single line script containing that code and then to invoke it by setting properly the variable PYTHONSTARTUP (i.e. by assigning to it the full pathname of that script) in your .bash_profile or .profile.
    If you already have a script which is invoked as above explained, then you have just to place that code at the top of that script.

  3. #3
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,172
    Distro
    Ubuntu 16.04 Xenial Xerus

    Re: Python interpreter coding?

    Quote Originally Posted by kumoshk View Post
    Code:
    # -*- coding: utf-8 -*-
    The above is a line of code I like to put in most of my Python scripts. This makes everything work as if it's UTF-8 without having to go through lots of coding differences and rigmarole (no typing u"á"; just "á").
    Then you are mistaken.. this "pragma" only applies to the way python reads your source code, and not how it handle characters. What appears a 'à' is realyl two bytes in your source code, and if you did put that "coding" pragma you would have a two-byte character string. You get away with "à" because it fits in a single byte and it's part of the default encoding. Regular python strings contain bytes
    Quote Originally Posted by kumoshk View Post

    Particularly in this case, I don't end up having to see and deal with sequences like this: '\xc3\xa1'

    Anyway, I'm just wondering how to do the same thing from the Python interpreter. Any ideas? Is there a config setting somewhere?

    I'm aware that half of the escape sequence output I'm getting is probably the terminal's fault, but I'm just wondering what the equivalent to the code I cited above is.
    If you want to "intrepret" the stream of bytes in a file as UTF-8 encoded characters, you need to use the "codecs" package.
    Code:
    import codecs
    fiile=codecs.open(filePath, 'r', encoding='utf-8')

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •