Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: [Python] Unsigned 64-bit integer causing addressing error

  1. #1
    Join Date
    Jun 2007
    Location
    The Fiery Desert
    Beans
    141
    Distro
    Ubuntu 7.04 Feisty Fawn

    Red face [Python] Unsigned 64-bit integer causing addressing error

    I am working with large binary files (several gigabytes each), and I need to be able to address specific bytes in those files. So I keep addresses of data within these files using 64-bit integers, specifically numpy types int64 and uint64. However, I noticed that the unsigned version is giving an error:

    PHP Code:
    >>> ids1 np.fromfile('TR_45_3 ch 0 cluster 1.txt'dtype np.uint64sep ' ')
    >>> 
    ids1[-1]
    4999998
    >>> tes4530.goto(ids1[-1])

    Traceback (most recent call last):
      
    File "<pyshell#55>"line 1in <module>
        
    tes4530.goto(ids1[-1])
      
    File "xxx\tes_utility.py"line 32in goto
        
    self.f.seek(newpos self.rec_len)
    OverflowErrorPython int too large to convert to C long 
    However, when I used the signed version numpy.int64, there is no problem:
    PHP Code:
    >>> ids1 np.fromfile('TR_45_3 ch 0 cluster 1.txt'dtype np.int64sep ' ')
    >>> 
    ids1[-1]
    4999998
    >>> tes4530.goto(ids1[-1]) 
    What is the difference between these two types that is causing this error?

    (The text files hold positions of records. In this case, the records are stored using 16-bit integers, and each record is 256 data points - hence the multipliers in the f.seek() command. tes4530 is an instance of a class that handles the binary data file, and goto() sets the byte position at the beginning of the record with the given number.

    Also, numpy is loaded using "import numpy as np".)
    Die writing: I write crazy things
    PhD Wander: I write crazy things about my travels
    RedScout: Lenovo 3000 C100, 1.5GHz Celeron M, 1.25GB RAM, 120GB HDD

  2. #2
    Join Date
    Dec 2009
    Location
    germany
    Beans
    1,020
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: [Python] Unsigned 64-bit integer causing addressing error

    Quote Originally Posted by Erdaron View Post
    I am working with large binary files (several gigabytes each), and I need to be able to address specific bytes in those files. So I keep addresses of data within these files using 64-bit integers, specifically numpy types int64 and uint64. However, I noticed that the unsigned version is giving an error:

    PHP Code:
    >>> ids1 np.fromfile('TR_45_3 ch 0 cluster 1.txt'dtype np.uint64sep ' ')
    >>> 
    ids1[-1]
    4999998
    >>> tes4530.goto(ids1[-1])

    Traceback (most recent call last):
      
    File "<pyshell#55>"line 1in <module>
        
    tes4530.goto(ids1[-1])
      
    File "xxx\tes_utility.py"line 32in goto
        
    self.f.seek(newpos self.rec_len)
    OverflowErrorPython int too large to convert to C long 
    However, when I used the signed version numpy.int64, there is no problem:
    PHP Code:
    >>> ids1 np.fromfile('TR_45_3 ch 0 cluster 1.txt'dtype np.int64sep ' ')
    >>> 
    ids1[-1]
    4999998
    >>> tes4530.goto(ids1[-1]) 
    What is the difference between these two types that is causing this error?

    (The text files hold positions of records. In this case, the records are stored using 16-bit integers, and each record is 256 data points - hence the multipliers in the f.seek() command. tes4530 is an instance of a class that handles the binary data file, and goto() sets the byte position at the beginning of the record with the given number.

    Also, numpy is loaded using "import numpy as np".)
    hi
    i'am not conform with this programing language ( for me C and assembler is the bible of programing languages) but i guess that the "dtype = np.uint64 don't match with your
    goto(ids1[-1]).
    just a feeling.
    cheers
    "What is the robbing of a bank compared to the FOUNDING of a bank?" Berthold Brecht

  3. #3
    Join Date
    Jun 2007
    Location
    The Fiery Desert
    Beans
    141
    Distro
    Ubuntu 7.04 Feisty Fawn

    Re: [Python] Unsigned 64-bit integer causing addressing error

    Thank you for the suggestion!

    I don't think calling ids[] with a negative index is the issue. In Python, using the negative index simply means counting backwards from the end of the array.

    The error occurs if I use positive indices, too. It somehow has to do with the type of the numbers stored in ids[]
    Die writing: I write crazy things
    PhD Wander: I write crazy things about my travels
    RedScout: Lenovo 3000 C100, 1.5GHz Celeron M, 1.25GB RAM, 120GB HDD

  4. #4
    Join Date
    Jun 2010
    Beans
    92

    Re: [Python] Unsigned 64-bit integer causing addressing error

    A signed integer uses the highest bit to denote whether the integer is positive or negative. So in this case only 63 bits are used to denote the abs value.

    The unsigned integer uses all 64 bits for the abs value. So the abs value for the unsigned integer can be 2^63 greater.

  5. #5
    Join Date
    Dec 2009
    Location
    germany
    Beans
    1,020
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: [Python] Unsigned 64-bit integer causing addressing error

    Quote Originally Posted by Erdaron View Post
    Thank you for the suggestion!

    I don't think calling ids[] with a negative index is the issue. In Python, using the negative index simply means counting backwards from the end of the array.

    The error occurs if I use positive indices, too. It somehow has to do with the type of the numbers stored in ids[]
    hi
    counting back at the end of the array ? how you define the end of array ? is it element
    blablu[0-1] or blablu[max_elements-1]. if it is what i think 0-1 then it's before the definition of your array and if the array is the first definition ( it is in C) it's your stack (hacker like this because the stack contains very interesting things )
    cheers
    "What is the robbing of a bank compared to the FOUNDING of a bank?" Berthold Brecht

  6. #6
    Join Date
    Feb 2009
    Beans
    1,468

    Re: [Python] Unsigned 64-bit integer causing addressing error

    Don't guess, test. Negative indices in Python count backwards from len(array).

    Code:
    v = ['h', 'e', 'l', 'l', 'o']
    print(v[-1])
    will print 'o'.

  7. #7
    Join Date
    Jun 2007
    Location
    The Fiery Desert
    Beans
    141
    Distro
    Ubuntu 7.04 Feisty Fawn

    Re: [Python] Unsigned 64-bit integer causing addressing error

    Quote Originally Posted by xb12x View Post
    A signed integer uses the highest bit to denote whether the integer is positive or negative. So in this case only 63 bits are used to denote the abs value.

    The unsigned integer uses all 64 bits for the abs value. So the abs value for the unsigned integer can be 2^63 greater.
    I thought this might be a problem, but the values I use are too small to run into that boundary. I'm at the point where a 32-bit integer just barely doesn't cover it. 63- and 64-bits provide far more address space than I am currently using.

    Also, if I use lower-valued addresses, it doesn't matter whether they are stored as signed or unsigned. So it's not the number of bits used to store the address.

    Maybe it's something internal with how numpy stores and converts signed and unsigned integers?
    Die writing: I write crazy things
    PhD Wander: I write crazy things about my travels
    RedScout: Lenovo 3000 C100, 1.5GHz Celeron M, 1.25GB RAM, 120GB HDD

  8. #8
    Join Date
    Jun 2010
    Beans
    92

    Re: [Python] Unsigned 64-bit integer causing addressing error

    Quote Originally Posted by Erdaron View Post
    Also, if I use lower-valued addresses, it doesn't matter whether they are stored as signed or unsigned. So it's not the number of bits used to store the address.
    I suspect it does matter. I suspect your signed variables are being misinterpreted as unsigned, and visa-versa.

    Signed integers are represented in memory by using the 'two's compliment' of the absolute value.

    8bit variables using two's compliment for negative numbers:
    +1 is represented by 00000001b
    - 1 is represented by 11111111b
    10000001b unsigned is 129 (absolute value is 129)
    10000001b signed is -127 (absolute value is 127)

    32bit:
    A signed variable containing 0xFFFFFFFF is -1
    An unsigned variable containing 0xFFFFFFFF is 4,294,967,295

  9. #9
    Join Date
    Feb 2009
    Beans
    1,468

    Re: [Python] Unsigned 64-bit integer causing addressing error

    How is goto defined, and what are newpos and self.rec_len? Could you be experiencing overflow in an intermediate calculation? Doesn't seem likely... but you never know.

    Also, what platform is this? Could long actually be smaller than 64 bits (as I believe it is on Windows)?

  10. #10
    Join Date
    Jun 2007
    Location
    The Fiery Desert
    Beans
    141
    Distro
    Ubuntu 7.04 Feisty Fawn

    Re: [Python] Unsigned 64-bit integer causing addressing error

    Here is the relevant part of the class and method definition:
    PHP Code:
    class TESstream1:
        
    #Objects of this class allow loading and searching data from a
        # a binary file containing single-channel data
        
    def __init__(selffnamerec_len 256outtype np.float32):
            
    #rec_len - number of data points per record
            #outtype - number type used in the outputs
            
    self.records os.path.getsize(fname) / (rec_len)
            
    self.open(fname'rb')
            
    self.rec_len rec_len
            self
    .fname fname
            self
    .outtype outtype
            self
    .** 15 #offset to compensate for data being unsigned
        
    def goto(selfnewpos):
            
    #go to the beginning of the record specified by newpos)
            
    self.f.seek(newpos self.rec_len
    The data file stores a large number of records, and each record is a fixed number of data points. The data are stored as uint16, so 2 bytes per data point. There are no separators between records.

    newpos specifies the record at whose start the pointer should park. These are the addresses stored in ids. Since there are rec_len points per record, and 2 bytes per point, newpost * 2 * rec_len is the byte address of the start of the record.

    I am working in Windows 7 x64, using 64-bit versions of Python 2.7.3, with numpy 1.5.1.

    Also, by trying various numbers, I converged on the boundary at which I start getting the error. Passing values up to 4194303 to goto() is fine. Beginning with 4194304, the error message begins to appear.
    Die writing: I write crazy things
    PhD Wander: I write crazy things about my travels
    RedScout: Lenovo 3000 C100, 1.5GHz Celeron M, 1.25GB RAM, 120GB HDD

Page 1 of 2 12 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •