Results 1 to 10 of 10

Thread: Searching/indexing a binary file

  1. #1
    Join Date
    Apr 2007
    Beans
    Hidden!

    Searching/indexing a binary file

    I have a large file with variable length records. The data was written sequentially with a field length prefixed to the start of the record.

    I want to write an application which can seek/search this file for a specific tag/content.

    I am having trouble developing an alogrithm to search the file. If the records were fixed length I could easily develop a binary-chop or something similar which calculates the position by offset and record length.

    But how can I randomly seek within a variable length record collection? I would have trouble identifying what was data and record length...

    Perhaps I should try and read and encode the file into another format first?


  2. #2
    Join Date
    Jun 2007
    Location
    Maryland, US
    Beans
    6,288
    Distro
    Kubuntu

    Re: Searching/indexing a binary file

    Seems to me that you have already answered your question.

    Having fixed record sizes would help immensely, however if you are searching for a particular record and the records are not sorted by some "key", doing the binary search wouldn't be of much help.

    In cases where the records are fixed-sized and contain a "key" (such as a time-stamp or unique ID), and the records are sorted in relation to this key, then you would definitely be able to perform a binary search.

  3. #3
    Join Date
    Oct 2005
    Location
    Seattle, WA
    Beans
    494
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Searching/indexing a binary file

    You could go through the file and create an index of fixed length. As you said, convert the file first.

    Are you looking to do this quickly? Because this solution would not do so.
    www.runtime-era.com - Blog About My Journey as a Developer

  4. #4
    Join Date
    Apr 2006
    Location
    Hamilton, New Zealand
    Beans
    198
    Distro
    Ubuntu 9.04 Jaunty Jackalope

    Re: Searching/indexing a binary file

    How much control do you have over the file format? If you really need to have variable length entries what about including an index at the start/end of the file which contains where abouts in the file the record starts. As you already know the size of each item and presumably the number of items in the file (at the time of writing) it shouldn't be too much of a problem to add the index.

  5. #5
    Join Date
    Apr 2007
    Beans
    Hidden!

    Re: Searching/indexing a binary file

    I think I will initially write some code to build a seperate index file as suggested. Re-encoding the data file to fixed widths may result in too much disk space usage.

    Each field has a unique ID, so I will store that and a "seek position" into the file.

    thanks for your ideas.

  6. #6
    Join Date
    Feb 2008
    Location
    Cape Town, South Africa
    Beans
    Hidden!
    Distro
    Ubuntu 8.04 Hardy Heron

    Re: Searching/indexing a binary file

    Maybe it is time to get a Data Base involved?

  7. #7
    Join Date
    Oct 2005
    Location
    Seattle, WA
    Beans
    494
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Searching/indexing a binary file

    Quote Originally Posted by rubinboy View Post
    Maybe it is time to get a Data Base involved?
    +1

    I am doing a similar project and just so ya know, if the file is huge (> 2gig) make sure you look out for the type of your offset. I'm not sure what language you are using, but make sure you store offsets in the index as 64bit integers.

    Just a heads up!

    Goodluck
    www.runtime-era.com - Blog About My Journey as a Developer

  8. #8
    Join Date
    Feb 2006
    Location
    Seattle-Eastside
    Beans
    309
    Distro
    Ubuntu 8.04 Hardy Heron

    Re: Searching/indexing a binary file

    You could also scan the entire file at startup, create an array or hash of the records and sizes. Each entry could contain the offset to where that record is within the file.

  9. #9
    Join Date
    Oct 2005
    Location
    Seattle, WA
    Beans
    494
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Searching/indexing a binary file

    Quote Originally Posted by KingTermite View Post
    You could also scan the entire file at startup, create an array or hash of the records and sizes. Each entry could contain the offset to where that record is within the file.
    That would take up a huge amount of memory if we are talking about large files.
    www.runtime-era.com - Blog About My Journey as a Developer

  10. #10
    Join Date
    Feb 2008
    Location
    Cape Town, South Africa
    Beans
    Hidden!
    Distro
    Ubuntu 8.04 Hardy Heron

    Re: Searching/indexing a binary file

    Quote Originally Posted by era86 View Post
    That would take up a huge amount of memory if we are talking about large files.
    HUGE, if the file was 2gigs, we are talking about alot more then 2gigs of ram just to start up.... I would recommend writeing a script that ports this data over to a mysqlDB

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •