Page 2 of 2 FirstFirst 12
Results 11 to 12 of 12

Thread: File header size in Bytes

  1. #11
    Join Date
    Aug 2010
    Location
    Lancs, United Kingdom
    Beans
    1,510
    Distro
    Ubuntu Mate 16.04 Xenial Xerus

    Re: File header size in Bytes

    As has been said, unlike .bmp files, many file formats do not specify any "header" data.

    All of the major current LibreOffice native formats are zip archives (.odt, .ods etc.). All of the major current Microsoft Office native formats are also zip archives (.docx, .xlsx, .pptx etc.) but not the old ones (.doc, .xls, .ppt etc.).

    A zip archive begins with the header of the first file contained within the archive. This is not a header for the archive itself and it is varying length (30 + X + Y bytes where X and Y are 2 lengths specified in the first 30 bytes). The interesting part, if you like, is the Central Directory which is located at the end of the archive and is also of varying size.

    For a .pdf file, all you are guaranteed up front is %PDF followed by a (varying length) version number. Similar to a zip archive, the Trailer Dictionary will be found at the end of the file, before the %%EOF. The Trailer Dictionary tells you, among other things, where to find the xref (cross reference) table, which will often be just before the trailer but need not be.

  2. #12
    Join Date
    Jan 2010
    Location
    Wheeling WV USA
    Beans
    1,343
    Distro
    Xubuntu 18.04 Bionic Beaver

    Re: File header size in Bytes

    if you want to get information from a file, you will need more information than just the size of the header. you will need to know where it puts the information want. maybe you want to collect the frequency count for each letter and word in the document. to do something like that, you will need to know a lot more than what byte offset you need to get past the header. you'll need to know how to uncompress it. you will also need to know how it interleaves the formatting and positioning of the letters and words.

    i cannot imagine any end goal that only needs to know the size of the header than can work for more than a small handful of file types.
    What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.

Page 2 of 2 FirstFirst 12

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •