Results 1 to 7 of 7

Thread: Parsing data for file signatures

  1. #1
    Join Date
    Oct 2009
    Location
    California
    Beans
    Hidden!
    Distro
    Ubuntu Studio 12.10 Quantal Quetzal

    Question Parsing data for file signatures

    I'm wondering if there is a way to parse through an unidentified file to identify file signatures, especially if it appears to contain multiple files.

    For example a picture embedded within a document. Or spliting an image file for example foo.png with another file music.mp4




    The purpose is partly to learn how to manually look/script a way of detecting content where it doesn't belong (much like what malware does). I tried using the
    Code:
    file
    command but it only returns the first thing it finds, and the -k option doesn't seem to solve the problem.

    I was thinking for finding an mp4 within the png file (which wouldn't open since the mp4 in it would corrupt it) using grep with some sort of signature

    So if the signature is 'ftup' and the file called merged.corruption

    Code:
    grep 'ftyp' merged.corruption
    would return a match if the signature is detected.

    I found some signatures on google here.

    http://www.garykessler.net/library/file_sigs.html
    http://en.wikipedia.org/wiki/List_of_file_signatures

    Is there a better or easier way?
    Can this be done with the file command?
    Is there any considerations I should keep in mind if using grep or file, like binary and ASCII data within the same file?
    User:To friend or not to friend--that is the question:
    Whether 'tis nobler to take an arrow to the knee or to suffer
    the slights and add's of outrageous fortune
    Or to take arms against a sea of trolls And by opposing feed them. www.evicsis.com

  2. #2
    Join Date
    Sep 2011
    Beans
    1,531

    Re: Parsing data for file signatures

    You could look at Scalpel to carve the file into pieces. Perhaps you can automate it by comparing the carved files to known signatures and return matches if they don't match the original filetype.

  3. #3
    Join Date
    Nov 2011
    Location
    /dev/root
    Beans
    7,174

    Re: Parsing data for file signatures

    PhotoRec is a tool to recover lost files. When the file system or partition table is destroyed, and the data is still lying on the disk or flash pendrive, it is possible to find it and recover it using typical signatures in various file types. The name indicates, that it was originally designed to recover photo files (for example jpeg files). But now it is developed into a general recovery tool.

    Maybe it can also help you looking for 'files in files'.

  4. #4
    Join Date
    Oct 2009
    Location
    California
    Beans
    Hidden!
    Distro
    Ubuntu Studio 12.10 Quantal Quetzal

    Re: Parsing data for file signatures

    Scalpel looks like it is perfect for the job!

    Now to only figure out how to find and add specific signatures.

    Thank you Ms. Daisy



    sudodus I'll be sure to take a look at PhotoRec.
    User:To friend or not to friend--that is the question:
    Whether 'tis nobler to take an arrow to the knee or to suffer
    the slights and add's of outrageous fortune
    Or to take arms against a sea of trolls And by opposing feed them. www.evicsis.com

  5. #5
    Join Date
    Sep 2011
    Beans
    1,531

    Re: Parsing data for file signatures

    The purpose is partly to learn how to manually look/script a way of detecting content where it doesn't belong (much like what malware does).
    For this purpose I'd recommend that you identify known filetypes and perhaps calls to urls. In other words, a Word document that makes a call to http;//somebadsite.c0m/nasty.exe would be an interesting find but may not contain a signature for embedded filetypes. You might be able to script that by looking for http://*.* strings, but I'm not sure how you would eliminate false positives like a document with a harmless link in the body.

  6. #6
    Join Date
    Aug 2009
    Beans
    Hidden!

    Re: Parsing data for file signatures

    ...I suggest you add looking into statistical analysis to the above suggestions. Finding changes in entropy might come in handy in cases w/o available signature or as second opinion method.

  7. #7
    Join Date
    Oct 2009
    Location
    California
    Beans
    Hidden!
    Distro
    Ubuntu Studio 12.10 Quantal Quetzal

    Re: Parsing data for file signatures

    Looking for URL's is a good idea, It would b fairly simple to find them using the "strings" command, but it would need a list to determine if the url is a bad one. Maybe resolving the domain name and comparing the result to networks that are known to send out malware (or more statistically likely to send out malware) would help identify potentially harmful addresses.

    Unspawn I will most certainly also need to look into statistical analysis.
    I wonder if similar types of malware have similar values of entropy, and if the entropy of of normal files will show a significant difference.
    Sounds like a great experiment to try out!
    User:To friend or not to friend--that is the question:
    Whether 'tis nobler to take an arrow to the knee or to suffer
    the slights and add's of outrageous fortune
    Or to take arms against a sea of trolls And by opposing feed them. www.evicsis.com

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •