Could something like this be done or did I watch too many movies?

**ssam** · January 12th, 2013

Originally Posted by koenn

2- hash/fingerprint functions, maybe ?
Theoretically, each file produces a unique hash.
So if the server sends you a hash of the file you requested, all you have to do is recreate a file that, given the function, creates the same hash. You then know that the file you created is identical to the one on the server.

Unfortunately, creating the correct file may be difficult.

hashes are not unique, it is just that hash collisions are rare (and for a good hash algorithm it is hard to find the collision).

for example suppose you take a 1MB file (8 million bits) and a 25 bit hash. there are 2^8000000 possible 1MB files (in normal notations this is a number with 2.4 million digits, so it wont fit in this post). there 2^256 possible hash values (only 11579208923731619542357098500868790785326998466564 0564039457584007913129639936). So for each possible has there must be many possible files.

for roughly the same reason a lossless compression algorithm can't guarantee to be able to compress every file.

if you take the
AAABB BBCCC CCDEE (15 "bits") -> 3A 4B 5C D 2E (9 "bits")
example from above, you can see that some inputs (actually most) can't be compressed
ABDED CADEA DCAEC (15 "bits") -> 1A 1B 1D 1E 1D 1C 1A 1D 1E 1A 1D 1C 1A 1E 1C (30 "bits")

compression only works if there is redundancy in the input that can squeezed out.

**mips** · January 12th, 2013

I think you watched to many movies. You can't really create something out of nothing.

**koenn** · January 12th, 2013

Originally Posted by ssam

hashes are not unique, it is just that hash collisions are rare (and for a good hash algorithm it is hard to find the collision).

for example suppose you take a 1MB file (8 million bits) and a 25 bit hash. there are 2^8000000 possible 1MB files (in normal notations this is a number with 2.4 million digits, so it wont fit in this post). there 2^256 possible hash values (only 11579208923731619542357098500868790785326998466564 0564039457584007913129639936). So for each possible has there must be many possible files.

Well, I said "theoretically" - and even so, I don't know enough math to be conclusive. So yes, it's "lossy" in that the file you get is not necessarily identical to the file you requested.

In any case, what you describe, although it's correct, is merely implementation. As you demonstrate, the collision probability will depend on the size of the hash in relation to the key space (how large are the files vs how large are the hashes). With large hashes and relatively small source files, you could get (near) zero collision.

Whether that would at all be practical, I don't know. Is there a reasonable change that sufficiently large hashes would be so big that, overall, you don't gain anything ? Probably, yes. Do I think it's likely you can save on transfer cost with something like "fuzzy rsync" or similar techniques ? Again, yes.

**Warpnow** · January 12th, 2013

This is basically how filetypes work.

**lisati** · January 14th, 2013

Originally Posted by koenn

The point of all compression is to look for repeating patterns, and reproduce them; and often to do so requires less information than the actual file.

IMO it's not just repetition but redundancy can be used as well.

Some of the compression algorithms I've briefly looked at go over my head, but one which is a bit easier to grasp is the case of a simple ASCII text file with no fancy formatting, UTF or anything like that. In such a scenario only 7 out of every 8 bits would typically be used, making it unnecessary to transmit the full 8 bits for every byte. Yes, I know, not a particularly massive saving in space and bandwidth in this example, but the idea of an algorithm optimised for a particular kind of data has been used, e.g. jpeg.

**Cheesemill** · January 14th, 2013

A similar process is used by the zsync file transfer program.

I download the 13.04 testing images which are updated every day. By using zsync the latest version is checked against the version already on my machine, and only the pieces of the iso file that are different to the ones I already have are downloaded.

In the example below you can see that zsync realised I already had 91.4% of the file on my machine, so I only had to download 70MB to get the new iso instead of the entire 800MB file.

Code:

rob@arch:~/ISOs/Linux/Ubuntu/Raring$ zsync http://cdimage.ubuntu.com/daily-live/current/raring-desktop-amd64.iso.zsync
#################### 100.0% 323.8 kBps DONE     

reading seed file raring-desktop-amd64.iso: ***************************************************************
***********************************************************************************************************
***********************************************************************************************************
***********************************************************************************************************
***********************************************************************************************************
***********************************************************************************************************
***********************************************************************************************************
*************************************************************
Read raring-desktop-amd64.iso. Target 91.4% complete.      
downloading from http://cdimage.ubuntu.com/daily-live/current/raring-desktop-amd64.iso:
#################### 100.0% 1645.9 kBps DONE     

verifying download...checksum matches OK
used 749846528 local, fetched 70852261
rob@arch:~/ISOs/Linux/Ubuntu/Raring$

**mips** · January 14th, 2013

I'm beginning to think people are missing what the OP proposes.

He's not talking about compression, most files on the net you download are already heavily compressed.

He seems to be talking about creating a file locally from nothing but something akin to a MD5 Hash sent by the far end.

I'm no mathematician but this is not going to happen with normal algorithms unless we start delving into Quantum teleportation.

**ssam** · January 14th, 2013

Originally Posted by koenn

Well, I said "theoretically" - and even so, I don't know enough math to be conclusive. So yes, it's "lossy" in that the file you get is not necessarily identical to the file you requested.

In any case, what you describe, although it's correct, is merely implementation. As you demonstrate, the collision probability will depend on the size of the hash in relation to the key space (how large are the files vs how large are the hashes). With large hashes and relatively small source files, you could get (near) zero collision.

Whether that would at all be practical, I don't know. Is there a reasonable change that sufficiently large hashes would be so big that, overall, you don't gain anything ? Probably, yes. Do I think it's likely you can save on transfer cost with something like "fuzzy rsync" or similar techniques ? Again, yes.

for there to be enough possible hashes for all the possible files you would need the hash to be as big as the file.

most real files do have enough redundancy to allow compression. many transfer protocols (http, rsync, ssh) have compression options to enable lossless compression. some mobile web browsers can connect to a proxy server that does lossy compression.

**coldraven** · January 14th, 2013

So it's like this: don't send me a psychical file, send me a description of that file, and I will make it psychical here locally.

You have already solved the problem! With your psychic files all you have to do is think of a file and it will immediately appear on your PC.

Wow! I want a psychic computer too!

**Thee** · January 14th, 2013

I meant to write physical

In any case, what I was asking originally is, can something be created out of nothing (or by borrowing parts of bits already on the computer), just by getting information of how that file looks like.

Just like Cheesemill said, something similar to zsync, but unique in its own way. Zsync is the closest to this.

Thread: Could something like this be done or did I watch too many movies?

Thread Tools

Display

Re: Could something like this be done or did I watch too many movies?

Re: Could something like this be done or did I watch too many movies?

Re: Could something like this be done or did I watch too many movies?

Re: Could something like this be done or did I watch too many movies?

Re: Could something like this be done or did I watch too many movies?

Re: Could something like this be done or did I watch too many movies?

Re: Could something like this be done or did I watch too many movies?

Re: Could something like this be done or did I watch too many movies?

Re: Could something like this be done or did I watch too many movies?

Re: Could something like this be done or did I watch too many movies?

Bookmarks

Bookmarks

Posting Permissions