PDA

View Full Version : [all variants] It's 2009. Why are we still guessing file types?



bobince
August 3rd, 2009, 09:46 PM
Seriously. Content-sniffing causes no end of problems with files appearing as the wrong type. The one programmers are most aware of is misdetection of programming languages based on keywords (and most commonly of all, misdetection of template files like JSP as HTML because they have some tags in), but setting file type by sniffing content is utterly bogus on every level — as well as being slow.

Yes, I know Windows-style reliance on file extension is also bad, but at least it's bad in a consistent way everyone understands. You don't find a text file you download mysteriously mutating into a .desktop file or a GIF or something else random when it arrives just because it has something near the beginning that can be construed as a magic word. GNOME and KDE are using both file extension and file contents as type markers, ensuring we get the worst of both worlds.

There is a proper fix for this, mentioned at freedesktop.org: storing a media type string in the extended file attribute (xattr) ‘user.mime_type’. Why are we not using this, and when can we start? All current Linux filesystems support xattrs, but GNOME gvfs*, and as far as I can tell KDE too, are paying no attention to the xattr even if you set it manually from the command line.

(*: at least under Ubuntu; GNOME claims to support it but it doesn't work for me at all.)

I admit it will take a long time to persuade applications to write their files' mime_type xattrs as a matter of course, but until there is at least recognition of the mime_type xattr from gvfs and kio, and a user interface to set them explicitly in Nautilus and Dolphin, we'll be stuck with the current schema of “look at the filename and guess; look at the file contents and guess again”, which is utterly unsatisfactory.

asmoore82
August 3rd, 2009, 10:03 PM
Yes, I know Windows-style reliance on file extension is also bad, but at least it's bad in a consistent way everyone understands.

You have grossly over-estimated the populace. :KS

People are _still_ clueless about file extensions, but what else can you expect
when ******* and mac _still_ hide them by default.

It's a damn shame ... I work for the school system,
and the standard 8th grade technology test has a question like

Which of the following files is most likely a picture?
A. 01.TXT
B. 01.ZIP
C. 01.JPG
D. 01.EXE

... and the students have been using computers with the extensions hidden ...

Chowderpilot
August 4th, 2009, 12:58 AM
That's a valid point. I would also submit that the inability (or at least difficulty) to arbitrarily execute code on Linux would help keep it safe, even while enforcing standardisation of file extensions. While I haven't had any real issue with the file extensions, I could see how it might be a barrier to entry into Linux for some folks.