Would you use a database file system?

**starcraft.man** · September 23rd, 2007

I don't see a need for a database file system. I keep all the folders I use often on my desktop/in places and none are more than 3 clicks away from me. I believe some people just need to be a bit better organized.

**happysmileman** · September 23rd, 2007

Originally Posted by Npl

Need searching for songs by artist, album, whatever: use a Player that uses a database. I dont need a filesystem that tries to accomplish everything.

Agreed, Amarok creates database of my music, that's all I need... If I want to look at a PDF file or anything I know exactly where they are in /home/paul/

**dyssident** · September 23rd, 2007

i think we can all agree that the way apps like amarok use metadata is elegant and generally cool. just imagine if such functionality was built into standard libraries so that all apps could take advantage of it seamlessly, instead of each building their own indexing/metadata reading stuff like amarok has.

imagine telling gedit to open all files that mention walruses or telling gimp to open all images containing walruses. for that matter, imagine telling an command line app to do the same thing. unless you already had your filesystem organized this way, it would be time consuming to do, likely requiring some advanced 'find' knowledge.

im not a fan of tagging. its really a substitute for effective artificial intelligence. plus, metadata like tags doesnt travel well. few file formats, most notably many commonly used image types, dont have an equivalent of an ID3 tag.

files already tell you alot about themselves with their content. my hope is that one fine day well be able to use machine learning and image recognition to identify the main ideas of text, music and images so that tagging wont be necessary.

**thisllub** · September 23rd, 2007

None of this discussion focuses on the issue of efficiency.

Consider a system that requires storage of > 1 million documents. Databases handle this easily through B-Tree indexes that exponentially reduce filesystem accesses. File systems aren't anywhere near as efficient.

Have you ever tried "mv thesefiles/* thosefiles/." with thousands of files? How about files within a date range?
You can do it with find and xargs but it is clumsy.

A command "move thesefiles/* where dateaccessed between last monday and today to thosefiles" would be a HUGE improvement.

A database file system that had efficient large binary file storage combined with SQL access and compatibility with the current hierarchical structure and commands would be a huge leap forward.

**Npl** · September 23rd, 2007

Originally Posted by thisllub

None of this discussion focuses on the issue of efficiency.

I already said its inefficient

Originally Posted by thisllub

Consider a system that requires storage of > 1 million documents. Databases handle this easily through B-Tree indexes that exponentially reduce filesystem accesses. File systems aren't anywhere near as efficient.

You conviniently forgot to tell that those structures only contain a single property - be it filename, date, accessrights, etc..
you need 1 tree for EACH property. (Add another index for full-textsearch if you consider it part of a DBFS)

Originally Posted by thisllub

Have you ever tried "mv thesefiles/* thosefiles/." with thousands of files? How about files within a date range?
You can do it with find and xargs but it is clumsy.

A command "move thesefiles/* where dateaccessed between last monday and today to thosefiles" would be a HUGE improvement.

Dont forget you have a DB for the WHOLE partition, so you dont have to search 1000s of files which dates match, but all files on the partition (10000s? 100000s?). Then you also have to do a second search which path begins with "*currentdir*/thesefiles/" . And then take those files which were found in both searches.

Originally Posted by thisllub

A database file system that had efficient large binary file storage combined with SQL access and compatibility with the current hierarchical structure and commands would be a huge leap forward.

There are ALOT of operations that will be way slower.
Browsing directors wil require DB-Search instead just grabbing a few lists, writing files will need updating alot of your B-Trees (not just one), those Btrees also need to be stored somewhere too (both on HDD and - if you want any useable performance - cached in Memory)...

Why would you want to force a damn-clumsy-in-many-respects DB upon everything? Im not arguing that there are also many cases where a DB is valuable, but in those cases, just teach the program to use a DB-library. Or in your case it would be thinkable to bind a DB to a directory you regulary need to do complex searching (but then you could just store/load directly into a Database as well).

In short - replacing filesystem is bad, augmenting where needed good.

**thisllub** · September 23rd, 2007

Originally Posted by Npl

Dont forget you have a DB for the WHOLE partition, so you dont have to search 1000s of files which dates match, but all files on the partition (10000s? 100000s?). Then you also have to do a second search which path begins with "*currentdir*/thesefiles/" . And then take those files which were found in both searches.

That is not how an indexed search works.

Originally Posted by Npl

There are ALOT of operations that will be way slower.
Browsing directors wil require DB-Search instead just grabbing a few lists, writing files will need updating alot of your B-Trees (not just one), those Btrees also need to be stored somewhere too (both on HDD and - if you want any useable performance - cached in Memory)...

Some of that is true but compared to a journaling filesystem the difference in writes would not be that great. For individual document retrieval the performance would be greater where there are large numbers of files.

Originally Posted by Npl

Why would you want to force a damn-clumsy-in-many-respects DB upon everything? Im not arguing that there are also many cases where a DB is valuable, but in those cases, just teach the program to use a DB-library. Or in your case it would be thinkable to bind a DB to a directory you regulary need to do complex searching (but then you could just store/load directly into a Database as well).

In short - replacing filesystem is bad, augmenting where needed good.

I have to disagree.
I see no need to replace filesystems on partitions that are used for what the average user does i.e. a standard install, but once the numbers of files gets into the thousands, the standard file management tools struggle to cope.

Applications like complex websites would be far easier to build, maintain and backup.

**happysmileman** · September 23rd, 2007

Originally Posted by dyssident

i think we can all agree that the way apps like amarok use metadata is elegant and generally cool. just imagine if such functionality was built into standard libraries so that all apps could take advantage of it seamlessly, instead of each building their own indexing/metadata reading stuff like amarok has.

Agreed... Amarok should index our music... Kaffeine should index all your video... KWord should index all your ODF's etc... (Yes I'm just looking through KMenu for examples)

Having entire FS databased is stupid... No-one wants to database all the crap in /bin and /libs that they'll never directly use (well they'll use some stuff in /bin)

**Npl** · September 24th, 2007

Originally Posted by thisllub

That is not how an indexed search works.

I worded it wrong, you are searching through file-descriptors, is that what you mean?. But tell me how you intend to search for a range of dates with one global index.

Originally Posted by thisllub

Some of that is true but compared to a journaling filesystem the difference in writes would not be that great. For individual document retrieval the performance would be greater where there are large numbers of files.

and why would you think so? In an hierarchical filesystem you can arrange the filedescriptors in an sorted list or BTree aswell, giving it an edge if all required files are only in a couple directories.

Originally Posted by thisllub

I have to disagree.
I see no need to replace filesystems on partitions that are used for what the average user does i.e. a standard install, but once the numbers of files gets into the thousands, the standard file management tools struggle to cope.

Depends highly on what you want to do with it. Other than having complex searches on a big number of files I dont see the point of it.

**Sluipvoet** · September 24th, 2007

I keep my /home folder tidy and organised with hierarchial folders and subfolders. I know exactly where to look for something.
I just can't image doing the same with a DBFS.

PS. This labels vs. folders thing kinda reminds me of the epiphany bookmarking system. And that Epiphany bookmarking system made me switch from GNOME to KDE.