 |
06-20-2008
|
#1 (permalink)
| | Just Joined!
Join Date: Jun 2008
Posts: 2
| filename indexing in linux Greetings, (first time here!)
I was wondering how linux filename access works.. I mean, performancewise, why didn't berkeleydb/sql etc just create a file for every record?
For example, if you're indexing something by username, why not just create a file for every username? I'm guessing there's a reason, e.g file systems don't index as efficiently as SQL, but I'm not exactly sure, and I don't know the jargon to search it in google..
So yeah, does anyone know?
Thanks in advance |
| |
06-20-2008
|
#2 (permalink)
| | Linux Engineer
Join Date: Nov 2007 Location: Córdoba (Spain)
Posts: 1,016
| It's an old debate. Both, databases and filesystems do have some common aspects at conceptual level, but they are quite different on some things. There has also been some attempts to *merge* both paradigms like: Slashdot | How To Implement A Database Oriented File System https://jdbfs.dev.java.net/ Database filesystem
All of these are quite different on their purpose and phylosophy. Also famous nowadays are the virtual filesystems that some desktops do have nowadays to abstract the underlying real filesystems and hardware devices. For example, I know that kde (4.x) has something in that regard involving clucene and that can use database backends for some purposes, though I must admit that I am particularly ignorant at this things.
But to answer your question (as far as I can), I will comment some random things that come to my head about traditional databases vs. filesystems (traditional ones as well): - searches on db's are damn quick compared to ANY fs
- db's allow you to include a random amount of metadata linked to any element, while fs's have usually more limitation on this regard
- any app can access a file on your filesystem, while retrieving data on a database needs specific support
- indexing lots of files is slow at filesystem level, while indexing elements in a database is easier, this is because of a simple reason: fs's are tied to a layout that's is, parly, on-disk, while db's operates at an higher level and don't care about the disk layout
- db's allow simpler backups and integrated security maintenance
- large db's need space to scale, keep in mind that if you put EVERYTHING including images and videos directly onto your db, the db will grow, and the ram requirements can grow up exponentially, this is why really big project usually pick an hybrid solution and save only pointers to a file in the filesystem instead of the whole files inside a database if these files are really big
What suits better in general? Impossible to say, it really depends on the nature of the project and the developer.
Cheers. |
| |
06-20-2008
|
#3 (permalink)
| | Just Joined!
Join Date: Jun 2008
Posts: 2
| Thanks for the swift reply! Good to know that there's plenty of discussion on this before.. now there's materials to look at. Quote: |
searches on db's are damn quick compared to ANY fs
| With that, do you mean searches on fields? I guess you have to have indexes, and once that happens it becomes a db eh
I was curious how dbs would fare against fs, if all you're going to be searching for is the record name. For example, having all the records named "john.record, mary.record" etc, and never searching for the fields inside, but rather only straight file access by filename..
Would that be faster than having a db, with a field for the name ('john', 'mary') and indexing that, then using a SELECT clause to access them? Quote: |
indexing lots of files is slow at filesystem level,
| I don't quite understand.. care to elaborate?
Thanks! |
| |
06-20-2008
|
#4 (permalink)
| | Linux Enthusiast
Join Date: Mar 2007
Posts: 523
| I dunno, I should probably stay out of this. But anyway, compare the performance of the find command to the updatedb & slocate combo. You'll find updatedb takes as long as find / -name *foo -print. But after that, slocate is instant, whereas find will have to work it's way through the entire system again (at least after reboot, I'm disregarding the intriquacies of memory management on running systems)
__________________
Can't tell an OS by it's GUI
|
| |
06-20-2008
|
#5 (permalink)
| | Linux Enthusiast
Join Date: Apr 2006 Location: Saint Paul, MN, USA / CentOS, Solaris, SuSE, Xandros
Posts: 675
| Hi.
I have had only a small exposure to MUMPS. If you like odd languages and systems, you may be interested in it. Quote: |
The most outstanding, and unusual, design feature of MUMPS is that database interaction is transparently built into the language. The MUMPS language assumes the presence of a MUMPS hierarchical database, which is implicitly "opened" for every application.
| More at MUMPS - Wikipedia, the free encyclopedia for an overview and a Google search for other pages ... cheers, drl
__________________ Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
Last edited by drl; 06-20-2008 at 03:04 PM.
Reason: Clarify.
|
| |
06-20-2008
|
#6 (permalink)
| | Linux Engineer
Join Date: Nov 2007 Location: Córdoba (Spain)
Posts: 1,016
| Quote:
Originally Posted by underthesun With that, do you mean searches on fields? I guess you have to have indexes, and once that happens it becomes a db eh  | Well, it's not just the indexes. Most stuff in db vs. fs is just a semantic question from my point of view but the main technical difference is that fs's are dependant on the disk layout, at a physical and logical level, while databases don't have to worry about that stuff. Filesystems need to understand clusters, sectors, inodes, deal with fragmentation and the like, db's just store registers into a wide land. But, at any point, both have to interact, so it's not a simple question.
I don't have a deep knowledge on any of these fields, and most of my knowledge is just theoretical from my college days, so I will not go into deeper stuff because I would surely missguide you. Having revealed some things in my other post, I will leave at your discretion to search for some more low-level info if you are interested. Or wait until a more knowledgeable person answers which is also possible. Quote: |
I was curious how dbs would fare against fs, if all you're going to be searching for is the record name. For example, having all the records named "john.record, mary.record" etc, and never searching for the fields inside, but rather only straight file access by filename..
| Well, my guess here is that all this info is cached when you open a database, so even file access might be a bit faster when done within the range of the files pertaining to a given db. But just for that: caching or whatever. However, it's just a guess. Don't trust it. Quote: |
Would that be faster than having a db, with a field for the name ('john', 'mary') and indexing that, then using a SELECT clause to access them?
| No idea. Quote: Quote: |
indexing lots of files is slow at filesystem level,
| I don't quite understand.. care to elaborate?
| Well, maybe I did not word it correctly. Sorry, I am not a native English speaker. I meant that even if you maintain some kind of index technique, let's say in a C program, you are still bound to the intrinsic "problems" of the filesystem. It doesn't matter how fast your program is, because accessing files in a file system is slow. On the contrary, accessing records in a database is usually faster because everything is done in ram, then dumped, besides that the two operation models are quite different. As I said above, I don't have the technical knowledge to go further without risking going into inaccuracies (if I haven't done so yet :P ).
All in all, I hope it helped in something. Cheers.  |
| | |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | | | | Thread Tools | | | | Display Modes | Linear Mode |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | All times are GMT. The time now is 11:14 AM. |
| |