Beagle - Desktop Searching with Linux
A user's personal file space is sacred, and generally reflects that person. Many times I wake up in the morning and cannot find my shoes. Unfortunately, the first solution that pops into my head is Google.
Lucky for most Windows users, Google has released the 'Google Desktop Search' , a utility that indexes a hardrive and provides Google-like search capabilities to personal documents. Unfortunately, Google seems to have forgotten their Linux heritage (the Google crawler, indexing, and search engines run on a heavily modified version of Linux). There is no such application ported to Linux.
Consequently, only months after Google's beta release of GDS, other companies scrambled to provide similar solutions. (I wish to note, Google DID NOT invent the desktop search, the just noticed a potential market and entered it with one of the best offerings) The new Spotlight feature for OS X and the promise of a better search engine for Windows have made a powerful desktop search the 'must have' of next generation desktops.
My home folder at home is an incredible mess of mp3's, essays, and source code. Throw in my Maildir, and we have quite a mess. Once every month or so I attempt to make sense of the incredible mess, sorting so deep that many times files are buried 5 or 6 layers deep. This lasts for 2-3 days before I start to get lazy again and justify that I'll just extract the tar here now, and move it later. Soon I have accumulated several hundred files in home again, and the process repeats. (The only thing remotely organized is my Music, all in one directory, managed by RhythmBox...) Like some horrible RPM-induced circular dependency (sorry, Debian fan, that was a low blow) I seemed doomed to just repeat the process over and over.
The sales pitch for a desktop search is as follows: Don't organize your stuff. Thats right, just leave it their, why worry about complex directory structures that you find far to complicated or time consuming to maintain? With a truly comprehensive desktop search, you just need to vaguely remember what your looking for and your in good shape.
Unfortunately all my reading had yet to turn up a variant with a Linux port. My first experiment was with the find and locate (slocate in my case) commands, while locate was fast, I was stuck with filenames along, nothing related to content. Find struggled with speed and was only really useful for text files, documents with markup (think OO.org) were difficult, and the search was simply all files that contained the specified string, nothing to do with priority.
A long and painful series of Googles finally lead me to my savior, and what the article is actually about. A desktop search for Linux called Beagle, also a part of the Gnome desktop, and (a little bit surprising) written in C# utilizing mono. Beagle is the result of a massive redirect of the Dashboard project (No, not the Apple dashboard, the Gnome one) Dashboard was a program intended to index a users personal files, and display pertinent information as the user went about basic desktop tasks. Beagle was the realization that a strong and independent backend was the ideal way to handle this. Beagle is really just an indexing daemon, and the application 'Best' is the search front end. Regardless, Beagle proves that Linux can remain competitive and innovative in the desktop market.
Beagle is still in an Alpha stage, with the latest snapshot being version 0.12, packages exist for both SuSE and Ubuntu (along with others) but no one on the development team would be quick to recommend this for large scale deployment yet. However, for home or personal use, it fits the bill quite nicely.
The initial setup the the one that all of the worlds anti-linux zealots love, citing long dependency list, kernel compiles and modification of all sorts of system configurations. Lucky the build process is well documented and easy to follow, copy and paste for much of it. For beagle to perform its best, it requires both an inotify enabled kernel and extended attributes for the target file system. The current upstream kernel has both of these capabilities included by default (Extended Attributes for ext3, the most common Linux fs) so we can expect that in the next several months the install will get considerably easier.
Once installed, the beagle daemon runs as an unprivileged user, and will actually refuse to run as root. The home directory is indexed by default, but extra directories for indexing can be added (I included /var/www and /var/mail/kevin). Beagle quickly starts the indexing process. Beagle can index a wide assortment of files including OO.org documents, MS Word documents, AbiWord documents, Music Files (meta data), Images (meta data), man pages, virtually any text file (including source files, Docbook, rtf and HTML), PDF, Gaim logs, Evolution mail and address books, the Tomboy note system, and several RSS aggregatiors. In short, most all the data that a desktop user access' regularly. The initial indexing ran overnight, and was light on the system resources. (A user can also force a faster indexing by turning the 'nice' mode off).
A screenshot of Beagle in action!
What made beagle impressive was not only the speed and precision of searches (generally quite high, once I removed the copy of OpenOffice source I had forgotten about) but its integration with other programs and the Gnome desktop. Snippets of files searched are shown Google style, logged conversations also mark the status of the conversations participants. A plug in for Firefox indexes web pages as they are viewed, creating a massive web history index (can prove very handy).
While were talking about Beagle's extensibility, I want to give a quick nod to some of Beagle's features untested in this review. First is a web interface to the search, primarily intended to work in conjunction with the next feature, the ability to perform distributed searches over a network. While the whole system is still considered in beta, this feature is one of the least developed, but still, should it reach a solid implementation, it would present the business user with another strong Open Source incentive.
If inotify support is included in the kernel, then results are updated in real time. For example, I run a search on Jane's Wedding, looking for some photos, it turns up a few, but apparently I don't have the one I'm looking for, so I IM Jane and ask her if she has them, the conversation now appears in the search results, she e-mails me the photo, and as it arrives, the search results update again to show the e-mail. (All of this functionality can be viewed as a demo
here!)A very cool feature that ups the 'wow' factor of Beagle and is probably the feature that won Beagle into the Ubuntu, Debian, Gentoo and SuSE package repositories.
While Beagle is definitely still a late-alpha/early-beta release (crashes and resource hogging are something to be expected), it shows great promise. If your willing to restart the indexing daemon every few hours, then you are in for a treat, Beagle is not only wicked cool, but fast to boot.
You can find Beagle here:
http://beaglewiki.org/Main_Page