US researchers have tweaked open-source techniques to gather better bug reports from software users
Open-source fans long have espoused the idea that "given enough eyeballs, all bugs are shallow" -- but the concept is getting a new twist in a university project.
Researchers at the University of California and at Stanford University have released versions of several open-source software packages modified to send debugging information to a central site, letting people who use the software contribute to the bug-hunting effort.
If the Cooperative Bug Isolation Project can get enough people to use the special versions, they essentially will be providing the eyes of debuggers more peepholes into software's inner workings.
"We're actually trying to enlist some of the users' horsepower to really find the bug, to give the engineer some information that will lead him to the bug more directly," said Ben Liblit, a Berkeley graduate student and project member.
Microsoft long has used software called Dr. Watson to harvest debugging information over the Internet, but the Berkeley-Stanford project takes an open-source tack on the problem. It provides software called "sampler" that open-source programmers can add to their own software to aid in debugging.
When the sampler software is inserted into a program -- a process that happens through use of a lightly modified version of the widely used GCC programming tool -- the resulting program is "instrumented," with instructions that capture data as the program runs. For example, it can record which path a program takes each time it runs into a choice of directions, Liblit said.
One key part of the project is ensuring that the sampler software doesn't bog down the program; the project's goal was to slow performance only by as much as 5 percent, Liblit said. To avoid this degradation, the sampler software records information only occasionally, based on a randomisation scheme. One thing that's recorded every time, though, is whether the program exited properly or crashed.
A debugger then can compare two sets of data: the choices the computer made when the program behaved properly and the choices it made when it failed. While many of those choices likely will be identical, the hope is that where differences are found, a debugger will be able to pinpoint what parts of the program are likely suspects, Liblit said.
"It's hard to predict how valuable the data will be, but I think the idea of doing it statistically over a large number of runs is quite valuable," Illuminata analyst Jonathan Eunice said of the project.
Enlisting software users The researchers now face a new challenge beyond engineering and math: finding cooperative software users, Eunice said.
"It becomes a social issue," he said. "Can you instrument a large number of users and get truly representative information?... To do statistics, you need the vast number of runs."
One way to get more participants would be to get the open-source projects, or even companies, to release feedback-enabled versions of software. But currently, the project members don't plan on making the sampler-enabled programs more widely available, Liblit said.
"If we could get Red Hat to pick this up, that would be wonderful and amazing," he said, but he recognises that his experimental research realm is different from the corporate world where programmers are cautious.
The project so far has released six software packages that run on Linux -- the Evolution email software, the Gaim instant-messaging program, the Gnumeric spreadsheet, the Gimp graphics editor, the Rhythmbox music player and the Nautilus file manager.
In the overall software industry, engineers are starved for debugging data, and any increase is a good thing, Eunice said.
"Any metrics are better than the state of the art, which is almost no metrics," he said.
The Cooperative Bug Isolation Project is part of Berkeley's Open Source Quality effort that began in 2000.