A FLOSS Research Toolbox

It is remarkable how, when I look through my box of FLOSS research tools, so many of them are pre-existing tools written by others. In the toolbox (or more precisely, in the directory called “tools”) there are also many self-authored programs or glue code, usually put together in a scripting language, but nevertheless the overall contents of the toolbox are a result of my own strategy. When seeking to put a new tool in the box, the first thing is try to borrow my neighbour’s instead. Searching on the Web or through research papers may reveal a program that is already capable of doing what you need.

Here is a handful of them that I have found useful so far (and which I will blog about in future):

SLOCCount

sloccount
A splendid little utility for counting the lines of executable code in a project, with additional capabilities for distinguishing between an abundance of programming languages, counting by directory, and even estimating the cost of producing the project!

Doxygen

doxygen
A documentation system usable for a handful of popular programming languages. You can construct documentation from markup embedded in the code (akin to Javadoc), or extract the structure from undocumented code. The various output formats, and even the intermediate files meant exclusively for Doxygen’s usage, mean that there’s lots of mining to be done by the keen researcher.

Statcvs/Statsvn

statsvn
Retrieves information from a CVS/Subversion repository and generates various tables and charts describing the project development. Formats output in HTML and XML for parsing at your pleasure.

FLOSSmole

flossmole
Less a tool, but more of a database produced by a webcrawling tool. FLOSSmole provides downloadable raw data about FLOSS projects in multiple formats.

What is noticeable about these (with the probable exception of FLOSSMole) is that I am using them in different ways to their mainstream usage. I will not presume to claim knowledge of the authors’ original intents, but did Doxygen’s project manager, Dimitri van Heesch, ever imagine that someone would be using the “useless” intermediate files left over by Doxygen (which are normally deleted) to perform complex coupling analyses? No doubt he would not be displeased if he knew, but I am sure his mind is most occupied with how it is put to its advertised purposes. On the other hand, it is very difficult to imagine my self-authored tools put to any use other than the ones for which I created them, which is why they exist in the first place: their purpose is so specific, nobody else has yet needed their capabilities.

My FLOSS research toolbox (and, I would venture, those of other FLOSS researchers) has been opportunistically built up over time. I think this is necessarily so. When the capability you want already exists in another program then you take it, in finest FLOSS tradition, thereby enjoying the many fruits of collaborative work like passing around new features, sharing bug-fixes, and preventing duplication of effort. When no tool can suit my purpose (or be adapted to do so), then I must fill the gap with my own creation. No slight revision of the research goals to suit a “near enough” program I have found — tool availability must never dictate the direction of the research — I just have to grit my teeth and hack out some code.

Then I feel like a real programmer again.

CSMR – Day 2

CSMR 2009CSMR 2009 soldiers on.

Today’s keynote was delivered by Tibor Gyimothy which looked at software metrics from the developer’s point of view. He presented the results of a study where developers were surveyed for their opinions on various metrics.

The developers were divided based upon such attributes as experience, platform knowledge (Java, C/C++, C#, SQL), and open source participation. They were questioned upon their opinions of metrics that measured size, complexity, coupling, and cloning, and the results were analyzed for correlations. The kinds of questions included “which metrics effect your understanding”, or “which metrics affect the testing effort”? There were many interesting differences between groups (too many to mention all), but examples included:

  • It was agreed that complexity, coupling, and clones affect understanding, but experienced developers disagreed with inexperienced developers whether size was an important factor.
  • Experienced developers were more insistent that smaller classes help testing.
  • Inexperienced developers tend to reject large generated classes, but experienced ones accept them.
  • Experienced developers prefer absolute metric values rather than the change in the values.

Aside from the keynote, architecture certainly seems to feature heavily this year at CSMR — maybe I have a faulty memory, but it seems to be more so than before. The panel discussion, apparently the first such discussion at CSMR, stimulated some slightly intense debate on the subject of collaborative tool use between academia and industry. It was a shame that we had to cut short, because I do love a good debate. I hope that I see such discussions in future conferences, but I would suggest that it be conducted with a little tighter discipline. Participants were allowed to make slide presentations of arguments and run over allotted time; I’d prefer a format where the chairman presents the arguments and takes a firmer hold of the participants (like the TV programme we have in the UK Question Time).

Looking forward to tomorrow: the software evolution track and the European projects track featuring some meaty FLOSS stuff as well as yet more evolution.

In The Beginning…

Why write a blog?

Well, why not. It seems like everyone else is.

I’ve been racking my brains to decide what I have to blog, or rather what is interesting enough to share with people. My field is computers; specifically research. I’ve been spending a few years researching free/open source software now, and I think I’ve got into the stride of things enough now to start to write about it.

In this blog, most of the time I plan my entries to fall into one of three categories:

  1. Posts about my research: I’ll share my various little findings that might be of interest to people who want to understand more about free/open source. I’ll try and make them as to understand as possible — if you want the real technical treatment, I’ll point you to the technical paper.
  2. About approaches to research: I also want to pass on the methods and tools you can use to carry out research on software. I hope this will be of interest to practitioners as well as researchers.
  3. Videos: Another little pet project of mine (called Computer Floss) is to produce a series of videos for a general audience that explains all the various facets of open source. I’ve already begun, and you can see them over at:

http://youtube.com/user/directrod

Don’t ask why my username there is directrod.