It is remarkable how, when I look through my box of FLOSS research tools, so many of them are pre-existing tools written by others. In the toolbox (or more precisely, in the directory called “tools”) there are also many self-authored programs or glue code, usually put together in a scripting language, but nevertheless the overall contents of the toolbox are a result of my own strategy. When seeking to put a new tool in the box, the first thing is try to borrow my neighbour’s instead. Searching on the Web or through research papers may reveal a program that is already capable of doing what you need.
Here is a handful of them that I have found useful so far (and which I will blog about in future):
SLOCCount
A splendid little utility for counting the lines of executable code in a project, with additional capabilities for distinguishing between an abundance of programming languages, counting by directory, and even estimating the cost of producing the project!
Doxygen
A documentation system usable for a handful of popular programming languages. You can construct documentation from markup embedded in the code (akin to Javadoc), or extract the structure from undocumented code. The various output formats, and even the intermediate files meant exclusively for Doxygen’s usage, mean that there’s lots of mining to be done by the keen researcher.
Statcvs/Statsvn
Retrieves information from a CVS/Subversion repository and generates various tables and charts describing the project development. Formats output in HTML and XML for parsing at your pleasure.
FLOSSmole
Less a tool, but more of a database produced by a webcrawling tool. FLOSSmole provides downloadable raw data about FLOSS projects in multiple formats.
What is noticeable about these (with the probable exception of FLOSSMole) is that I am using them in different ways to their mainstream usage. I will not presume to claim knowledge of the authors’ original intents, but did Doxygen’s project manager, Dimitri van Heesch, ever imagine that someone would be using the “useless” intermediate files left over by Doxygen (which are normally deleted) to perform complex coupling analyses? No doubt he would not be displeased if he knew, but I am sure his mind is most occupied with how it is put to its advertised purposes. On the other hand, it is very difficult to imagine my self-authored tools put to any use other than the ones for which I created them, which is why they exist in the first place: their purpose is so specific, nobody else has yet needed their capabilities.
My FLOSS research toolbox (and, I would venture, those of other FLOSS researchers) has been opportunistically built up over time. I think this is necessarily so. When the capability you want already exists in another program then you take it, in finest FLOSS tradition, thereby enjoying the many fruits of collaborative work like passing around new features, sharing bug-fixes, and preventing duplication of effort. When no tool can suit my purpose (or be adapted to do so), then I must fill the gap with my own creation. No slight revision of the research goals to suit a “near enough” program I have found — tool availability must never dictate the direction of the research — I just have to grit my teeth and hack out some code.
Then I feel like a real programmer again.