A FLOSS Research Toolbox

It is remarkable how, when I look through my box of FLOSS research tools, so many of them are pre-existing tools written by others. In the toolbox (or more precisely, in the directory called “tools”) there are also many self-authored programs or glue code, usually put together in a scripting language, but nevertheless the overall contents of the toolbox are a result of my own strategy. When seeking to put a new tool in the box, the first thing is try to borrow my neighbour’s instead. Searching on the Web or through research papers may reveal a program that is already capable of doing what you need.

Here is a handful of them that I have found useful so far (and which I will blog about in future):

SLOCCount

sloccount
A splendid little utility for counting the lines of executable code in a project, with additional capabilities for distinguishing between an abundance of programming languages, counting by directory, and even estimating the cost of producing the project!

Doxygen

doxygen
A documentation system usable for a handful of popular programming languages. You can construct documentation from markup embedded in the code (akin to Javadoc), or extract the structure from undocumented code. The various output formats, and even the intermediate files meant exclusively for Doxygen’s usage, mean that there’s lots of mining to be done by the keen researcher.

Statcvs/Statsvn

statsvn
Retrieves information from a CVS/Subversion repository and generates various tables and charts describing the project development. Formats output in HTML and XML for parsing at your pleasure.

FLOSSmole

flossmole
Less a tool, but more of a database produced by a webcrawling tool. FLOSSmole provides downloadable raw data about FLOSS projects in multiple formats.

What is noticeable about these (with the probable exception of FLOSSMole) is that I am using them in different ways to their mainstream usage. I will not presume to claim knowledge of the authors’ original intents, but did Doxygen’s project manager, Dimitri van Heesch, ever imagine that someone would be using the “useless” intermediate files left over by Doxygen (which are normally deleted) to perform complex coupling analyses? No doubt he would not be displeased if he knew, but I am sure his mind is most occupied with how it is put to its advertised purposes. On the other hand, it is very difficult to imagine my self-authored tools put to any use other than the ones for which I created them, which is why they exist in the first place: their purpose is so specific, nobody else has yet needed their capabilities.

My FLOSS research toolbox (and, I would venture, those of other FLOSS researchers) has been opportunistically built up over time. I think this is necessarily so. When the capability you want already exists in another program then you take it, in finest FLOSS tradition, thereby enjoying the many fruits of collaborative work like passing around new features, sharing bug-fixes, and preventing duplication of effort. When no tool can suit my purpose (or be adapted to do so), then I must fill the gap with my own creation. No slight revision of the research goals to suit a “near enough” program I have found — tool availability must never dictate the direction of the research — I just have to grit my teeth and hack out some code.

Then I feel like a real programmer again.

A Review of “Inside the Anthill: Open Source Means Business”

A recent radio broadcast on BBC Radio 4 (your grandmother’s favourite radio station) entitled “Inside the Anthill: Open Source Means Business” was advertised as “Gerry Northam goes behind the scenes to investigate ‘open source’ computer software“. (Spot the irony in going “behind the scenes” to investigate something that is done openly and transparently.) But let us immediately get one thing straight: the programme was mostly about the principles of openness and distributed collaborative projects in general, rather than exclusively about FLOSS. There is nothing wrong with that, of course, but I sympathize with the purist who finds it grating when the two are conflated. It also will not help the purist that the host does not get it quite right on occasion, such as by describing Linux as the first major open source project.

But still, this is radio for the generation of grandmother not Grand Theft Auto. Perhaps we should forgive some over-simplification? After all, the programme is clearly aimed at those who know little more than the phrase “open source” and who know it has something to do with computers. When the host is interviewing FLOSS developers (which is also when the programme is at its most interesting), he restricts his questions to the basics. The guys at Mozilla get the “why get involved?“, “how do you co-ordinate it all?“, and “who makes the decisions?” questions, while Linux, which seems to be held up as the exemplary FLOSS project, gets “why is it not more popular?“, “are people paid?“, and “where does the money come from?” The host promptly follows the money to IBM, and listens as members of the Linux Technology Centre give glimpses of their modus operandi.

After this, the show leaves the techies behind, and talks to people who apply the principle of open source outside of the computer world. We are treated to people from organizations like Wikipedia and Goldcorp, and from other observers, who give their predictions about open collaboration. Here is where my interest began to wane, because the talk starts to become a little woolly as the interviewees leave aside the specifics, and predict how businesses and governments will take the open principle and the new technology to become more democratic, cheaper, faster, better etc.

Finally, the show links back to its title by breaking down the analogy it set up in the first place (an open source project as a colony of ants), stating that a true anthill needs no hierarchy or centralized decision-making, things which are seen in the examples examined. (Think Linus and his lieutenants, or the guys who decide that Firefox needs to go to 3.5.)

In summary, being only half an hour in length, the programme could not have hoped to go into any real depth, but it may stoke the fires of general interest in the uninitiated listener.

What’s in a Name? Free vs. Libre vs. Open Source

The next episode of Computer Floss, a YouTube series aimed at educating about FLOSS. This one attempts to clear up the naming issues and the so-called differences between Free/Libre/Open Source. Here’s a transcript:

So far in this series, the terms “free software” and “open source” have been used somewhat synonymously, and this has been a little naughty of me, because there are actual differences in the usage of these terms, even though the extent of the difference is arguable. Let’s take a closer look.

Richard Stallman... freedom!
Richard Stallman: freedoooom!

As explained in previous videos, if you’ve been following the plot, when the father-figure of the movement, Richard Stallman, put together his ideas for freely using, changing and distributing source code, he named it “free software”. He tackled the obvious ambiguity in this name by distinguishing between “free-as-in-speech” and “free-as-in-beer” to make it clear that making money from free software is certainly permitted.

But to Stallman the moral issues were of supreme importance. To him, denying other people the right to fix, adapt or improve software themselves, and to share these changes, was immoral and anathema to progress. He argued it was divisive of society, and hindered the ability of people to learn and help others. What a nice chap.

And so, the free software movement continued. In the late 1990s, the term “open source” was coined; this was the result of the coming together of a number of programmers and businessmen, among them Eric Raymond, whose essay “The Cathedral and the Bazaar”, had kick-started the meeting and went on to become an important book about the theories behind open source. These were guys who agreed with the principles of Stallman’s Free Software Foundation, but didn’t like the name or the moralising attitudes behind it, so they set up the Open Source Initiative and infused it with a politically neutral and business-friendly image, like a marketing make-over.

Eric Raymond... A business friendly imaaaage!
Eric Raymond: Marketing savvy and a business friendly imaaaage!

Yes, there was a divergence. You might have in your mind an image of the old groups of radical left-wingers, who would form political groups, only to splinter predictably into warring factions even though they believed the same things.

Here we go again, you might think, another movement splintering. Just hurry up, collapse and let us get on with our lives. But there’s a curious difference in this case: whilst the radical left-wing hippies of yesteryear shared the same ideologies, but differed over the practicalities, free software and open source form the mirror image of this: it is their ideologies and motivations that differ, but in practice they do most things the same. The Free Software Foundation and the Open Source Initiative both do things like approve licences, support communities and provide consultations; there’s considerable overlap where they do it and sometimes they even collaborate.

And that’s why the whole thing doesn’t collapse in on itself. Let’s say you want to get involved: You could join up with the free software lot and be motivated by promoting the user’s freedoms, or you can go over to the open source bunch and espouse the pragmatic and economic benefits. You can even give both of them the finger and just concentrate on developing stuff that others can use, change and redistribute, because even then, you’re still helping everyone in the community, regardless of their persuasion.

As if I’ve not bored or confused you enough, there’s even a third label that the Europeans have come up with. The chaps on the continent use different words to describe “free as in freedom” and “free as in price” — how cunning of them — the freedom sense being translated as “libre”. Libre software then, is an unambiguous name for free software.

So, to answer the question “What’s in a Name?” — it turns out, not much. Free, Libre, Open Source: they’re just labels, and by their definitions they pretty much describe the same thing. But they can be misused or misunderstood like any other label, so rather than rely on the name, just ask those three magic questions: can I see the source? can I change the source? can I distribute the source? If you get a yes to all three, you’ve got a piece of FLOSS.

Want to Know the Facts? Turn off the TV and Logon

I’m going off-topic slightly here, but, in a way, I still remain on it.

Swine flu is upon us. Members of the public who know nothing of epidemiology want to know the facts about the virus.  Meet Thunderf00t, a YouTube user and producer of some rather excellent videos, at their best when debunking the pseudo-scientific and countering with commentaries on the beauties of real science. Recently, he has been posting videos on swine flu, simply explaining the facts in front of a whiteboard like a lecturer.

[kml_flashembed movie=”http://www.youtube.com/v/l8pSPfZFysg” width=”425″ height=”350″ wmode=”transparent” /]
[kml_flashembed movie=”http://www.youtube.com/v/br80HSGGPek” width=”425″ height=”350″ wmode=”transparent” /]

Now, I do not know about the quality of the TV news you receive locally, but if I contrast the way Thunderf00t here disseminates swine flu information to what I have seen on television news in the UK, then I am afraid the spontaneous voluntary effort by Thunderf00t wins hands down in veracity, relevance and comprehensiveness. A 10 minute piece from him is full of the most relevant preventative information, motivations behind what the WHO advocate,  and definitions of all those terms you might have heard being passed around. Conversely, a 10 minute piece on the BBC News channel is cluttered with useless footage like reporters interviewing nearby neighbours of a young couple in England diagnosed with the illness, and asking them what they feel about it, as if anybody cares. I do not watch much television news these days; I have not seen how Sky News or ITV News have handled the outbreak, but given how the BBC is relatively sedate and calm in its delivery, I would not expect them being any better in this respect.

Thunderf00t even slips in a rejoinder to an American politician, whose peculiar ideology of “government = bad”, dismisses any notion that public agencies should get involved in preventative measures.

But returning to topic, this reflects a wider problem with the delivery of TV news, something already observed upon superbly by people like Charlie Brooker and Adam Curtis. To be fair it is probably not the institutions themselves that are the problem; after all, if you go to the BBC News website (and hunt around a little bit), you will find a decent page with information about swine flu, which they still cannot resist peppering with bits of irrelevance, admittedly.

If I want the facts given with an objective delivery, it seems television news is far from the first place to go.

On the Veracity of Sources

When I want to learn about something in free/open source software more generally, there are a number of different types of sources to look towards, each one with their own advantages and considerations, and each with an intended audience. So knowing about all of these types of sources serves as a good indicator of where to start looking. Judging the most suitable place from where to obtain veracious information about FLOSS reminds me a little about the same problem in science.

If you want to learn about the latest from the world of quantum physics, where do you go? A research paper? That is surely guaranteed to bring fresh news from the physics trenches, but it will certainly assume some domain knowledge on the part of the reader and a grasp of some sophisticated mathematics. Failing that, you could wait for someone to write a more understandable treatment of the subject — a magazine article in New Scientist will probably not take long to become available; but alternatively a book will contain more detail if you can wait. The nuances within such an example do not differ wildly from those you might observe with the same question in FLOSS.

So what types of sources exist for FLOSS and how are they useful or problematic? This is my take on the taxonomy.

Research works

Not too long after “open source” was coined as a phrase in 1998, serious research institutes began to look into it, hence we now have a decade’s worth of peer-reviewed research carried out by universities, institutes and other organisations. In addition to the many stand-alone works, there have been a number of research projects devoted to FLOSS that routinely publish their findings, including CALIBRE, SQO-OSS and FLOSSMetrics as examples in Europe alone. Some academic conferences, particularly those dedicated to software engineering, now even explicitly refer to FLOSS as a topic of research. Such publications are the things to look through for the latest information, but they typically assume a rather high level of familiarity with the concepts, and occasionally a good knowledge of maths and programming. They are normally published in conference proceedings and journals, which might cost a pretty penny to access.

Technical reports

Many organisations, typically with some particular expertise within FLOSS, release technically-oriented reports on their research or experiences. They may sometimes be released as part of wider research in which the organisation is involved and the Internet is often utilised as the distributive medium these days. They are very much like research papers, but it is likely they have not have been subjected to a wider peer-review process.

Books

Now the field really widens up. Many many books have been authored over the last ten years on FLOSS, aimed at both specialists and at a more general audience. You can purchase an entire book dedicated to the sed stream editor, or you can read Eric Raymond’s general thesis on the open source movement (which I think is readable by any interested layperson), so the understandability of information within books is much more wide-ranging than that of peer-reviewed research papers. And yet, rather similarly to research works, choosing your sources can depend on reputation (of the publisher as well as the author); but having said that, if you are looking for a book as a place to begin your quest for information, you probably have insufficient first-hand knowledge about reputations. In such a case it would be prudent to check your choice with someone more “in the know”.

Magazine articles

Like books, magazine articles on FLOSS may be intended for the specialist (such as those found in the IEEE publication “Computer“), or the more general audience, which may or may not feature in a software-oriented publication. Once again, it is necessary to be discerning depending upon a combination of the author’s credentials, the depth of information given and the rigour demonstrated (for example, a FLOSS article from the BBC News, regardless of how well written, is unlikely to be of sufficient depth to do anything other than spark an interest in an otherwise unfamiliar topic). But furthermore one must acknowledge the probable brevity of articles appearing in magazines or newspapers. I would suggest an article is a useful way to whet your newly developing appetite, or as a quick way to keep up with the Joneses.

Websites, blogs, etc.

There are numerous and diverse resources on the Internet addressing free software with greatly varying quality and intent, and, for organisations or projects concerned with FLOSS, they may be the primary method of contact and dissemination. Their reliability may be judged upon by the identities of the authors/publishing organisation, or by the opportunity for corroboration. Some websites and blogs are the Internet presence of authors accessible via other media (for example, a number of published FLOSS researchers maintain blogs and websites, such as Diomidis Spinellis, Paul Adams, Martin Krafft, and many more), but in a number of other cases the authors remain anonymous which presents the difficulty of establishing their credentials. Even then this does not necessarily preclude a website being useful for research — Groklaw is one of the most well-known websites devoted mostly to discussing FLOSS legal issues, and whilst its authors are anonymous it provides copies of or references to actual legal proceedings.

Now, not to destroy my taxonomy having just built it, but this is clearly not the only useful way of looking at it. I doubt whether one single class of person uses a single type of source to the exclusion of all others, and the types of sources are certainly not mutually exclusive in terms of their veracity: A well-written blog post by an expert has the potential to be a more veracious source than a book or article that achieves only mediocrity.

In search of the ultimate idiom with which to conclude, I’ll borrow from my British heritage and say “It’s all swings and roundabouts.”

Why I Like Linux (One Tiny Reason Further)

I use the Linux operating system at my place of work (I’ll refrain from revealing the flavour, lest we descend into religious wars). Because we do not have servers to provide common data storage or processing power off-site, I leave my machine running constantly and it acts as a server so I may access my materials whenever and wherever I need. This week, a kernel update was rolled out, meaning I needed to reboot. Just out of interest I wanted to see how long the machine had been running since the last reboot so I did:

$ uptime
15:02:01 up 63 days

Now I realize 63 days is peanuts in server time, but I still continue to be impressed that after more than eight weeks my system was as quick and responsive as when freshly booted… maybe I’m still coloured by my earlier experiences with a certain popular operating system. It certainly made me think on when I friend of mine, who uses Windows Vista and keeps her machine running overnight like me, remarked that it was time to reboot her machine because it had been running all week and was starting to run really slow.

Is this really still an issue with Windows machines? I’ve still never heard a satisfactory answer as to why this happens.

CSMR – Day 3

And so CSMR concludes. Still recovering from the conference dinner (surreally hosted in the Hotel Alcatraz, which used to be a prison — and that’s not a joke). Enjoyed it here, conversed with some really interesting chaps, and one or two personal heroes.

Let me wrap up with some bite-sized snippets:

  • Acceptance rate of papers: 31%
  • Best paper award: “Incremental Clone Detection” by Nils Gode and Rainer Koscher
  • Special award: Harry Sneed (“for leadership and many contributions to the practice and principled growth of software maintenance techniques and their industrialization.”)
  • Population of Kaiserslautern: 100,000 approx.
  • Time I have to wake up tomorrow to begin my journey home: 0545.