On the Veracity of Sources

When I want to learn about something in free/open source software more generally, there are a number of different types of sources to look towards, each one with their own advantages and considerations, and each with an intended audience. So knowing about all of these types of sources serves as a good indicator of where to start looking. Judging the most suitable place from where to obtain veracious information about FLOSS reminds me a little about the same problem in science.

If you want to learn about the latest from the world of quantum physics, where do you go? A research paper? That is surely guaranteed to bring fresh news from the physics trenches, but it will certainly assume some domain knowledge on the part of the reader and a grasp of some sophisticated mathematics. Failing that, you could wait for someone to write a more understandable treatment of the subject — a magazine article in New Scientist will probably not take long to become available; but alternatively a book will contain more detail if you can wait. The nuances within such an example do not differ wildly from those you might observe with the same question in FLOSS.

So what types of sources exist for FLOSS and how are they useful or problematic? This is my take on the taxonomy.

Research works

Not too long after “open source” was coined as a phrase in 1998, serious research institutes began to look into it, hence we now have a decade’s worth of peer-reviewed research carried out by universities, institutes and other organisations. In addition to the many stand-alone works, there have been a number of research projects devoted to FLOSS that routinely publish their findings, including CALIBRE, SQO-OSS and FLOSSMetrics as examples in Europe alone. Some academic conferences, particularly those dedicated to software engineering, now even explicitly refer to FLOSS as a topic of research. Such publications are the things to look through for the latest information, but they typically assume a rather high level of familiarity with the concepts, and occasionally a good knowledge of maths and programming. They are normally published in conference proceedings and journals, which might cost a pretty penny to access.

Technical reports

Many organisations, typically with some particular expertise within FLOSS, release technically-oriented reports on their research or experiences. They may sometimes be released as part of wider research in which the organisation is involved and the Internet is often utilised as the distributive medium these days. They are very much like research papers, but it is likely they have not have been subjected to a wider peer-review process.

Books

Now the field really widens up. Many many books have been authored over the last ten years on FLOSS, aimed at both specialists and at a more general audience. You can purchase an entire book dedicated to the sed stream editor, or you can read Eric Raymond’s general thesis on the open source movement (which I think is readable by any interested layperson), so the understandability of information within books is much more wide-ranging than that of peer-reviewed research papers. And yet, rather similarly to research works, choosing your sources can depend on reputation (of the publisher as well as the author); but having said that, if you are looking for a book as a place to begin your quest for information, you probably have insufficient first-hand knowledge about reputations. In such a case it would be prudent to check your choice with someone more “in the know”.

Magazine articles

Like books, magazine articles on FLOSS may be intended for the specialist (such as those found in the IEEE publication “Computer“), or the more general audience, which may or may not feature in a software-oriented publication. Once again, it is necessary to be discerning depending upon a combination of the author’s credentials, the depth of information given and the rigour demonstrated (for example, a FLOSS article from the BBC News, regardless of how well written, is unlikely to be of sufficient depth to do anything other than spark an interest in an otherwise unfamiliar topic). But furthermore one must acknowledge the probable brevity of articles appearing in magazines or newspapers. I would suggest an article is a useful way to whet your newly developing appetite, or as a quick way to keep up with the Joneses.

Websites, blogs, etc.

There are numerous and diverse resources on the Internet addressing free software with greatly varying quality and intent, and, for organisations or projects concerned with FLOSS, they may be the primary method of contact and dissemination. Their reliability may be judged upon by the identities of the authors/publishing organisation, or by the opportunity for corroboration. Some websites and blogs are the Internet presence of authors accessible via other media (for example, a number of published FLOSS researchers maintain blogs and websites, such as Diomidis Spinellis, Paul Adams, Martin Krafft, and many more), but in a number of other cases the authors remain anonymous which presents the difficulty of establishing their credentials. Even then this does not necessarily preclude a website being useful for research — Groklaw is one of the most well-known websites devoted mostly to discussing FLOSS legal issues, and whilst its authors are anonymous it provides copies of or references to actual legal proceedings.

Now, not to destroy my taxonomy having just built it, but this is clearly not the only useful way of looking at it. I doubt whether one single class of person uses a single type of source to the exclusion of all others, and the types of sources are certainly not mutually exclusive in terms of their veracity: A well-written blog post by an expert has the potential to be a more veracious source than a book or article that achieves only mediocrity.

In search of the ultimate idiom with which to conclude, I’ll borrow from my British heritage and say “It’s all swings and roundabouts.”

CSMR – Day 3

And so CSMR concludes. Still recovering from the conference dinner (surreally hosted in the Hotel Alcatraz, which used to be a prison — and that’s not a joke). Enjoyed it here, conversed with some really interesting chaps, and one or two personal heroes.

Let me wrap up with some bite-sized snippets:

  • Acceptance rate of papers: 31%
  • Best paper award: “Incremental Clone Detection” by Nils Gode and Rainer Koscher
  • Special award: Harry Sneed (“for leadership and many contributions to the practice and principled growth of software maintenance techniques and their industrialization.”)
  • Population of Kaiserslautern: 100,000 approx.
  • Time I have to wake up tomorrow to begin my journey home: 0545.

CSMR – Day 2

CSMR 2009CSMR 2009 soldiers on.

Today’s keynote was delivered by Tibor Gyimothy which looked at software metrics from the developer’s point of view. He presented the results of a study where developers were surveyed for their opinions on various metrics.

The developers were divided based upon such attributes as experience, platform knowledge (Java, C/C++, C#, SQL), and open source participation. They were questioned upon their opinions of metrics that measured size, complexity, coupling, and cloning, and the results were analyzed for correlations. The kinds of questions included “which metrics effect your understanding”, or “which metrics affect the testing effort”? There were many interesting differences between groups (too many to mention all), but examples included:

  • It was agreed that complexity, coupling, and clones affect understanding, but experienced developers disagreed with inexperienced developers whether size was an important factor.
  • Experienced developers were more insistent that smaller classes help testing.
  • Inexperienced developers tend to reject large generated classes, but experienced ones accept them.
  • Experienced developers prefer absolute metric values rather than the change in the values.

Aside from the keynote, architecture certainly seems to feature heavily this year at CSMR — maybe I have a faulty memory, but it seems to be more so than before. The panel discussion, apparently the first such discussion at CSMR, stimulated some slightly intense debate on the subject of collaborative tool use between academia and industry. It was a shame that we had to cut short, because I do love a good debate. I hope that I see such discussions in future conferences, but I would suggest that it be conducted with a little tighter discipline. Participants were allowed to make slide presentations of arguments and run over allotted time; I’d prefer a format where the chairman presents the arguments and takes a firmer hold of the participants (like the TV programme we have in the UK Question Time).

Looking forward to tomorrow: the software evolution track and the European projects track featuring some meaty FLOSS stuff as well as yet more evolution.

CSMR – Day 1

CSMR 2009Guten tag.

How embarrassing. The weather in Kaiserslautern is bad, I’m an Englishman…. and I don’t have an umbrella.

But at least the Fraunhofer IESE Centre is a wonderful environment. Which leads me to quickly express my admiration of the German approach to technical research and development. Briefly, there are four “actors” in their setup: the universities and industry, which scarcely need elaborating on, are two of them. The other two exist in-between these other players. The Max-Planck institutes are outlets of basic research funded mostly by the state. The Fraunhofer institutes are centres of applied science that are mostly funded by contract work. Look them up to learn more.

Onto the conference itself. Today, the keynote was delivered by Dieter Rombach. He argued that when software engineering is being taught, too scientific an approach is taken, and also that people are not sufficiently versed in software engineering principles.

Many maintenance tasks, he argues, are able to be anticipated, and yet they are not prepared for. For example, if you develop software that is dependent upon the CPU, why should you not develop it in a way that makes it as simple as possible to adapt to a new CPU? When developers in the 1960’s and 1970’s developed systems and saved a few bytes by storing the year as two digits, their systems broke when the year 2000 arrived: the shame is, not on them for anticipating it, but on us for not learning from their mistakes.

To prepare for maintenance, Rombach advocates these principles:

  • An adroit use of the fundamentals: e.g. divide and conquer, traceability
  • Use of software product lines
  • Empirically proven best practices

Also worth mentioning is Carola Lilienthal’s paper on analyzing large-scale architectures and suggestions for keeping their complexity under control. Her approach is to compare the intended architecture of the system (e.g. layered architecture) to the actual architecture derived from code analysis. By borrowing from cognitive psychology her paper proposes three aspects for architectural complexity to beĀ  applied: modularity, ordering (whether the relationships between elements form a directed, acyclic graph), and pattern conformity. A recommendation is made to begin with a reference architecture and progress to a layered architecture implementing the interfaces as the system grows.

Going to the CSMR Conference

CSMR 2009

This year’s European Conference on Software Maintenance and Reengineering is in Kaiserslautern, Germany. I’ll be there, but my fellow researchers in the Centre of Research on Open Source Software won’t because we’ve been accepted to so many different conferences we have to divide ourselves up because it’s the only way we can afford to attend them all. So for the benefit of my colleagues (and you, dear reader, if you so wish) I’ll try and find time to blog about the more notable presentations I see there.