Archive for November, 2009

Accolade for Saros

Posted on November 26th, 2009 by Karl Beecher

Saros has just recently been named one of the best 30 Eclipse plugins by AjaxLine. Way to go Saros team!

No sooner do I join the team than Saros is winning accolades. Coincidence?… Well, yes actually. I’ve yet to contribute a thing so far!

An Introduction to Saros

Posted on November 25th, 2009 by Karl Beecher

I feel as though I am almost completely adjusted and acclimatised to my new surroundings, and furthermore I am now becoming fully immersed in my new work here at the Free University of Berlin (FU). Therefore, I think I can begin to introduce the new work I am now part of. My research concerns have now diversified a little: they continue to include FLOSS and empirical methods, but they also now encompass Agile Methods. Let me explain…

I am now part of Saros, an on-going project here at the Institute of Computer Science. Saros is a free/open source Eclipse plug-in that enables distributed collaborative software development within the Eclipse environment. Developers can work simultaneously on shared documents over a network, allowing concurrent access to geographically dispersed programmers. Changes made are instantaneously seen by all participants, and Saros makes it clear who made what changes.

A screenshot of Saros

A screenshot of Saros (click to enlarge)

The plugin is useful in a number of scenarios, such as collaborative development, joint code reviews, or knowledge transfer. But particularly cool, is the support for distributed pair programming, by allocating the roles of “driver” or “observer” to everyone involved. In time, more features will help Saros replicate the pair programming experience in a distributed environment.

It seems I have joined a project that is already thriving thanks to the hard work and dedication of both the researchers and the team of bachelors/masters students who have contributed to the software in the previous months. Feedback from users is coming in steadily, the number of downloads is now measurable in thousands, and there are a few “testimonials” to Saros from outside parties (locatable on the FU website).

And there is plenty of life left in the project because a steady stream of students continue to sign up to work on new aspects. Together with the existing team they have some exciting new features in the works that will enrich the feature set of Saros.

Why not go to the SourceForge page and try it out for yourself?

Techniques for Selection

Posted on November 5th, 2009 by Karl Beecher
Analysis steps

Analysis steps

As we have seen before, this figure shows the stages of a typical approach to a post-hoc study of FLOSS, like a digital archaeologist. The figure shows a series of stages, each of which includes some number of steps, and yields some outcomes. Each outcome may or may not feed into the following stage. In this post, I will discuss the selection stage. Remember that this is the method I have preferred so far, and is the method that a number of my peers have used, in whole or in part. It is not the method.


The point of selection is to choose the metrics that are going to be used to measure the attributes you are interested in, and also to compose a list of projects to study. On selecting metrics, I cannot be especially general. They are very closely tied to the goals of your study and a way needs to be found to come up with an effective way of measuring your success at achieving those goals. My personal preference is the “Goal-Question-Metric” (GQM) Approach. You can read about it elsewhere, but what attracts me to the GQM Approach is that it is a software engineering specific method that helps you come up with the right measures by forming questions needed to achieve the goals you have set. These questions can also “suggest” the hypotheses needed in your study. It is not perfect, but “Goal-Question-Metric” is a useful parallel to “Research Question-Hypotheses-Measures”. I do think it important to do metric selection first; the reason will be apparent in what I shall say next.

Additionally, you will probably want to set some parameters for the investigation, usually to ensure your investigation remains valid. For example, if you looking into some aspect of, say, forum activity then it probably makes no sense to include projects for which no forum activity exists. (At the same time, you should report what proportion of your initial sample is disqualified.) This may impact the pool of projects you can choose from by eliminating some potentials, but it should not impact the metrics you choose — the investigation should be guided by what you want to measure, not how easy something is to measure. Sometimes this is made quite simple for you. For instance, FLOSSMole is a service that provides you with meta-data about individual FLOSS projects in nicely-formatted lists. If you wish to prune such a list then it is easy to write a software tool to do it quickly for you, leaving only the “valid” candidates. Ask nicely and you could borrow mine.

So-called “filters” I have found myself testing for in past work have included:

  • Programming language
  • Version control system used
  • Product size
  • Development status

These factors can impact the validity of the study (e.g. can different programming languages be compared fairly), or be technical considerations (e.g. do I have tools that can analyse these programming languages). Both need careful thought.

Further considerations include your selection method, i.e. how are you choosing the projects to study? If you are examining a very small number of projects, be sure your choice has some careful thought behind it. Generalizing from simply analysing one or a couple of projects can be tricky; a more focused comparative analysis, such as the work by Schach et al comparing four different Unix-like operating system kernels is probably more productive at that level. If you seek to generalize FLOSS as a phenomenon from your analysis, some different works have now been carried out (including my own) that do so by analysing large samples of projects. In this case I think the consensus is that random selection of a filtered population of projects is the best approach.

And so the end of the selection process should be a list of projects you wish to analyse that feeds through to the next stage: retrieval.