An email trickling into my inbox reminds me to repeat my existing stance on requests to complete surveys about open source software development: I don't do them.
The availability of the email address of developers in OSS projects may make people think that they could gain some insight by asking those developers questions as part of some research project, but consider this
- You won't be the first person to have thought of this -and tried to conduct a survey.
- The only people answering your survey will be people who either enjoy filling in surveys, or who haven't been approached, repeatedly before.
- Therefore your sample set will be utterly unrealistic, consisting of people new to open source (and not yet bored of completing surveys), or who like filling in surveys.
- Accordingly any conclusions you come to could be discounted based on the unrepresentative, self-selecting sample set.
Here then are some better ideas than yet-another-surveymonkey email to get answers whose significance can be disputed:
- Look at the patch history for a project and identify the bodies of code with the highest rate of change -and the lowest. Why the differences? Is the code with the highest velocity the most unreliable, or merely the most important?
- Look at the stack traces in the bug reports. Do they correlate with the modules in (1)?
- Does the frequency of stack traces against a source module increase after the patch to that area ships? or does it decrease? That is, do patches actually reduce the #of defects, or as Brooks said in the Mythical Man Month, simply move around.
- Perform automated complexity analysis on source. Are the most complex bits the least reliable? What is their code velocity?
- Is the amount of a discussion on a patch related to the complexity of the destination or the code in the patch?
- Does that complexity of a project increase of decrease over time?
- Does the code coverage of a project increase or decrease over time?
[photo: ski lifts in the cloud, Austria, december 2013]