The picture of open-source software painted by the popular media tends to be superficial and simplistic. Open source is touted as a miraculous way to produce software at no cost. For anyone developing software professionally, all this open-source hype no doubt seems pretty farfetched. Let's take a closer look and try to shed some light on what open source is really all about.
If you read the newspaper, open source seems to have started with the Linux operating system back in 1991 (or more likely, in 1997 or 1998 when whoever wrote the article finally heard about Linux). The actual facts are a bit different: Open source is as old as computer programming. If you had wandered into places such as MIT or Stanford in the 1960s, you would have seen that sharing software source code was assumed. Early development of the ARPAnet was helped by freely available source code, a practice that continued as it grew into today's Internet. The Berkeley version of Unix dates from the mid-1970s. All in all quite a distinguished lineage.
The creation of software by a loosely coupled group of volunteers seems a thoroughly contemporary phenomenon, based on the free outlook of the 1960s--a kind of fallout of free love and hippiedom--but the concept of distributed group development is hardly new.
On Guy Fawkes Day, 1857, Richard Chenevix Trench, addressing the Philological Society, proposed the production of a new, complete English dictionary based on finding the earliest occurrences of each of the English words ever in printed use. That is, the dictionary would be constructed by reading every book ever written and noting down exactly where in each book a significant use of every word occurred; these citations would be used to write definitions and short histories of the words' uses. In order to do this, Trench proposed enlisting the volunteer assistance of individuals throughout English-speaking countries by advertising for their assistance.
Over a period of 70 years, many hundreds of people sent in over 6 million slips with words and their interesting occurrences in thousands of books. This resulted in the Oxford English Dictionary , the ultimate authority on English, with 300,000 words, about 2.5 million citations, and 8.3 citations per entry in 20 volumes.
Compare this with the best effort by an individual--Samuel Johnson, over a 9-year period, using the same methodology and a handful of assistants called amanuenses , produced a two-volume dictionary with about 40,000 words and in most cases one citation per entry. As we look at these two works, Johnson's dictionary is a monument to individual effort and a work of art, revealing as much about Johnson as about the language he perceived around him, while the OED is the standard benchmark for dictionaries, the final arbiter of meaning and etymology.
The stereotype for the sort of person who contributes to an open-source project is that of a hobbyist or student, someone you perhaps wouldn't take too seriously. After all, full-time professional programmers don't have time for such things. Well, first, students and hobbyists can often write very good code. Next, lots of professionals do like to program on their own time. A study done by the Boston Consulting Group1 found that over 45% of those participating in open-source projects were experienced, professional programmers, with another 20% being sys admins, IT managers, or academics. The same study found that over 30% of these professionals were paid by their day job to develop open-source software. Both Sun and IBM have engineering teams contributing to various parts of the Apache web server. Most companies building PC peripherals now write the needed device drivers for Linux as well as Windows. In fact, the open-source process encourages the replacement of contributions from less capable programmers with code from more capable ones.
How can a bunch of random programmers, with no quality assurance (QA) folks, produce code with any degree of quality? Isn't open-source software full of bugs? Well, there may initially be as many bugs in open source as in proprietary code, but because it's open more developers will actually look at the code, catching many bugs in the process. Also everyone using the code is essentially doing QA; they report on any bugs that they find, and because they have access to the source code, they often also fix the bugs themselves.
In 2003, Reasoning, Inc., performed a defect analysis2 of the Apache web server and Tomcat, which is a mechanism for extending Apache with Java servlets, by using their defect discovery tool. For Apache, the tool found 31 defects in 58,944 source lines, a defect density of 0.53 defects per thousand lines of source code (KSLC). In a sampling of 200 projects totaling 35 million lines of code, 33% had a defect density below 0.36 defects/KSLC, 33% had a defect density between 0.36 and 0.71 defects/KSLC, and the remaining 33% had a defect density above 0.71 defects/KSLC. This puts Apache squarely in the middle of the studied quality distribution. For Tomcat, the tool found 17 software defects in 70,988 lines of Tomcat source code. The defect density of the Tomcat code inspected was 0.24 defects/KSLC. This puts Tomcat in the upper half of quality.
If you still don't believe that open-source software is of similar quality to most commercial software, just take a look at some open-source software you use every day. Assuming you make any use of the Internet, you are relying on open-source code such as BIND, which is at the heart of the Domain Name Service (DNS); or sendmail, which probably transports most email; and Apache, which as of February 2004 was the software running on over 67% of the world's web servers. Then there's Linux, which has won several awards for quality and has a notably longer mean time between reboots than some other major PC operating systems.
Having thousands of people fixing bugs might work, but how can you possibly coordinate the work of that number of developers? Without central control how can it possibly be an efficient process? Well, that's correct, but why does it need to be efficient? When you have limited resources, efficiency is important, but in an open-source effort with lots of developers, if some go off and write a module that eventually is rejected, it doesn't matter. Open-source efforts often progress in small steps. If several people are working on different solutions to a problem, as long as one eventually produces a solution, you are making forward progress. If two solutions are produced, that's even better: just pick the best one. Also, with the ease of email and news groups, the various people working on the problem will probably find each other and spontaneously self-organize to work together to produce a result that is better than any of them alone could have produced--all without any central control.
Why should your company pay you to write free software? Well, your company may already be doing that. Are you working on a product that is sold or distributed for free? Are you working on something only used internally? Is the income generated from selling the software you write greater than the cost to produce it? The profit may come from other activities. Likewise for free software. Your company will continue to make its money from selling hardware (e.g., servers, storage, and workstations); proprietary software; books; and consulting, training, and support.
For example, O'Reilly and Associates sells enough Perl books to pay the main Perl developers to work on new Perl features. Several of the main Linux developers are employed by Red Hat, which makes its money by packaging up free software. Cygnus (now part of Red Hat) sells support for the GNU compiler and debugger, which its employees continue to develop and give away. Sun sells servers, but gives away Java.
Look at the sections The Business Model Must Reinforce the Open-source Effort and (in Chapter 4) Business Reasons for Choosing to Open Source Your Code for more details about how your company can make money from open-source software development. Keep in mind, however, that roughly 90% of the money spent on software development is for custom software that is never sold; commercial software represents less than 10% of the total investment.
Myth 6: By Making Your Software Open Source You'll Get Thousands of Developers Working on It for No Cost
That would be nice, but in reality most open-source projects have only a few dozen core developers doing most of the work, with maybe a few hundred other developers contributing occasional bug reports, bug fixes, and possible enhancements. Then there are the thousands of users, who may contribute bug reports and requests for new features. The users also post messages asking how to use the software and, in a healthy project, the more experienced users post answers to those questions. Some users may even help write documentation.
Hewlett-Packard and Intel report a 5:1 or 6:1 ratio of community to corporate developers for open-source projects the two companies have been involved with.3 Our belief is that this is a little high, but it isn't too far off.
Another source of data is SourceForge, which has about 80,000 projects with 90,000 developers. The distribution of the number of developers to projects there follows a power law with about 60,000 projects with between zero and one active developers, 3000 with three, five with 30, and one with 100. To factor out the large number of inactive or dead projects on SourceForge, a study in May 2002 by Krishnamurthy4 looked at participation only in mature, active projects and found the average number of developers per project to be four. Only 19 out of the 100 projects studied had more than 10 developers, whereas 22% of the projects had only one developer associated with them.
It's true that you don't need to pay any outside developers who choose to work on your project. However you do need to pay the cost of maintaining the infrastructure necessary for open-source development (e.g., a CVS code server, a bug database, project mailing lists, and project website), along with the people to integrate the contributions you get. You won't get something for nothing, but for a successful open-source project you can get back more than what you put in.
Experience with conventional, proprietary software development teaches that the larger the project, the greater the number of resources needed for coordination and design. For an open-source project where all the discussion is via mailing lists and where there is no formal management structure, it seems that it would be impossible to efficiently organize the developers. Hence, open source might work for small projects, but not for large ones.
In his essay, The Mythical Man-Month , Frederick P. Brooks states that adding more developers to a late project will just make it later. In an open-source project, developers can be added at any time with no forewarning. One issue with Brooks' Law and the various studies that have subsequently either supported or qualified it is that there is a tacit assumption about the development process. Although rarely stated, the assumption is that the development team will be made up of individual contributors, each working on a separate part of the software, forming an efficient allocation of developers to the code. As it turns out, neither extreme programming nor open source obeys that assumption. Moreover, these studies assume that developers are a scarce resource, which is not true for open source.
Although it has been difficult to set up proper experiments to test how extreme programming affects Brooks' Law, one preliminary study5 showed that when a programmer was added to create a pair-programming situation, the added programmer could immediately contribute by observing and pointing out simple errors and by asking probing questions that served to clarify the thought processes of the primary programmer. Thus, the new programmer could be productive immediately, although not as productive as a full-speed developer. The difficulty in experimental methodology is to obtain a valid comparison between an extreme programming project and a traditional one.
In an open-source project, developers are no longer treated as a scarce resource that must be used efficiently. Therefore, a developer added to a project doesn't need to have a separate part carved out. Moreover, a new developer can probably contribute immediately in the same way as in extreme programming by finding (and fixing) simple errors and asking probing questions. In his essay, Brooks points out that new developers must be trained, that larger teams require greater overhead to communicate with each other, and that not every task may be partitioned.
For an open-source project, it is important to distinguish between those developers who make up the core team --the module owners and few developers with check-in privileges--and the much larger number of occasional contributors. The core team is always going to be too small and all the lessons of conventional software development apply to them, including Brooks' Law. However it is with the larger group of contributors that open source changes the rules: These are the folks who can track down obscure bugs and create fixes for them, help each other to get up to speed on the code, implement features from the project's wishlist, or explore and experiment with more radical modifications--all activities that free up the core team to focus on its own main work.
Instead of controlling and scheduling developers, open source relies on the developers' self-organizing based on their interests and abilities. Instead of a management structure to coordinate everyone's work, open-source development requires resources to evaluate and integrate developer contributions. Moreover, those resources can draw on the total community of developers and are not limited to any core group. To see this, look at the success of some of the large open-source projects such as Apache or Linux.
Your company still owns any source code that it releases under an open-source license because your company still owns the copyright. The open-source license grants others the right to examine and use the source code, but it does not affect your company's ownership of the code. As the copyright owner, your company can release the source code under another license or use it in a proprietary product. Only if the source code were distributed containing an explicit disclaimer of copyright protection by your company would the software pass to the public domain and thereby no longer be owned by your company.6
However, source code contributed back to your company by outside developers is owned by the author, who holds the copyright for it. Under some licenses, such as the Sun Community Source License (SCSL), your company would be able to use the contributed code without restrictions. Under an open source license, such as GPL or the Mozilla Public License (MPL), your company would be bound by the terms of the license just like any other developer.
Similarly, your company still owns the patents embodied in any source released under an open-source license, but if your company does not explicitly talk about the uses to which any such patents may be put, others might be free to use those patents.
An open-source community is a community surrounding an open-source artifact, but it may not be an open community, meaning that it might not be open to anyone at all joining, and that once in the community a member might not know how to move ahead and become a leader. The community can be as closed, idiosyncratic, and undemocratic as it wants to be. The license guarantees that everyone in the community has certain rights with respect to the code, but in general it does not say anything about the community.