Draft Open Content Alliance Timeline

Some of the comments I received on the Google timeline were very helpful to me, so I decided to post the timeline for the other digitization case in my dissertation, the Open Content Alliance (OCA). In many ways, the OCA has been harder to unravel into a linear narrative, partially because the documentation on it is much sparser than that on Google Books. Still, between the press coverage and a set of interviews I conducted with the project’s early leaders (marked INT for confidential interviews; Brewster Kahle opted to make his interview open), I think I’ve made a fair start. (And in fact, because I included every detail I came across on this one, it’s actually substantially longer than the Google timeline turned out to be.)

Please do let me know if I am missing things or have particular facts wrong.

Here is what I have (citations follow below):

1980-mid-1990s Brewster Kahle begins to think about building a universal digital library, but recognizes several gaps in the existing technology, and sets about trying to fill them, working on increasing the capacity of digital storage, building out digital networks, and working out online revenue streams for publishers (Interview with Kahle, 2011)
1995 Kahle sells WAIS, an early publishing and distributed search system, to AOL for $15 million (“Staff Bios”, Hogge 2005)
1996 Kahle founds web analytics company Alexa Internet, named for the Library of Alexandria, in collaboration with Bruce Gilliat. As part of its functionality, Alexa crawls and archives webpages (“Staff Bios”, Tong 2002).  Simultaneously, Kahle founds the Internet Archive, which uses Alexa’s archived web-crawls as the foundation for its archive of internet content, made accessible via the Wayback Machine (Hogge 2005).
1999 Kahle sells Alexa Internet to Amazon for $250 million; the Internet Archive retains the right to receive web crawls from Alexa (“Technology”, Hogge 2005, Hardy 2009).
2000 The Million Book Project, an NSF-funded large-scale digitization project geared toward addressing a particular set of technological research questions, is initiated by a group of computer scientists at Carnegie Mellon, in collaboration with universities in China and India, the Biblioteca Alexandrina in Egypt, and other partners (St. Clair 2008, 152-53).
2004 The Internet Archive signs on to assist the Million Book Project (by then also referred to as the Universal Library Project) with permanent archiving, quality control, and materials acquisition (“Frequently Asked Questions” 2007). Following on their experiences with Million Book, IA begins to develop its own book scanning technology (both hardware and software), which would eventually be called the Scribe (Interview with Kahle, 2011).
2004 (Autumn) The Internet Archive and the University of Toronto begin a pilot project to test the IA’s scanning process. Within the next year, they scan about 2,000 books (Carlson and Young 2005, Bengtson 2006)
2004 (December 14) Google Print Library Project announced (Google 2004).
2005 (Early) Conversations begin within the Internet Archive and its social sphere about starting up a more open book scanning project, as an alternative to Google’s; initial thought was to fund the project via contributions from libraries (INT).
2005 (Early/Mid) Sumir Meghani of Yahoo! approaches the Internet Archive, proposes the concept of the Open Content Alliance, a collaborative project to scan works out of copyright or otherwise openly available for scanning. Yahoo! puts up $150,000 in funding to start, targeting a collection of works in American Studies at the University of California (INT, Albanese 2005, Johnson 2007, 5).
2005 (October 2) Open Content Alliance announced by Brewster Kahle on the Yahoo! Search Blog (2005a). Founding partners include: The Internet Archive (technology/content/hosting/administration), Yahoo! (technology/indexing/financial), Adobe Systems (technology), The European Archive (content), HP Labs (technology), The UK National Archives (content), O’Reilly Media (content), Prelinger Archives (content), University of California (content), University of Toronto (content) (“Consortium Forms OCA to Bring Additional Content Online” 2005)
2005 (By October 14) The project is endorsed by the Association of Learned and Professional Society Publishers (ALPSP) (Carlson and Young 2005).
2005 (Mid-October) Partners added: Lulu (a print on demand service), LibriVox (a producer of audio editions), the Biodiversity Heritage Library, the Smithsonian Institution Libraries, and eleven university libraries (McMaster, Memorial University of Newfoundland, the University of Ottawa, University of British Columbia, York University, Columbia University, Emory University, Johns Hopkins University, the University of Virginia, Rice University and the University of Pittsburgh) (Auchard 2005, Crawford 2005, Suber 2005).
2005 (October 25) Microsoft joins the OCA, pledging funding and tech/scanning assistance (Auchard 2005); simultaneously announces MSN Book Search (“MSN Search Announces MSN Book Search” 2005, Albanese 2005, Crawford 2005).
2005 (October 27) The Research Libraries Group (RLG) signs on to provide the OCA with bibliographic information from its union catalog (“RLG Joins Open Content Alliance” 2005, Crawford 2005).
2005 (By November 2) The project is endorsed by the Association of American Publishers (AAP) and the American Association of University Presses (AAUP) (Suber 2005).
2005 (November 4) British Library signs on to scan 100,000 books with Microsoft (“Microsoft Scans British Library” 2005, Crawford 2005, Kupferschmid 2005).
2005 (November 8) Official launch of OCA book scanning; Microsoft pledges to fund the digitization of 150,000 by the end of 2006 (Kahle 2005b).
2005 (November 9) WSJ reports that the IA/Toronto pilot project has scanned 2800 books, at a cost of $108,250, over the course of the past year (Crawford 2005).
2005 (December 29) A group of 27 Canadian research libraries announces that they are jointly establishing a project called Alouette Canada, a digitization alliance intended to work collaboratively with the OCA (Crawford 2006).
2006 (March) Partner added: University of North Carolina-Chapel Hill (Library and SILS; providing content and expertise, respectively) (“UNC-Chapel Hill Library and Library School Join Open Content Alliance” 2006, Albanese 2006b).
2006 (September) The University of California joins Google Book Search; Kahle sees this as a betrayal of the OCA’s principles on UC’s part, going so far as to claim that UC is “effectively giving their library to a single corporation,” despite its then-ongoing participation in the OCA (INT, Albanese 2006a).
2006 (December 6) Microsoft Live Search Books releases its first beta version (at http://books.live.com – now defunct) (Guren 2006).
2006 (December 19) IA receives $1 million from the Sloan Foundation to scan specific collections from the Boston Public Library, The Getty Research Institute, The Metropolitan Museum of Art, UC-Berkeley’s Bancroft Library, and Johns Hopkins (“Sloan Foundation Grant Awarded” 2006,  “The Internet Archive Receives Grant” 2006, Internet Archive and Boston Public Library 2007).
2006 (December 20) IA announces that it has digitized and made available 100,000 books, largely from members of the Open Content Alliance (“Milestone Achieved”).
2007 (March) The OCA reaches 130,000 volumes scanned, all available via the IA Text Archive (Notess 2007).
2007 (April) Partner added: University of Illinois (“U of Illinois Joins Open Content Alliance” 2007).
2007 (June 25) The Internet Archive successfully petitions to be declared a library by the State of California, in order to gain eligibility for state-administered federal grants (Kahle 2007a, McCoy 2007).
2007 (July 16) At Kahle’s request, open access advocate and entrepreneur Aaron Swartz signs on to help build the architecture for Open Library. His goal in the design, as he puts it, is to create “a website with a page for every book, collecting everything we can find out about it from libraries, publishers, reviewers, and of course, book lovers” (Swartz 2007, Kniffel 2008, quoting Swartz).
2007 (October) The OCA reaches 200,000 volumes scanned. Eight scanning centers are in operation, in three countries: the US, Canada, and the UK (Goth 2007, Kahle 2007b, Ashmore and Grogg 2008). IA releases a rewritten version of its scanning software, Scribe2, which promises greater format flexibility, less bandwidth usage, and support for new cameras (Internet Archive 2007, Kahle 2007b).
2007 (November) Partner added: Boston Library Consortium (19 member libraries, all contributing public domain materials only, self-funded). The BLC publicly announces that it was approached by Google first, but rejected them in favor of OCA (“Boston Library Consortium and Open Content Alliance to Provide Digitized Books” 2007, Albanese 2007a, Hane 2007).
2007 (November 15) A set of OCA partners – IA, BPL, MBL-WHOI, and Universidad Francisco Marroquín – announce a plan to scan out-of-print, in-copyright works for distribution via a new form of digital interlibrary loan, which they will develop (Albanese 2007b).
2007 (December) The OCA reaches 250,000 volumes scanned (Hane 2007).
2007 (December 15) Yale University signs on with Microsoft to scan up to 100,000 books outside of the OCA, and on terms more like Google’s than like IA’s (Albanese 2007c).
2008 (January-April) With funding from Microsoft, IA deploys five more US-based scanning centers under the banner of the OCA (INT, Kahle 2008).
2008 (February) The Boston Public Library begins to offer scan-on-demand interlibrary loan (ILL) services for public domain works using its onsite Scribe workstations. This reduces the turnaround time from weeks to days, and makes it possible to fulfill ILL requests that would otherwise have been denied due to the condition and/or rarity of the item requested (Colford 2008).
2008 (February 19) Partner added: Triangle Research Libraries Network, a consortium composed of the research libraries at Duke University, North Carolina Central University, North Carolina State University, and the University of North Carolina at Chapel Hill (the last of which was already an OCA partner) (“Triangle Research Libraries Network” 2008,  “TRLN Libraries” 2008,  “TRLN Member Libraries” 2008).
2008 (May 23) Microsoft announces it is bowing out of book scanning, having spent $10 million on its efforts. The company’s departure leaves a major gap in OCA funding (INT, Albanese 2008b, Guess 2008, Kahle 2008, Nadella 2008).
2008 (August) Maura Marx, then head of the Boston Public Library’s Digital Content Program, is hired to be the first Executive Director of the OCA. However, she never actually assumes the role, but instead founds a separate initiative, Open Knowledge Commons, with Sloan funding (INT, “People” 2008, Berry 2009).
2008 (November) HathiTrust launched, incorporating scanned content from OCA as well as Google Books and other digitization projects (Albanese 2008a).
2008 (December) Open Library and Boston Public Library jointly begin to offer a scan-on-demand service for public domain works that have been indexed by Open Library, but have not yet been made available in full text (“Have a Hand in Scan-on-Demand” 2008).
2009 (January) The OCA reaches 1 million volumes scanned – including 300,000 donated by Microsoft after the discontinuation of Live Search Books (O’Leary 2009).
2009 (July 5) Last mention of the term “Open Content Alliance” on the organization’s own blog (Kahle 2009). After about this point, the project under that name is effectively defunct, though various pieces of it persist, and the term still pops up occasionally in discussions of book digitization.
2009 (December) IA and the OCA form the Open Book Alliance to oppose the Google Books Settlement Agreement. Yahoo, Microsoft, Amazon, the Special Libraries Association and the New York Library Association soon join (Oder, et al. 2009).
2010 (February) The IA debuts BookServer, “a distributed system for lending and vending on the Internet” at the O’Reilly Media Tools of Change for Publishing Conference. It allows individuals to buy or check out in-copyright but out-of-print materials, dovetailing with Open Library and connecting with libraries and retailers (Hadro 2010a).
2010 (Spring) IA works with the City of San Francisco to hire over 125 workers for its scanning project, as subsidized labor under the Temporary Assistance for Needy Families (TANF) program (INT, Miller 2010).
2010 (May 6) In the final post on the Open Content Alliance blog, Kahle announces that the IA will be making 1 million books, both in and out of copyright, accessible to the print disabled via Open Library in the open DAISY talking book format (Hadro 2010b, Kahle 2010).
2010 (June) The first 200 or so ebook versions of out-of-print, in-copyright books go live for lending via Open Library. They are readable for two-week periods using Adobe Digital Editions Software (Rapp 2010).
2011 (April) IA announces that 85,000 in-copyright, out-of-print titles, contributed by 150 public and academic libraries, will be made available via Open Library, but only to patrons actually physically located in those 150 libraries (though once patrons download the books, they can use them on their personal devices outside the library for the duration of the loan) (Rapp 2011c).
2011 (June) OCLC researchers develop “oclcBot,” a piece of software that matches up records from Open Library to records from OCLC, checks to see of the Open Library has an OCLC number (a unique identifier commonly used across library systems), and inserts one if none is present (Rapp 2011b).
2011 (July) Kahle announces the establishment of the Internet Archive’s physical book archive, hoping to obtain “one copy of everything ever published.” The archive is launched with an initial collection of 450,000 items, accumulated as part of IA’s various digitization efforts, and seeks to build its collection through donations and by gathering up items deaccessioned by other libraries (INT, Rapp 2011a).
2011 (October) October: The state librarians of all 50 U.S. states vote unanimously to enter into a memorandum of understanding with IA, pledging their support for the Open Library’s online lending program (Kelley 2011).

Works Cited

(linked where possible)

Albanese, Andrew. “AAP Sues Google over Scan Plan.” Library Journal 130, no. 19 (November 15 2005): 17-18.

———. “BLC, OCA Join in Digitization Effort.” Library Journal 132, no. 17 (October 15 2007a): 15-16.

———. “Hathitrust Is Launched.” Library Journal 133, no. 18 (November 1 2008a): 13.

———. “Microsoft Gives up Scan Plan.” Library Journal 133, no. 12 (July 15 2008b): 14.

———. “OCA to Scan Orphan Works.” Library Journal 132, no. 19 (November 15 2007b): 16-17.

———. “UC Joins Google’s Scan Plan.” Library Journal 131, no. 14 (September 1 2006a): 14-15.

———. “UNC Library, SILS Join Content Alliance.” Library Journal 131, no. 6 (April 1 2006b): 21-22.

———. “Yale, Microsoft Join in Scan Plan.” Library Journal 132, no. 20 (December 15 2007c): 19.

Ashmore, Beth, and Jill E. Grogg. “The Race to the Shelf Continues – the Open Content Alliance.Searcher 16, no. 1 (January 2008): 18-23.

Auchard, Eric. “Microsoft Joins Yahoo on Digital Library Alliance.” Yahoo News, October 26, 2005.

Bengtson, Jonathan B. “The Birth of the Universal Library.” Library Journal 131, no. 6 (Spring 2006): 2-7.

Berry, John N., III. “Chicago Hope.” Library Journal 134, no. 10 (June 1 2009): 22.

“Boston Library Consortium and Open Content Alliance to Provide Digitized Books.” College & Research Libraries News 68, no. 10 (2007): 624-25.

Carlson, Scott, and Jeffrey R. Young. “Yahoo Works with Academic Libraries on a New Project to Digitize Books.” Chronicle of Higher Education 52, no. 8 (October 14 2005): A34. (BEHIND PAYWALL)

Colford, Michael R. “Rethinking Resource Sharing: Boston Public Library Provides Scan-on-Demand for Interlibrary Loan.” ASCLA 30, no. 1 (2008): 5-6.

“Consortium Forms OCA to Bring Additional Content Online.” Advanced Technology Libraries 34, no. 11 (2005): 9-10.

Crawford, Walt. “Discovering Books: The OCA/GBS Saga Continues.” Cites & Insights 6, no. 6 (Spring 2006).

———. “OCA and GLP 2: Steps on the Digitization Road.” Cites & Insights 5, no. 14 (2005).

Frequently Asked Questions About the Million Book Project.”  2007.

Google, Inc. “Google Checks out Library Books.”  2004.

Goth, G. “Digital Libraries Are Taking Form.” IEEE Distributed Systems Online 8, no. 12 (2007): 1-3.

Guess, Andy. “Post-Microsoft, Libraries Mull Digitization.” Inside Higher Ed, May 30, 2008.

Guren, Cliff. “Live Search Books Beta Release.” Bing Community (blog). December 5, 2006.

Hadro, Josh. “Infotech.” Library Journal 135, no. 5 (March 15 2010a): 16.

———. “Infotech.” Library Journal 135, no. 10 (June 1 2010b): 18.

Hane, Paula J. “Free Content Options Continue to Shake Things Up.” Information Today 24, no. 11 (2007): 7-12. (PAYWALL)

Hardy, Quentin. “The Big Deal: Brewster Kahle.” Forbes, November 27, 2009.

Have a Hand in Scan-on-Demand.” Library Journal 133, no. 20 (December 15 2008): 27.

Hogge, Becky. “Brewster Kahle.” New Statesman 134, no. 4762 (2005): 26.

Internet Archive. “Internet Archive Scribe2.” Launchpad.net, 2007. Accessed August 21, 2013.

Internet Archive, and Boston Public Library. “The John Adams Library Collection – Cooperative Agreement.” April 13, 2007.

The Internet Archive Receives Grant from Alfred P. Sloan Foundation to Digitize and Provide Open Online Access to Historical Collections from Five Major Libraries.” December 20, 2006.

Johnson, Richard K. “In Google’s Broad Wake: Taking Responsibility for Shaping the Global Digital Library.” ARL Bimonthly Report 250 (2007): 1-15.

Kahle, Brewster. Interviewed by Elisabeth A. Jones. September 7, 2011. In person, at the Internet Archive, San Francisco, CA.

———. “Achievements for Humanity.” Opencontentalliance.org (blog). July 5, 2009.

———. “Announcing the Open Content Alliance.” Yahoo! Search Blog (blog). October 2, 2005a.

———. “Books Scanning to Be Publicly Funded.”  2008.

———. “Bookscanning Launch and Vision of an Open Library.”  2005b.

———. “Internet Archive Officially a Library.” Internet Archive Blogs (blog). June 25, 2007a.

———. Libraries Going Open. San Francisco: Internet Archive, 2007b.

———. “Over 1 Million Digital Books Now Available Free to the Print-Disabled.” Opencontentalliance.org (blog). May 6, 2010.

Kelley, Michael. “Newsdesk.” Library Journal 136, no. 20 (December 1 2011): 14.

Kniffel, Leonard. “Backed by Internet Archive, Entrepreneur Takes on OCLC.” American Libraries  (April 2008).

Kupferschmid, Keith. “Are Authors and Publishers Getting Scroogled?Information Today 22, no. 11 (2005): 1-5.

McCoy, Adrian. “The Internet Gives Birth to an ‘Official’ Online Library.” Pittsburgh Post-Gazette, June 22, 2007.

Microsoft Scans British Library.” BBC News, November 4, 2005.

Milestone Achieved.” Opencontentalliance.org (blog). December 20, 2006.

Miller, Robert. “Saveusjobspasssenatebills4213.” Internet Archive, 2010.

“MSN Search Announces MSN Book Search.” Advanced Technology Libraries 34, no. 12 (2005): 6.

Nadella, Satya. “Book Search Winding Down.” Bing Community (blog). May 23, 2008.

Notess, Greg R. “Search Engine Update.” Online 30, no. 1 (January/February 2006): 15. (PAYWALL)

———. “Search Engine Update.” Online 31, no. 2 (March/April 2007): 14. (PAYWALL)

O’Leary, Mick. “Open Content Alliance Embodies Open Source Movement.” Information Today 26, no. 1 (January 2009): 37-43. (PAYWALL…ironically)

Oder, Norman, Lynn Blumenstein, and Josh Hadro. “Newsdesk.” Library Journal 134, no. 20 (December 15 2009): 14-17.

P1. Interviewed by Elisabeth A. Jones. September 6, 2011. In person.

P2. Interviewed by Elisabeth A. Jones. September 14, 2011. In person.

P3. Interviewed by Elisabeth A. Jones. September 14, 2011. In person.

P5. Interviewed by Elisabeth A. Jones. September 9, 2011. In person.

P11. Interviewed by Elisabeth A. Jones. October 27, 2011. Skype.

“People.” Technicalities 28, no. 6 (2008): 23.

Rapp, David. “Infotech.” Library Journal 135, no. 13 (August 1 2010): 16.

———. “Infotech.” Library Journal 136, no. 6 (April 1 2011a): 16.

———. “Infotech.” Library Journal 136, no. 10 (June 1 2011b): 20.

———. “Infotech.” Library Journal 136, no. 12 (July 1 2011c): 18.

RLG Joins Open Content Alliance.” College & Research Libraries News 66, no. 11 (December 2005): 770.

Sloan Foundation Grant Awarded.” Opencontentalliance.org (blog). December 19, 2006.

St. Clair, Gloriana. “The Million Book Project in Relation to Google.” Journal of Library Administration 47, no. 1/2 (2008): 151-63.

Staff Bios.” Internet Archive.

Suber, Peter. “The Open Content Alliance.” SPARC Open Access Newsletter, November 2, 2005.

Swartz, Aaron. “Announcing the Open Library.” Raw Thought (blog). July 16, 2007.

Technology.” Alexa Internet.

Tong, Judy. “Responsible Party – Brewster Kahle; a Library of the Web, on the Web.” New York Times, September 8, 2002.

The Triangle Research Libraries Network (TRLN) Member Libraries Join Open Content Alliance.” D-Lib Magazine 14, no. 3/4 (2008).

“TRLN Libraries Join Open Content Alliance.” Advanced Technology Libraries  (2008) Academic OneFile.

“TRLN Member Libraries Join Open Content Alliance.” Library Hi Tech News 25, no. 4 (2008): 21-21:

“U of Illinois Joins Open Content Alliance.” Advanced Technology Libraries 36, no. 4 (2007).

UNC-Chapel Hill Library and Library School Join Open Content Alliance.” College & Research Libraries News 67, no. 3 (March 2006): 140.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s