Draft Open Content Alliance Timeline

Some of the comments I received on the Google timeline were very helpful to me, so I decided to post the timeline for the other digitization case in my dissertation, the Open Content Alliance (OCA). In many ways, the OCA has been harder to unravel into a linear narrative, partially because the documentation on it is much sparser than that on Google Books. Still, between the press coverage and a set of interviews I conducted with the project’s early leaders (marked INT for confidential interviews; Brewster Kahle opted to make his interview open), I think I’ve made a fair start. (And in fact, because I included every detail I came across on this one, it’s actually substantially longer than the Google timeline turned out to be.)

Please do let me know if I am missing things or have particular facts wrong.

Here is what I have (citations follow below):

1980-mid-1990s Brewster Kahle begins to think about building a universal digital library, but recognizes several gaps in the existing technology, and sets about trying to fill them, working on increasing the capacity of digital storage, building out digital networks, and working out online revenue streams for publishers (Interview with Kahle, 2011)
1995 Kahle sells WAIS, an early publishing and distributed search system, to AOL for $15 million (“Staff Bios”, Hogge 2005)
1996 Kahle founds web analytics company Alexa Internet, named for the Library of Alexandria, in collaboration with Bruce Gilliat. As part of its functionality, Alexa crawls and archives webpages (“Staff Bios”, Tong 2002).  Simultaneously, Kahle founds the Internet Archive, which uses Alexa’s archived web-crawls as the foundation for its archive of internet content, made accessible via the Wayback Machine (Hogge 2005).
1999 Kahle sells Alexa Internet to Amazon for $250 million; the Internet Archive retains the right to receive web crawls from Alexa (“Technology”, Hogge 2005, Hardy 2009).
2000 The Million Book Project, an NSF-funded large-scale digitization project geared toward addressing a particular set of technological research questions, is initiated by a group of computer scientists at Carnegie Mellon, in collaboration with universities in China and India, the Biblioteca Alexandrina in Egypt, and other partners (St. Clair 2008, 152-53).
2004 The Internet Archive signs on to assist the Million Book Project (by then also referred to as the Universal Library Project) with permanent archiving, quality control, and materials acquisition (“Frequently Asked Questions” 2007). Following on their experiences with Million Book, IA begins to develop its own book scanning technology (both hardware and software), which would eventually be called the Scribe (Interview with Kahle, 2011).
2004 (Autumn) The Internet Archive and the University of Toronto begin a pilot project to test the IA’s scanning process. Within the next year, they scan about 2,000 books (Carlson and Young 2005, Bengtson 2006)
2004 (December 14) Google Print Library Project announced (Google 2004).
2005 (Early) Conversations begin within the Internet Archive and its social sphere about starting up a more open book scanning project, as an alternative to Google’s; initial thought was to fund the project via contributions from libraries (INT).
2005 (Early/Mid) Sumir Meghani of Yahoo! approaches the Internet Archive, proposes the concept of the Open Content Alliance, a collaborative project to scan works out of copyright or otherwise openly available for scanning. Yahoo! puts up $150,000 in funding to start, targeting a collection of works in American Studies at the University of California (INT, Albanese 2005, Johnson 2007, 5).
2005 (October 2) Open Content Alliance announced by Brewster Kahle on the Yahoo! Search Blog (2005a). Founding partners include: The Internet Archive (technology/content/hosting/administration), Yahoo! (technology/indexing/financial), Adobe Systems (technology), The European Archive (content), HP Labs (technology), The UK National Archives (content), O’Reilly Media (content), Prelinger Archives (content), University of California (content), University of Toronto (content) (“Consortium Forms OCA to Bring Additional Content Online” 2005)
2005 (By October 14) The project is endorsed by the Association of Learned and Professional Society Publishers (ALPSP) (Carlson and Young 2005).
2005 (Mid-October) Partners added: Lulu (a print on demand service), LibriVox (a producer of audio editions), the Biodiversity Heritage Library, the Smithsonian Institution Libraries, and eleven university libraries (McMaster, Memorial University of Newfoundland, the University of Ottawa, University of British Columbia, York University, Columbia University, Emory University, Johns Hopkins University, the University of Virginia, Rice University and the University of Pittsburgh) (Auchard 2005, Crawford 2005, Suber 2005).
2005 (October 25) Microsoft joins the OCA, pledging funding and tech/scanning assistance (Auchard 2005); simultaneously announces MSN Book Search (“MSN Search Announces MSN Book Search” 2005, Albanese 2005, Crawford 2005).
2005 (October 27) The Research Libraries Group (RLG) signs on to provide the OCA with bibliographic information from its union catalog (“RLG Joins Open Content Alliance” 2005, Crawford 2005).
2005 (By November 2) The project is endorsed by the Association of American Publishers (AAP) and the American Association of University Presses (AAUP) (Suber 2005).
2005 (November 4) British Library signs on to scan 100,000 books with Microsoft (“Microsoft Scans British Library” 2005, Crawford 2005, Kupferschmid 2005).
2005 (November 8) Official launch of OCA book scanning; Microsoft pledges to fund the digitization of 150,000 by the end of 2006 (Kahle 2005b).
2005 (November 9) WSJ reports that the IA/Toronto pilot project has scanned 2800 books, at a cost of $108,250, over the course of the past year (Crawford 2005).
2005 (December 29) A group of 27 Canadian research libraries announces that they are jointly establishing a project called Alouette Canada, a digitization alliance intended to work collaboratively with the OCA (Crawford 2006).
2006 (March) Partner added: University of North Carolina-Chapel Hill (Library and SILS; providing content and expertise, respectively) (“UNC-Chapel Hill Library and Library School Join Open Content Alliance” 2006, Albanese 2006b).
2006 (September) The University of California joins Google Book Search; Kahle sees this as a betrayal of the OCA’s principles on UC’s part, going so far as to claim that UC is “effectively giving their library to a single corporation,” despite its then-ongoing participation in the OCA (INT, Albanese 2006a).
2006 (December 6) Microsoft Live Search Books releases its first beta version (at http://books.live.com – now defunct) (Guren 2006).
2006 (December 19) IA receives $1 million from the Sloan Foundation to scan specific collections from the Boston Public Library, The Getty Research Institute, The Metropolitan Museum of Art, UC-Berkeley’s Bancroft Library, and Johns Hopkins (“Sloan Foundation Grant Awarded” 2006,  “The Internet Archive Receives Grant” 2006, Internet Archive and Boston Public Library 2007).
2006 (December 20) IA announces that it has digitized and made available 100,000 books, largely from members of the Open Content Alliance (“Milestone Achieved”).
2007 (March) The OCA reaches 130,000 volumes scanned, all available via the IA Text Archive (Notess 2007).
2007 (April) Partner added: University of Illinois (“U of Illinois Joins Open Content Alliance” 2007).
2007 (June 25) The Internet Archive successfully petitions to be declared a library by the State of California, in order to gain eligibility for state-administered federal grants (Kahle 2007a, McCoy 2007).
2007 (July 16) At Kahle’s request, open access advocate and entrepreneur Aaron Swartz signs on to help build the architecture for Open Library. His goal in the design, as he puts it, is to create “a website with a page for every book, collecting everything we can find out about it from libraries, publishers, reviewers, and of course, book lovers” (Swartz 2007, Kniffel 2008, quoting Swartz).
2007 (October) The OCA reaches 200,000 volumes scanned. Eight scanning centers are in operation, in three countries: the US, Canada, and the UK (Goth 2007, Kahle 2007b, Ashmore and Grogg 2008). IA releases a rewritten version of its scanning software, Scribe2, which promises greater format flexibility, less bandwidth usage, and support for new cameras (Internet Archive 2007, Kahle 2007b).
2007 (November) Partner added: Boston Library Consortium (19 member libraries, all contributing public domain materials only, self-funded). The BLC publicly announces that it was approached by Google first, but rejected them in favor of OCA (“Boston Library Consortium and Open Content Alliance to Provide Digitized Books” 2007, Albanese 2007a, Hane 2007).
2007 (November 15) A set of OCA partners – IA, BPL, MBL-WHOI, and Universidad Francisco Marroquín – announce a plan to scan out-of-print, in-copyright works for distribution via a new form of digital interlibrary loan, which they will develop (Albanese 2007b).
2007 (December) The OCA reaches 250,000 volumes scanned (Hane 2007).
2007 (December 15) Yale University signs on with Microsoft to scan up to 100,000 books outside of the OCA, and on terms more like Google’s than like IA’s (Albanese 2007c).
2008 (January-April) With funding from Microsoft, IA deploys five more US-based scanning centers under the banner of the OCA (INT, Kahle 2008).
2008 (February) The Boston Public Library begins to offer scan-on-demand interlibrary loan (ILL) services for public domain works using its onsite Scribe workstations. This reduces the turnaround time from weeks to days, and makes it possible to fulfill ILL requests that would otherwise have been denied due to the condition and/or rarity of the item requested (Colford 2008).
2008 (February 19) Partner added: Triangle Research Libraries Network, a consortium composed of the research libraries at Duke University, North Carolina Central University, North Carolina State University, and the University of North Carolina at Chapel Hill (the last of which was already an OCA partner) (“Triangle Research Libraries Network” 2008,  “TRLN Libraries” 2008,  “TRLN Member Libraries” 2008).
2008 (May 23) Microsoft announces it is bowing out of book scanning, having spent $10 million on its efforts. The company’s departure leaves a major gap in OCA funding (INT, Albanese 2008b, Guess 2008, Kahle 2008, Nadella 2008).
2008 (August) Maura Marx, then head of the Boston Public Library’s Digital Content Program, is hired to be the first Executive Director of the OCA. However, she never actually assumes the role, but instead founds a separate initiative, Open Knowledge Commons, with Sloan funding (INT, “People” 2008, Berry 2009).
2008 (November) HathiTrust launched, incorporating scanned content from OCA as well as Google Books and other digitization projects (Albanese 2008a).
2008 (December) Open Library and Boston Public Library jointly begin to offer a scan-on-demand service for public domain works that have been indexed by Open Library, but have not yet been made available in full text (“Have a Hand in Scan-on-Demand” 2008).
2009 (January) The OCA reaches 1 million volumes scanned – including 300,000 donated by Microsoft after the discontinuation of Live Search Books (O’Leary 2009).
2009 (July 5) Last mention of the term “Open Content Alliance” on the organization’s own blog (Kahle 2009). After about this point, the project under that name is effectively defunct, though various pieces of it persist, and the term still pops up occasionally in discussions of book digitization.
2009 (December) IA and the OCA form the Open Book Alliance to oppose the Google Books Settlement Agreement. Yahoo, Microsoft, Amazon, the Special Libraries Association and the New York Library Association soon join (Oder, et al. 2009).
2010 (February) The IA debuts BookServer, “a distributed system for lending and vending on the Internet” at the O’Reilly Media Tools of Change for Publishing Conference. It allows individuals to buy or check out in-copyright but out-of-print materials, dovetailing with Open Library and connecting with libraries and retailers (Hadro 2010a).
2010 (Spring) IA works with the City of San Francisco to hire over 125 workers for its scanning project, as subsidized labor under the Temporary Assistance for Needy Families (TANF) program (INT, Miller 2010).
2010 (May 6) In the final post on the Open Content Alliance blog, Kahle announces that the IA will be making 1 million books, both in and out of copyright, accessible to the print disabled via Open Library in the open DAISY talking book format (Hadro 2010b, Kahle 2010).
2010 (June) The first 200 or so ebook versions of out-of-print, in-copyright books go live for lending via Open Library. They are readable for two-week periods using Adobe Digital Editions Software (Rapp 2010).
2011 (April) IA announces that 85,000 in-copyright, out-of-print titles, contributed by 150 public and academic libraries, will be made available via Open Library, but only to patrons actually physically located in those 150 libraries (though once patrons download the books, they can use them on their personal devices outside the library for the duration of the loan) (Rapp 2011c).
2011 (June) OCLC researchers develop “oclcBot,” a piece of software that matches up records from Open Library to records from OCLC, checks to see of the Open Library has an OCLC number (a unique identifier commonly used across library systems), and inserts one if none is present (Rapp 2011b).
2011 (July) Kahle announces the establishment of the Internet Archive’s physical book archive, hoping to obtain “one copy of everything ever published.” The archive is launched with an initial collection of 450,000 items, accumulated as part of IA’s various digitization efforts, and seeks to build its collection through donations and by gathering up items deaccessioned by other libraries (INT, Rapp 2011a).
2011 (October) October: The state librarians of all 50 U.S. states vote unanimously to enter into a memorandum of understanding with IA, pledging their support for the Open Library’s online lending program (Kelley 2011).

Works Cited

(linked where possible)

Albanese, Andrew. “AAP Sues Google over Scan Plan.” Library Journal 130, no. 19 (November 15 2005): 17-18.

———. “BLC, OCA Join in Digitization Effort.” Library Journal 132, no. 17 (October 15 2007a): 15-16.

———. “Hathitrust Is Launched.” Library Journal 133, no. 18 (November 1 2008a): 13.

———. “Microsoft Gives up Scan Plan.” Library Journal 133, no. 12 (July 15 2008b): 14.

———. “OCA to Scan Orphan Works.” Library Journal 132, no. 19 (November 15 2007b): 16-17.

———. “UC Joins Google’s Scan Plan.” Library Journal 131, no. 14 (September 1 2006a): 14-15.

———. “UNC Library, SILS Join Content Alliance.” Library Journal 131, no. 6 (April 1 2006b): 21-22.

———. “Yale, Microsoft Join in Scan Plan.” Library Journal 132, no. 20 (December 15 2007c): 19.

Ashmore, Beth, and Jill E. Grogg. “The Race to the Shelf Continues – the Open Content Alliance.Searcher 16, no. 1 (January 2008): 18-23.

Auchard, Eric. “Microsoft Joins Yahoo on Digital Library Alliance.” Yahoo News, October 26, 2005.

Bengtson, Jonathan B. “The Birth of the Universal Library.” Library Journal 131, no. 6 (Spring 2006): 2-7.

Berry, John N., III. “Chicago Hope.” Library Journal 134, no. 10 (June 1 2009): 22.

“Boston Library Consortium and Open Content Alliance to Provide Digitized Books.” College & Research Libraries News 68, no. 10 (2007): 624-25.

Carlson, Scott, and Jeffrey R. Young. “Yahoo Works with Academic Libraries on a New Project to Digitize Books.” Chronicle of Higher Education 52, no. 8 (October 14 2005): A34. (BEHIND PAYWALL)

Colford, Michael R. “Rethinking Resource Sharing: Boston Public Library Provides Scan-on-Demand for Interlibrary Loan.” ASCLA 30, no. 1 (2008): 5-6.

“Consortium Forms OCA to Bring Additional Content Online.” Advanced Technology Libraries 34, no. 11 (2005): 9-10.

Crawford, Walt. “Discovering Books: The OCA/GBS Saga Continues.” Cites & Insights 6, no. 6 (Spring 2006).

———. “OCA and GLP 2: Steps on the Digitization Road.” Cites & Insights 5, no. 14 (2005).

Frequently Asked Questions About the Million Book Project.”  2007.

Google, Inc. “Google Checks out Library Books.”  2004.

Goth, G. “Digital Libraries Are Taking Form.” IEEE Distributed Systems Online 8, no. 12 (2007): 1-3.

Guess, Andy. “Post-Microsoft, Libraries Mull Digitization.” Inside Higher Ed, May 30, 2008.

Guren, Cliff. “Live Search Books Beta Release.” Bing Community (blog). December 5, 2006.

Hadro, Josh. “Infotech.” Library Journal 135, no. 5 (March 15 2010a): 16.

———. “Infotech.” Library Journal 135, no. 10 (June 1 2010b): 18.

Hane, Paula J. “Free Content Options Continue to Shake Things Up.” Information Today 24, no. 11 (2007): 7-12. (PAYWALL)

Hardy, Quentin. “The Big Deal: Brewster Kahle.” Forbes, November 27, 2009.

Have a Hand in Scan-on-Demand.” Library Journal 133, no. 20 (December 15 2008): 27.

Hogge, Becky. “Brewster Kahle.” New Statesman 134, no. 4762 (2005): 26.

Internet Archive. “Internet Archive Scribe2.” Launchpad.net, 2007. Accessed August 21, 2013.

Internet Archive, and Boston Public Library. “The John Adams Library Collection – Cooperative Agreement.” April 13, 2007.

The Internet Archive Receives Grant from Alfred P. Sloan Foundation to Digitize and Provide Open Online Access to Historical Collections from Five Major Libraries.” December 20, 2006.

Johnson, Richard K. “In Google’s Broad Wake: Taking Responsibility for Shaping the Global Digital Library.” ARL Bimonthly Report 250 (2007): 1-15.

Kahle, Brewster. Interviewed by Elisabeth A. Jones. September 7, 2011. In person, at the Internet Archive, San Francisco, CA.

———. “Achievements for Humanity.” Opencontentalliance.org (blog). July 5, 2009.

———. “Announcing the Open Content Alliance.” Yahoo! Search Blog (blog). October 2, 2005a.

———. “Books Scanning to Be Publicly Funded.”  2008.

———. “Bookscanning Launch and Vision of an Open Library.”  2005b.

———. “Internet Archive Officially a Library.” Internet Archive Blogs (blog). June 25, 2007a.

———. Libraries Going Open. San Francisco: Internet Archive, 2007b.

———. “Over 1 Million Digital Books Now Available Free to the Print-Disabled.” Opencontentalliance.org (blog). May 6, 2010.

Kelley, Michael. “Newsdesk.” Library Journal 136, no. 20 (December 1 2011): 14.

Kniffel, Leonard. “Backed by Internet Archive, Entrepreneur Takes on OCLC.” American Libraries  (April 2008).

Kupferschmid, Keith. “Are Authors and Publishers Getting Scroogled?Information Today 22, no. 11 (2005): 1-5.

McCoy, Adrian. “The Internet Gives Birth to an ‘Official’ Online Library.” Pittsburgh Post-Gazette, June 22, 2007.

Microsoft Scans British Library.” BBC News, November 4, 2005.

Milestone Achieved.” Opencontentalliance.org (blog). December 20, 2006.

Miller, Robert. “Saveusjobspasssenatebills4213.” Internet Archive, 2010.

“MSN Search Announces MSN Book Search.” Advanced Technology Libraries 34, no. 12 (2005): 6.

Nadella, Satya. “Book Search Winding Down.” Bing Community (blog). May 23, 2008.

Notess, Greg R. “Search Engine Update.” Online 30, no. 1 (January/February 2006): 15. (PAYWALL)

———. “Search Engine Update.” Online 31, no. 2 (March/April 2007): 14. (PAYWALL)

O’Leary, Mick. “Open Content Alliance Embodies Open Source Movement.” Information Today 26, no. 1 (January 2009): 37-43. (PAYWALL…ironically)

Oder, Norman, Lynn Blumenstein, and Josh Hadro. “Newsdesk.” Library Journal 134, no. 20 (December 15 2009): 14-17.

P1. Interviewed by Elisabeth A. Jones. September 6, 2011. In person.

P2. Interviewed by Elisabeth A. Jones. September 14, 2011. In person.

P3. Interviewed by Elisabeth A. Jones. September 14, 2011. In person.

P5. Interviewed by Elisabeth A. Jones. September 9, 2011. In person.

P11. Interviewed by Elisabeth A. Jones. October 27, 2011. Skype.

“People.” Technicalities 28, no. 6 (2008): 23.

Rapp, David. “Infotech.” Library Journal 135, no. 13 (August 1 2010): 16.

———. “Infotech.” Library Journal 136, no. 6 (April 1 2011a): 16.

———. “Infotech.” Library Journal 136, no. 10 (June 1 2011b): 20.

———. “Infotech.” Library Journal 136, no. 12 (July 1 2011c): 18.

RLG Joins Open Content Alliance.” College & Research Libraries News 66, no. 11 (December 2005): 770.

Sloan Foundation Grant Awarded.” Opencontentalliance.org (blog). December 19, 2006.

St. Clair, Gloriana. “The Million Book Project in Relation to Google.” Journal of Library Administration 47, no. 1/2 (2008): 151-63.

Staff Bios.” Internet Archive.

Suber, Peter. “The Open Content Alliance.” SPARC Open Access Newsletter, November 2, 2005.

Swartz, Aaron. “Announcing the Open Library.” Raw Thought (blog). July 16, 2007.

Technology.” Alexa Internet.

Tong, Judy. “Responsible Party – Brewster Kahle; a Library of the Web, on the Web.” New York Times, September 8, 2002.

The Triangle Research Libraries Network (TRLN) Member Libraries Join Open Content Alliance.” D-Lib Magazine 14, no. 3/4 (2008).

“TRLN Libraries Join Open Content Alliance.” Advanced Technology Libraries  (2008) Academic OneFile.

“TRLN Member Libraries Join Open Content Alliance.” Library Hi Tech News 25, no. 4 (2008): 21-21:

“U of Illinois Joins Open Content Alliance.” Advanced Technology Libraries 36, no. 4 (2007).

UNC-Chapel Hill Library and Library School Join Open Content Alliance.” College & Research Libraries News 67, no. 3 (March 2006): 140.

Advertisements

New Google Books Library Project Timeline: Now With (more) Citations!

It seemed like this inspired some interest even in sketchy form, so I thought I should flesh it out a bit, and make it more useful. I’ve added citations to the info provided below. Where the source is my own dissertation interviews (with librarians and Googlers involved in the project), I’ve used the notation (INT). I still consider this a living document, and so reserve the right to add to it as I continue to write things up more formally.

1996 Backrub, precursor to PageRank, developed based on principles/ideas from scholarly publishing/citation linking (widely known; e.g. Battelle 2005, Levy 2011, also the official Google Books History page)
1998 Google search engine launches
2001 First talks between Larry Page and librarians at the University of Michigan (INT)
2002 Talks underway in earnest at UM and Stanford; Larry Page still personally involved; Monthly calls btw Google & UM (INT)
2003 Oxford, NYPL, Harvard brought in on discussions (INT)
2003 (December) Google Print publisher project first publicly discussed
2004 (Spring) Contracts signed with G5 libraries (e.g., Michigan’s; the others are not public, but logic would suggest that they must have been signed around the same time)
2004 Pilot scanning at U. of Michigan; role of library partnerships manager(s) created (INT)
2004 (August 19) Google IPO (August 19) (Widely publicized; the founders’ IPO letter is one source)
2004 (December 14) Google Print Library Project announced (Harvard internally 1 day before everyone else)
2005 Scanning begins at G5 libraries (Stanford = March); Google starts building out additional scanning centers (INT)
2005 Dan Clancy hired to manage project (INT; also on his public LinkedIn profile)
2005 (September-October) Lawsuits filed by AG, AAP (about a million possible refs – most comprehensive for all things lawsuit-related is The Public Index)
2005 (November) Google changes name of project to Google Book Search
2006 Google hits 1 million volumes; rate of scanning increases; Doug Kuch hired to run logistics for GBLP (all INT). Three U.S. scanning centers in operation (to my knowledge): Mountain View (CA), Ann Arbor (MI), Cambridge (MA) (INT).
2006 Partners added: U of California (August), Universidad Complutense de Madrid (September), U of Wisconsin (October), U of Virginia (November)
2007 Partners added: U of Texas (January), Biblioteca de Catalunya (January), Princeton (February), Bavarian State Library (March), Lausanne (May), Ghent (May), Mysore (May), CIC (June), Keio (July), Cornell (August), Columbia (December)
2007 NYPL begins to offer access to its scans via its catalog
2008 Google finishes scanning materials from Harvard and Oxford (INT) – I suspect NYPL also finished around this time, but have no direct substantiation for that.
2008 (February) University of Michigan reaches 1 million books scanned
2008 (October) Settlement Agreement first proposed (info at The Public Index); HathiTrust Launched
2008 (November) Google reaches 7 million volumes (6 million from libraries)
2009 (November) Revised Settlement Agreement put forward; some library contracts revised in its wake (in anticipation of its approval) (info at The Public Index; also amended contracts for Michigan, Wisconsin, and Texas)
2009 (December) French court loss; no more scanning in-copyright books in France
2010 Partners added: Italian Ministry of Culture (March), Austrian National Library (June), Dutch National Library (July)
2010 (June) Google reaches 12 million volumes
2011 Partners added: Czech National Library (February), British Library (June)
2011 (March) Settlement Agreement rejected by court
2011 (October) Google shuts down Mountain View scanning center, leaving Ann Arbor as the sole remaining scanning center in the United States; rate of scanning decreased (INT)
2012 (January) HathiTrust reaches 10 million volumes
2012 (March) Google reaches 20 million volumes
2013 (April) Google reaches 30 million volumes [Update 7/15/13: This is not a figure Google stands behind; their public number is still 20 million (so says a source at Google)]

Google Books Library Project Timeline: What am I missing?

UPDATE: This version of the timeline has been superseded. New version here: New Google Books Library Project Timeline: Now With (more) Citations! (Thanks for all the input!)

I’ve been working on finishing up the pre-writing for my last two dissertation case chapters, and yesterday I put together this timeline of the Google Books Library Project, based on a mix of data from my interviews and from other primary and secondary sources. I thought I’d throw it up here to see if anybody out there on the interwebs might see anything that’s glaringly missing and should be there. (Plus, I figured there was a chance that others might find it useful, even in its current sketchy form.)

There are a few things in particular which, looking at what I already have, I’d love to know/put dates to, if anyone knows them offhand (otherwise I’m sure I can dig most of them up somewhere…maybe even in my own EndNote library…):

  • When did the first library-scanned book go live on the Google site? (I’m guessing sometime in 2005, but a month would be cool.)
  • When did each library start offering access to their Google scans through their OPAC, if they have done so? (I have a date for NYPL, but not the others)
  • Where outside the U.S. did/does Google have scanning centers? When did each close, if it has?
  • Where was the Google scanning center on the East coast? I assume there was one nearer to Harvard & NYPL than Ann Arbor, but perhaps that’s not the case?
  • The 30 million volumes figure at the bottom comes from a recent NYRB article by Darnton, and it’s uncited there – does anyone know a more official source for it? (I believe it, it’d just be nice to have something more solid.)

So here goes:

1996 Backrub, precursor to PageRank, developed based on principles/ideas from scholarly publishing/citation linking
1998 Google search engine launches
2001 First talks between Larry Page and librarians at the University of Michigan
2002 Talks underway in earnest at UM and Stanford; Larry Page still personally involved; Monthly calls btw Google & UM
2003 Oxford, NYPL, Harvard brought in on discussions
2003 (December) Google Print publisher project first publicly discussed
2004 (Spring) Contracts signed with G5 libraries
2004 Pilot scanning at U. of Michigan; role of library partnerships manager(s) created
2004 (August 19) Google IPO (August 19)
2004 (December 14) Google Print Library Project announced (Harvard internally 1 day before everyone else)
2005 Scanning begins at G5 libraries (Stanford = March); Google starts building out additional scanning centers
2005? Dan Clancy hired to manage project
2005 (September-October) Lawsuits filed by AG, AAP
2005 (November) Google changes name of project to Google Book Search
2006 Google hits 1 million volumes; rate of scanning increases; Doug Kuch hired to run logistics for GBLP
2006 Partners added: U of California (August), Universidad Complutense de Madrid (September), U of Wisconsin (October), U of Virginia (November)
2007 Partners added: U of Texas (January), Princeton (February),Bavarian State Library (March), Lausanne (May), Ghent (May), Mysore (May), CIC (June), Keio (July), Cornell (August), Columbia (December)
2007 NYPL begins to offer access to its scans via its catalog
2008 Google finishes scanning materials from Harvard and Oxford
2008 (February) University of Michigan reaches 1 million books scanned
2008 (October) Settlement Agreement first proposed; HathiTrust Launched
2008 (November) Google reaches 7 million volumes (6 million from libraries)
2009 (July) Partner added: Biblioteca de Catalunya
2009 (November) Revised Settlement Agreement put forward; some library contracts revised in its wake (in anticipation of its approval)
2009 (December) French court loss; no more scanning in-copyright books in France
2010 Partners added: Italian Ministry of Culture (March), Austrian National Library (June), Dutch National Library (July)
2010 (June) Google reaches 12 million volumes
2011 Partners added: Czech National Library (February), British Library (June)
2011 (March) Settlement Agreement rejected by court
2011 (October) Google shuts down Mountain View scanning center, leaving Ann Arbor as the sole remaining scanning center in the United States; rate of scanning decreased
2012 (January) HathiTrust reaches 10 million volumes
2012 (March) Google reaches 20 million volumes
2013 (April) Google reaches 30 million volumes

*Updates since first posting are noted in pink*