A bit of entertainment

A few weeks ago, my school put out a promo video for our undergrad Informatics program, for which they interviewed a student who happened to be TA’ing in the course I was teaching at the time. The film crew came to class to film her a little more, and ended up taking some pretty entertaining footage of me teaching, two bits of which made the final edit.

Note: I half wish I could say that these clips aren’t reflective…but they probably are. I’m…um…animated. 🙂

Enjoy!

Advertisements

Monograph Costs and Urban Legends–What’s Wrong with This Picture?

[Re-blog of my guest post on The Scholarly Kitchen, which ran today. This version has been lightly edited to make it fit the context of this blog.]

A recent post by Kent Anderson on The Scholarly Kitchen described the ease with which a bit of misinformation can blow up into an “academic urban legend” through the magic of citation networks. While Anderson’s tale centered on the iron content of spinach, the phenomenon of misinformation creeping into the conventional wisdom is hardly limited to the field of nutritional science. In fact, those of us interested in the future of publishing and libraries fall into these patterns ourselves. As in other domains, some of the data we rely upon as well-established and unquestionable sometimes turns out to be quite questionable indeed. This post will describe – and begin to question – one such touchstone within our field: the Association of Research Libraries’ (ARL’s) annual graph, “Monograph and Serials Costs in ARL Libraries.” The most recent version of the graph appears below:

ARLmonograph-serial-costs

What’s wrong with this picture?

Every year for the past three decades, ARL has released a set of statistics about its member institutions – everything from the number of graduate students at their home universities to the average prices of the monographs they purchase. Since its addition in 1989, one of the most frequently-cited pieces of this annual report has been that year’s version of the graph pictured above. It is especially common to see elements of this graph, or even the graph itself, referenced in discussions of the economic issues surrounding electronic serials subscriptions, analyses of the perils facing library collection budgets, and arguments in favor of open access publishing – indeed, it has been used that way previously on the Scholarly Kitchen itself – and a quick search for the exact title of the graph (sans date range) in Google Scholar pulls up nearly 100 hits. So well-established is this image, in fact, that one article on library collection development trends [paywall] goes so far as to assert that it “is quite possibly the best-known contemporary symbol of today’s library in today’s marketplace.” At present, ARL is undergoing a significant reorientation of its data tracking practices in this area, so it is unclear whether further revisions of this graph will be produced. However, it is quite unlikely that the many versions already out there will cease to be used in analyses of the library and publishing world – although, as I will suggest, perhaps they should be. (Stay tuned.)

Recent versions of the graph have reported on four trends relevant to materials purchasing in major research libraries (the members of ARL):
1. Serial Expenditures,
2. Monograph Unit Cost,
3. Monograph Expenditures, and
4. Monographs Purchased.

The data here come from surveys of ARL member libraries, conducted annually by ARL itself. And on its face, the trends that the graph portrays seem very reasonable. They tell a familiar story, in line with the conventional wisdom: serials expenditures are skyrocketing, while all the figures for monographs meander along well below. And this familiar story, backed up repeatedly by this exact graph and its direct predecessors, has been used to make countless numbers of policy arguments at levels ranging from the departmental to the international.

There is just one problem with this graph – and by extension, our community’s (over)reliance upon it: the trends it portrays cannot logically coexist.

Setting aside the Serials trend line for a moment, let’s take a closer look at the three lines for monographs, and particularly, for the sake of illustration, the trends these lines portray in the period from 2008 to 2011. In that period, we can observe the following:

– Monograph Unit Costs are Rising
– Monographs Purchased are Rising
– Monograph Expenditures are…Falling?

This is plainly impossible. Say the average cost of a book rises from $50 to $60 over the same period where the average number of monographs libraries purchase rises from 20,000 to 25,000. Basic arithmetic tells us that the monograph expenditures must also rise, from $1 million to $1.5 million ($50 * 20,000 = $1,000,000; $60 * 25,000 = $1,500,000). There is – or ought to be – a precise arithmetic relationship among these values. And yet the canonical ARL graph shows expenditures falling while prices and purchasing both rise.

How can this be?

The answer to this question reveals a deep and potentially fatal weakness in the methodology behind the graph. The problem is that all of the trend lines it portrays are derived from different samples that are of different sizes. For the most recent version of the graph, the Monograph Unit Cost figure represents the median of 57 libraries’ data, the Monograph Expenditures figure represents the median of 97 libraries’ data, and the Monographs Purchased figure represents the median of 58 libraries’ data (as shown in the Excel file ARL provides alongside the graph).

This is not good statistical practice; indeed, it renders the trends in the graph completely non-comparable.

Yet, it is easy to see how this might have happened. One suspects that the story runs something like this: All of ARL’s survey data for each year is initially aggregated by variable, and is initially analyzed in isolation. At some point, someone has the idea to extend this analysis of each isolated variable longitudinally, comparing the variables’ values to past values of those same variables. But to make sure that the data is comparable year-over-year, they limit the sample to the libraries with complete data for the period, within each variable. So far, so good.

However, different libraries have answered – or have abstained from answering – different questions. Thus, when the variables get aggregated into a cross-variable comparative graph, the samples ought to have been re-adjusted, to ensure that the sample data was not only complete and comparable along one dimension (each variable over time), but along all of those relevant to analysis – including, and especially, comparability of the population of libraries for each variable. But this does not seem to have been done.

Additionally, it will come as no surprise to readers of the Scholarly Kitchen that the way in which libraries define the term “monograph” has been evolving. Where in past decades it could be more or less depended upon to refer exclusively to bound paper books, in recent years more and more libraries have been reporting ebook statistics to ARL under this same heading. Yet, this evolution has not occurred uniformly across ARL members, and that has caused further problems with the organization’s statistics. Indeed, as ARL’s Martha Kyrillidou recently noted to me via email, ARL no longer tracks “monographs,” per se, but asks its members about “one time purchases” instead.

A bit more digging into the ARL archives, moreover, reveals a further wrinkle: the logically incompatible trends portrayed in the graph have only appeared in iterations published since 1999; previous iterations appear at least superficially logical in their trends. And indeed, on page 5 of the 1993-1994 edition, explicit mention is made of the sampling issue, and how it has been dealt with: “The graphs are based on time series that start in 1986, and they depict only those libraries that have had no missing data in the respective variables since 1986. Although these graphs are based on less than the full population of 108 academic libraries, additional analysis has been carried out to ensure that the time series trends represent population trends.” Based on the inconsistencies enumerated above, however, it would appear that these procedures may no longer be followed.

ARL is one of very few organizations that collect this sort of broad-scale longitudinal data on libraries’ behavior and environment – and taken in their full context and with the requisite grains of salt, these data are exceedingly valuable resources for thinking about the present and future of both academic libraries and scholarly publishing. Yet, in the case of ARL’s “Monograph and Serial Costs” graph, this necessary context seems to have fallen away: the evolving image has become increasingly entrenched as unquestioned truth, even as the trends it portrays have diverged further and further from what is logically possible. The effort to ground arguments for open access, for changes to library practice, and for new forms of scholarly publishing in empirical data is undoubtedly positive. However, as we go about assembling data for these purposes, we must continue to look upon it with a critical eye – no matter how stable and objective the source may appear.

My Dissertation: a Preview

Earlier this week, at long last, I submitted my dissertation, Constructing the Universal Library, to my reading committee. I also passed it along to all of those interviewed for it (at least those for whom I could locate a current email address), just as a final check that the document accurately reflects their views and experiences.

The defense is scheduled for Monday, May 12, 2014, at 9 AM, in the UW Libraries’ Allen Auditorium, and it’s open to the public – so if you’re in Seattle, please feel free to come.

In the meantime, I thought I’d post the abstract and table of contents, just to give a preview of what this behemoth* actually says.**

So, without further ado…

Continue reading

What we talk about when we talk about the Google Books fair use decision

[Cross-posted at the Library Juice Blog]

[First, disclosure: I am currently affiliated with the University of Michigan Libraries, and was also so affiliated when the Google Books lawsuits were filed in 2005. I also worked in Media Relations for the UM side of the project in 2006-07. And, of course, I’ve spent the last several years working on a dissertation in which the Google Books Library Project is perhaps the central case (it’s certainly the longest chapter). These experiences have undoubtedly shaped the views that follow. And now, disclaimer: these are my own opinions, and do not reflect the views of any of my employers, past or present. Also, I am not a lawyer, and nothing here should be construed as legal advice.]

Last Thursday, when Judge Chin handed down his decision granting Google’s motion for summary judgment in the Author’s Guild’s 8-year-old* copyright lawsuit against it, I shared the elation of many in the library, tech, and research communities who, like me, have been following the case since the beginning.
Like them, I truly believe that the ruling is a victory for libraries, for innovation, and for research. It supports and confirms Judge Baer’s earlier decision in the AG’s case against HathiTrust, and in so doing provides strong reassurance that future digitization projects – whether executed by libraries or by other private or public entities – should be able to proceed with some confidence that as long as certain boundaries are respected, such digitization will be found fair and legal.

Reading the early celebratory analyses, I initially felt I had little to say – others had summed it up so well.

However, this morning I read the chain of emails re-posted to the Library Juice Blog from the Progressive Librarians Guild discussion list and Social Responsibilities Round Table discussion list, and it made me feel like I might have something to say after all – and when Library Juice’s founder, Rory Litwin, approached me directly to see if I had any thoughts, that sealed it. And here we are.

In that chain of emails, several progressive-leaning librarians expressed a great deal of skepticism regarding the idea that the Google Books fair use decision was actually “a victory for libraries,” on a number of grounds. Most of these rationales rested on a fundamental distrust of Google as a corporation, and of its motives for getting involved in scanning books.

OK, fine. No need to trust Google. No need to like or respect their motives.

But here’s the thing: however you might feel about Google or its motives, those feelings are irrelevant to thinking about the implications of Judge Chin’s decision for libraries.

Yes, Google undoubtedly plans to make money off these scans – though as the opinion notes, not by selling the scans in question, and also not by selling advertising around them.** But does that inherently make them evil from a library perspective? Don’t libraries do business with a lot of other corporations who do much worse things to information access than Google? (I’m looking at you, Elsevier…Wiley…Springer…) And what’s more, don’t libraries pay these corporations millions of dollars per year to provide their services? Google’s library partners never paid Google a red cent for scanning their books (which is not to say it was cost-free – only that Google didn’t charge libraries for its scanning service). So why is one acceptable, and the other not?***

Of course, there are many more substantial critiques that can be made of the Google Books Library Project from a library perspective. Among the most compelling, in my view, are the privacy implications for readers using the service (which are terrifying, if you think about it) and the frankly crappy metadata, which can’t help but impede any kind of research executed using the corpus (but especially the kind of “big data” work that is so in vogue these days). These critiques also appeared in the re-posted email thread.

But these critiques, as important as they are, are no more relevant to thinking about whether or not Judge Chin’s decision was a victory for libraries than the more subjective distaste for Google described above. They don’t matter either. Not here.

Judge Chin’s decision is beneficial for libraries not because it benefits Google (though of course it does) but because of the way the law works – that is, based on precedent. This decision sets the precedent that scanning books for the purpose of indexing – even books in copyright, and even without the copyright-holder’s permission – is fair use, so long as access to the actual digital versions of those in-copyright books is limited in particular ways. Judge Baer’s decision set a very similar precedent. And those precedents are immensely valuable to libraries who wish to go forward with digitizing and broadening access to their collections, whether they choose to do so in partnership with a corporation like Google, with a nonprofit like the Internet Archive, with a collection of their institutional peers, or with nobody but their own staff.

The nature of legal precedent is such that you don’t have to like the party that wins, and you don’t have to like what it’s doing, in order for that precedent to benefit you. Heck, I seem to recall that at least half of the cases we read in Intellectual Property & Information Law centered on pornographers, hate groups, and other unsympathetic protagonists – and those sketchy characters often won, but that didn’t mean the decisions set bad precedents from the perspective of library values and ethics. Often just the reverse.

Moreover, Judge Chin’s decision also benefits some library projects more directly – especially HathiTrust. Since HathiTrust is mostly composed of Google scans, it would have suffered a significant blow if the Author’s Guild had gotten its way, since it would probably have had to stop using all the scans of in-copyright works that Google had made, both for search and retrieval and, one suspects, for providing access to the print-disabled (though, I am not a lawyer – if Chin’s ruling had conflicted with Baer’s here, I’m not sure exactly what would have happened). Judge Chin’s ruling undoubtedly has the folks involved with both HathiTrust itself and the HathiTrust Research Center breathing a massive sigh of relief.

So yes, I’m sticking with my view, and the ALA’s view, and the view of many others, that Judge Chin’s decision was a massive victory for libraries. Because though the case was about Google, the decision is about more than that. It’s about the rights of information users – whether corporate, public, or individual – to make use of copyrighted works in transformative ways that do not imperil the economic well-being of the copyright holders, in a world where copyright terms last far longer than they truly should. For libraries, it’s about lowering the level of tension surrounding the legal risks of digitization, and of making secondary uses of externally-digitized works. It’s about the public good. Google may be a massive part of the information ecosystem, and its influence may be deeply questionable in many ways – but in the context of this decision, Google is only a tiny piece of what matters.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

*Seriously, if this lawsuit was a person, it would have just started the second grade.

**Most likely, as one of my dissertation research participants speculated, the benefit of Google Books to the Google bottom line will be indirect, through increased eyeballs and increased data-banks that help to improve the algorithm and sell advertising in other parts of the Google megasphere.

***Also worth noting: Google does not have, and has never had, a monopoly on these scans. Heck, that’s part of what the Author’s Guild was suing over – that Google was providing the scans to libraries, with few limitations on what those libraries could do with them. And hey look! The libraries almost immediately started pooling their scans (along with other scans they’d created under other projects), and made HathiTrust! It’s almost as though the Google scans are accessible through a provider within the library world, which might be more beholden to library ethics and metadata standards! Well go figure!

Draft Open Content Alliance Timeline

Some of the comments I received on the Google timeline were very helpful to me, so I decided to post the timeline for the other digitization case in my dissertation, the Open Content Alliance (OCA). In many ways, the OCA has been harder to unravel into a linear narrative, partially because the documentation on it is much sparser than that on Google Books. Still, between the press coverage and a set of interviews I conducted with the project’s early leaders (marked INT for confidential interviews; Brewster Kahle opted to make his interview open), I think I’ve made a fair start. (And in fact, because I included every detail I came across on this one, it’s actually substantially longer than the Google timeline turned out to be.)

Please do let me know if I am missing things or have particular facts wrong.

Here is what I have (citations follow below):

1980-mid-1990s Brewster Kahle begins to think about building a universal digital library, but recognizes several gaps in the existing technology, and sets about trying to fill them, working on increasing the capacity of digital storage, building out digital networks, and working out online revenue streams for publishers (Interview with Kahle, 2011)
1995 Kahle sells WAIS, an early publishing and distributed search system, to AOL for $15 million (“Staff Bios”, Hogge 2005)
1996 Kahle founds web analytics company Alexa Internet, named for the Library of Alexandria, in collaboration with Bruce Gilliat. As part of its functionality, Alexa crawls and archives webpages (“Staff Bios”, Tong 2002).  Simultaneously, Kahle founds the Internet Archive, which uses Alexa’s archived web-crawls as the foundation for its archive of internet content, made accessible via the Wayback Machine (Hogge 2005).
1999 Kahle sells Alexa Internet to Amazon for $250 million; the Internet Archive retains the right to receive web crawls from Alexa (“Technology”, Hogge 2005, Hardy 2009).
2000 The Million Book Project, an NSF-funded large-scale digitization project geared toward addressing a particular set of technological research questions, is initiated by a group of computer scientists at Carnegie Mellon, in collaboration with universities in China and India, the Biblioteca Alexandrina in Egypt, and other partners (St. Clair 2008, 152-53).
2004 The Internet Archive signs on to assist the Million Book Project (by then also referred to as the Universal Library Project) with permanent archiving, quality control, and materials acquisition (“Frequently Asked Questions” 2007). Following on their experiences with Million Book, IA begins to develop its own book scanning technology (both hardware and software), which would eventually be called the Scribe (Interview with Kahle, 2011).
2004 (Autumn) The Internet Archive and the University of Toronto begin a pilot project to test the IA’s scanning process. Within the next year, they scan about 2,000 books (Carlson and Young 2005, Bengtson 2006)
2004 (December 14) Google Print Library Project announced (Google 2004).
2005 (Early) Conversations begin within the Internet Archive and its social sphere about starting up a more open book scanning project, as an alternative to Google’s; initial thought was to fund the project via contributions from libraries (INT).
2005 (Early/Mid) Sumir Meghani of Yahoo! approaches the Internet Archive, proposes the concept of the Open Content Alliance, a collaborative project to scan works out of copyright or otherwise openly available for scanning. Yahoo! puts up $150,000 in funding to start, targeting a collection of works in American Studies at the University of California (INT, Albanese 2005, Johnson 2007, 5).
2005 (October 2) Open Content Alliance announced by Brewster Kahle on the Yahoo! Search Blog (2005a). Founding partners include: The Internet Archive (technology/content/hosting/administration), Yahoo! (technology/indexing/financial), Adobe Systems (technology), The European Archive (content), HP Labs (technology), The UK National Archives (content), O’Reilly Media (content), Prelinger Archives (content), University of California (content), University of Toronto (content) (“Consortium Forms OCA to Bring Additional Content Online” 2005)
2005 (By October 14) The project is endorsed by the Association of Learned and Professional Society Publishers (ALPSP) (Carlson and Young 2005).
2005 (Mid-October) Partners added: Lulu (a print on demand service), LibriVox (a producer of audio editions), the Biodiversity Heritage Library, the Smithsonian Institution Libraries, and eleven university libraries (McMaster, Memorial University of Newfoundland, the University of Ottawa, University of British Columbia, York University, Columbia University, Emory University, Johns Hopkins University, the University of Virginia, Rice University and the University of Pittsburgh) (Auchard 2005, Crawford 2005, Suber 2005).
2005 (October 25) Microsoft joins the OCA, pledging funding and tech/scanning assistance (Auchard 2005); simultaneously announces MSN Book Search (“MSN Search Announces MSN Book Search” 2005, Albanese 2005, Crawford 2005).
2005 (October 27) The Research Libraries Group (RLG) signs on to provide the OCA with bibliographic information from its union catalog (“RLG Joins Open Content Alliance” 2005, Crawford 2005).
2005 (By November 2) The project is endorsed by the Association of American Publishers (AAP) and the American Association of University Presses (AAUP) (Suber 2005).
2005 (November 4) British Library signs on to scan 100,000 books with Microsoft (“Microsoft Scans British Library” 2005, Crawford 2005, Kupferschmid 2005).
2005 (November 8) Official launch of OCA book scanning; Microsoft pledges to fund the digitization of 150,000 by the end of 2006 (Kahle 2005b).
2005 (November 9) WSJ reports that the IA/Toronto pilot project has scanned 2800 books, at a cost of $108,250, over the course of the past year (Crawford 2005).
2005 (December 29) A group of 27 Canadian research libraries announces that they are jointly establishing a project called Alouette Canada, a digitization alliance intended to work collaboratively with the OCA (Crawford 2006).
2006 (March) Partner added: University of North Carolina-Chapel Hill (Library and SILS; providing content and expertise, respectively) (“UNC-Chapel Hill Library and Library School Join Open Content Alliance” 2006, Albanese 2006b).
2006 (September) The University of California joins Google Book Search; Kahle sees this as a betrayal of the OCA’s principles on UC’s part, going so far as to claim that UC is “effectively giving their library to a single corporation,” despite its then-ongoing participation in the OCA (INT, Albanese 2006a).
2006 (December 6) Microsoft Live Search Books releases its first beta version (at http://books.live.com – now defunct) (Guren 2006).
2006 (December 19) IA receives $1 million from the Sloan Foundation to scan specific collections from the Boston Public Library, The Getty Research Institute, The Metropolitan Museum of Art, UC-Berkeley’s Bancroft Library, and Johns Hopkins (“Sloan Foundation Grant Awarded” 2006,  “The Internet Archive Receives Grant” 2006, Internet Archive and Boston Public Library 2007).
2006 (December 20) IA announces that it has digitized and made available 100,000 books, largely from members of the Open Content Alliance (“Milestone Achieved”).
2007 (March) The OCA reaches 130,000 volumes scanned, all available via the IA Text Archive (Notess 2007).
2007 (April) Partner added: University of Illinois (“U of Illinois Joins Open Content Alliance” 2007).
2007 (June 25) The Internet Archive successfully petitions to be declared a library by the State of California, in order to gain eligibility for state-administered federal grants (Kahle 2007a, McCoy 2007).
2007 (July 16) At Kahle’s request, open access advocate and entrepreneur Aaron Swartz signs on to help build the architecture for Open Library. His goal in the design, as he puts it, is to create “a website with a page for every book, collecting everything we can find out about it from libraries, publishers, reviewers, and of course, book lovers” (Swartz 2007, Kniffel 2008, quoting Swartz).
2007 (October) The OCA reaches 200,000 volumes scanned. Eight scanning centers are in operation, in three countries: the US, Canada, and the UK (Goth 2007, Kahle 2007b, Ashmore and Grogg 2008). IA releases a rewritten version of its scanning software, Scribe2, which promises greater format flexibility, less bandwidth usage, and support for new cameras (Internet Archive 2007, Kahle 2007b).
2007 (November) Partner added: Boston Library Consortium (19 member libraries, all contributing public domain materials only, self-funded). The BLC publicly announces that it was approached by Google first, but rejected them in favor of OCA (“Boston Library Consortium and Open Content Alliance to Provide Digitized Books” 2007, Albanese 2007a, Hane 2007).
2007 (November 15) A set of OCA partners – IA, BPL, MBL-WHOI, and Universidad Francisco Marroquín – announce a plan to scan out-of-print, in-copyright works for distribution via a new form of digital interlibrary loan, which they will develop (Albanese 2007b).
2007 (December) The OCA reaches 250,000 volumes scanned (Hane 2007).
2007 (December 15) Yale University signs on with Microsoft to scan up to 100,000 books outside of the OCA, and on terms more like Google’s than like IA’s (Albanese 2007c).
2008 (January-April) With funding from Microsoft, IA deploys five more US-based scanning centers under the banner of the OCA (INT, Kahle 2008).
2008 (February) The Boston Public Library begins to offer scan-on-demand interlibrary loan (ILL) services for public domain works using its onsite Scribe workstations. This reduces the turnaround time from weeks to days, and makes it possible to fulfill ILL requests that would otherwise have been denied due to the condition and/or rarity of the item requested (Colford 2008).
2008 (February 19) Partner added: Triangle Research Libraries Network, a consortium composed of the research libraries at Duke University, North Carolina Central University, North Carolina State University, and the University of North Carolina at Chapel Hill (the last of which was already an OCA partner) (“Triangle Research Libraries Network” 2008,  “TRLN Libraries” 2008,  “TRLN Member Libraries” 2008).
2008 (May 23) Microsoft announces it is bowing out of book scanning, having spent $10 million on its efforts. The company’s departure leaves a major gap in OCA funding (INT, Albanese 2008b, Guess 2008, Kahle 2008, Nadella 2008).
2008 (August) Maura Marx, then head of the Boston Public Library’s Digital Content Program, is hired to be the first Executive Director of the OCA. However, she never actually assumes the role, but instead founds a separate initiative, Open Knowledge Commons, with Sloan funding (INT, “People” 2008, Berry 2009).
2008 (November) HathiTrust launched, incorporating scanned content from OCA as well as Google Books and other digitization projects (Albanese 2008a).
2008 (December) Open Library and Boston Public Library jointly begin to offer a scan-on-demand service for public domain works that have been indexed by Open Library, but have not yet been made available in full text (“Have a Hand in Scan-on-Demand” 2008).
2009 (January) The OCA reaches 1 million volumes scanned – including 300,000 donated by Microsoft after the discontinuation of Live Search Books (O’Leary 2009).
2009 (July 5) Last mention of the term “Open Content Alliance” on the organization’s own blog (Kahle 2009). After about this point, the project under that name is effectively defunct, though various pieces of it persist, and the term still pops up occasionally in discussions of book digitization.
2009 (December) IA and the OCA form the Open Book Alliance to oppose the Google Books Settlement Agreement. Yahoo, Microsoft, Amazon, the Special Libraries Association and the New York Library Association soon join (Oder, et al. 2009).
2010 (February) The IA debuts BookServer, “a distributed system for lending and vending on the Internet” at the O’Reilly Media Tools of Change for Publishing Conference. It allows individuals to buy or check out in-copyright but out-of-print materials, dovetailing with Open Library and connecting with libraries and retailers (Hadro 2010a).
2010 (Spring) IA works with the City of San Francisco to hire over 125 workers for its scanning project, as subsidized labor under the Temporary Assistance for Needy Families (TANF) program (INT, Miller 2010).
2010 (May 6) In the final post on the Open Content Alliance blog, Kahle announces that the IA will be making 1 million books, both in and out of copyright, accessible to the print disabled via Open Library in the open DAISY talking book format (Hadro 2010b, Kahle 2010).
2010 (June) The first 200 or so ebook versions of out-of-print, in-copyright books go live for lending via Open Library. They are readable for two-week periods using Adobe Digital Editions Software (Rapp 2010).
2011 (April) IA announces that 85,000 in-copyright, out-of-print titles, contributed by 150 public and academic libraries, will be made available via Open Library, but only to patrons actually physically located in those 150 libraries (though once patrons download the books, they can use them on their personal devices outside the library for the duration of the loan) (Rapp 2011c).
2011 (June) OCLC researchers develop “oclcBot,” a piece of software that matches up records from Open Library to records from OCLC, checks to see of the Open Library has an OCLC number (a unique identifier commonly used across library systems), and inserts one if none is present (Rapp 2011b).
2011 (July) Kahle announces the establishment of the Internet Archive’s physical book archive, hoping to obtain “one copy of everything ever published.” The archive is launched with an initial collection of 450,000 items, accumulated as part of IA’s various digitization efforts, and seeks to build its collection through donations and by gathering up items deaccessioned by other libraries (INT, Rapp 2011a).
2011 (October) October: The state librarians of all 50 U.S. states vote unanimously to enter into a memorandum of understanding with IA, pledging their support for the Open Library’s online lending program (Kelley 2011).

Works Cited

(linked where possible)

Albanese, Andrew. “AAP Sues Google over Scan Plan.” Library Journal 130, no. 19 (November 15 2005): 17-18.

———. “BLC, OCA Join in Digitization Effort.” Library Journal 132, no. 17 (October 15 2007a): 15-16.

———. “Hathitrust Is Launched.” Library Journal 133, no. 18 (November 1 2008a): 13.

———. “Microsoft Gives up Scan Plan.” Library Journal 133, no. 12 (July 15 2008b): 14.

———. “OCA to Scan Orphan Works.” Library Journal 132, no. 19 (November 15 2007b): 16-17.

———. “UC Joins Google’s Scan Plan.” Library Journal 131, no. 14 (September 1 2006a): 14-15.

———. “UNC Library, SILS Join Content Alliance.” Library Journal 131, no. 6 (April 1 2006b): 21-22.

———. “Yale, Microsoft Join in Scan Plan.” Library Journal 132, no. 20 (December 15 2007c): 19.

Ashmore, Beth, and Jill E. Grogg. “The Race to the Shelf Continues – the Open Content Alliance.Searcher 16, no. 1 (January 2008): 18-23.

Auchard, Eric. “Microsoft Joins Yahoo on Digital Library Alliance.” Yahoo News, October 26, 2005.

Bengtson, Jonathan B. “The Birth of the Universal Library.” Library Journal 131, no. 6 (Spring 2006): 2-7.

Berry, John N., III. “Chicago Hope.” Library Journal 134, no. 10 (June 1 2009): 22.

“Boston Library Consortium and Open Content Alliance to Provide Digitized Books.” College & Research Libraries News 68, no. 10 (2007): 624-25.

Carlson, Scott, and Jeffrey R. Young. “Yahoo Works with Academic Libraries on a New Project to Digitize Books.” Chronicle of Higher Education 52, no. 8 (October 14 2005): A34. (BEHIND PAYWALL)

Colford, Michael R. “Rethinking Resource Sharing: Boston Public Library Provides Scan-on-Demand for Interlibrary Loan.” ASCLA 30, no. 1 (2008): 5-6.

“Consortium Forms OCA to Bring Additional Content Online.” Advanced Technology Libraries 34, no. 11 (2005): 9-10.

Crawford, Walt. “Discovering Books: The OCA/GBS Saga Continues.” Cites & Insights 6, no. 6 (Spring 2006).

———. “OCA and GLP 2: Steps on the Digitization Road.” Cites & Insights 5, no. 14 (2005).

Frequently Asked Questions About the Million Book Project.”  2007.

Google, Inc. “Google Checks out Library Books.”  2004.

Goth, G. “Digital Libraries Are Taking Form.” IEEE Distributed Systems Online 8, no. 12 (2007): 1-3.

Guess, Andy. “Post-Microsoft, Libraries Mull Digitization.” Inside Higher Ed, May 30, 2008.

Guren, Cliff. “Live Search Books Beta Release.” Bing Community (blog). December 5, 2006.

Hadro, Josh. “Infotech.” Library Journal 135, no. 5 (March 15 2010a): 16.

———. “Infotech.” Library Journal 135, no. 10 (June 1 2010b): 18.

Hane, Paula J. “Free Content Options Continue to Shake Things Up.” Information Today 24, no. 11 (2007): 7-12. (PAYWALL)

Hardy, Quentin. “The Big Deal: Brewster Kahle.” Forbes, November 27, 2009.

Have a Hand in Scan-on-Demand.” Library Journal 133, no. 20 (December 15 2008): 27.

Hogge, Becky. “Brewster Kahle.” New Statesman 134, no. 4762 (2005): 26.

Internet Archive. “Internet Archive Scribe2.” Launchpad.net, 2007. Accessed August 21, 2013.

Internet Archive, and Boston Public Library. “The John Adams Library Collection – Cooperative Agreement.” April 13, 2007.

The Internet Archive Receives Grant from Alfred P. Sloan Foundation to Digitize and Provide Open Online Access to Historical Collections from Five Major Libraries.” December 20, 2006.

Johnson, Richard K. “In Google’s Broad Wake: Taking Responsibility for Shaping the Global Digital Library.” ARL Bimonthly Report 250 (2007): 1-15.

Kahle, Brewster. Interviewed by Elisabeth A. Jones. September 7, 2011. In person, at the Internet Archive, San Francisco, CA.

———. “Achievements for Humanity.” Opencontentalliance.org (blog). July 5, 2009.

———. “Announcing the Open Content Alliance.” Yahoo! Search Blog (blog). October 2, 2005a.

———. “Books Scanning to Be Publicly Funded.”  2008.

———. “Bookscanning Launch and Vision of an Open Library.”  2005b.

———. “Internet Archive Officially a Library.” Internet Archive Blogs (blog). June 25, 2007a.

———. Libraries Going Open. San Francisco: Internet Archive, 2007b.

———. “Over 1 Million Digital Books Now Available Free to the Print-Disabled.” Opencontentalliance.org (blog). May 6, 2010.

Kelley, Michael. “Newsdesk.” Library Journal 136, no. 20 (December 1 2011): 14.

Kniffel, Leonard. “Backed by Internet Archive, Entrepreneur Takes on OCLC.” American Libraries  (April 2008).

Kupferschmid, Keith. “Are Authors and Publishers Getting Scroogled?Information Today 22, no. 11 (2005): 1-5.

McCoy, Adrian. “The Internet Gives Birth to an ‘Official’ Online Library.” Pittsburgh Post-Gazette, June 22, 2007.

Microsoft Scans British Library.” BBC News, November 4, 2005.

Milestone Achieved.” Opencontentalliance.org (blog). December 20, 2006.

Miller, Robert. “Saveusjobspasssenatebills4213.” Internet Archive, 2010.

“MSN Search Announces MSN Book Search.” Advanced Technology Libraries 34, no. 12 (2005): 6.

Nadella, Satya. “Book Search Winding Down.” Bing Community (blog). May 23, 2008.

Notess, Greg R. “Search Engine Update.” Online 30, no. 1 (January/February 2006): 15. (PAYWALL)

———. “Search Engine Update.” Online 31, no. 2 (March/April 2007): 14. (PAYWALL)

O’Leary, Mick. “Open Content Alliance Embodies Open Source Movement.” Information Today 26, no. 1 (January 2009): 37-43. (PAYWALL…ironically)

Oder, Norman, Lynn Blumenstein, and Josh Hadro. “Newsdesk.” Library Journal 134, no. 20 (December 15 2009): 14-17.

P1. Interviewed by Elisabeth A. Jones. September 6, 2011. In person.

P2. Interviewed by Elisabeth A. Jones. September 14, 2011. In person.

P3. Interviewed by Elisabeth A. Jones. September 14, 2011. In person.

P5. Interviewed by Elisabeth A. Jones. September 9, 2011. In person.

P11. Interviewed by Elisabeth A. Jones. October 27, 2011. Skype.

“People.” Technicalities 28, no. 6 (2008): 23.

Rapp, David. “Infotech.” Library Journal 135, no. 13 (August 1 2010): 16.

———. “Infotech.” Library Journal 136, no. 6 (April 1 2011a): 16.

———. “Infotech.” Library Journal 136, no. 10 (June 1 2011b): 20.

———. “Infotech.” Library Journal 136, no. 12 (July 1 2011c): 18.

RLG Joins Open Content Alliance.” College & Research Libraries News 66, no. 11 (December 2005): 770.

Sloan Foundation Grant Awarded.” Opencontentalliance.org (blog). December 19, 2006.

St. Clair, Gloriana. “The Million Book Project in Relation to Google.” Journal of Library Administration 47, no. 1/2 (2008): 151-63.

Staff Bios.” Internet Archive.

Suber, Peter. “The Open Content Alliance.” SPARC Open Access Newsletter, November 2, 2005.

Swartz, Aaron. “Announcing the Open Library.” Raw Thought (blog). July 16, 2007.

Technology.” Alexa Internet.

Tong, Judy. “Responsible Party – Brewster Kahle; a Library of the Web, on the Web.” New York Times, September 8, 2002.

The Triangle Research Libraries Network (TRLN) Member Libraries Join Open Content Alliance.” D-Lib Magazine 14, no. 3/4 (2008).

“TRLN Libraries Join Open Content Alliance.” Advanced Technology Libraries  (2008) Academic OneFile.

“TRLN Member Libraries Join Open Content Alliance.” Library Hi Tech News 25, no. 4 (2008): 21-21:

“U of Illinois Joins Open Content Alliance.” Advanced Technology Libraries 36, no. 4 (2007).

UNC-Chapel Hill Library and Library School Join Open Content Alliance.” College & Research Libraries News 67, no. 3 (March 2006): 140.

Research Methods and the Library Professional (talk)

This morning I gave a talk at the University of Michigan’s Hatcher Graduate Library about data collection methods in professional library settings, as part of the library’s Emergent Research Conversation Series. It seemed pretty well-received (highest attendance ever for this series!), and I definitely had fun talking about these things with those who attended.

The slides and handout will be posted at the library website shortly, along with [shudder] a video recording of the actual talk. But I figured it couldn’t hurt to also post them here. In addition to the SlideShare version below, here are PDF versions of the slides and the handout that went along with them. The handout is essentially a bibliography, which recommends several general methodology texts and also provides citations to articles using each of the different data collection methods discussed in LIS settings (mostly as recommended in Wildemuth 2009).

Enjoy!