Symposium Notes, somewhat delayed

So, I spent much of last weekend at the UMich Library’s Symposium on Digitization – “Scholarship and Libraries in Transition: A Dialogue about the Impacts of Mass Digitization Projects” – and I had to take notes for my internship. It was truly exhausting – ended up typing 30 pages’ worth of notes, all told – not much room for cognitive thought. I’m still processing. But I thought the notes might be useful to someone, so I’m posting them here. Unfortunately, I can’t get them to look particularly pretty in Blogger; if anyone has ideas on transitioning bullets easily from Word to Blogger, please let me know. In the meantime, have a blast…I’m sure they’ll be fascinating to all comers…heh…


Digitization Symposium
March 10, 2006

Intro: NCLIS rep
• Libraries – Enlighten, entertain, educate
• Past – constrained by costs, geography, technology
• Mass digital conversion of past decade “puts patrons [etc] at the heart of media” – engage in new ways, liberated from the constraints of the analog world
• Can now reach almost anyone anywhere, anytime
• More ways to connect with patrons
• Also challenges: interoperability, content theft
• Solvable through cooperation – libraries and public as well
• Symposium – hoping to highlight the MD projects going on at Michigan, active dialogue about digitization from many points of view, engage libraries at a national level in a discussion of impact of Mass Digit. Initiatives

Brenda Johnson, AUL for Public Services; JPW AUL for Tech Services & IT

President Coleman
• I recognize this speech…
• Jefferson, Library of Congress, value of books, fire, etc.
• Google, digitization
• Safeguarding human knowledge, protect, preserve…
• Michigan = leader in digital preservation; long before Google, long after
• Digital copies of work may be the only ones that survive into the future {think so? What about all the issues with the impermanence of digital formats?}
• Need to embrace the Internet, realize that’s how students do research these days {But then, is it necessarily good to encourage students to run to Google for all their information needs?}
• March 10 is the anniversary of the telephone – earth-shattering technology, changed how the world worked; could convey emotion, which telegraph wasn’t so good at{basically a killer app for the analog world}
• Q&A
o Should talk about “rising” into the public domain rather than “falling” into the public domain
o U of Colorado: what did Pat Schroeder think about your talk
• MSC: had a lovely time at the AAP meeting, was a lively and spirited discussion
• New business models will evolve; publishers and authors will be great beneficiaries of this technology
• Fair use will be divided by the courts
• Shouldn’t be afraid to let each other know how we feel – purpose for symposium
• 10-20 years we’ll look back and say this tech opened access, allowed for the assertion of creativity in a previously unimaginable way
• Not only noble, also promises to change the economic dynamic of people around the world

Panel 1: Impact of Mass Digitization on Libraries (Barbara Allen, Michael Keller, Josie Parker [moderator], Karin Wittenborg)
• JP: I am neither a scholar nor a researcher, probably the only one on any panel
• JP: Panelists will speak for 10 min each, then open floor for questions
• JP: Believes she is probably the only public library director to blog on her library website
o Need to define what we mean by “public”
o Change is not free; change costs a lot; necessary to recognize the need for it
o Need to deliver a return on the public’s investment
o Without what the university does, it would be difficult for the public library to do what it does
• BA (CIC):
o Thank you to Pres. Coleman for support of both digitization and other initiatives such as affirmative action; references Woodrow Wilson (infighting in Washington nothing compared to Princeton)
o Window of opportunity, before public policy is fully set, to begin experimentation and partnership to drive these efforts
o Research library trends that indicate changes in user behavior
• 2002 ARL report (123 Research libraries): circulation below 1991 levels for the first time – fewer people are coming into the library, fewer are using reference
• However, ILL has increased 140%
• Users have strong preference for using digital materials, even when they’re available in print
• Example: JSTOR – print materials used 672 times, digital materials 12,000 during control period
• Expenditures for University libraries have doubled (?), but libraries are buying fewer books
• 1994 – 63 libraries, $11 million on electronic resources
• 2004 – $270 million on ER
• Most ER from commercial publishers; licensing digital journals
o What we do know and don’t know
• OCLC – 32 million records representing books
• 40% of all books are held uniquely by one institution in that database
• 50% are works published before 1977
o Trends
• Public libraries, other libraries, can rethink their space – gathering places for inquiry
• New organizing principles for freely available as well as licensed digital materials
• Create value for all the partners in the chain
• Collaborate in creating large scale digital libraries – makes little sense to digitize the same material multiple times; can realize efficiencies & economies of scale by working together, centralizing efforts
• Let the Google project run, then look to see what’s missing; digitize items that are unique to indiv. libraries
• Thos. Jefferson – university/library as builder of public good
• Break down artificial walls to access, in ways consistent with our values and win ways consistent with the public good
• Michael Keller
o Litigation underway; has faith that the lawsuits will serve the public good in the end
o Challenges notion that library is only a physical entity containing books and materials
o “The Library of the Mind” – 1972 – Venn(?): things that contradict our hypotheses are the library of the scholar
o library is not just a building with books and services – pervades over the network, is also an ideal
o In the decade and a half it took Stanford to convert card catalog to OPAC, usage increased 50% (circulation) – not just books taken out, but books taken off shelves and used
o Hard to measure use of digital resources
o Use of digital information fostered enthusiasm for Google partnership
o June 2001 – Rick Prelinger and him involved – Meeting – (Paul Allen?) – large supercomputer in a satellite, synchronous with earth, containing all the world’s information, to improve the lives of everyone in the world
• Highwire Press journals: Google spidered them in a deal much like the current one with Harper Collins
• Caused an order of magnitude increase in use of those articles; now even more
• Increasing intellectual access to collections
o Ways to look at digitization other than intellectual access:
• Sales of current books increase when you can search the words inside (like on Amazon)
• Stanford intends to do analysis and research on digital copies of books; very important
• Not just about access, not just about snippets; about research, about innovation, about energy that has made the U.S. leaders in the advance of civilization
o Stanford intends to (within the university, not necessarily outside):
• Taxonomically index them – ideas, not just words
• Associative searching – Akihiko Takano
• Citation linking from the footnotes of books
• Provide new ways of navigating information landscapes – especially through GUIs
• Provide readers with alerts when new items that match their interests come into the collection
• Provide book recommendation services like Amazon
• Will not purvey private information about patrons, but want to show usage by faculty, undergrad, grad
• Highlight words to find definitions, place names to find maps, etc.
o Help readers to advance themselves in ways that they see fit
o Make full use of copyright law and fair use; new decisions on orphan works
o Discover which books have not had their copyrights renewed
o Potential positive impacts in Arabic-speaking countries/Middle East
• Karen Wittenborg (UVA)
o Thomas Jefferson is “really thrilled about Mass Digitization”
o Not an impartial observer – believes GBS is one of the most important things to happen during her career, and that mass digt’n will change everything for libraries, users, the public, etc
o Can’t even imagine how radical the change will be
o Deeply disappointed and “irked” at the opposition to mass digitization on the part of some librarians, many publishers, and some authors, as well as some others who have chosen to weigh in
• Among the arguments: digit’n not the solution to everything; quality not up to standards; copyright
• Ultimately, adds up to: it changes the status quo; she believes that is a good thing
o Optimistic about the future of digitization
• Quality will improve to near-perfect
• Perpetual access will become possible
• Inexpensive portable devices for accessing information
• Move from mostly text to all formats and all languages
• Access will be affordable
o Libraries
• Reaction to Google announcement: What a relief
• Libraries were beginning to work on digitization initiatives, gave her nightmares of preservation microfilm initiatives – what would standards be, who would do what, etc.
• Libraries don’t have resources to do all these things, Google does; allows libraries to focus on other initiatives
o Copyright: should serve scholars and public interest rather than commercial interests – need to work for change
o Space: virtual library vs. physical library; won’t see the disappearance of the latter in the near term, but depends on the extent to which libraries can reinvent themselves
• “Libraries are sinkholes for space”
• Some believe we won’t need them at all, soon
• Space usage has already changed a lot; idea of the library as intellectual crossroads, place for dialogue; opportunity for library programming not done previously
• Less custodians of physical collections than facilitators of intellectual life
o Long term is uncertain
• Depends on librarians as to whether there will be a long term future for libraries
• Q&A
o Hearings about Sec. 108 – in the new digital world, what points could this group communicate on these issues?
• MK: need to make clear that sec 108 provides not only for the creation of the digital copy, but also for the use of it
o Difference w/ JSTOR, etc. of putting journals online; when Google makes public domain works available, what will be the effects in the libraries – what are the libraries planning on doing with the paper copies once they’re digitally available?
• KW: less about what we’ll do with existing collections than what this project will do for us; print-on-demand, etc; we will have books for the foreseeable future, since people aren’t really reading books online yet; wondering about whether or not to build more high density storage, possibility of discarding common/old journals
• MK: What do you do with the physical backset when the backset’s been digitized? Keep one copy, but not multiple ones; usage explodes on digital copies
• BA: Only time will tell how users will interact with this information differently
o Library space: if libraries change from depositories, what would you like to see happen with the space?
• MK: Books are not going away! But Stanford is planning a bookless Engineering library; planning more services, communal spaces, etc. In 20 yrs will have more bookless libraries. People like to study in libraries…
• KW: They’re rethinking library space. Moving some library ops out of prime real estate, make more space for digital services. Better serve underserved graduate students. Continual renewal of buildings to suit the new ways people will work
• BA: Indiana U. opened up a commons where students could work in groups, collaborate on digital projects, etc.
o Purdue U Press: Faculty research being produced and then sold back to libraries – system must change; digitization democratizes research; would be nice if U’s decided that if knowledge is a public good, they should support U presses more strongly. What do you think of digital repositories now, and what role will they play in the future?
• BA: Big 10 produces 20% of PhDs in the country; would like to encourage collaboration among these scholars, also work with them to develop new systems for informational communication/integration; excellent opportunities from the international perspective.
• MK: As dig. Repositories prove themselves to be worthy archives, we may see more publishers discarding their print-on-paper versions, which may lead to more efficiency in scholarly communication; challenge is to discover how to make it easy for students & researchers to utilize such info-silos for their research
o Notre Dame Libraries: as others use the digital copies to make their own, the library will become less central as an information repository; what will they do then?
• MK: Thinks it’s a mischaracterization; doubts people will go around making lots of copies. Adds that services that he hopes to make available are already being developed for the web.
• KW: currently, librarians are in a unique place with regard to info. Can add a lot of value by adding tools, architecture; but would be kidding ourselves if we think it’ll stay our sole responsibility; perhaps think about outsourcing more
o Minnesota: Thanks for drawing the legal fire for the rest of us, Google partners! Problem of how to rationalize the more finite task of digitization along with organization/preservation of born-digital content.
• MK: no lawsuits against libraries yet! Feels there is no rush to digitize, yet the GBS project has inspired others to compete, and digitize more. Question of born-digital materials is very important, and we’re working on that; so, not only how to digitize, but how to preserve. Great challenge for future.
• BA: Digitization is consonant with stewardship of physical objects. Opportunities abound…
• KW: thinks there is a rush. Students want interactive teaching and learning, new ways of getting to information; need to move and solve problems that come along
o Music librarian (Madison?): set of dysfunctions emerging in his mind, suggesting we’ll run into issues about collaboration, related to duplication. Importance of value added initiatives – what about the universities whose collections aren’t the ones being digitized? The research opportunities are limited to those with the digital copy. Also, issue of things out of print but in copyright…
• MK: copyright used to be harmonized with patents – 14 years. Now, disharmony with patents. Would love to see the strict constructionists on the court bring us back into harmony. Books tend to go out of print within 5 yrs., then pass into orphan work status after a bit. Need public policy folks to change policies to create more subtle regulations in these areas.
• BA: Has Article 1 of Constitution in front of her! Promote progress of useful arts, limitations; commercial interests are not allowed to deny public access to information (at least, not by the constitution). Non-duplication: availability of deep and rich special collections
o GPO: Authentication. Are there standards or practices for giving public trust in the information they are seeing?
• MK: major issue. Need to make sure that digital avatars are accurate representations of the originals. Potential for use of hashes, other technologies…
o U of M (Peter Honeyman): issue of DRM, DMCA
• MK: HarperCollins shows us that some publishers are supportive of creating new ways of accessing materials; much of it is “by arrangement,” which is troubling
• KW: Pressure from scholars for born-digital projects…at UVA, there’s a special asst. who is an attorney, gives faculty members advice on copyright, etc for digital projects.
o U of M (missed name): Fan of digitization; but the conversation has context that has not been explicitly stated: assuming pervasive, persistent digital environment in which to access these resources. Does not address the issue of what happens when that is not the case, such as Hurricane Katrina. Need more disaster planning, third-world countries, etc; core knowledge sets that should be duplicated, made geographically redundant
• BA: Great opportunity for collaboration; but don’t let the perfect get in the way of the good.
o Katrina = excellent argument for digitization. Also, issue of digitization for developing countries. If it’s in digital form and accessible on the web, people in developing countries will find it if they need it (according to their own testimony); if it’s in print, no way.
• Closing comments:
o MK: Filo & Wang started Yahoo in 1994; Brin & Page started Google in 1999. Dozens preceded and succeeded these projects; there are unimaginable things to come.
• One more comment/question: born digital stuff – is anyone making a physical copy of things like Slate? Also, need to proselytize about fair use rights to users/public.

Tim O’Reilly Keynote: “Reading 2.0”
• Reading 2.0 because there’s a meeting next week on the future of reading in the digital age
• What job does a book do?
o We think of books as artifacts, but we don’t think of the job that they actually do
o Internet is world’s most successful ebook
o Job of book – create imaginative world in which to immerse yourself – so maybe World of Warcraft’s a book
o Reference, like encyclopedia – so maybe Google, Wikipedia are books
o O’Reilly books – educational, mainly, but also fun for many (esp. “Hacks” series”); new magazine, Make = “Martha Stewart for geeks”
• Graph: Britannica vs. Wikipedia (usage)
o Wikipedia consistently rising, Britannica flatlining way down near the bottom
• eBook reading devices – stuff with lots of DRM, little available content is not going to take over the world
o Thing that made iPod possible was the widely available capability to burn & rip CDs
o No equivalent for books so far
• Davenport Group: DocBook (1988)
o How do we make an online library?
o Led to Safari Bookshelf
• Lets you do a lot of things
• Can find other material that looks similar – links to sections in other books that cover the same material – only possible in the electronic medium
• What job does a library do?
o Not just dissemination of information, also preservation of information
o WWW pioneers didn’t really preserve their early history
• Viola archive – early web browsing, now offline, inaccessible
• Applicable to Eolas lawsuit, but can’t bring it in, because it’s not accessible
o Wayback Machine – important innovation, not soon enough for Viola
• Graph: vs. Library of Congress – same idea as Britannica/Wikipedia graph
• So what’s the good of the Google Library Project?
o With iPod and CD burning, music industry got very nervous
o Yet, iPod couldn’t have happened w/o Napster – needed a body of content to draw on
o Internet access used to be free – Usenet, cooperative networking
o Free will be replaced by sales as technology matures
o LastFM – helps you find artists you might like based on artists you say you already like – part of new economy in music
o Same will happen for books if we let it go
o Internet BookMobile – still not terribly easy to “rip” a book; means positive externalities are still further off
o GBS is a good step in the direction of allowing for the positive externalities
o 75% plus of books are in the gray area between in-print-in-copyright and public domain – nightmare to find and purchase rights, because in many cases nobody knows
o Strong vs. Loose DRM
• Cat to the vet vs. Dog to the vet – hold a cat loosely, hold a dog tightly; DRM should be more like a cat.
o Long Tail
• Chris Anderson
• Search spurs demand, increases sales (suggested by Amazon’s search inside the book)
o Does Online Search drive discovery?
• Compare sales of physical books vs. e-books
• Sales of physical books as reported by bookscan
• E-books – sales of same books, but in e-format on Safari…
• Missed second part of slide
o Out of 17,754 physical ISBNs sold, about 2,000 are available in Safari
o Safari page views pretty much track with physical book sales; however, in some places, there are little spikes in the Safari line toward the end of the tail – people find them again because they’re online and searchable – the graph of the average sales makes the spikes even more pronounced
• Additional demand from long tail effect – in Safari, represents 23% of views; physical books 6% of sales
• Books can deliver value online that they can’t deliver in print (or at the very least, that they aren’t delivering in print)
o Once books aren’t available in bookstores, people access them more frequently online (percentagewise)
o Reference-oriented series have a disproportionate share of GBS views relative to their bookscan units
o Future of learning may be online, entirely
• Questions for publishers:
o Does search help or hurt book sales? — probably neither. There’s a lot of latency; the biggest upside in search is for books already out of print. Of course, Safari’s 20% of O’Reilly’s business…moving online can keep things in print.
• Visions of the future:
o Web2.0 – about information businesses, overlaps with world of book; software as a service; succeed by harnessing collective intelligence
o Users add value: e.g. Amazon user reviews – key to competitive advantage in Internet apps is the value that users themselves add to that which you’ve provided already
• Craigslist – 7th most trafficked site on the internet has only 18 employees – it’s all about the value added by users
o Perpetual Beta
• When devices and programs are connected to the Internet, applications are no longer software artifacts, they are ongoing services
• E.g. Google Maps, Gmail, Flickr, SafariU
• Rough Cuts – access to books before they’re finished; gives O’Reilly access to info about user behavior – huge percentage of digital-only buyers are international
o Software above the level of a single device
• The PC is no longer the only access device for internet applications, and applications that are limited to a single device are less valuable than those that are connected
• For example, Google uses Linux, so if you use Google, you use Linux
• iTunes: system that is the paradigm for the future – part of the application resides on the internet, part on a handheld, and part on the PC
• Need to take these lessons from the web to the future of the book
o Data is the next “Intel Inside”
• Applications are increasingly data-driven; therefore, owning a unique, hard-to-recreate source of data may lead to an Intel-style single-source competitive advantage
• Potential problem – data being consolidated in the hands of a single provider is a dangerous thing; Internet makes good model
• People should have their own digital content; we can all bring our own stuff to the party; work towards interoperability, mobility
• Public domain stuff in GBS is subject to the same DRM as the copyrighted stuff in the publisher program – is that really beneficial?
o A Platform beats an Application every time
• E.g. Microsoft Windows
• Two types of platform:
• One ring to rule them all
• Small pieces loosely joined (e.g. the Internet)
• Composite applications
• E.g. – mashup of and Google Maps
• Also, licensing – O’Reilly licenses content for integration into Dreamweaver
• But what about a web services-based help system? Provide an O’Reilly books “Help” API
o Quote: “I’m an inventor. I became interested in long term trends because an invention has to make sense in the world in which it is finished, not the world in which it is started” (Ray Kurzweil)
• Question: what about losing content/data scrubbing?
o Need to make sure we don’t protect too much; if we have all the rights to our stuff and we want to bury something, we can, which may be a disservice to the public; O’Reilly tries to get the data out there in multiple formats
o Need to get “Bookster” out there somewhere along the line

Panel II: Research, Teaching, and Learning (Jean-Claude Guédon, Ed Tenner, Ann Wolpert)
Ed Tenner:
• Thanks U of M for Making of America – helped with his book research
• Writes about unintended consequences, so wishes to focus on that
• Bill Gates – The Road Ahead
o Claimed that technology would raise educational standards for everyone
o Recent news items have called this into question:
• British literacy concerns among University freshmen
• 2005 U.S. literacy rates not looking good – only 31% of college grads can read a complex book and extrapolate from it
• Web use is up; people should be better able to use information, but it’s not necessarily the case
• Chart: suggests that the level of literacy among the college educated & grad educated population are actually sinking (less are “proficient,” more are “Intermediate” or “basic”)
• Google simplicity to blame? May trick people into believing they’re good searchers, because the engine works well, when really it’s mediocre. The triumph of the “good enough” – trajectory towards mediocrity.
• World History – What would happen if a student entered “world history” into a Google search?
o You don’t get decent surveys of the field until at least the second page of hits
• Clusty: groups search results into more or less topical clusters
• Wikipedia: leaves out everything about World History scholarship before 1980 – not a very good starting point
• Encyclopedia Britannica, on the other hand, doesn’t even list World History
• Arguments:
o Academics should join in the project of open source content creation like Wikipedia
o Should learn tradecraft of search engine optimization so that academic pages don’t get ranked so low
o Search engines should have a scholarly mode of functioning that integrates more authority-of-information criteria in its rankings
• The search engine tends to turn up information that’s only good enough; in the 21st century, good enough isn’t
• Questions:
o Mike Keller: it’s not the search engine’s fault that the results are bad. For one thing, lots of stuff is hidden behind logins, etc, not visible to search engines. Need to teach researchers/students to search beyond the obvious
• ET: Wrote a piece on undergrad info use, agrees that we can’t entirely rely on the search engines, need to build skills to search more deeply
o You say that there are mistakes in Wikipedia – did you fix them?
• ET: I will do that, and possibly write about my experiences
o Recent research & projects that are going on – Strategy Hub as an example of teaching people how to search using expert methods; and building on the first question: is this information out there waiting to be found?
• ET: Would love to hear about the projects; future belongs with just such projects
Jean-Claude Guédon
• Need to think about what happens to the essence of the documents themselves and our relationship with these documents
• When printing became mass printing process, what shifts occurred in our relationship to documents?
o Before print, the quality, value, trustworthiness of a document was tied to its genealogy – where it was copied from (the chain of copies)
o When print came around, people just kind of grabbed manuscripts and started printing them; people questioned their authority/trustworthiness
o When there were many versions, we came around to “critical editions” combining and annotating them
o Text became no longer the direct transmission of knowledge from source to reader; became a representation, an object
o Problems: critical editions of Shakespeare from 1603 & 1624 – didn’t account enough for the possibility that he might’ve changed his mind in 20 years
• Wikipedia as an example again
o What is it? Online encyclopedia done collaboratively. So which is the good one? Today’s? Yesterday’s? Last year’s? A traditional encyclopedia is a fixed snapshot at some point in history; Wikipedia is actually a process, which will be ongoing into the future
o More interactive than traditional encyclopedia – cannot do anything with Britannica (but “throw it at the wall”); can edit and contribute to Wikipedia
• Right now, we’re mesmerized by Google
o Terribly useful but not wholly adequate
o He loves to Google himself; much better than a mirror
o Googles friends, enemies
o It’s a way to look for people to contact for more info; to build relationships
• Examples of utility of Google project:
o PhD theses: contain literature reviews
• Could run analyses on these literature reviews that would help to create visualizations of current epistemology, scholarship, ideas
o We digitize millions of books; perhaps we could create a concordance for these books and then make lists of the least frequently used words to help pinpoint technical terms, etc.; could help build communities and commentaries surrounding specific interests
• Further toward creation of H.G. Wells’ “World Brain”
• Questions:
o Margaret Hedstrom: senses a bit of a contradiction – the practice of citation came along with the critical edition; even if an undergraduate cites a lousy entry in Wikipedia, by the time someone goes to check it, it might have changed. So, what effects do you think recent developments have on citation?
• JCG: RFC analogy (? – I missed the explanation). Wikipedia is interesting because it itself is an experiment. Another analogy would be Free Software development – people’s names stick to their additions to the code, so it’s easy to figure out where problems come from. Need to create some sort of attribution system. Attribution is also part of the incentive to create, in addition to helping create textual authority.
o Rare terms idea – like Amazon’s “Statistically Improbable Phrases” – possibly also like some ideas to do with the Semantic Web.
• JCG: Part of it, to his understanding, has to do with standardized ontologies (there’s more to his answer, largely to do with disambiguation of language; I admit I couldn’t follow parts of it)
o We need to think about open-source image of the things that we digitize; when we get text, we’ve added value to our project already; using the page images we can recreate most of the scholarship; Full text is different from open source image; need to make original images available
Ann Wolpert
• This panel was to confront the question of whether Google has confronted the issue of scholarly inquiry at the same level as it has more generalized inquiry
• MIT has been more interested in “born digital” works, and developing means and methods to manage those works, than in digitization per se
• MIT has been working with Google Scholar, however
• Beta test products: run the sorts of experiments needed to determine whether these search tools can serve the needs of a scholarly community
• Question with Google Scholar is whether you can create an environment that serves the needs of the scholarly community
• Also working to resolve Google searches against proprietary journals owned by libraries like MIT to pull up the best results; push library-owned journal results to the top, use direct linking to journals, rather than having to jump back and forth
• Use of online journals is very similar to potential for mass digitization of book content
• E-books have not achieved a similar status to online journals – don’t get the same kind of traction
o Not really clear why – some on supply side, some on demand side
o Students sometimes object to being asked to use electronic books
o Limited data to guide us
• Where e-resources are available, people vote with their mice
o 85% regularly use online resources
o more use them in a library than off-campus
o 32% aware they could use Google Scholar to access library resources
o 61% thought this feature was important for the future of their research
• Resources themselves rank lower than finding tools in importance
• Web sites most frequently consulted in studies or work
o Libraries
o Google
o MIT home page
o Departmental home pages
• When looking for books:
o Barton (OPAC)
o Amazon
o Google
• Bottom places:
o E-book databases or gateways
• When looking for facts:
o Google
o Wikipedia
o Printed handbooks, dictionaries, etc.
• Bottom places:
o People!
• What students, researchers, faculty want:
o A single interface for searching across a variety of information sources
o Expanded online content, especially older materials
o More access to all library material via commercial search engines
o A “wizard” to help choose the best tools for a topic
• What might we learn from these responses:
o Want help sorting out the chaos; right kind of assistance matters
o Users want to be involved in design
o People know that some high-value info might not be freely available
o [Missed 2 bullets]
• Ongoing mkt research will be necessary
o Standard questions?
o Time series, right experiments, maintain domain expertise, should devise/promote economic models that work for the academy
• Questions:
o Will you make your questionnaire available?
• Sure!
o What was staff reaction to the survey results?
• People were most satisfied with the staff (they just don’t use them)
• Full Panel Discussion:
o JCG: the Ebook question is interesting; most ebooks don’t have the clarity, definition necessary for wider usage; maybe we’re trying to force an old form of document into a new medium where it doesn’t fit – maybe we should try to think more outside the box
o ET: As a writer, if I had unlimited access to visual materials, and the ability to market a product and get paid for it…ebooks replicate all the constraints of the printed page, only adding more – I would love to see them get liberated. Unfortunately, the current copyright system isn’t really made for this kind of transition.
o AW: Most faculty can produce an article, as a practical matter; producing a book is a whole different undertaking. Much of what publishers do is cajole, harass, baby, annoy, do what’s necessary to get an author to produce a work by a date certain. Book publishing is essentially a consignment process; no upfront money, publisher takes all the risk, has to be marketed, sent into a distribution channel that’s almost as dysfunctional, and if the book doesn’t sell, the vendors return the books for credit. In some ways, this mold just needs to be broken – for instance, many faculty insist upon a hard copy…
o ET: Used to work with the Princeton U Press, and for scientists, books are often seen as a distraction from your real work (unlike journal articles, which are your real work). However, science books retain their value longer (at least, in some areas…math, physics). There’s a way in which the irrationality of the book publishing world is part of the appeal of the process – someone loves you enough to print your stuff, and even make you a cloth bound edition.

Panel III: Publishing (L. Suzanne DeBell, Daniel Greenstein, Alicia Wise, Mark Sandler [moderator]
Mark Sandler:
• Publishing initially meant “to make public”
• Now lots of people are publishers in that sense
• Librarians and publishers share many interests: free expression, defense of authorship & challenging ideas
Suzanne DeBell (ProQuest):
• Mass digitization – frightening concept
o Don’t hear enough about nuance, serendipity
o Only positive if you can find what you need
• Web search – “where good enough really is good enough”
• Evidence Matters – new tool from ProQuest for use in Evidence-Based medicine
• [Much of the rest of the talk felt like a sales pitch…]
Alicia Wise:
• Was there life before Google?
• Digitization has been going on for at least a decade, probably longer
o By archives, museums, libraries
o By publishers
• Publishers and Google
o Publisher project – sounds great!
o Lots of publishers have done deals with Google
o Publisher process not as conducive to smaller publishers
• Publishers and Google Library Partnership
o Danger of concentrating so much of the world’s information in the hands of one corporate actor
o Lack of nuance – the challenges for business models & practices will likely be different for different kinds of publishers (specialty magazines vs. encyclopedias, for example)
• Publishers and Google have more in common than they disagree on
o Vision: getting everyone access to the info they need and want
o Legal framework: digitization falls under the spheres of copyright law and contract law – may need to update those legal precepts to deal with new issues in the digital era
o Real costs: creativity, innovation, marketing, cataloging – all cost something
• New issues
o Technology: can digitize faster, and many more works are born digital
o Funding: fragmented, more and more from commercial sources rather than foundations or government
o Stakeholders: diverse – everyone in the information/entertainment “value chain”
• Potential problems
o Copyright, changing roles…
• Vision:
o Users will increasingly expect convenient access to information
o Demand for convenience will fuel innovation
o New services will need to be underpinned by personalizable and securable infrastructure, must ensure privacy
o Info and literature will be freely accessible, but not free of charge. Online content and services will fuel the economy
o Info and lit will be accessible in socially responsible ways, taking into account freedom of expression, ability to pay, and the environment
Daniel Greenstein
• Increasing awareness of strangeness of cycle of scholarly publishing among faculty – sell your IP rights to a publisher so that the publisher can sell them back to your university so that you can use them in your class
• Tendency towards open access
• Faculty are becoming more well-versed in copyright issues, and how to play the copyright system
• Change is hard – people react to it differently
• To him, Google’s making available the backlist, which has never been available before – why are publishers complaining?
• Open Content Alliance:
o Trying to scale up to 5000 books/month
o Aim is to ensure/build out the underlying infrastructure to make other opportunities come to pass
• Wish list:
o Trusted, third-part preservation
o Open services definition
o Collection support tools
o Transparency…
• OCA formats
o JPEG2000 archive master
o PDF (color)
• Need will to ensure that while knowledge becomes universally accessible,
o Knowledge that rises to the public domain stays there
o …[missed point]
• Molly: A lot of people have talked about standards; a lot of publishers have differing feelings on standards, DRM – what are the panels feelings on standards/sharing standards?
o DG: Standards change; libraries have actually given up on an old standard (TIFF) in favor of new ones (JPEG2000)
o SD: Cost is a driving factor, [also echoed what DG said]
• Many publishers were cutting deals with Google Print; seems like some of the lawsuits arise from resentment of Google’s success; somewhat parallel to how many authors feel about publishers; thoughts?
o AW: doesn’t believe publishers are really motivated by jealousy; however, Google could become a potential competitor
• Orphan works issue; one of the difficulties with dealing with books in this orphan category is that they cost exorbitant amounts of money once they’re a certain time out of print – how does the panel feel about this kind of economy?
o DG: Advantage of Google for publishers is that it makes their backlist work better; feels like if this kind of project makes print-on-demand projects work better too, that’s probably a good thing
o SD: Publishers spend a lot of time clearing rights to works; may be some resentment about Google’s not doing that; many publishers do choose to use works for which they can’t locate the rights holder at their own risk
• (Margaret Hedstrom) Myths: Google’s digitization will give them a monopoly over them – Michigan will keep their paper copy, and will get a digital copy; Myth 2: Mass digitization – shouldn’t get too excited about digitization of all human knowledge – there’s millions of pages of manuscripts, letters, which is largely excluded from these conversations so far
o SD: feels that ephemera like the last stuff mentioned presents a great opportunity for digitization
• Function of publication is very different from the interests of publishers – what’s the purpose of locking up the 50-year-old stuff?
• Publishers aren’t just bad guys – need to make their money back somehow. Might have to go to a licensing model for books, if this is what’s going to be done with them. Question: if all of this is going to become a public good, who’s going to pay for it?
o DG: I am! Assumes Google is paying, corporations are paying…
• Bob Frost – question relating to erosion of first sale rights through DRM schemes – need an openly developed, collaboratively developed DRM, not something unilaterally developed by corporations
o AW: In complete agreement.

Adam Smith, Google
• How to retain an open posture and listen to the community (of librarians, and more generally), yet continue to set own priorities?
• The Facts:
o Create a comprehensive, searchable, virtual card catalog of all books in all languages, while respecting copyright
o Most books are out of print (~95%)
o 3 different ways to view a book – public domain, publisher program, and snippets
• Snippets’ usefulness will be dependent on extent of metadata creation
• No ads on snippet books
• Google Scholar
o Working on providing best possible scholarly search
• Working on applying information gleaned from usage to improve Google Scholar, etc.
• Serendipitous discovery
o Vanity search – discovering your past, your family’s past
• Comprehensiveness requires collaboration
o 67% of monographs in OCLC are not held by the 5 libraries involved in the project
• Seeking more library partnerships
o Europe, Asia, Latin America
o Additional US partners
• Inclusion of other digitization efforts
• Library of Congress
o Funding for World Digital Library efforts
• Collaborating with OCLC
• Metadata and Google
o Google is heavy user of metadata in both book search and scholar
o Absolutely essential
• Enabling Virtual Collections
o Improving URLs to facilitate linking, persistence
• Users are creating virtual collections of GBS materials
• User-created GBS add-ons
o Firefox extension to allow individuals to discover whether the book is in their local library
• Web of Content: discovering linkages between books
o References
o Authorship
o Individuals
o Concordance
o Temporal relationships
o Topical similarity
• Access
o New models for publishers – digital marketplace for books; to generate more book revenue for publishers
• For customers to view the books online
• US and UK publishers can sign up, set prices, and choose which books to include
• At this time, consumers will not be able to purchase access to books
o Public domain books
• Today, users can download & save pages from books
o Library search
• Find the closest library with the book
• Discovery process to access process
o Library links
• Facilitate access to library resources
• Automated for on-campus users
• Easy to configure for libraries
• Final thoughts
o Google creates index, complements libraries
o Will drive usage & sales
o Challenges are real (technical, logistical, legal, etc)
o So much more still to do…working together
o We listen – give feedback!
• Questions:
o What is Google doing to cull duplicates from different libraries?
• Still in analysis phase; between external metadata and metadata in book itself, they’ll get to know where the duplicates are; haven’t focused on it intensely to date
o UConn: Since we have to do this all together, and we all have unique stuff, open standards would be strongly appreciated – examples, exceptions, etc. Creating cultural memory of record online – will there be editability through time to correct errors, etc?
• Working on it
• We have PhD experts in OCR technologies, thinking about where we are, what is possible to improve
o UMich: Library catalog info being included in book search; so why aren’t we getting the results we’d expect with that data included?
• It’s a goal, definitely; tackling it piece by piece, in priority order
o Ferris State: In GBS, Public Domain stuff subject to same DRM as stuff still under copyright (as O’Reilly said)?
• No DRM on public domain stuff – except that you have to download it page by page…
o UMich: Reference works will not be available via snippets – what criteria do you use for those decisions?
• Others make those decisions at Google; exact algorithms, policies, he’s unable to discuss
o UMich, etc: I scan books and redistribute them in the public domain; I’d like you to have them, so that people can search it and use it – what can I do?
• Contact him (Adam Smith); interface isn’t perfect for including the scans of others
o Potential for user-mediated correction of OCR errors, and other user participation?
• [no response]
o UMich: How does Google determine copyright status for works published outside the US?
• Not a lawyer, not gonna comment on the legalities
o Why not library lookup universally for books with ISBNs?
• Doing their best to improve that


Digitization Symposium
March 11, 2006

Panel IV: Economics (Paul Courant, Hal Varian, Karl Pohrt, Ronald Milne [moderator])
Ronald Milne (intro)
• Strongly feels publishers still have a leading role to play, though business models may have to change
• Strong faculty support for project at Oxford – all very keen to say what should be digitized first
• Questions: should libraries continue with digitization efforts if Google’s doing it anyway? What might this project mean for the future of libraries in general? And more specifically, effects on collection development, acquisitions, storage…?
• Storage problems are huge at Oxford – 3 linear miles of new additions every year (?)
• Can save incredible amounts of money if you don’t have to build new shelving facilities ($55 million)
• If the digitized copy is sufficient for the reader, where does that leave the publisher or the bookseller?
• Codex is a very useful thing
• We’ve been hearing about the death of the traditional book for ages; hasn’t happened yet
• Transitional phase
Paul Courant
• Distinguish between 2 words – economist & alchemist – haha.
• Comments on yesterday’s discussion
o On utopia that Karin Wittenborg sketched:
• We have media technologies that make so much possible – easier to search, edit, collaborate, distribute, make things pretty – should be able to take advantage of this capacity and do a better job of scholarship
• Shouldn’t so much worry about the library business, the publishing business, etc; should worry about scholarship, and by extension, human welfare
• What matters is less any particular digitization effort – it’s the digitization of lots of stuff, in general, that matters
• Economics of academic libraries – supply and demand
• Old way: Marginal cost of getting a book to someone in finite time much lower if book is in neighborhood – just send someone to get the book (or go get it yourself) – physical libraries were a point of competition, a draw for faculty – technology and geography combined to reify the physical library
• Libraries funded as public goods within the university – excellent example of local public goods
• Digital capabilities make this local public good a global one
• Once something is in digital form on a server, the cost of adding another reader anywhere in the world is virtually zero
o Changes business model for universities
o Physical libraries aren’t as much of a competitive advantage in wooing faculty
o For most material, doesn’t matter where it is
o Value of libraries & collections will be established on the fly (happens to a certain extent now)
• Finding and using physical books begins to seem like a horrible inconvenience
• We won’t have the same kind of collections we had before
o Global scale
o Organization becomes much more difficult
• One requirement of public goods is an assured method for its production
• Hates the phrase scholarly communication – prefers “scholarship” and there is no scholarship w/o communication, so the phrase is redundant – if you don’t share your ideas, it’s not scholarship
o Publish or perish is a moral imperative, beyond being an annoying inconvenience
• Mechanism of scholarship – whatever we think we know, we put in a library, where others can get it out, know where it comes from, that it has authority, same thing every time, for hundreds of years
o Scholars are really bad at this – think of ideas, get it out there, then move on to the next thing – why librarians and archives are so important
o Ability to have a system that continues to allow us to put things in a library and get them out again later = imperative
• Only people trusted for this exercise are librarians
• Money
o We spend a lot in this area
o A lot of this money is redeployable to organizing things in a different way
o Libraries as museums as well as places for finding and making sense of published information
• Making sense of = prof’s jobs – algorithmic tools aren’t good enough w/o human help to make it good enough
o Clearance of rights
• Can’t have a market that works well unless the rights are well established
• Market in this case works very poorly, because rights are NOT well established
• Preservation of culture
o Want to be able to get back at our own history
o Naturally worked with books – last a long time, good descriptions of what’s been going on
o Even film and video aren’t as good as books for this yet
o Need to articulate the political problem of having the good stuff available, maintaining access to our heritage – otherwise we risk having access to only the junk in the long run
• Key point: Express demand, show the value of scholarship.
• Questions:
o Interesting to hear this conversation taking place from Canada where there’s already a public lending right (publishers are compensated for lending from libraries) – question of how this kind of system might be translated into argument for digitization here.
• “In the utopian world, you wouldn’t do it” — marginal cost of getting digital materials to user is zero, so you should charge zero; that said, a pay-per-view model may be more sustainable/saleable than some others
• In general, the idea of a pay per view model troubles PC
• In publishing, “most people who write books don’t expect to make a dime, and they aren’t far off” – there are millions and millions of volumes that nobody is making a dime off of, and there’s no real reason for it
o Scott Dennis: Isn’t the current rights management situation the result of economic factors? Why should we think it’s going to change?
• If we take human greed as being an economic force, then yes…but “invisible hand” argument doesn’t apply to public goods – even Adam Smith got that
• If the economics of it is getting value out of scarce resources, the configuration of copyright law in the context of these resources, and providing access to these sources, then it becomes an obstacle
Hal Varian
• Defense of GLP from the perspective of economics and law
• [Outlining the project…]
• Legal argument: Fair use
o Four factors
• Ad model tied to queries not content; transformative nature of use
• Fact rather than fiction in most cases
• Tiny selections of content
• Not a substitute for entire work; instead is potentially a complement
o Precedent
• Kelly v. Arriba Soft – transformative, commercial, entire work copied (but transformed), no impact on market for the work
o Opt in vs. Opt out
• Transaction costs
• Transaction is valuable when gains to one party exceed costs to another
• Transaction costs can destroy otherwise valuable deals
• Large transaction costs for opt-in, small for opt-out
o Finding rights holders
• How do you know who the rights holder is?
• Seeking heirs, potentially all of them…
• Contractual modifications affecting rights
• Many undocumented agreements and rights arrangements
• Orphan works problem
o Size of collection ~25 million books
• Intractable rights clearance problem
• Very uncertain revenue stream
o Negotiation costs of opt-out
• Send Google an email or call an 800 number w/ book info
• Economics
o Whose behavior is going to change?
• Will publishers and authors write & publish less books?
• Lower quality?
• Lower profits?
• Easier to find books?
o Google’s mission – organizing world’s information
• Includes info in libraries
• Library project uses same model as web; fair use + opt out
• Provides valuable user service
• Imposes minimal costs on publishers
o Who will make the catalogs?
• Typically rather poor incentives to do it (for publishers, etc)
• In future must minimize human intervention
• Prior search & negotiation would impose huge transaction costs on the cataloging industry
• Questions:
o Alicia Wise: analysis of transaction costs is well established in economic principle; however, what’s missing is any analysis of the benefits of collective licensing – established for photocopying in library environment, and for broadcast media. Also, what about transaction costs for publishers of opting out?
• In the Google case, costs of opting out are costs of sending an email
• Much less than rights clearance
o UWisc – Why even have an opt out? Publishers can’t opt out of library photocopying…if it’s fair use, why bother with the opt out?
• Speculates that Google used the web model; seemed clear, worked well for web; likely served as default model
o Analysis seems to skip over something from publisher session – fundamental distrust of Google’s intentions/motivations, possibility of leaking information…how do you deal with that kind of distrust?
• Google sees itself as trying to organize the world’s information
• Google doesn’t really own any content itself
• Other companies are interested in becoming content aggregators; doesn’t seem to be Google’s mission
• Copyright law still exists, should Google overreach or let things “leak out”
• Pretty easy to scan things in already; already a “leaky” system
Karl Pohrt
• Independent bookseller for 30 years, has owned bookstore for 25 years
• Nonprofit sector is not immune from many of the economic factors that roil the world of retail
• Technology moves fast, economies are sluggish
• How can we change business models so that we can move into the future, not be “tossed onto the slag heap of history”?
• Difficult for people in large institutions to understand the hesitations of smaller players in the book economy
• Context: “A Madhyamikan Viewpoint”
o Buddhist philosophy
o Strategy – attack all philosophical viewpoints
o Goal – liberation from delusion
• Bookselling is a much more perilous practice for smaller sellers these days
o Big box, Amazon…everyone’s selling books
o Many independent stores closing, few opening
• 60% of book sales happen outside book stores (includes the internet)
o predictions had it that the percentage of book sales on the internet would be much larger by now; but it’s nonetheless significant
• Decline in literary reading foreshadows decline in civic participation
o Also correlates with increase in participation in electronic media
• Independent sector is underperforming in sales of most popular titles, but dramatically exceed market share in somewhat less popular titles (ranked 150-400)
o Interesting for helping publishers launch books
• Rapidity of tech change makes it difficult to see opportunities
• If everything works like it should with GBS/GPL, everyone will be helped, nobody will get hurt
o Yet, nothing really works out perfectly
o Beware the conditional statement
o There are always unintended effects
• Textbooks, for example: now there are models involving renting textbooks
o Move to decrease use of textbooks by colleges/universities
• Sony reader
o E-ink display technology
o 10,000 Titles available for it, but only from Sony website
• Disintermediation – removing the bookseller from the equation
• Retail bookstore = “2% business”
o Need your support and patronage
o If independent booksellers disappear, choices will get worse for consumers
o Internet can’t replace experience of browsing bookstores
• More from the Buddhists: First mark of existence is impermanence; since we all desire permanence, second mark of existence is sorrow
• Book recommendation: Accelerando, by Charles Stross – novel(?) involving aliens, IP
o There is a technical companion to the book on the Wikipedia website – “transmedia”
o There is also a website that has the text online for free – has seen thousands of direct downloads
• Hermeneutics – difference between monk reading block-printed religious text after years of preparation (in reading it, committing it to memory) and buying it at Shaman Drum and reading it in a coffee shop
General Questions:
• Should government agencies be involved in digitization projects, since they serve the public good? Shouldn’t it come from the public, rather than from corporations like Google?
o HV: Certainly a role for congress to play in dealing with transaction costs related to orphan works, etc.
• Contrarian views: if books are on the web, the codex rules – print on demand becomes much more appealing to users – could streamline economics of book production, perhaps creates opportunity for independent booksellers (who could get into the p.o.d. business). Collective licensing works well, is cost effective, inexpensive, easy, once they met with artists’ representatives {not sure who “they” are here} – easy to opt out if Google is the only indexer; but what happens when projects proliferate? What about projects that authors don’t know about? Also, isn’t there an argument for libraries acting collectively to digitize themselves?
o KP: Has actually been involved with beta testing a machine for publishers printing on demand in the store; but the technology hasn’t really been further developed. Would love to try something like that
o HV: Opt-in, Opt-out is not a legal concept; if it’s fair use, opt-out isn’t necessary, it’s just something Google’s offering. In discussion in Washington, everyone likes the idea of having a centralized rights clearinghouse where permissions could be obtained.
• Bigger issue may be born-digital objects; have to devise some mechanism for placing things into the public sphere, irretrievably and unambiguously, taking into account multiple versions, etc.
o PC: sure. Rich description, as distinct from fixity, is what we’ve always had; cool thing about new technologies is that people can gloss one another’s work in new and different ways

Panel V: Public Policy (James Hilton, Bruce James, Brian Kahin, Nancy Davenport [moderator])
Bruce James
• Forenotes:
o Spends a lot of time attending seminars across the country, and this is the best, most efficiently run one he’s been to
o He’s a Buckeye
o Also chairman of the board of a large private university, and recently had a meeting with others about the out-of-control costs of higher education
• When will it cost more to buy one year of higher education than it will to buy a new house? – 2058 (or so?)
• Clearly out of control
• Need to find new ways of providing education that utilizes the new technologies available to us now
• With libraries going into digital form, could be worth considering having a universal digital library
• GPO – roots go back to 1813
• Creation of the Constitution very contentious
• Eventually people just got worn down and signed it
• Soon began to second-guess, began to remove power from the federal government – Bill of Rights, more…
• Among these changes, it was decreed that all information produced by the gov’t will be in the public domain, and GPO was created
o Gov’t has proactive duty to make sure its info is widely available throughout US
o Led to creation of federal depository library system
• Unique in terms of scope and what it contains
• 1250 institutions, 53 get one copy of every document; story of America available throughout the US
• Worked very well till 1993, when Congress ordered the GPO to put gov’t info on the internet
o 92% of all Gov’t info is on the web; the other 8%, there’s diverse reasons why not (maps, other weird materials)
• Is thrilled about Google project – seems to think the gov’t and libraries think too much; Google’s just going in, acting, making mistakes, and fixing things as they go along
• Concerns
o Need to assure you that gov’t docs are authentic – that the document you see is the one the author originally wrote
• Watermarking – has to look valid 200 years into the future
o Perpetuity – for these purposes, the time that the U.S. will continue to exist
• 200 years? 500? Maybe 1000?
• What companies that existed 200 yrs ago still exist today? 100?
• The U.S. will last longer than any company
• We won’t trust any company to be the authentic provider of the government’s information
• Paper’s not going away, but much gov’t info is now born digital
• Federal Register – had 35,000 paid subscribers a few years ago, now less than 2,000 (for paper) – now, millions of electronic readers every day
• 9/11 commission report
o GPO was going to provide it for $60, limited availability; commercial publishers were offering to sell for $10 everywhere
o All-time GPO bestseller, but private sector had better ideas for it
o Need to partner with private sector to get these things done, but keep it mostly in the hands of the gov’t (public sector)
Brian Kahin
• “Toward a Public Information Infrastructure” – prior to birth of internet
• Hal Varian made a lot of his points, so he restructured his presentation a little – more on the “fuzzier” issues
• Fair use doctrine – four factors – interpretations of those factors by the courts
• Courts don’t tend to think much about transaction costs
• There is also a debate in the scholarly community about the extent to which fair use analysis maybe should or does depend on transaction costs
• Copyright used to be opt-in – if you didn’t claim your rights, you lost them entirely
• “equitable rule of reason” – Stewart v. Abend, 495 US 207 (1990)
• What’s evolved has been free information, advertiser-supported information, government information – removes transaction costs
• Google’s taken the ad model and evolved it in a sophisticated way
• Move away from the value chain model (as in motion picture industry) – towards more of a value-added model – the more people make software for your platform, the more valuable your platform is
o Soft relationships, no discernable barriers from the user perspective
• Google’s success is in marrying content and electronic commerce – something that neither libraries nor publishers have been able to do
• Open Content Alliance – looks like a big playpen, no rules, lots of heterogeneity
o Google, by contrast, is very centrally planned, comprehensive, can see the vision
o Critical mass problem
• People don’t search by publisher (heck, they hardly search by author)
• Fear that Google is the next Microsoft; that it will benefit from similar network effects; that it will become much more threatening to publishers
o Yet, Microsoft’s business model is based on retail sales of licensed software
o Google’s model is based on low transaction costs, service model, open system {open to what extent though, and in what ways?}
• The web was successful because it could be implemented for free, in a distributed fashion…very low transaction costs – built on internet, which in turn was built on deregulated telecommunications industry
• Google has a strong image for pursuing its public policy aims
o No ads on snippets; decouples the money stream from the public service orientation
• Perhaps deepest fear of publishers is that Google has mastered the “attention economy” which is essentially what their business is built on
James Hilton
• “The emergence of the ‘pure property’ view of the world of ideas and expression undermines the soul of the academy and…” [changed slide too fast]
• Too many IP fences being thrown up around ever-diminishing slices of property
• Scope of copyright and patent have both increased – for copyright, longer terms and wider breadth; for patent, mostly wider breadth
• “The Proud Family” – Disney cartoon
o Episode where Penny decides she needs to earn some money, goes to work in CD store, but buys so many CDs that she ends up owing the place money
o Gets approached by a dude who introduces her to filesharing
o Upgrades computer, internet connection
o But CD store goes out of business, mall goes out of business, tax revenue drops off, public services are discontinued
o Cops show up, Penny stops filesharing, and the world is restored
o {Is he serious? Did this actually happen on a cartoon?}
• Kids “rationalize the crap out of” file sharing
• Patent
o Some silly ones: Amazon’s patent on one-click ordering;’s patent on the business practice of allowing consumers to name a price and allowing vendors to agree or disagree
• People really have come to think IP means protecting ideas – the silly patents play into that
• IBM – generates business method patents, which they largely don’t exploit, so that when they infringe a patent they can turn around and say, well, you’ve probably infringed one of ours
• Conversations w/ Publishers often go down this track:
o Indexing provided by Google may be a transformative/fair use, but only if you ask us for permission
o JH’s response – that’s not the definition of fair use; if it’s fair, you don’t have to ask
o Under licensing, that might be the case; under constitutional copyright law, it’s not, but publishers increasingly see it that way
• It’s about that 70% — the stuff not in the public domain, not in print – publishers want a piece of the economic action
o But it’s a revenue stream that nobody ever saw coming; no author could have anticipated this as an incentive to create
o Bookstores with coffee shops analogy – people go read the book in the coffee shop, then don’t buy it – should publishers be entitled to part of the coffee revenue?
• Jefferson quote on the inexclusivity of ideas – lighting tapers, etc.
• In the academy…
o Property think is creeping in
o MBA students are making their profs sign non-disclosure agreements so that their profs won’t take their ideas and process-patent them
o Who “owns” class notes?
o Who “owns” the right to publish?
• Research collaborations, differing conclusions – who gets to publish?
o Who owns collaboration?
• Faculty are worried about students stealing their IP
• Faculty may be more likely to contract out the work they need done so that their students won’t claim rights to it
• Options for future:
o Protest!
o Creative Commons
• Great system; wouldn’t be needed if copyright law still worked
o Participate in open software creation
• Institutional effort behind open IP
o Examine new forms of scholarship/publishing
o “Don’t let tech transfer be the tail that wags the university dog”
o Google Library Project – Yay for digitization!
• What’s great about it is that it goes for that 70%
• From a policy standpoint, initiating a conversation about fair use, the public good – a positive thing in itself
• Hates Intellectual Property as a term
o The current conception is like oil wells – not replenishable
o IP is really more like wheat fields – can re-sow, replenish, nourish
• Rick Prelinger: clarification – vagueness that Brian mentioned isn’t really the case; likes to think of OCA not in comparison to Google, but more in the more general area of open content, which predates GBS; OCA is planned to be harmonious with GBS; about librarians thinking about how to build a body of material available for sharing; content that individuals can build value-added services on
• Mike Keller: 2 court cases – Perfect 10 (fair use rejected) & Blakefield (fair use upheld)
o BK: not familiar with second case; Perfect 10:
• 2 distinctions – Google was picking up unauthorized images from Perfect 10 at 3rd-party sites; thumbnail issue was treated in same way as ArribaSoft decision, but was looked on more negatively because Google had ads next to thumbnails; another issue is Perfect 10’s sales model for thumbnails
• U. Utah Press – dichotomy of publishers vs. Google, publishers vs. mass digitization may be somewhat false – many support the project. Still, there’s a keen awareness that Google’s a corporation with its own interests in mind, and those interests aren’t always transparent.
• GPO’s commitment to being the provider of Gov info – does that extend to cataloging, etc?
o BJ: Yes. Working hardest on authentication. Also, there are many sources that point to GPO’s content (LoC, etc). Essentially the repository for gov docs. Want to add to the collection, going back in time. Have found funding and individuals to do these things.
• National Weather Service – Santorum’s proposal to make that information less available so it wouldn’t compete with commercial services; cessation of provision of taxpayer-funded research information a few years ago…where are things like this headed?
o BJ: GPO forwards everything; perspective on gov’t competition with private industry changes from time to time depending on the administration; some industries rise up to meet gov’t inefficiencies, but he believes gov’t will grow more efficient
• Trustworthiness of GPO docs – issue of scrubbing, disappearance of documents from GPO website – why should we trust you?
o BJ: fair question – there are some legitimate reasons to withdraw documents, but to the extent that it’s capricious, he’s taken some steps to make that sort of thing happen less.

Closing Remarks: Clifford Lynch
• Nowadays, is an adjunct prof at the School of Information, now renamed, at Berkeley
• During the building of Melvyl (UC online catalog), had a consultant in, who pointed out that they’d not asked the question “what happens if we succeed?” – guiding point for talk
• Digitizing the public domain
o Easy to say, well, there’s so much that’s not in the public domain! But there is so much that is, and it’s legally uncontroversial to digitize it.
o “Large-scale” rather than “mass” digitization – thinks “mass” invites thinking about digitization in a rather uncritical way
• Push towards maximizing scope
o Books are important as a societal totem, but the public domain contains so much more – film, artworks, special collections – all of this is important, not just the books
o All the policy issues show up for non-book materials too, and in many cases they’re much more intractable (e.g. rights clearance for amateur photo collections)
• Importance of social feedback
o Will never get everything exactly perfect
o Scope is huge – it’s all of our culture
• Meaning of public domain
o Among other things, can make copies without anyone suing you
o But having a copy of something in the public domain does not place an imperative on the owner to make it freely accessible – e.g. museums control use/photography of their particular collections
o Google Book Search model – have to download pages one at a time; tedious
o Some are allowing downloading of entire work
o What about the collection level? What about people who want to build their own personal collections, or rehost public domain materials? Why would people want to?
o Libraries don’t ask their patrons why they want things; they just help people find what they’re looking for
o Raising capital for digitization of public domain items; trading short-term exclusivity, etc. to build public-private partnerships to finance the creation of these initiatives
• Problem is less legal than public policy
o We have some laws on the books that reflect really, really ill-formed public policy
o Wishes someone would write an intensive book on how the CTEA happened – who twisted whose arms, how the various arms of the content industry worked in pushing that through; seems naively to him that the extension made a much bigger difference to the film industry than to the traditional publishing industry – would be interesting to know who argued for it vs. who actually won
o Subsidiary legal and policy issues – orphan works – more than just books
• Orphan works are an even bigger problem for many non-book areas
o Need a more rational framework for term of copyright – would make many problems more tractable
• Transaction costs would lessen
• The farther back you reach into the past, the worse the transaction costs are likely to get
• Interesting to analyze the comparison between transaction costs and age of the materials
o Need to think more about stewardship, sustainability regarding mass digitization projects
• Dark archives
• Limits on ability to replicate material in case of disaster
• Google reminds us that we’ve moved beyond individuals interacting with individual texts
o Complex algorithms for searching the internet
o Will probably make equally complex mathematical analyses of books, once they’re scanned in
o Indexing – will it become specially privileged under copyright; socially privileged?
o Loads of technologies and loads of investment in data mining technologies
o Where are bodies of texts (to mine) going to come from? Libraries? Google’s digital archive? Will others be able to do similar things?
o One of the great wildcards in the role of libraries in digitization going forward
o 2 stages to big computations – acquisition stage (e.g. web crawling), then ranking or correlation, or whatever other kind of mining you want to do
• acquisition stage is hugely expensive; hard for individuals who just have ideas to do it
o Licensing makes the picture even more intractable – imagine trying to cut licenses with every purveyor of copyrighted content in order to let you mine it
• The items already in library collections = 2 kinds of things: works of scholarship & “evidence,” raw materials that support scholarship but are not specifically scholarly texts
o Need to be careful about conflating the markets for scholarly works & raw materials
o There are gray areas – multiple-use materials – esp. in engineering, applied sciences
o Already huge progress in retrospective digitization of scholarly journals
o Moving wall approach – some scholarly journals open access after a few years
• The academy has a lot of ability to shape how scholarly communication looks, in shaping access policies, etc.; less true with broader commercial/consumer marketplaces
• Unintended consequences will be substantial, more striking than we realize today
• [missed the question] – digitization helps with finding linkages; with the explosion in the quantity of information available, people are finding it ever more difficult to put the pieces together
• And then, my battery died.


One thought on “Symposium Notes, somewhat delayed

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s