Friday, October 17, 2008
Data Curation: Issues and Challenges (8:30-9:30 AM; Reactor Panel 9:30-10:15)
James Mullins (convener/moderator)
Liz Lyon (panelist)
Fran Berman (panelist)
Sayeed Choudhury (reactor)
Pam Bjornson (reactor)
Intro (JM): [Missed due to computer issues.]
- How can libraries & librarians engage with science (science conceived broadly, as research)
- Will give examples from the UK
- Will not suggest robots to deal with the data, like in Nature
- DCC SCARP Project
- Immersive case studies
- Neuro-imaging case study – MRI images & demographic data
- 10 years worth of this data
- Multi-centered project, with different scanners and data standards – shared terminology and ontologies needed
- DCC Curation Lifecycle model
- Weekly meetings on neuro project are really important to cross-disciplinary interaction (“Heedful interaction”)
- Thinking of data-sharing as a form of trade – “give to get”
- Ethics & confidentiality issues – removing ears & face from brain scans
- Funding from JISC to work with chemists at Southampton
- Cloud of repositories containing crystallographic data
- DCC examined how the chemists were curating their data; software platform, metadata structure, etc.
- Standards – DCC recommended particular management software & metadata application
- Scaling up – DCC made a number of recs on this too
- DCC SCARP Project
- Practice challenges – all things librarians are used to dealing with for literature already – need to transition to data
- Understanding risks, awareness
- Community consensus, advocacy
- …about 8 more
- Librarians need to get embedded, involved in research process – rapid integration
- Shared Research Data Service Feasibility Study
- Objectives – assess UK’s data service needs, make recs for strategy
- National infrastructure for data curation
- What’s already there
- What’s happening elsewhere
- Other possible models
- 3 options: do nothing, have a centralized approach, or mixed model (going with the third – first two were “thrown out”)
- University of Oxford Case Study – interviews with researchers
- Data Audit Framework – JISC has taken that on (out of Dealing With Data recs), funded DCC to develop it (LL provided flyers about it)
- Launched Oct 1, 2008
- Self-audit tool, but could be facilitated
- Schema for describing data assets in detail
- Pilot at Edinburgh school of Geosciences
- Found that
- Needs lots of planning
- Sometimes datasets are vague, ambiguous
- Needs support from senior mgt
- Preliminary but positive
- Req for institution-wide data policy & Guidelines
- Req for researcher training
- Req for auditor training
- Found that
- Transition or Transform? Librarians…
- Lots of opps for action – leadership by senior managers, faculty audit coordination, advocacy, awareness-raising workshops, data doc best practices, etc.
- People and Skills
- JISC funded a piece of work by Key Perspectives – Swan 2008
- Only 5 data librarians in the UK?? – all accidental
- Need more data librarians
- Bringing diverse communities together – data center managers, IR managers, librarians, funders, and policy makers
- Up next at DCC: Roles and responsibilities for effective data management (Nov 2008)
- DCC Digital Curation 101
- 1-week of “summer” school in October
- Focused on data/curation lifecycle
- Targeting bench scientists and also info professionals
- Hope to do it again
- Shifting goals
- Open Science seems like the future
- Friendfeed, GalaxyZoo, Second Life
- Sharing of workflow processes, being blogged as it happened; putting results data right up on the web
- Transition or Transform? The Role…
- Room for a different sort of person to come into library team – from librarianship to informatics
- Issues from the perspectives of some of the folks on the research side
- Research Today:
- Researchers are detectives, trying to assemble the most interesting & useful tools they can
- PSID: Interested (in soc.sci.) in understanding the nature of social organizations, behavior – longitudinal study of families over 40 years
- Earthquake simulations – SCEC – long-term coll of seismic data
- Protein Data Bank – worldwide reference coll of protein structure info
- What is needed?
- In the 80s, figured it was either your local computer, or some big computer (HPC)
- In the 90s, data becomes part of the question – more or less data, more or less processing power (2×2 square)
- 2000s, Network dimensions – now a 3D graph – could also need more or less connectivity
- For many, data is a means to an end
- Use of data is v. important
- Critical services – visualization, portal creation, collection publication, analysis, mining, hosting, preservation services, etc.
- Research & Data: “Data Driven Cosmology” – simulating the first bn years of the universe after the big bang
- Composing simulation outputs from diff timeframes builds up light cone volume – more comprehensive picture of formation of early galaxies, other events of cosmic evolution
- ENZO – simulator of cosmic evolution after big bang
- Runs on supercomputer, spits out petabytes of data
- Calculates the growth of cosmic structure – stars to galazies to clusters… with dark matter, self-gravity, etc
- Uses adaptive mesh refinement (AMR) to provide high spatial resolution in 3D – puts a big grid on the sky, and then goes down 7 different layers of refinement – non-uniform grid, 65,000+ units cubed
- Sci wants to map this in more detail, and add radiation to the model
- Interdisciplinary team – computer scientists
- Spectrum of Preservation Profiles
- Short term to long term
- Small-scale to large-scale
- Well tended to poorly tended
- More restrictive policy & regulation to less restrictive
- Has a data mgt & sustainability plan, or not
- Synergy between Researcher Needs & Library Strengths
- Research comm. is focused on
- Targeted solutions (customization)
- Researchers need help w/ things librarians are good at – all things that researchers are not incentivized to do
- Dev reliable mgt, pres, & use environments
- Proper curation & annotation
- Navigating policy, regulation, intellectual property
- Research comm. is focused on
- Long-Lived Research Data
- Historical Data
- 2008 Cyber-election
- fundraising via website
- YouTube videos
- Blogs as vehicles for issue discussions
- Online organizing
- Needs to be preserved!
- 2008 Cyber-election
- Life Sciences Data
- PDB – worldwide repository for 3-D protein structures
- $80+ bn investment in research resulting in protein structures
- Cultural Data
- Historical photographs – generated in both research communities and private homes – so much now born digital
- Historical Data
- Preserving Long-lived Research Data
- What should we save? (value, policy, regulation)
- Who should pay for it? (economics)
- How much data is there? – enough that saving everything isn’t an option (at least by 2007)
- By 2023, the amt of digital data will exceed Avogadro’s number
- What do we want to save?
- Key word – not “save,” but “we”
- Who are “we?”
- Official & historically valuable data – census info, presidential emails, Shoah collection, etc.)
- We=Research Community
- PDB, National Virtual Observatory, etc.
- My medical record, Quicken data, digital family photos, etc.
- What do we have to save?
- HIPAA, Sarbanes-Oxley, OMB
- Research sector is often unaware of the regulations that apply to us
- Economics: Preserving research data incurs real costs
- Even with decline in cost of storage, costs still rise – people costs don’t go down, nor energy costs, networking, space, etc.
- UCSD perspective
- Relationship between SDSC & UCSD libraries
- Together, working on some kind of preservation grid for LoC – disaster preparedness
- Blue-Ribbon task force on sustainable digital preservation & acces
- Cost framework – what are the key categories of digital preservation?
- Set of economic models that provide alternatives, work in diff kinds of regimes
- Pros & cons, costs, tradeoffs of each
- Real world conditions where best suited
- Actionable recommendations – things to use as models
- Recognition of the benefits of preservation
- Appropriate incentives
- …missed 1 or 2
- Organization & Governance
- BRTF deliverables
- Yr 1: Positivist: What is?
- Yr 2: Normative: What should be?
- Go beyond tendency to just call for more research & resources for data preservation
- Blue-Ribbon task force on sustainable digital preservation & acces
- Other deliverables: community outreach
- When the public understands something, and gets excited about it (like with climate change), it gets easier to make things happen, work for change
- Most important lesson – “Don’t just copy what we’re doing” – first look at your institutional environment, and assess what is best there.
- Data Flow (Levels of Data)
- Level zero – generated directly by telescope – binary
- Processing @ Fermilab
- Calibrated, refined, checked, verified
- Level 3 data – Loaded into SQL database, released to public
- Data releases are put on a website – Sloan Digital Sky Survey
- High usage at high schools
- Data releases are put on a website – Sloan Digital Sky Survey
- The public doesn’t deal with Level zero data – but at higher levels, you start to interact with the public in very direct ways
- That’s the point at which scientists tend to start asking for help
- Key considerations
- Work within existing scientific/research systems – need to embed data management into existing workflows, and not try to force some totally new process onto researchers
- Consider gateways for those systems as part of infrastructure development – no one right gateway or metadata standard
- Focus on both human and tech components of infrastructure – human part tends to hide in plain sight
- Human interoperability is more difficult than technical interoperability – [important point for SC] no formula for having healthy interpersonal relationships; some heuristics, but nothing universal
- Questions (1):
- How do we transfer principles into new practices, especially given scale & complexity?
- What are the fundamental differences between data and collections? Human readable vs. machine readable?
- What about the “cloud” or the “crowd”?
- Can Flickr help us with data curation?
- Questions (2):
- How does a partnership audit data (and associated services) distributed across the network?
- Are audits about “completeness” or perhaps about transparency and reliability?
- Where are the existing data curators? Maybe we shouldn’t use the terms data librarian or data scientist or humanist. (car as “e-horse” – maybe we need a totally different term)
- Questions (3):
- What are the requirements? Are there common requirements, which may be most appropriate area for libraries?
- Are there unifying concepts or themes? “One scientist’s noise is another’s signal”
- Question: Theme arising – inflow, data flow, workflow – questions arise related to skills, long term commitment
- Question: On the cloud – if we use web scale tools, how much work can be done in a federated kind of way? Q’s of coherence, linkages…are there models?
- Question: Data curation is domain specific and labor intensive – prospects for change on either count?
- LL: Can’t stay so labor intensive; need to work on tools for automated metadata generation
- FB: There’s a lot of complementarity between research & commercial spheres of use – but the link is tenuous. Researchers build a lot of specialized tools, sometimes with knowledge of others in existence, sometimes not. These aren’t at the level of the tools out in the private sector. Private sector is good at “ilities” – scalability, …. Not so much the custom, but the general.
Q: How were 25 “top” data sets selected in the Edinburgh geology project? Did this exercise facilitate interactions with faculty? Are there valuable linkages that were made?
- LL: Work was done at Edinburgh, so not hers… sets chosen in dialogue with the geologists. Doing more audits with the departments so they can get a fuller picture.
Q: We hear a lot about the skills librarians need – what about the values? Open access, etc. – not just something they’re responsible for, but something in line with particular values – is there value in librarians talking about these things in terms of values, or would scientists just not care? Is there a need or an opportunity for librarians to educate in-team data curators about values, and not just practices?
- SC: Definitely yes. Often the values just aren’t on scientists’ radar screens. At Johns Hopkins, they had discussions with scientists about this, and the scientists were receptive, though they did wonder about the time costs. But yes, letting scientists know that there’s more you can do than just access is important.
- FB: Realistically, schedules are really full, professionally and personally. What’s of value to researchers is when you can “bring it home” to what they’re working on. Giving them information in terms they can immediately understand (immediate resonance) is important. Many researchers don’t have a good sense about IP, copyright, various regulations, and that’s good to know; once relationships are built, this could be something for librarians to inform scientists about.
Q: Seems like one great role for librarians is to help researchers find information they just weren’t aware was out there. Thinks about searching for data, and thinks one challenge is that we think of data in terms of sets – metadata is produced for the whole set, and yet, data exists as points within the set as well, and that’s not always visible in current data search systems.
- JM: Had an issue with this at Purdue – interdisciplinary group got together, realized they had no shared vocabulary. Recognized that this could be a role for librarians, and the librarians set up a set of websites that helped to bridge the gap – not providing access to data, but providing them with the knowledge/vocab to go and find the data themselves.
Q: Do the scientific professional societies discuss this too? Or just information scientists? Maybe we could go talk at such societies’ conferences?
- LL: In crystallography, felt it was vital that the professional societies were involved; was quite influential in convincing the scientists to participate (Royal Society of Chemistry) – gets at the trigger points for the scientists. Librarians can act as facilitators. DCC forum was another place where they got all sorts of people together in one room (data mgrs & scientists, etc), and people thought that was very helpful.
- SC: JH is working with the American Astronomical Society – needed their backing to make stuff happen. Those prof. societies can act as mediators too – one of the partners in this overall picture.
- PB: Another avenue = international societies like CODATA; various levels of activity in diff countries, but great to have one place where librarians can interact and are seen as an essential part of discussions of these things
Q (Brian Schottlaender, UCSD University Librarian): “Curation” – Maybe we’re making ourselves unnecessarily anxious about vocabulary. If you think of curation as a series of mgt activities, you start to just think about whether you have adequate staff to deal with those activities – maybe don’t need one person to handle “cradle to grave.”
- JM: Concluding remarks – maybe younger people will be better at handling this stuff [seems like more on the “digital natives” meme] – less translation required
Supporting Virtual Organizations (11:00-11:45 AM; Reactor Panel 11:45-12:30)
Wendy Lougee (Convener/Moderator)
Tom Finholt (Panelist)
Mark Lundstrom (Panelist)
Medha Devare (Reactor)
- Looking backward and then forward
- Changing nature of geographically distributed collaboration
- Changes have a history (practices & tech evolve)
- Changes can be described in terms of scale and other factors
- Much of what’s before, we can think of as “convenient antecedents”
- Collaboratories, video conferencing
- No traditional referent for things like crowdsourcing
- Exciting for research because it means there’s lots of mysteries, but can also be baffling
- Scale of past work = 10s to 100s of participants; theoretical orientation = social psych; tech paradigm = CSCW; Characteristic research Q = “being there”
- Future: 1000s to millions; sociological & economic; social computing; “beyond being there”
- Lessons – book edited by Gary Olson, Ann Zimmerman, et al, on Collaboratory research at U of M; org through Collaboratory for Research on Electronic Work (CREW)
- Domain Scientists – tend to be hierarchical/seniority-biased, individualistic, adversarial/competitive, skeptical of new tech/v. risk-averse
- CI Developers – tend to be egalitarian/talent-biased, project-modeled, adversarial/competitive, open to new tech/v. risk-seeking
- These two communities are very different
- Plan for first contact – it’s often difficult
- Communication – can easily become a tower of Babel over differing interpretations of terms like “requirements”
- Common ground is important to find
- New Challenges
- Past experience hasn’t necessarily prepared us
- Beyond Being There: A Research Program for Virtual Organizations
- NSF wkshps (2) – 2007-08
- Computational science perspective emphasizing middleware – trust, security, resource discovery
- Organizational science perspective emphasizing distributed teams, collaboration, etc
- Sometimes talked past each other, but interesting things emerged
- First efforts often = awkward hybrids – need to be tolerant and patient, since it’s hard to know where future greatness might lie
- Virtual Radical Collocation
- Worked from studies of collocated software engineers; trying to reproduce the advantages without needing the collocation
- Opti-portal, for example
- Create advantages of physical proximity at a distance
- Add new capabilities – multi-megapixel visualization
- So, has both the benefits of collocation and those of dispersion
- Idea that we might not set out to explicitly organize an activity, but we put out a call, and others come forward and solve it
- Examples: Innocentive, “Games with a purpose” (e.g. Re-Captcha)
- Don’t know who is going to do the work
- Effort voluntarily contributed
- Mechanism/incentive design becomes very important
- Delegation of Organizational Work
- Negotiating/monitoring trust relationships, access permissions, etc.
- E.g. MEDICUS project – federated access to medical images – requires a lot of middleware to manage authentication, privacy, etc.
- Rating systems – way of figuring out authority in a delegated fashion – borrowed from Slashdot et al by NanoHUB et al
- Unique aspects: need to figure out how to support/supplant social ties; organizing w/o the work of organizing; q’s of who to trust, permissions, etc. are managed by middleware
- Group work is inevitable – now with an extra degree of difficulty added by virtual organizing – may never meet our co-workers
- Broader continuum of activities covered by geographically-distributed work
- Emerging modes of contribution/participation are not as amenable to intentional tech choice or org design
- Need more research on incentives and delegation mechanisms
- What is nanoscience/technology as a community
- What is nanoHUB
- Challenges & Opportunities
- Network for Computational Nanotechnology
- According to NSF, “an infrastructure & research network”
- Connect simulation-builders w/researchers
- Bridge divides
- 1 nanometer = half the diameter of a strand of DNA – and we can see stuff that small now, and work with it
- Silicon microelectronics make lots of stuff possible – and they’re based on nanoscience – or at least, making them smaller is – putting lots of transistors on a chip
- CSE – 3 reasons to simulate: to explore uncharted territory; resolve well-posed questions; make good design choices – most of computational nanoscience focuses on the first, because the territory’s relatively undefined
- 2 kinds of results: answers & understanding, and software
- 2 kinds of engineers: builders and analysts
- nanoHUB – online simulation to connect simulation tool developers and users
- Want to make it easy to assemble workflows for existing pieces of code, to make it easy to share them, for others to build on/improve them
- Also want to make it easy for people to come find simulations and use them seamlessly through the web browser, with the option to download the source code – once it ran seamlessly through the browser, though, people generally stopped downloading the code
- Provides not only the tool, but reviews of the tool, citations to papers, user guides and tutorials, directions for citing the tool
- Focused on people who aren’t necessarily focused on computation
- Simple interfaces for complex tools, so that they’re easy to use in the classroom
- Began to add tutorials & seminars in voice-over powerpoint (through Breeze, then new Adobe product…)
- For learning new techniques, etc.
- Podcasts also
- YouTube also
- Complete short courses on topics for which there aren’t textbooks yet; or no courses exist yet
- Capabilities for online meetings
- Example: Supriyo Datta
- All of his materials, short courses, etc. are placed on the nanoHUB
- Very specialized, high-level stuff, but gets like 15,000 hits(/year?)
- Biggest things: online seminars & lectures (796), Simulation tools (120+), one other…
- >80,000 users per year, growing sharply – expects more than 100,000 by next year
- Just under 50% of users are in the U.S.
- o V. generic underlying technology – has led to discussion of spinning the underlying structure off into a generalized tool for hub development in other areas (HUBzero)
- Promote diffusion of knowledge
- Give researchers impact
- Provide a new avenue for “publication”
- Facilitate the use of simulation in the classroom
- Promote collaboration across disciplines
- Issues & Challenges:
- Finding content in a growing collection
- Permanence of content
- Quality of content (anyone can upload, so there’s a range of quality – should there be monitoring?)
- Handling data (in addition to sim tools and ed/training resources)
- Intellectual Property issues
- Financial model for sustainability
- People issues
- To learn more: www.HUBzero.org
- Idea of the embedded librarian – really interesting. When she first took her position at Cornell, she had been a postdoc in agriculture; felt like she was then a scientist embedded in the library. Need to keep the lines of communication open – was the primary idea in hiring an expert like herself, and others in Mann library.
- Virtual Organizations – lots of definitions – key char:
- Boundary crossing
- Pooling of competencies
- Participants or activities geographically separated
- Fluid – changing participants, activities
- …2 more
- Library contributions
- Tech choices, tools – librarians/IT staff tend to be quite up-to-date on tools/tech
- Tech support, guidance – providing human aspect of tech implementation – has proven effective/helpful
- Subject expertise
- Understanding of research landscape
- Trust – librarians can be trusted arbiters of information
- Vision – user needs of the future? Seems like librarians are probably well-positioned to think about this
- VIVO – University-wide research and expertise discovery tool
- Most of the info in there already exists in different silos
- Event info, HR info, news service, grants, publications, courses, faculty updates/edits
- Helps reduce the fragmentation of university data sources
- Info brought in through automated feeds
- Will soon be able to allow individuals to self-edit, update their own info
- Created to support two interdisciplinary research initiatives, entirely within the library
- Now extending beyond life sciences into the entire U, with administration’s support
- VIVO does faceted search – same search pulls up people in diverse departments interested in similar things from diff perspectives
- Shows faculty affiliations, publications, research focus, co-investigators
- Trying to figure out ways of scaling this up, making it an interuniversity effort
- Feels that demand for this sort of service will only increase
- VIVO – University-wide research and expertise discovery tool
- DataStar – Supports data-sharing among researchers
- Librarian provides guidance to users/scientists on data formatting, technologies/tools to use
- Reinventing the library?
- Rick Luce – librarians as middleware
- Would like to be able to see VIVO profiles in DataStar, and bring up DataStar data in VIVO
- At Purdue, librarians have been brought on as co-PIs, etc.
- Redefining ourselves as librarians
- There are some references for that redefinition, but it’s not all there
- What is “that” – what is the “that” that we deal with – uses example of nanoHUB tools, simulations, etc.
- Community aspect – how is participation perceived in this virtual organization? Incentives for both researchers and librarians would seem to be important.
- TF: Traditional way to gain entry = named as co-PIs on CI projects – formal, top-down; over time, became more bottom-up; people want to be where excitement and utility are – nanoHUB works because it’s useful, serves a need, does cool stuff, but probably wouldn’t have been explicitly planned to be what it eventually did become. Now it’s the network externality that defines your popularity/impact – have to have mechanisms that afford that kind of engagement – SB: “net cred”?
- ML: If you have to work really hard to incentivize the faculty, you’re probably not doing the right thing. There’s now a few examples of senior faculty using the service (nH) that they can point to, in order to gain more buy-in. Open sourcing can do excellent things for impact – might not know who developed things, but those things can have tremendous impact on a whole field (ex: SPICE). Seems like it’ll have easier uptake in the younger generation.
- SB: Also thinks of little kids now doing gaming, in terms of the younger gen
Q (Fran Berman): Seems like striking examples of VOs have some commercial economic models behind them to sustain them – what might be the economic sustaining factors for projects like nanoHUB?
- ML: Not sure – getting some interest from textbook publishers, for partnerships… Or maybe premium services… But want to be careful about making barriers that limit the number of users. Might be able to gain some leverage if it becomes part of the expected institutional IT infrastructure.
- TF: “Open Source Ecologies” – for-profit actors pump money in to build the core code base, upon which they then build in order to sell the value-added; Another example might be JSTOR – tiered subscription model. Locking down IP probably not the way; however, some really want that. Another example: the Internet emerged from initial defense funding, but ended up taking on a life of its own, became self-supporting.
Q (MD): To what extent do you think libraries should be proactive…or reactive? How do we negotiate that?
- TF: People in the library are more in touch with the cutting edge of the tech than the faculty are – faculty look to them for support. With every entering cohort of students, there’s a whole new set of tech you need to deal with. Learning technologies will be a focus. Librarians have a particularly unique perspective on openness and access that’s not necessarily “in the gene code” of programmers/researchers.
Q (W. Michener): What have been some surprises in terms of VOs that have worked – how people spend their days, etc.?
- TF: VOs tend to consist of people who are invisible from the POV of the larger organization. May not have titles like Dean, administrator, etc. – may not even be on the org chart. Need to accord those people the respect they deserve; advancement opportunities, etc.
ML: Can be too much to keep up with…lots of demand! nH folks are struggling with that.
Lessons & New Roles: The Experience of Health Sciences Libraries (1:15-2:00 pm)
Neil Rambo (Convener/moderator)
Linda Watson (panelist)
Betsy L. Humphreys (panelist)
- Health Sciences Libraries seem to feel like they’ve got it figured out; other libraries say, “yeah, but that’s just Health Sciences – it doesn’t apply to us”
- However, there is stuff to learn
- NLM – has a major impact, esp in the context of digital data management and standards development (rel. to PubMed)
- Science Libraries split onto two campuses, but with good working relationship
- Are we creating an obstacle for users by segregating services geographically?
- Libraries aren’t just separate – their webspaces are
- How can we build more inter-institutional collaboration & integration?
- Interdisciplinary Challenge (biomed research, but also sci more generally)
- As partnerships form, barriers surface; possible that they might kill the partnerships
- Sharing is important and difficult – technologically and non-technologically
- What can we learn from Academic Health Centers/Health Libraries
- 3 core domains: patient care/consumer health, professional education, interdisciplinary research – lead to
- 3 outcomes – evidence-based care, ….missed.
- AAHSL Demographics
- 60% Health Sciences, 24% Univ Library, 12% University, 4% Other
- Only 29% in standalone facilities – 71% are integrated into health care facilities [this would seem to be an argument in favor of UW’s distributed library system…]
- Not about collection size – more about skills & services
- Average FTE of about 35 within HSLs
- Advantages of HSLs:
- All about science
- NLM infrastructure to draw on
- Information not artifact (mostly)
- Liaison/Teacher, not bibliographer
- Access not preservation (mostly) (There is a sense that NLM will handle this)
- “Ithaka Alarm” doesn’t sound as loud in HSLs
- CTSA – Clinical and Translational Science Awards
- NIH goal to develop an infrastructure to support effective & efficient translation of scientific discoveries into medical practice – funding initiative
- Librarians are less involved in this than they have been in past similar initiatives (e.g. IAIMS)
- Libraries in CTSA: http://ctsa-lib.blogspot.com
- Roles for Health Sci Librarians
- Expert Searching – systematic reviews, support of evidence based medicine
- Teaching – integrated into clinical teaching rounds & morning report, working with AMIA on information competencies for docs
- Informationist – facilitates integration into clinician or researcher or public health workflow
- Bioinformatics – Extent of engagement varies
- Is at the NLM, but has been thinking about what she would be doing if she were an academic health sciences librarian
- Worth considering:
- Whatever has worked somewhere else
- Blurring distinctions between data, scientific publications, and synthesized knowledge
- Research methods that span scientific disciplines (library faculty as potential solution to the problem of scientists’ lack of training in effective data management)
- Likely expansion of data-sharing mandates (NSF probably next)
- Seems like there will be more, and they’ll get more specific
- NIH is doing a 5-year review of the data-sharing policy
- Open Access Data
- Human Genome project was a pioneer – openness would validate the high monetary investment by governments
- Seems like even in a tough economic time for research, we’ll see not shrinking data sharing req’s, but an expansion.
- New Player: Board on Research Data and Information, National Research Council
- Things like incentive system for contribution into broad-based e-scienc
Education for New Roles (2:15-3:15 pm)
Betsy Wilson (convener/moderator)
Ron Larsen (panelist)
Catherine Blake (panelist)
Carole Palmer (panelist)
- Education for Cyberscholarship
- Abstract View
- Qualitatively different opportunities for new forms of research & scholarship – come from big science, long tail of small science
- Need a content infrastructure to support novel forms of research – content itself is becoming infrastructure
- Blend of interdisciplinary research and development, engaging scientists, social scientists, humanists
- Emerging forms of research
- Humans read one doc at a time – computers can read a lot more, and can uncover patterns
- Accelerating the exchange of ideas
- Emerging infrastructure
- Collections of digital content
- Web services
- Uneven progress
- Primary research data often discarded after publication
- When saved, rarely publicly accessible
- When published, frequently incompatible with e-science (doesn’t work with tools others have to use with it)
- Approaching a tipping point
- Digital content the norm in most disciplines
- Infrastructure lagging
- Peter Murray-Rust
- The linkages between the current literature contain a bunch of undiscovered science…
- …but there’s apathy
- The Goal
- Ensure that all publicly funded research products….ACCESS, STANDARDS
- Capture content
- Make it broadly accessible
- Enable innovative value-added services
- Curate & Preserve
- Resistance to Change
- Scale and complexity
- Open access to science and public scholarship
- Trying to address an entire system that recognizes that we’re suffering from a dearth of new ideas – invest some money in trying some crazy radical things
- Idea of roadmap = tie together role of NIH etc, institutions, etc
- Education for Cyberscholarship
- What does the science librarian/informatics professional need to know/be able to do?
- Seek the high ground
- Assure linkage to institutional mission
- Create new value-added services
- Serve on disciplinary research teams
- Measure, assess, revise…
- DCC lifecycle model
- Good frame for the way science is changing
- Creation – looks more like folks hunched over computer screens than folks hunched over microscopes (that too, though)
- Collection & Annotation
- Identification & Cataloging – crowdsourcing for things like photos; collectively identifying resources
- Storage & Preservation
- Barriers to access removed
- More sources of info
- NIH Mandated access
- No single point of access
- Diff levels of access required – HIPAA, Maintaining cultural norms
- Use and Reuse
- § Data and text mining
- Data-oriented roles
- Data Consultant – best practices on data organization & sharing
- Data Distributor
- Data Manager
- Data Services Provider – preprocessing so that data mining tools can be used
- Data and Text Analyst – applying visualization, etc.
- Embedded Roles (Data Scientist)
- Information organization, conceptual modeling, etc. – OAIS modeling/representations
- Conceptual vs. relational roles
- Good database design – enforcement of data quality; ongoing maintenance
- Overview of text mining (in fast-forward…no notes)
Carole L. Palmer
- Concrete things!
- Programs at GSLIS at UIUC
- Preparing e-science info specialists
- Three approaches:
- Biological information specialist
- Data curation concentration (within MLIS program)
- Summer institutes for practicing librarians and information service providers
- Landmark meetings – 1948, 1952, 1958 – interrelations of information systems, complexity of formats, prepublication document components, speed of circulation, interdisciplinarity
- Not solved, but hey, we’ve been talking about them for a long time!
- 1980s – predicted revolution of scholarly info processing
- 1991 – Online Journal of Current Clinical Trials – not functional till 2000, and then basically just a database
- Application and further dev of grounding theories:
- Value added (Taylor)
- Applications & services (Shera)
- Coordinate across sciences
- Metascience responsibilities
- Foundations in user communities
- Greatest role for research library in SMALL science
- Big Data projects tend to have it covered; don’t need us
- Small science projects do
- BIS curriculum – core courses in GSLIS, biology, CS
- Data curation concentration – both distance & on-campus – 4 core courses, and some electives
- Assignments: 20 case studies of curation problems; critiques of data mgt plans
- BIS Student profiles
- Most have bio degrees, some Masters/PhDs, a few w/ CS minor
- First grad in 2007
- 8 students in progress
- Many LIS students taking the BIS classes
- Losing some to LIS due to CS req’s and financial aid
- Data curation student profiles
- Most are distance students
- Many working full time elsewhere
- 3 grads so far
- 21 currently enrolled
- More internship opportunities than students
- Demand from practicing academic libraries
- Summer Institutes – 1st held in June
- 30 participants
- 26 from ARL institutions
- 10 presenters
- 6-person panel
- Topic areas included digital data, data integrity & authenticity, etc.
- Lots of partnerships with other institutions, gov’t and academic
Q (Medha Devare): Engagement – seems like for librarians to engage effectively with researchers, they need confidence, maybe built through some subject knowledge. What happens to library school grads when they go out to libraries, expected to liaise with various science departments, but feel like they can’t communicate? Is there a thought about having a general science track, where students can take a little chem., bio, environmental science courses?
- CP: In the biological areas, they have that – need grad-level bio in order to get the degree. Would certainly be possible for a student to take a similar model, but get the degree in LIS. Has been wondering about doing similar things for a general science specialist, but so far hasn’t made sense to do that. It’s hard to recruit students for this kind of program – disgruntled bio students with an interest in information…
- CB: Presented idea to colleagues, they said, oh, so you want students to have two degrees – but that’s not the point. Idea: bring in scientists to come in for an hour to discuss what it means to discover something in their field; what the process is from idea to publication; what tools are involved in that workflow…would come from lots of disciplines in the sciences.
- RL: Definitely a PR point for the iSchools – want to have these programs to demonstrate commitment to the area.
Q: Tech is necessary but not sufficient to deal with a lot of issues in science librarianship – a minority of library schools offer a science librarianship course even once per year… They’re just not doing it well. There’s gotten to be a debate about whether an MLS is actually necessary, or whether a science degree might be better for working in a Sci/Eng library.
- CP: Echoes the question – will BIS students be able to apply for new jobs?
Q: 2 aspects to librarianship – one is person-to-person. But another part is computers being able to process things much faster. In Library schools, are we teaching people how to build the rules of association across metadata so that we can use computers to do what they do best?
Summary Reactor Panel (3:30-4:15 pm)
Wendy Lougee (convener/moderator)
Carol Mandel (reactor)
Becky Lyon (reactor)
Neil Rambo (reactor)
- One thing the reactors could do would be to just emphasize points already covered, but they probably won’t. Going to give us some things to think about going home.
- Extraordinary day & a half
- Feels encouraged, excited about next steps
- No hand-wringing, just a lot of ways forward
- Way forward: 2 spheres
- Both on 3 dimensions
- …and also Inter-organizational
- If we just think about recent experience about reinvention we’ve been doing profession wide in past 5 years – digital librarianship is new, for example, didn’t used to be many of those. Now DLF is getting bigger and bigger all the time
- This is like the seedling phase of that, in a new space
- Can draw on lessons from those past experiences
- Digital Libraries: has d-lib journal, forums, etc.
- Need to push more broadly, stretch the concepts
- Thinking about profession-building in a broad sense as well as a specific sense
- The profession won’t spring like Athena from the head of Zeus, overnight, but it is building
- Success stories are so important – can build the profession institutionally, as Purdue has
- Our organizations need to support us in development opportunities; CNI/ARL have an ongoing role here
- Leads into relationship building – need to build relationships with scientists and learn about what they’re doing; need to do structured interviews with people, probably every year, because things change from year to year – “heedful interaction”
- Organizationally, there are things our orgs can do – importance of domain interaction, reaching out to domain research organizations
- Process had already started, reached a new phase today
- Not a part of the academic community
- “I’m from the government, and we’re here to help”
- New directions
- Have been in the mode of building databases, repository structure
- Moving into developing discovery tools
- Advanced linking so that scientists, librarians & others can make new discoveries
- Datasets – putting data up for people to use; e.g. GenBank, DBGAP, clinicaltrials.gov
- Natural language processing
- UMLS – brings together diff medical vocabularies
- Privileged to be in an organization that’s already brought together a multidisciplinary team to deal with these issues (NCBI)
- Has a high degree of confidence in the future, that the discovery tools NLM (et al?) is developing will help advance e-Science
- In BMHI, have Bench-to-Bedside; here it’s….[lost the analogy – lab to library?]
- Interactive publications – where underlying data is there in a form that researchers can manipulate in order to make new discoveries
- Working with optical society of America on doing something like this
- New development that will come eventually
- Institute at the Woods Hole Oceanographic Institution
- Brings together librarians & researchers & MDs – maybe need to do more of that, in science
- Resources, for sure
- What might ARL do?
- Consider integrating e-science as a strategic priority that cuts across the three existing strategic initiatives
- Policy related to it will affect all of us, and it’s something ARL should take the lead on
- IMLS, CNI, NLM are working on alliances
- No one size fits all solution (“when you’ve seen one research library, you’ve seen one research library”)
- Things that struck Neil:
- “We are the people that we’ve been waiting for” – taken from…(Obama?)
- 2 things mentioned repeatedly – economic support & sustainability
- But we do those things every day; we figure out if things will be shared services where we tax departments, fee-for-service for outsiders, endowments/grants, incorporating advertisements, donation model…
- A lot of things we need to do and will try to do won’t scale, but we need to do them anyway – just try some things on lower/smaller levels
- Nimble, agile, focused – need to be; but to support team science et al, we may need to look into the atomization of the research library – create teams of librarians or other info professionals who can engage with researchers in ways that might shift from one month to the next, according to research priorities
- Recruiting – hard to incentivize within our own institutions, especially for hourly staff, for whom it’d be a dead end
- o Moving toward a multi-professional workplace, better get used to it
- o Hybrid professional – what do they look like, what do they need to know – very few come from information schools/library schools – more are coming out of CS or other applied programs – we need to be on the lookout for them, find things to turn them loose on – create opportunities, pilot programs
- o Need to be creative in coming up with funding models
- o Echoes point about alliances with other organizations – how do we decide that we’re going to do X because we’re good at it (or whatnot), but not Y because we can get that from elsewhere – risky to rely on others
Becky Lyon: Library as space – not ready to declare library dead… Feels there’s still a place for the physical library in the university, and in e-science as well.
Q (Betsy Wilson): How do we take what we’ve learned or heard back to our institutions? How do we follow up on this?
WL: e-Science working group has talked about using ARL’s collaborative platform for discussion on this topic, but we’d need to structure an underlying goal for that discussion space
Liz Lyon: Could set up a Nature Network – then can connect with scientists as well as others
Barbara Dewey: Seems like e-Science should be a strategic direction not just for the library, but for the University (?) – hopes we reflect on a broader take on this.
Carol Hutchence – Put notes in some kind of order, and work the listservs in areas like physics, math, etc. – mostly global, these days – in turn, that discussion might guide what ARL might build as a platform
Q (?? (F)) – from Rutgers – Has been inspired to discuss these issues with higher-level folks, but wants guidance on how to mobilize her library’s team to think about/talk about these issues? How to scope initiatives for a particular institution?
Q (?? (M)) – Collaborations with industry would certainly be worth looking into
Q (?? (F)) – Some of these practices are discipline-wide; is there a way we might spread the successful practices more broadly; how can we continue to share that info broadly, take concrete steps in dividing up the work… We’ve been trying to think of how to bring together an e-Science team, probably starting within the library, and then bringing in folks from the Office of Research, others?
Catherine Blake – Presentations will be available
Summation and Closing Observations (4:15-4:30)
- Thanks for joining us
- Helpful program, clear that it’s generated a lot of thinking/energy around the topic
- Going forward:
- Subscribe to the CNI-Announce list!
- Announcements of important reports et al in this area go up there
- E.g. UKOLN report
- Will be taking up a number of the issues touched on here
- A few comments about specific points CL thinks are important, and to make explicit things that were implicit, and to point at some directions where it might be important to look for the future
- The nature of research practice in science is changing in fundamental ways – and it’s not going to stop.
- Infusion of data, computing, distributed sensor networks – and a big generational shift too
- We need to get out more! Bringing in scientists to talk to us about how their research practices to talk to us, and engaging in other such activities is “absolutely strategic” at this point
- Distributed sensor networks, virtual organizations, large-scale scanning systems – different scientific process which will have continuing ramifications for us
- Science doesn’t just generate data – it’s contextualized by software, workflow, cultural apparatuses, etc – need to think more holistically about scientific knowledge in the digital world, and not just about the ‘bits themselves’
- We talk about data curation, but we haven’t been talking about software curation – and we need to.
- We came to data curation from a frame of data preservation – what’s going on now is much less about preservation than it is about reuse – using data in novel ways to find new things and promote scholarship. And reuse will speak to scholars much more than preservation (sterile warehouse image, auditor image). Need to speak to how reuse will facilitate advancement and discovery. Will help us build connections to ongoing scientific work.
- Scale: “Doing this at scale is one of our great challenges” – important to remember that when you look at a projection that looks ridiculous, it usually means something is going to change – for example, projections that everyone would be a programmer – instead our whole way of thinking about computers/programming changed.
- Thinks we’ll see these issues pushed back as far as K-12 – it’s a data rich world
- 3 other notes about scaling up:
- Diff between marquee science and small science – right to focus on small science, that’s where the challenge is.
- CIOs are starting to pay a lot of attention to CI planning campuswide – EDUCAUSE is heavily involved in that. There’s an opp to fit this into a broader discussion about how universities/institutions need to change
- We’re going to need some inter-institutional collaboration on an unprecedented scale – will need to pool and share our expertise. Currently lack mechanisms to do that. But talking about how we might start to do that within ARL would be an appropriate step.
- Abstraction is an important issue
- To what extent can we do this on a cross-disciplinary basis, and to what extent do we need to get deep into discipline-specific information. – fundamental question, but poorly understood
- Variety of roles that librarians (et al) may take on: providers of reference, preservers, standards developers, etc. – need to be open to that variety. Probably asking a lot to ask that science librarians do everything across all the disciplines at every level – there will be specialization.
- IT is a young profession compared to librarianship; thus more dynamic/flexible over the last couple decades – perhaps analogous fluctuations will happen in libraries redefining their rel to science
- VALUES – libraries and librarians have served as advocates for openness, preservation, and all sorts of things on behalf of their institutions, users, and broader society
- We need to start talking about how openness can benefit society and the sharers
- We heard about the difficulties that are still built in the scholarly literature – constructed to bury data, be computationally unfriendly
- At the level of the individual scholar, open data may not make much difference, but at a collective level, it makes a huge difference
- Clouds – cloud computing/storage has a role, but we bring some values into play that aren’t very popular in cloud computing
- Cloud computing is opaque – they won’t tell you what they do with your data – need to reflect on that lack of transparency, think about analogy to credit default market…
- Need to be able to do things like intelligently assess risk, speak up for openness
- Sustainability, Openness, Open Data: how do we balance? Is there a nasty tradeoff we’ll have to make? Engineering openness into the systems we create, maybe. Need to be careful what we ask for in this area (and esp. from associations); may end up with something more or other than we bargained for.
- Need to remain mindful not just of skills, expertise, resources, but also some thoughtful consideration of values we want to reflect in the system we ultimately build to address these needs.