LIS 2670 - Digital Libraries: 2012

Saturday, December 1, 2012

Week 13 - Legal Rights and the Future of Digital Libraries

Lesk - Chapter 11, Intellectual Property Rights

One bit of insightful information in this piece was its articulation of Bridgeman Art Library, Ltd. v. Corel Corp., 36 F. Supp. 2d 191. Lesk perceives this case as holding a photograph of an artwork is not copyrightable if the photograph was routinely taken just to depict the artwork. This is especially important for digital libraries which routinely digitize images of important works of art.

Another potential pitfall for English-language digital libraries is the United Kingdom's protection over "typographic arrangement." Because a public domain book may be set in a protected typeset, one may need to look into legal issues before digitizing certain books.

Muddiest Point: What is the best resource/book for librarians to turn for copyright help, for large digitization projects?

Stiglitz - Intellectual Property Rights and Wrongs

Stiglitz reminds us that when designing a digital library, we must take into account how users from less advantaged regions may access our material. We should champion open-source software and open-access scholarship while maintaining high quality. In fact, this has already been down. Both DSpace and ContentDM are OS software available to librarians who have an Internet connection.

Lynch - Where Do We Go from Here?

Lynch does an excellent job of summarizing the history of digital libraries. I am glad he acknowledges the crucial role played by government-funded and led initiatives. Very often, a national government can play a pivotal role in innovation. Innovation is not the exclusive domain of decentralized lone rangers.

Second, Lynch raises many good fields of research for others to pursue in the future. However, because this article is a little old, 2005, many of these fields of research -- personal information management, long term relationships between humans and information collections and systems (e.g., human computer interaction) -- have already become well developed.

Knowledge Lost in Information - Report

The Ubiquitous Knowledge Environment, or "information ether," is now becoming a reality. I believe the scope of digital libraries includes cloud-supported libraries of videos and music. These are now readily accessible at a moment's touch from any area which is wireless supported. Also, Google Print permits one to add devices and then to print from one's Gmail account anywhere one is able to access it.

Even though these may not seem research-related, these above technologies represent "individualized, customized, human-centric computing."

Wednesday, November 21, 2012

Week 12 - Security and Economics

Arms - Economic and Legal Issues

This article captures the crux of digital information's economics from this quote:

With digital information, . . . once the first copy has been mounted online, the distribution costs are tiny. In economic terms, the cost is almost entirely fixed cost. The marginal cost is near zero. As a result, once sales of a digital produce reach the level that covers the costs of creation, all additional sales are pure profit [emphasis mine].

This made me think of all the profit that Apple is making from selling music and videos through ITunes! When the content is good, and the interface is popular, people are willing to pay =) .

Arms - Implementing Policies for Access Management

I really enjoyed this piece's explanation of security and implementation. It gave a very good piece of "design best practice" for user interface authentications embodied in this quote:

The least intrusive situation is when authentication is keyed to some hidden information, such as the IP address of the user's computer, or where the user logs in once and an authentication token is passed onto other computers[.]

I also found the illustrations showing particular roles and attributes for users and institutions to be very helpful.

Lesk - Economics

I greatly appreciated the reference to work done on library ROIs by Jose-Marie Griffifths and Don King - However, because this work is a little old, 2003, I would like to know -->

Muddiest Point: Has there been more recent work on a library's return on investment, which give librarians strategies on showing their worth to their home institution?

Second - I believe there has been more recent work done on how administrative costs of OA journals can be borne by the authors, and not by the users. This may be a better model: the prestige of having one's article accepted by a prestigious OA journal is greater than the small minimal cost of $2.99 for submission.

Kohl - Safeguarding Digital Library Contents and Users

Finally, I have a clear understanding on how encryption and decryption keys work in the digital world! These public and private key relationships were explained very clearly.

Tuesday, November 20, 2012

Week 11 - Social Issues

Borgman - Social Aspects of Digital Libraries

The great value from this early piece is summed up from these two sentences:

#1. Digital libraries are a set of electronic resources and associated technical capabilities for creating, searching, and using information.

#2. Digital libraries are constructed -- collected and organized -- by a community of users, and their functional capabilities support the information needs and uses of that community.

I am glad that, as early as 1996, researchers called for an empirical approach to digital library design, with a focus on users!

Roush - Infinite Library

This piece correctly pointed out three directions which Google's digitization project can go:

Door One - a private firm begins to purchase rights to things already in the public domain, in order to privatize them

Door Two - Parallel public and private databases coexist peacefully. Google could keep one copy of each library's collection for itself and give away the other copy.

Door Three - Private companies offer commercial access to digital books while public entities, such as libraries, are allowed to provide free access for research and scholarship.
I love this quote: "Libraries and publishing have always existed in the physical world without damaging each other; in fact, they support each other. What we would like to see is this tradition not die with this digital transformation."

Arms - A Viewpoint Analysis of the Digital Library

Again, a great emphasis on the user's perspective in light of interoperability.

Muddiest Point: I have no questions this week - everything was easy to understand.

Thursday, November 1, 2012

Week 10 - Interaction and Evaluation

Arms - Chapter 8

One of the best insights from this reading was from this quote: "The functions offered to the user depend upon . . . structural metadata." Because this reading comes from 1999, I wonder whether structural metadata is still crucial for user functionality in digital libraries.

Muddiest Point: In 2012, which structural metadata is most crucial for user functionality?

This reading also gave me a good sense of what Java is, its distinction from JavaScript, and how Java functions.

Muddiest Point: Dr. he, may you please tell us some best practices for selecting servers, middleware, and CMSes? For a medium-sized digital library, what type of server, database language, and middleware should a library purchase? Is there a website or journal which a librarian should follow regarding this type of selection?

Kling & Elliot - Digital Library Design for Usability

I felt that this article was too vague. Maybe it is because it is a little outdated. I appreciate the author outlining in Section 6.3 a usability engineering life cycle model, but I want to know concrete steps: how to conduct a user study, how to develop a questionnaire, how one should go about creating a prototype. Too vague, too little.

Saracevic - Evaluation of Digital Libraries, An Overview

This article raised more questions than it gave answers. That is fine, because these were questions which needed to be asked. Just one example would be:

To what extent are user studies also evaluation studies?
To what extent are studies of specific user behavior in digital libraries also evaluation studies?

Also, I believe his list of factors under Section 6, "Criteria,"can act as a checklist for what digital library designers should keep in mind when they begin development.

Hearst - Search User Interfaces

This reading gave solid practical advice on designing search interfaces from a user-based perspective. Its assertions were backed up with stats and studies. Thanks for having us read this article!

Thursday, October 25, 2012

Week 9 - Reading Notes

Hedstrom - Research Challenges in Digital Archiving and Long-term Presernvation

This short essay summarizes the main challenges to digital preservation: (a) the collections are heterogeneous and ever-growing; (b) one must digitally preserve for the long-term; (c) both infrastructure and technologies must be affordable.

Because this was so short, I wonder whether this was an old paper given at a workshop before the advent of OAI.

Lavoie - The Open Archival Information System Reference Model: Introductory Guide

This has to be one of the best pieces I have read this semester. It clearly sets forth how OAI got started. Second, it explains to a would-be architect the different steps and components one may need to build an OAI-compliant archive. I give two examples below:

"The first responsibility of an OAIS-type archive is to establish criteria for determining which materials are appropriate for inclusion in the archival store." (page 4)

"The second responsibility emphasizes the need for the OAIS to obtain sufficient intellectual property rights [ . . . ]" (page 4)

"Another responsibility of an OAIS-type archive is to determine the scope of its primary user community."

I especially appreciated the visual diagrams which showed the actors and different stages on page eight.

Preservation Management of Digital Materials: The Handbook

I felt that this reading seemed to repeat much of the material covered in the lecture and in previous readings. Because I actually studied Digital Preservation in an Archives context before, I already knew much of the material.

Littman - Actualized Preservation Threats

Muddiest Point - I know MODS in passing, but I would appreciate a more in-depth explanation and demonstration. Thank you!

I find it very helpful to know in advance some of the failures that took place. However, I wonder whether the utility of this paper is limited, because it was published in 2007, and most technology has now moved on.

Monday, October 22, 2012

Week 8 - Reading Notes

OAI for Beginners - I greatly enjoyed reading about the history of how OAI developed. I especially appreciated the clear definitions between Data Providers and Service Providers, and the pictures which illustrated the functions and interrelationship between these two types of Providers. However, because OAI depends heavily on HTTP, and because I have not yet learned HTTP, there were some parts of this tutorial which I did not understand.

Muddiest Point: Because we are already learning HTML, XML, DTDs, and XML Schema in library school, why don't we also learn some basic HTTP? If such important metadata standards rely on HTTP, then we should learn this in library school.

The Truth About Federated Searching - This article held some very valuable insights for me. First, Hane reminds the reader that federated search engines must demonstrate to the library that they can search the library's databases using the library's own authentication, both locally and remotely. Second, I was surprised to learn that federated searching cannot improve on a native database's search capabilities. A federated search engine can only use the capabilities of the native database.

Muddiest Point: PittCat subscribes to Summon. Has Summon demonstrated its value by effectively employing authentication and the capabilities of Pitt's subscribed databases?

Z39.50 Information Retrieval Standard - For the most part, I enjoyed how this article explained history. I appreciated knowing the origins of Z39.50, even knowing that NISO was once the Z39 committee of ANSI.

However, the article also assumed a lot of background knowledge of TP/IP, protocols upon which Z39.50 seems to be based. Again, like in the OAI for Beginners article, we at Pitt's ISchool probably do not have an adequate background in TP/IP. We should. I believe Pitt should teach this to us. I will most likely go learn it on my own via Lynda, but I think that Pitt should teach TP/IP to us if an important standard employs these protocols.

I was disappointed that this article did not use helpful diagrams, as the OAI article had. Therefore, many of the complex relationships between server and client were a little hard to visualize.

Lossau - Search Engine Technology and Digital Libraries - This article correctly points out that a library's vision should not be focused on its own collection, but should be broader. A library should focus on building search services targeting virtual collections of material even within the deep web. However, this strikes home the importance of interoperable accepted standards across all types of digital objects and their repositories.

Friday, October 5, 2012

Week 7 - Reading Notes

Lesk - Chapter Four

Frankly, I found this chapter to be outdated. This is because the technology which this chapter describes seems to be obsolete.

For example, in its chapter about pictures, only GIF and JPEG were described. However, from my training as an archivist and amateur photographer, it is common knowledge that TIFF files are preferred over any one of these formats. This is because each time a GIF or JPEG file is opened, its pixel count and thus resolution decreases. A TIFF file does not do this. Could it be that TIFF files were not around when this book was published in 2005? According to Wikipedia, TIFF was born in 1992, but did not receive wide usage till 2009.

With respect to Automatic Speech Recognition, the greatest contemporary example in everyday life seems to be Apple's Siri. Even though this is proprietary, I would love to know how this works. Because Siri is extremely new, Lesk does not mention it in this 2005 book.

Hawking - Web Search Engines, Parts 1 & 2

These two articles were fantastic. These are the most clearly written articles I have ever read about what exactly a web search engine does.

That being said, there were some muddy points where I did not know what Hawking was talking about -

Muddiest Point #1: From Part 1, page 87: "Excluded Content. Before fetching a page from a site, a crawler must fetch that site's robots.txt file to determine whether the webmaster has specified that some or all of the site should not be crawled." What are the reasons that a webmaster would specify certain sections not be crawled? What would those sections be generally?

Muddiest Point #2: From Part 2, page 88: "An inverted file is a concatenation of the postings list for each distinct term." Would we be able to see a visual example of this list in class? I did look up the definition of the term "concatenation," but I am having a hard time visualizing this. Also, Hawking did not define what "postings list" is, so I need clarification on that, too.

Henzinger et al. - Challenges in Web Search Engines

This was an extremely good piece of writing. All of the terms were well-defined before used in an explanation by the authors.I especially enjoyed the descriptions of text spam, link spam, and cloaking. I had never known exactly how website creators try to improve their rankings in search results; now I know!

Monday, October 1, 2012

Week 6 - Reading Notes

Bryan - An Introduction to the Extensible Markup Language (XML)

Out of all four articles, I believe this one explained the function and advantages of XML over its predecessors most clearly. For example, Bryan writes XML "is designed to make it easy to interchange structured documents over the Internet." "Another succinct definition of XML which encapsulates its function is where Bryan states "XML is formal language that can be used to pass information about the component parts of a document to another computer system."

I believe that one advantage which XML brings is that it forces its coders to think in a very disciplined fashion. Bryan describes this where he states, "To allow the computer to check the structure of a document users must provide it with a document type definition that declares each of the permitted entities, elements and attributes and the relationships among them."

Ogbuji - A survey of XML standards: Part I

Muddiest Point: I felt very frustrated that Ogbuji did not seem to bother with classifying these many standards by like functions. Instead, although I may know how a specific standard functions, I do not know its relationship to XML nor how to employ it. I feel this type of explanatory text is only useful if one already has seen each of these standards in action. Perhaps a fifteen minute demonstrative video would be better.

Tidwell - Introduction to XML

Tidwell's guide really helps me create DTDs. I use it all the time when I have a DTD assignment. However, I use the W3CSchools reference first, because it is online and therefore most likely updated often. Tidwell's piece is ten years old and is in danger of becoming obsolete.

Friday, September 21, 2012

Week 5 - Reading Notes

Gartner - Metadata for digital libraries: state of the art and future directions

This piece is one of the most well-written assigned readings I have come across. Finally, in plain English, a librarian has articulated why metadata schemes employing XML may be able to interoperate so well:

"XML has the crucial feature that a marked-up file can embed others encoded in different XML schemas directly within it (if, of course, it follows a schema that is designed with this function in its specification). This is made possible by a feature known as XML namespaces."

I was also very grateful for the author pointing out one technical problem which may arise involving namespace definitions. This is where XML schemas within the METS framework incorporate subsidiary schemes whose namespace definitions conflict with those in METS.

So very often librarians who are not IT-literate become dependent upon things being interoperable immediately and automatically, that we lack the sophistication to troubleshoot.

This piece also finally identified the mystery organization behind METS and MODS, namely, the MARC Standards Office.

Muddiest Point: Who are the other organizations involved in promulgating standards? Is there enmity or competition between them, or is there a sincere commitment to openness?

Gilliland - Setting the Stage

This article identified important functions of metadata, such as certifying the authenticity and degree of completeness of the content, and providing some information that might have been provided in a traditional, in-person reference or research setting.

Muddiest Point: Because metadata performs such important functions, should there be a code of ethics for people entering the metadata? If the object is something extremely important, like a volume from the Vatican, might there be the opportunity for an unethical monk to misrepresent the object and scurry it away into another hidden category?

Weibel - Border Crossings

I absolutely loved Weibel's analogy of changing train gears on the tracks from Beijing to Siberia, with how metadata schemes encounter interoperability challenges. As someone who has taken that train, I can fully relate!

Thursday, September 20, 2012

Week 4 - Reading Notes

Lesk - Chapters 2.1, 2.2, 2.7, 3

From Chapter 3.7, I especially liked learning about the Gallica Collection available at the Bibliothèque Nationale Française, because I speak and read French. After discovering this, I went online to Gallica itself, and was amazed to see the editor's proof of Les Fleurs du Mal, a very important book of French poetry from the 19th century . . . with the editor's and author's own handwritten proofs on the margins of the scanned version! Amazing. From a scholarship standpoint, a researcher no longer has to travel to Paris to read this vital copy; s/he can read it from the comfort of their own laptop. For an example, see http://gallica.bnf.fr/ark:/12148/btv1b86108314/f42.image

Arms - Chapter 9

On Panel 9.3, I was surprised to learn that EAD is in fact a sophisticated type of DTD. I have been using EAD for many years and have used it to mark up more than 31 linear feet of archival material. One day, I hope to see the actual code for EAD, because it is open-source and I hope to help edit its code.

Lynch - Identifiers and Their Role in Networked Information Applications

Muddiest Point: What are the politics behind competing standards? It seems that one group, like the Internet Engineering Task Force, will first try to develop a persistent identifier, like URN, only to be usurped by another group like OCLC who is trying to push PURL. Then, much later, NISO will push SICI, while AAP and CNRI will now push for DOI.

Who is right and who is wrong? Who is stepping on the other's toes? Why are there so many competing handles?

Paskin - Digital Object Identifier System

This article greatly clarified for me the relationship between DOI, URLs, and the digital object itself. I believe Figure 1 brought it home for me by showing that where the URL for the digital object changed, the DOI itself did not have to alter. The independence of a DOI from an object's URL guarantees its persistence over time.

Wednesday, September 19, 2012

Week 3 - Reading Notes

Witten, Bainbrdige & Boddie: Greenstone, Open-Source Digital Library Software

The New Zealand Digital Library project's Greenstone software has features which appeal to me. First, it has interfaces in many different languages, including Chinese. Second, it has user activity logs which record every query made to its collection. This could greatly assist in building a taxonomy for a site, to choose labels and to create an index or FAQ. Third, I was impressed by the fact that the U.N. freely distributed Greenstone via CD-ROM to developing countries.

Smith et al.: DSpace - An Open Source Dynamic Digital Repository

It seems that DSpace has some major advantages over Greenstone. First, DSpace employs more standards in its protocols, such as the DublinCore metadata standard and OAI-PMH, and thus seems more interoperable. Second, DSpace is able to accommodate a complicated workflow system in an organization, with different players performing different functions (submitter, reviewer, metadata editor, etc.)

Muddiest Point: Does DSpace employ user activity logs, in order to capture user queries to refine its taxonomy and labels?

Biswas and Paul - An Evaluative Study - Special Reference to DSpace and Greenstone Digital Library

I really enjoyed the fact that this article pointed out some of the major features of DSpace, like its Lucene search engine, Handle system, and its employment of OAI-PMH. However, I was dismayed by the numerous misspellings which occurred in this article. I appreciated the fact that the authors pointed out one particular weakness of DSpace at that time: It lacks/lacked employment of the Metadata Encoding and Transmission Standard, METS.

Week 2 - Reading Notes

Suleman and Fox: A Framework for Building Open Digital Libraries

In this reading, I really enjoyed the authors' clear articulation of principles to guide an Open Digital Library Design:

All Digital Library services should be encapsulated within components that are extensions of Open Archives.
All access to the Digital Library services should be through their extended OAI interfaces.
The semantics of the OAI protocol should be extended or overloaded as allowed by the OAI protocol, but without contradicting the essential meaning.
All Digital Library services should get access to other data sources using the extended OAI protocol.
Digital Libraries should be constructed as networks of extended Open Archives.

However, my question is: Is OAI the acknowledged industry standard for interoperability between Digital Libraries? What is the difference between OAI-PMH and Dublin Core? Do those standards describe different things?

Arms, Blanchi, and Overy: An Architecture for Information in Digital Libraries

This piece helped give me a clear understanding of the outline of information architecture in a digital library.I especially appreciated its definition of data types, structural metadata, and meta-objects. I liked the rules which were made at the outset:

All data is given an explicit data type.
All metadata is encoded explicitly.
Handles are given to individual items of intellectual property.
Meta-objects are used to aggregate digital objects.
Handles are used to identify items in meta-objects.

Because this article gave clear definitions to its outline, it provoked no questions from me.

Payette, Blanchi, Lagoze and Overly: Interoperability for Digital Objects and Repositories: The Cornell/CNRI Experiments

This article had a great definition for interoperability:

[I]nteroperability si defined as the ability of digital library components or services to be functionally and logically interchangeable by virtue of their having been implemented in accordance with a set of well-defined, publicly known interfaces. . . . When repositories and digital objects are created in this manner, the overall effect can be a federation of repositories that aggregate content with very different attributes, but that can be treated in the same manner due to their shared interface definitions.

My question is: What are the best interoperability protocol standards for digital libraries today? Where can I go to find such information, and information on how those standards are evolving? This article was written in 1999; where can I find new stuff?

I appreciated the ARMS' chapter which made an overview of the Web and its history, but I have read it many times now, so I don't have that many questions.

Thursday, August 16, 2012

Week 1 - My Reading Questions

Questions from Week 1 Readings

During this week's reading, Borgman states that "computer science researchers sometimes counter that LIS researchers are bound by a narrow paradigm and pay insufficient attention to computer science's accomplishments."

I would really like Borgman to describe in detail which specific accomplishments computer science has contributed. I would like to ask whether those accomplishments relate directly to perennial problems in librarianship, like findability and interoperability.

Specifically, one great problem which I perceive in librarianship is that, for a specific discipline of study, the hierarchy of secondary materials and specialized nomenclature can be extremely distinct or similar to that of another discipline of study, thus resulting in confusion. Which CS accomplishments can help us resolve this problem?