LIS 2670 - Digital Libraries: October 2012

Thursday, October 25, 2012

Week 9 - Reading Notes

Hedstrom - Research Challenges in Digital Archiving and Long-term Presernvation

This short essay summarizes the main challenges to digital preservation: (a) the collections are heterogeneous and ever-growing; (b) one must digitally preserve for the long-term; (c) both infrastructure and technologies must be affordable.

Because this was so short, I wonder whether this was an old paper given at a workshop before the advent of OAI.

Lavoie - The Open Archival Information System Reference Model: Introductory Guide

This has to be one of the best pieces I have read this semester. It clearly sets forth how OAI got started. Second, it explains to a would-be architect the different steps and components one may need to build an OAI-compliant archive. I give two examples below:

"The first responsibility of an OAIS-type archive is to establish criteria for determining which materials are appropriate for inclusion in the archival store." (page 4)

"The second responsibility emphasizes the need for the OAIS to obtain sufficient intellectual property rights [ . . . ]" (page 4)

"Another responsibility of an OAIS-type archive is to determine the scope of its primary user community."

I especially appreciated the visual diagrams which showed the actors and different stages on page eight.

Preservation Management of Digital Materials: The Handbook

I felt that this reading seemed to repeat much of the material covered in the lecture and in previous readings. Because I actually studied Digital Preservation in an Archives context before, I already knew much of the material.

Littman - Actualized Preservation Threats

Muddiest Point - I know MODS in passing, but I would appreciate a more in-depth explanation and demonstration. Thank you!

I find it very helpful to know in advance some of the failures that took place. However, I wonder whether the utility of this paper is limited, because it was published in 2007, and most technology has now moved on.

Monday, October 22, 2012

Week 8 - Reading Notes

OAI for Beginners - I greatly enjoyed reading about the history of how OAI developed. I especially appreciated the clear definitions between Data Providers and Service Providers, and the pictures which illustrated the functions and interrelationship between these two types of Providers. However, because OAI depends heavily on HTTP, and because I have not yet learned HTTP, there were some parts of this tutorial which I did not understand.

Muddiest Point: Because we are already learning HTML, XML, DTDs, and XML Schema in library school, why don't we also learn some basic HTTP? If such important metadata standards rely on HTTP, then we should learn this in library school.

The Truth About Federated Searching - This article held some very valuable insights for me. First, Hane reminds the reader that federated search engines must demonstrate to the library that they can search the library's databases using the library's own authentication, both locally and remotely. Second, I was surprised to learn that federated searching cannot improve on a native database's search capabilities. A federated search engine can only use the capabilities of the native database.

Muddiest Point: PittCat subscribes to Summon. Has Summon demonstrated its value by effectively employing authentication and the capabilities of Pitt's subscribed databases?

Z39.50 Information Retrieval Standard - For the most part, I enjoyed how this article explained history. I appreciated knowing the origins of Z39.50, even knowing that NISO was once the Z39 committee of ANSI.

However, the article also assumed a lot of background knowledge of TP/IP, protocols upon which Z39.50 seems to be based. Again, like in the OAI for Beginners article, we at Pitt's ISchool probably do not have an adequate background in TP/IP. We should. I believe Pitt should teach this to us. I will most likely go learn it on my own via Lynda, but I think that Pitt should teach TP/IP to us if an important standard employs these protocols.

I was disappointed that this article did not use helpful diagrams, as the OAI article had. Therefore, many of the complex relationships between server and client were a little hard to visualize.

Lossau - Search Engine Technology and Digital Libraries - This article correctly points out that a library's vision should not be focused on its own collection, but should be broader. A library should focus on building search services targeting virtual collections of material even within the deep web. However, this strikes home the importance of interoperable accepted standards across all types of digital objects and their repositories.

Friday, October 5, 2012

Week 7 - Reading Notes

Lesk - Chapter Four

Frankly, I found this chapter to be outdated. This is because the technology which this chapter describes seems to be obsolete.

For example, in its chapter about pictures, only GIF and JPEG were described. However, from my training as an archivist and amateur photographer, it is common knowledge that TIFF files are preferred over any one of these formats. This is because each time a GIF or JPEG file is opened, its pixel count and thus resolution decreases. A TIFF file does not do this. Could it be that TIFF files were not around when this book was published in 2005? According to Wikipedia, TIFF was born in 1992, but did not receive wide usage till 2009.

With respect to Automatic Speech Recognition, the greatest contemporary example in everyday life seems to be Apple's Siri. Even though this is proprietary, I would love to know how this works. Because Siri is extremely new, Lesk does not mention it in this 2005 book.

Hawking - Web Search Engines, Parts 1 & 2

These two articles were fantastic. These are the most clearly written articles I have ever read about what exactly a web search engine does.

That being said, there were some muddy points where I did not know what Hawking was talking about -

Muddiest Point #1: From Part 1, page 87: "Excluded Content. Before fetching a page from a site, a crawler must fetch that site's robots.txt file to determine whether the webmaster has specified that some or all of the site should not be crawled." What are the reasons that a webmaster would specify certain sections not be crawled? What would those sections be generally?

Muddiest Point #2: From Part 2, page 88: "An inverted file is a concatenation of the postings list for each distinct term." Would we be able to see a visual example of this list in class? I did look up the definition of the term "concatenation," but I am having a hard time visualizing this. Also, Hawking did not define what "postings list" is, so I need clarification on that, too.

Henzinger et al. - Challenges in Web Search Engines

This was an extremely good piece of writing. All of the terms were well-defined before used in an explanation by the authors.I especially enjoyed the descriptions of text spam, link spam, and cloaking. I had never known exactly how website creators try to improve their rankings in search results; now I know!

Monday, October 1, 2012

Week 6 - Reading Notes

Bryan - An Introduction to the Extensible Markup Language (XML)

Out of all four articles, I believe this one explained the function and advantages of XML over its predecessors most clearly. For example, Bryan writes XML "is designed to make it easy to interchange structured documents over the Internet." "Another succinct definition of XML which encapsulates its function is where Bryan states "XML is formal language that can be used to pass information about the component parts of a document to another computer system."

I believe that one advantage which XML brings is that it forces its coders to think in a very disciplined fashion. Bryan describes this where he states, "To allow the computer to check the structure of a document users must provide it with a document type definition that declares each of the permitted entities, elements and attributes and the relationships among them."

Ogbuji - A survey of XML standards: Part I

Muddiest Point: I felt very frustrated that Ogbuji did not seem to bother with classifying these many standards by like functions. Instead, although I may know how a specific standard functions, I do not know its relationship to XML nor how to employ it. I feel this type of explanatory text is only useful if one already has seen each of these standards in action. Perhaps a fifteen minute demonstrative video would be better.

Tidwell - Introduction to XML

Tidwell's guide really helps me create DTDs. I use it all the time when I have a DTD assignment. However, I use the W3CSchools reference first, because it is online and therefore most likely updated often. Tidwell's piece is ten years old and is in danger of becoming obsolete.