Friday, October 5, 2012

Week 7 - Reading Notes

Lesk - Chapter Four

Frankly, I found this chapter to be outdated. This is because the technology which this chapter describes seems to be obsolete.

For example, in its chapter about pictures, only GIF and JPEG were described. However, from my training as an archivist and amateur photographer, it is common knowledge that TIFF files are preferred over any one of these formats. This is because each time a GIF or JPEG file is opened, its pixel count and thus resolution decreases. A TIFF file does not do this. Could it be that TIFF files were not around when this book was published in 2005? According to Wikipedia, TIFF was born in 1992, but did not receive wide usage till 2009.

With respect to Automatic Speech Recognition, the greatest contemporary example in everyday life seems to be Apple's Siri. Even though this is proprietary, I would love to know how this works. Because Siri is extremely new, Lesk does not mention it in this 2005 book.

Hawking - Web Search Engines, Parts 1 & 2

These two articles were fantastic. These are the most clearly written articles I have ever read about what exactly a web search engine does.

That being said, there were some muddy points where I did not know what Hawking was talking about -

Muddiest Point #1:  From Part 1, page 87: "Excluded Content. Before fetching a page from a site, a crawler must fetch that site's robots.txt file to determine whether the webmaster has specified that some or all of the site should not be crawled." What are the reasons that a webmaster would specify certain sections not be crawled? What would those sections be generally?

Muddiest Point #2: From Part 2, page 88: "An inverted file is a concatenation of the postings list for each distinct term." Would we be able to see a visual example of this list in class? I did look up the definition of the term "concatenation," but I am having a hard time visualizing this. Also, Hawking did not define what "postings list" is, so I need clarification on that, too.

Henzinger et al. - Challenges in Web Search Engines

This was an extremely good piece of writing. All of the terms were well-defined before used in an explanation by the authors.I especially enjoyed the descriptions of text spam, link spam, and cloaking. I had never known exactly how website creators try to improve their rankings in search results; now I know!

No comments:

Post a Comment