CUSO Winter School in Computer Science

Information Technology on the Web

January 28 - February 1, 2008
Champéry, Switzerland

The course is composed of three parts:

1. Web search (D. Hawking & A. Broder)

This lecture will focus on the following topics:

  • Evaluating quality of search results (test collections, side-by-side evaluation, C-TEST framework, tuning)
  • The Web from an enterprise perspective (i.e. visibility of organisational websites in global search engines)
  • Enterprise-scale webs
  • Topic-specific search portals and information quality
  • Retrieval based on external descriptions of documents
  • Distributed search and personal metasearch
  • Efficient generation of document summaries
  • Link graphs, web communities, spam rejection
  • Web-scale crawling
  • Indexing and retrieval, near-duplicate detection, ad generation
  • Algorithmic advertising

2. Accessing XML content: An information retrieval perspective (M. Lalmas) — [slides]

With XML as the evolving standard for structured documents, there is an increasing demand for appropriate XML access methods. The development of approaches to access XML content has generated a wealth of issues that are being addressed by the database (DB) and information retrieval (IR) communities. The DB community has traditionally focused on developing query languages and efficient evaluation algorithms for highly structured content. In contrast, the IR community has traditionally focused on searching unstructured content, and has developed various techniques for ranking query results and evaluating their effectiveness.

This lecture will concentrate on the work pursued by the IR community, where the main purpose is to provide content-oriented access to XML documents to support more precise access to XML documents by retrieving XML document components (the so-called XML elements) instead of whole documents in response to users' queries. The lecture will introduce the major XML-related standards and their role in information retrieval. It will cover structured text models, already investigated before XML, and their relation to XML, as well as indexing and searching algorithms for XML. Current XML retrieval approaches covering both extensions of older methods toward XML, as well as new models and methods developed specifically for content-oriented XML retrieval will be discussed. The lecture will finish with the issue related to the evaluation of content-oriented XML retrieval, carried out as part of INEX.

This lecture is based on tutorials given at the ACM SIGIR conferences in Seattle, 2006 (together with Ricardo Baeza-Yates) and Amsterdam, 2007 (together with Sihem Amer-Yahia, Ricardo Baeza-Yates, and Mariano Concens).

3. Service Oriented Architectures (M. Little) — [slides]

Conceptually, a distributed application consists of several distinct fragments split between the original calling process (client) and a remote (server) process responsible for executing the requested operations locally. Both the client and server are typically designed and implemented as if the application was to execute in a traditional centralised environment. Unfortunately this encourages an architecture where you should tie data and its processing together, leading to tightly coupled applications. Such applications can be brittle when failures occur or new services/objects need to be swapped in to replace old services/objects.

SOA is an architectural style to achieve loose coupling among interacting software agents. A service is a unit of work done by a service provider to achieve desired end results for a consumer. Both provider and consumer are roles played by software agents on behalf of their owners. SOA is deliberately unprescriptive about what happens behind service endpoints: we are ultimately only concerned with the transfer of structured data between parties, plus any meta-level information to safeguard such transfers (e.g., by encrypting or digitally signing messages).

SOA breaks the three-tier approach by inserting a new interface layer to de-couple the core business logic and database (back-end implementation choices) from the presentation layer and other applications. SOA turns business functions into services that can be reused and accessed through standard interfaces. This presentation will give an overview of information processing in SOA.


Monday From 15h00 Registration
16h30-18h45 Web search, part I (D. Hawking)
Tuesday 08h30-11h30 Web search, part I (D. Hawking)
16h30-18h45 Web search, part II (A. Broder)
Wednesday 08h30-11h30 Web search, part II (A. Broder)
16h30-18h45 Accessing XML content (M. Lalmas)
Thursday 08h30-11h30 Accessing XML content (M. Lalmas)
16h30-18h45 Service Oriented Architectures (M. Little)
Friday 08h30-11h30 Service Oriented Architectures (M. Little)
11h30-12h00 Wrap-up