Track: User Interfaces and Accessibility
Paper Title:
Visualizing Structural Patterns in Web Collections
Authors:
Abstract:
We present a tool, DescribeX, suitable for exploring and
visualizing the structural patterns present in collections of
XML documents. DescribeX can be employed by developers
to interactively discover, for example, those XPath expressions
that will actually return elements known to occur in
the collection.
Many collections of XML documents present in the Web are difficult to describe because they use different schemas, the schemas used may be extended through namespaces, and the document instances are often complex and ad-hoc in structure. Collected feeds are an example of web collections that are comprised of documents with multiple schemas (e.g. Atom, RSS, and RDF), in multiple versions (e.g. RSS 1.0, RSS 2.0, etc.), which have been fruther extended by schemas from several namespaces (e.g. Dublin core, iTunes Podcast, Microsoft Simple List Extensions). Another example not involving feeds is a collection created from traces of web service requests.