Track: XML and Web Data
Paper Title:
Visibly Pushdown Automata for Streaming XML
Authors:
Abstract:
We propose the study of visibly pushdown automata (VPA) for
processing XML documents. VPAs are pushdown automata where the
input determines the stack operation, and XML documents are
naturally visibly pushdown with the VPA pushing onto the stack on
open-tags and popping the stack on close-tags. In this paper we
demonstrate the power and ease visibly pushdown automata give
in the design of streaming algorithms for XML documents.
We study the problems of type-checking streaming XML documents against SDTD schemas, and the problem of typing tags in a streaming XML document according to an SDTD schema. For the latter problem, we consider both pre-order typing and post-order typing of a document, which dynamically determines types at open-tags and close-tags respectively as soon as they are met. We also generalize the problems of pre-order and post-order typing to prefix querying. We show that a deterministic VPA yields an algorithm to the problem of answering in one pass the set of all answers to any query that has the property that a node satisfying the query is determined solely by the prefix leading to the node. All the streaming algorithms we develop in this paper are based on the construction of deterministic VPAs, and hence, for any fixed problem, the algorithms process each element of the input in constant time, and use space O(d), where d is the depth of the document.