Track: XML and Web Data
Paper Title:
Mapping-Driven XML Transformation
Authors:
Abstract:
Clio is an existing schema-mapping tool that provides user-friendly means to
manage and facilitate the complex task of transformation and integration of
heterogeneous data such as XML over the Web or in XML databases. By means of
mappings from source to target schemas, Clio can help users conveniently
establish the precise semantics of data transformation and integration. In this
paper we study the problem of how to efficiently implement such data
transformation (i.e., generating target data from the source data based on schema
mappings). We present a three-phase framework for high-performance XML-to-XML
transformation based on schema mappings, and discuss methodologies and algorithms
for implementing these phases. In particular, we elaborate on novel techniques
such as streamed extraction of mapped source values and scalable disk-based
merging of overlapping data (including duplicate elimination). We compare our
transformation framework with alternative methods such as using XQuery or SQL/XML
provided by current commercial databases. The results demonstrate that the
three-phase framework (although as simple as it is) is highly scalable and
outperforms the alternative methods by orders of magnitude.