Track: Systems
Paper Title:
Mirror Site Maintenance Based on Evolution Associations of Web Directories
Authors:
Abstract:
Mirroring Web sites is a well-known technique commonly used in the
Web community. A mirror site should be updated frequently to
ensure that it reflects the content of the original site. Existing
mirroring tools apply page-level strategies to check each
page of a site, which is inefficient and expensive. In this paper,
we propose a novel site-level mirror maintenance
strategy. Our approach studies the evolution of Web directory
structures and mines association rules between ancestor-descendant
Web directories. Discovered rules indicate the evolution
correlations between Web directories. Thus, when maintaining the
mirror of a Web site (directory), we can optimally skip
subdirectories which are negatively correlated with it in
undergoing significant changes. The preliminary experimental
results show that our approach improves the efficiency of the
mirror maintenance process significantly while sacrificing
slightly in keeping the ``freshness" of the mirrors.