![Top of Menu](images/menuTop.jpg)
![Home](images/menuHome.jpg)
![CFP](images/menuCfp.jpg)
![Program](images/menuProgramS.jpg)
![Committees](images/menuCommittee.jpg)
![Key Dates](images/menuKeyDates.jpg)
![Location](images/menuLocation.jpg)
![Hotel](images/menuHotel.jpg)
![Registration](images/menuRegistration.jpg)
![Students](images/menuStudents.jpg)
![Sponsors](images/menuSponsors.jpg)
![Media](images/menuMedia.jpg)
![Submission](images/menuSubmission.jpg)
![Tutorials](images/menuTutorial.jpg)
![Workshops](images/menuWorkshops.jpg)
![Travel Info](images/menuTravel.jpg)
![Proceedings](images/menuProceedings.jpg)
Track: XML
Paper Title:
Adaptive Record Extraction From Web Pages
Authors:
Abstract:
We describe an adaptive method for extracting records from
web pages. Our algorithm combines a weighted tree matching
metric with clustering for obtaining data extraction patterns.
We compare our method experimentally to the state-of-the-art,
and show that our approach is very competitive
for rigidly-structured records (such as product descriptions)
and far superior for loosely-structured records. (such as entries
on blogs).