Top of Menu Home CFP Program Committees Key Dates Location Hotel Registration Students Sponsors Media Submission Tutorials Workshops Travel Info Proceedings

Poster Papers

Track: Search

Paper Title:
Web Page Classification with Heterogeneous Data Fusion

Authors:

Abstract:
Web pages are more than text and they contain much contextual and structural information, e.g., the title, meta data, the anchor text, etc., each of which can be seen as a data source or a representation. Due to the different dimensionality and different representing forms of these heterogeneous data sources, simply putting them together would not greatly enhance the classification performance. We observe that via a kernel function, different dimensions and types of data sources can be represented into a common format of kernel matrix, which can be seen as a generalized similarity measure between web pages. In this sense, a kernel learning approach is employed to fuse these heterogeneous data sources. The experimental results on a collection of the ODP database validate the advantages of the proposed method over any single data source and the uniformly weighted combination of heterogeneous data sources.

PDF version























sponsors