![Top of Menu](images/menuTop.jpg)
![Home](images/menuHome.jpg)
![CFP](images/menuCfp.jpg)
![Program](images/menuProgramS.jpg)
![Committees](images/menuCommittee.jpg)
![Key Dates](images/menuKeyDates.jpg)
![Location](images/menuLocation.jpg)
![Hotel](images/menuHotel.jpg)
![Registration](images/menuRegistration.jpg)
![Students](images/menuStudents.jpg)
![Sponsors](images/menuSponsors.jpg)
![Media](images/menuMedia.jpg)
![Submission](images/menuSubmission.jpg)
![Tutorials](images/menuTutorial.jpg)
![Workshops](images/menuWorkshops.jpg)
![Travel Info](images/menuTravel.jpg)
![Proceedings](images/menuProceedings.jpg)
Track: Data Mining
Paper Title:
A New Suffix Tree Similarity Measure for Document Clustering
Authors:
Abstract:
In this paper, we propose a new similarity measure to compute the pairwise
similarity of text-based documents based on suffix tree document model. By
applying the new suffix tree similarity measure in Group-average Agglomerative
Hierarchical Clustering (GAHC) algorithm, we developed a new
suffix tree document clustering algorithm (NSTC). Our experimental results on
two standard document clustering benchmark corpus OHSUMED and RCV1 indicate
that the new clustering algorithm is a very effective document clustering
algorithm. Comparing with the results of traditional keyword tfidf similarity measure
in the same GHAC algorithm, NSTC achieved an improvement of 51% on the
average of F-measure score. Furthermore, we apply the new clustering
algorithm in analyzing the Web documents in online forum communities. A topic
oriented clustering algorithm is developed to help people in assessing,
classifying and searching the the Web documents in a large forum community.