![Top of Menu](images/menuTop.jpg)
![Home](images/menuHome.jpg)
![CFP](images/menuCfp.jpg)
![Program](images/menuProgramS.jpg)
![Committees](images/menuCommittee.jpg)
![Key Dates](images/menuKeyDates.jpg)
![Location](images/menuLocation.jpg)
![Hotel](images/menuHotel.jpg)
![Registration](images/menuRegistration.jpg)
![Students](images/menuStudents.jpg)
![Sponsors](images/menuSponsors.jpg)
![Media](images/menuMedia.jpg)
![Submission](images/menuSubmission.jpg)
![Tutorials](images/menuTutorial.jpg)
![Workshops](images/menuWorkshops.jpg)
![Travel Info](images/menuTravel.jpg)
![Proceedings](images/menuProceedings.jpg)
Track: Data Mining
Paper Title:
Scaling Up All Pairs Similarity Search
Authors:
Abstract:
Given a large collection of sparse vector data in a high dimensional space, we investigate the problem of finding all pairs of vectors whose similarity score (as determined by a function such as cosine distance) is above a given threshold. We propose novel optimization and indexing techniques for this problem, resulting in an algorithm that is both faster and simpler than the previous state-of-the-art approaches. We demonstrate the effectiveness of our algorithm on the public DBLP dataset, and on two real-world web applications: generating recommendations for the Orkut social network, and computing pairs of similar queries from search snippet data among the 5 million most frequently issued Google queries. Our algorithm is between 5 times to 20 times faster than previous algorithms on these datasets.