Finding Community Structure in Mega-scale Social Networks [Extended Abstract]^{1, 2}

Ken Wakita

Tokyo Institute of Technology
Tokyo, Japan

Toshiyuki Tsurumi

Tokyo Institute of Technology
Tokyo, Japan

tsurumi2@is.titech.ac.jp

Copyright is held by the author/owner(s).
WWW 2007, May 8-12, 2007, Banff, Alberta, Canada.
ACM 978-1-59593-654-7/07/0005.

ABSTRACT

Community analysis algorithm proposed by Clauset, Newman, and Moore (CNM algorithm) finds community structure in social networks. Unfortunately, CNM algorithm does not scale well and its use is practically limited to networks whose sizes are up to 500,000 nodes. We show that this inefficiency is caused from merging communities in unbalanced manner and that a simple heuristics that attempts to merge community structures in a balanced manner can dramatically improve community structure analysis. The proposed techniques are tested using data sets obtained from existing social networking service that hosts 5.5 million users. We have tested three three variations of the heuristics. The fastest method processes a SNS friendship network with 1 million users in 5 minutes (70 times faster than CNM) and another friendship network with 4 million users in 35 minutes, respectively. Another one processes a network with 500,000 nodes in 50 minutes (7 times faster than CNM), finds community structures that has improved modularity, and scales to a network with 5.5 million. Further detail is reported in [3].

Categories & Subject Descriptors

H.2.8 [Database applications]: Datamining; G.2.2 [Graph Theory]: Graph algorithms

General Terms

Algorithms, Performance

Keywords

Community analysis, clustering, social networking service

1 Introduction

Finding community structure in complex networks is an important first step to grasp inherent complex structure of social networks. Many research activities attempt to define the notion of communities and propose community analysis algorithms [2], [1].

We implemented a fast community analysis algorithm proposed by Clauset, Newman, and Moore [1] (CNM algorithm) and applied it to analyze various subsets of an acquaintance relationship network obtained from a social networking system (SNS). It performs well for a mid-scale subset of the network that consists of less than 500,000 users it was incapable to analyze larger networks.

We observed that merging in coupling pair of community structures, balancedness of the pair has great impact on computational efficiency of CNM algorithm. Based on this observation, we have implemented a slightly modified versions of CNM algorithm and observed remarkable speedup and slight improvement of the modularity.

2 CNM Algorithm

Newman and Girvan attempt to measure the quality of network clustering by means of modularity [2]. Their algorithm (CNM algorithm) is a bottom-up agglomerative clustering which continuously finds and merges pairs of clusters trying to maximize modularity of the community structure in a greedy manner [1].

The authors have programed CNM algorithm and attempted to analyze an friendship network of an SNS called “mixi³” that hosted about one million users in October 2005. In spite of the good scalability as advertised in [1], we had to stop the experiment after a week when less than 10% of the whole analysis was finished.

To figure out the performance bottleneck, we conducted community analysis on a various subsets of mixi SNS network. The mixi SNS assigns each user an incrementa serial registration ID . Let us represent the mixi SNS network by a graph G_mixi= (U, F), where U= { 1, 2, … } is the set of user IDs and F⊂ U× Uis a set of friendship relationship. A subset of this graph can be given as follows:
G_mixiⁿ= (U_n, F ∩ (U_n × U_n)) where U_n= { u ∈ U | u≤n}

Figure 1 illustrates time required for analysis of various subsets: G_mixi^N. Each bar of the graph depicts time required to merge 10,000 community pairs. For most of the computation time is consumed for the first half of the merging process which decreases dramatically for the latter half.

[1] discusses the computational complexity of CNM algorithm can be approximated by O(n log²n), for nnodes but this approximation contracicts super quadratic cost illustrated in Figure 1.

3 Heuristics

4 Evaluation

	G_mixi^200K	G_mixi^400K	G_mixi^600K	G_mixi^800K	G_mixi^1M	G_mixi^4M
CNM	42.2	197	NA	NA	NA	NA
HE	2.15	6.80	13.6	24.5	36.2	243
HE'	8.52	35.5	68.2	124	173	3,400
HN	0.43	1.16	2.05	3.17	4.47	33.3

References

[1] A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70:066111, 2004.

[2] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004.

[3] Ken Wakita and Toshiyuki Tsurumi. Finding community structure in mega-scale social networks, February 2007, cs.CY/0702048.

^*1 A long version of this paper is available online.

^*2 This work is supported in part by the Grant-in Aid (No. 18300041) of MEXT, Japan.

^*3 mixi is the largest invitation-based SNS in Japan and it hosts about 7 million users (Feb. 2007).

Finding Community Structure in Mega-scale Social Networks [Extended Abstract]1, 2

Ken Wakita

Tokyo Institute of Technology Tokyo, Japan

wakita@is.titech.ac.jp

Toshiyuki Tsurumi

Tokyo Institute of Technology Tokyo, Japan

tsurumi2@is.titech.ac.jp

ABSTRACT

Categories & Subject Descriptors

General Terms

Keywords

1 Introduction

2 CNM Algorithm

3 Heuristics

4 Evaluation

References

Finding Community Structure in Mega-scale Social Networks [Extended Abstract]^{1, 2}

Tokyo Institute of Technology
Tokyo, Japan

Tokyo Institute of Technology
Tokyo, Japan