Large-Scale Graph Approximation and Clustering
We consider exploiting data’s structure for sparse graph approximations. Graphs play increasingly important roles in learning problems: manifold learning, kernel learning, and spectral clustering. Specifically, in this paper, we concentrate on nearest neighbor sparse graphs which are widely adopted in learning problems due to its spatial efficiency. Nonetheless, we raise an even challenging problem: can we save more memory space while keep competitive performance for the sparse graph? To this end, first, we propose to partition the entire graph into intra- and inter-graphs by exploring the graph structure, and use both of them for graph approximation. Therefore, neighborhood similarities within each cluster, and between different clusters are well preserved. Second, we improve the space use of the entire approximation algorithm. Specially, a novel sparse inter-graph approximation algorithm is proposed, and corresponding approximation error bound is provided. Third, extensive experiments are conducted on 11 real-world datasets, ranging from small- to large-scales, demonstrating that when using less space, our approximate graph can provide comparable or even better performance, in terms of approximation error, and clustering accuracy. In large-scale test, we use less than 1/100 memory of comparable algorithms, but achieve very appealing results.
- Ming Shao, Xindong Wu, and Yun Fu, Scalable Nearest Neighbor Sparse Graph Approximation by Exploiting Graph Structure, IEEE Transactions on Big Data (TBD), vol. 2, no. 4, pages 365–380, 2016. [pdf] [bib]
Fast Deep Clustering
Clustering has been one of the most critical unsupervised learning techniques that has been widely applied in data mining problems. As one of its branches, graph clustering enjoys its popularity due to its appealing performance and strong theoretical supports. However, the eigen-decomposition problems involved are computationally expensive. In this paper, we propose a deep structure with a linear coder as the building block for fast graph clustering, called Deep Linear Coding (DLC). Different from conventional coding schemes, we jointly learn the feature transform function and discriminative codings, and guarantee that the learned codes are robust in spite of local distortions. In addition, we use the proposed linear coders as the building blocks to formulate a deep structure to further reﬁne features in a layerwise fashion. Extensive experiments on clustering tasks demonstrate that our method performs well in terms of both time com-plexity and clustering accuracy. On a large-scale benchmark dataset (580K), our method runs 1500 times faster than the original spectral clustering.