TY - GEN
T1 - Distributed query processing on compressed graphs using K2-trees
AU - Álvarez-García, Sandra
AU - Brisaboa, Nieves R.
AU - Gómez-Pantoja, Carlos
AU - Marin, Mauricio
PY - 2013/1/1
Y1 - 2013/1/1
N2 - Compact representation of Web and social graphs can be made efficiently with the K2-tree as it achieves compression ratios about 5 bits per link for web graphs and about 20 bits per link for social graphs. The K 2-tree also enables fast processing of relevant queries such as direct and reverse neighbours in the compressed graph. These two properties make the K2-tree suitable for inclusion in Web search engines where it is necessary to maintain very large graphs and to process on-line queries on them. Typically these search engines are deployed on dedicated clusters of distributed memory processors wherein the data set is partitioned and replicated to enable low query response time and high query throughput. In this context a practical strategy is simply to distribute the data on the processors and build local data structures for efficient retrieval in each processor. However, the way the data set is distributed on the processors can have a significant impact in performance. In this paper, we evaluate a number of data distribution strategies which are suitable for the K2-tree and identify the alternative with the best general performance. In our study we consider different data sets and focus on metrics such as overall compression ratio and parallel response time for retrieving direct and reverse neighbours.
AB - Compact representation of Web and social graphs can be made efficiently with the K2-tree as it achieves compression ratios about 5 bits per link for web graphs and about 20 bits per link for social graphs. The K 2-tree also enables fast processing of relevant queries such as direct and reverse neighbours in the compressed graph. These two properties make the K2-tree suitable for inclusion in Web search engines where it is necessary to maintain very large graphs and to process on-line queries on them. Typically these search engines are deployed on dedicated clusters of distributed memory processors wherein the data set is partitioned and replicated to enable low query response time and high query throughput. In this context a practical strategy is simply to distribute the data on the processors and build local data structures for efficient retrieval in each processor. However, the way the data set is distributed on the processors can have a significant impact in performance. In this paper, we evaluate a number of data distribution strategies which are suitable for the K2-tree and identify the alternative with the best general performance. In our study we consider different data sets and focus on metrics such as overall compression ratio and parallel response time for retrieving direct and reverse neighbours.
UR - http://www.scopus.com/inward/record.url?scp=84893920259&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-02432-5_32
DO - 10.1007/978-3-319-02432-5_32
M3 - Conference contribution
AN - SCOPUS:84893920259
SN - 9783319024318
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 298
EP - 310
BT - String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Proceedings
PB - Springer Verlag
T2 - 20th International Symposium on String Processing and Information Retrieval, SPIRE 2013
Y2 - 7 October 2013 through 9 October 2013
ER -