"WebGraph is a software framework designed for analyzing the web graph." (10 words)
1. Codes - this set of flat codes are optimized for storing web graphs and integers with power-law distribution within certain exponent ranges. We offer both empirical testing and thorough mathematical analysis to demonstrate their efficiency.
2. Compression algorithms - WebGraph has developed top-notch compression algorithms that include gap compression, referentiation (la LINK), intervalisation, and codes. These algorithms enable high compression ratios, such as in the case of the compressed snapshot from UbiCrawler's .uk domain of about 18,500,000 pages that is compressed at 2.22 bits per link. The algorithms offer several parameters, which allow users to achieve different tradeoffs between compression ratio and access speed.
3. Access algorithms - WebGraph's access algorithms provide users with the luxury of accessing a compressed graph without necessarily decompressing it. We utilize lazy techniques that hold off on decompression until it is absolutely necessary.
4. Implementation of algorithms in Java - our platform includes a fully documented and complete package of these algorithms implemented in the Java programming language. This package has a well-defined API and includes several classes that allow for modifying and recompressing a graph to test different settings.
5. Data sets - WebGraph provides very large data sets, including those with a billion links. These sets are either from public sources such as WebBase or produced by WebGraph's UbiCrawler.
Using WebGraph is a breeze. The package is easily installed, and data sets are quickly downloaded. Even with as little as 256 Mbytes of RAM, a user can easily access and analyze a vast web graph. WebGraph's ease of use makes studying phenomena, such as PageRank and the properties of the web graph's distribution, both enjoyable and efficient. With everything WebGraph has to offer, it is an exceptional choice for any individual looking to analyze and manage web graphs.
Version 2.4.2: N/A