Google N-Gram-Patterns seeks to build a co-occurrence network based on n-gram data provided by Google Inc.
Version: 1.0Google N-Gram-Patterns seeks to build a co-occurrence network based on n-gram data provided by Google Inc. This project presents an easy and fast way to analyze Google n-gram data, which is contributed by Google Inc.
Operating System: Linux
Google n-gram data consists of a huge amount of word information based on real life searching queries entered by internet users. The huge amount of data makes it so hard to analyze the whole data set. In this project, we present a possible parallel solution to build and access co-occurrence network using Google n-gram data.
Moreover, we use the co-occurrence network to find relationship (path) between words in this large corpus. We also build a common library based on C/MPI for all the similar co-occurrence network analysis programs. This method was tested on both Blade system and Altix system from MSI at University of Minnesota Twin City campus.