Arch is an intranet search extension of Apache Nutch. It is designed to improve existing corporate search engines with a highly scalable platform. Arch includes blind test evaluation tools to ensure its effectiveness.
Arch introduces a novel method that delivers high precision search results which work greatly. It solves this fundamental problem and finally puts an end to low-quality intranet search results. To back up our claims, blind test evaluation tools are included in the software. The software allows you to deploy Arch and compare its performance to your current search engine and/or Google using a blind test methodology.
Apart from the impressive search quality, Arch has all the essential features critical to corporate environments. Document-level security is provided so that users can only access documents that they are authorized to view. Inexpensive index updates are possible, and Arch can keep indexes up-to-date without regular site recrawling. It provides 24/7 availability, meaning there's always a working index available even if a crawl fails. Arch supports simultaneous indexing and search of multiple websites and offers the ability to search and administer any site independently if needed. Dynamic adding and removal of web sites is easy too.
Other features include a great faceted search "out of the box," an automatically generated site directory, low-cost support after deployment, dual interface (PHP and Java) for easy deployment and customization, an extensive and extensible set of parsers for parsing various file formats (HTML, PHP, PDF, MS Office, Open Office, and more), a modular and plugin-based architecture that can be customized and extended easily, and last but not least, its source code is included.
Arch prides itself on high performance and scalability, and it can run on computer clusters to index significantly extensive data sets. If you're tired of your current corporate search engine and all the problems that come with it, Arch is definitely worth checking out.
Version 1.9.2: Improved document parsing, ported on Nutch 1.9.