Google Könguló

Google Könguló

code.google.com

Könguló, the personal spider for Google Desktop Search

Overview

This plugin for Google Desktop Search is a simple web spider (Könguló is Icelandic for spider) that crawls websites you specify, e.g. intranet websites, and dumps them into Google Desktop Search so that you will see results from them when you perform a desktop search, and can browse their contents offline via GDS's cache.

Features include:

Follows links in HTML frame, image and anchor tags
Supports HTTP and HTTPS protocols, and HTML and plain text filetypes
Obeys robots.txt
Knows basic and digest HTTP authentication and allows you to specify your usernames and passwords for multiple resources
Can run in a loop, recrawling over previously crawled pages every X minutes
When recrawling, uses If-Modified-Since HTTP header to minimize transfers
You can specify a regular expression to limit crawls to e.g. your intranet domain

Bad things that would be nice to fix include:

No GUI, just your friend the command line
No persistence between sessions; it'd be nicer if the state of which pages have already been fetched, and their last-modified timestamp, were stored and reused next time
No support for form-based authentication.

Könguló is distributed under the terms of the BSD License.

For downloads, news, and other information, visit our Project Page

Example

This is by no means a complete example; it simply gives you a feel for what Könguló can do. Index your intranet wiki page one level deep: kongulo.exe --depth=1 http://mywiki/wiki-index.cgi Re-crawl every 30 minutes: kongulo.exe --loop --sleep=30 -d 1 http://mywiki/wiki-index.cgi

Download

For downloads, visit our Project Page

Installation

See the README file for installation instructions.

Documentation

README

News

Initial Launch, May 2005

See the Project Page for news archives.

Google Groups

Google-Desktop-Plugin-Kongulo
codesite-discuss -- general discussion

Bug reports and patches

Code.google.com is Google's open-source project, releasing useful pieces of Google software into the wild. Keep watching for more.