Könguló, the personal spider for Google Desktop Search


This plugin for Google Desktop Search is a simple web spider (Könguló is Icelandic for spider) that crawls websites you specify, e.g. intranet websites, and dumps them into Google Desktop Search so that you will see results from them when you perform a desktop search, and can browse their contents offline via GDS's cache.

Features include:

  • Follows links in HTML frame, image and anchor tags
  • Supports HTTP and HTTPS protocols, and HTML and plain text filetypes
  • Obeys robots.txt
  • Knows basic and digest HTTP authentication and allows you to specify your usernames and passwords for multiple resources
  • Can run in a loop, recrawling over previously crawled pages every X minutes
  • When recrawling, uses If-Modified-Since HTTP header to minimize transfers
  • You can specify a regular expression to limit crawls to e.g. your intranet domain

Bad things that would be nice to fix include:

  • No GUI, just your friend the command line
  • No persistence between sessions; it'd be nicer if the state of which pages have already been fetched, and their last-modified timestamp, were stored and reused next time
  • No support for form-based authentication.

Könguló is distributed under the terms of the BSD License.

This is by no means a complete example; it simply gives you a feel for what Könguló can do. Index your intranet wiki page one level deep: kongulo.exe --depth=1 http://mywiki/wiki-index.cgi Re-crawl every 30 minutes: kongulo.exe --loop --sleep=30 -d 1 http://mywiki/wiki-index.cgi


See the README file for installation instructions.




  • Initial Launch, May 2005

Google Groups

Bug reports and patches

o ooO
