This plugin for Google Desktop Search is a simple web spider
(Könguló is Icelandic for spider) that crawls websites you specify,
e.g. intranet websites, and dumps them into Google Desktop Search so
that you will see results from them when you perform a desktop search,
and can browse their contents offline via GDS's cache.
- Follows links in HTML frame, image and anchor tags
- Supports HTTP and HTTPS protocols, and HTML and plain text filetypes
- Obeys robots.txt
- Knows basic and digest HTTP authentication and allows you to specify your
usernames and passwords for multiple resources
- Can run in a loop, recrawling over previously crawled pages every
- When recrawling, uses If-Modified-Since HTTP header to minimize
- You can specify a regular expression to limit crawls to e.g. your
Bad things that would be nice to fix include:
- No GUI, just your friend the command line
- No persistence between sessions; it'd be nicer if the state of which pages
have already been fetched, and their last-modified timestamp, were stored
and reused next time
- No support for form-based authentication.
Könguló is distributed under the terms of the
For downloads, news, and other information, visit our
This is by no means a complete example; it simply gives you a
feel for what Könguló can do.
Index your intranet wiki page one level deep:
kongulo.exe --depth=1 http://mywiki/wiki-index.cgi
Re-crawl every 30 minutes:
kongulo.exe --loop --sleep=30 -d 1 http://mywiki/wiki-index.cgi
For downloads, visit our
See the README file for installation instructions.
See the Project
Page for news archives.
Bug reports and patches