Go to Google
 Google Könguló

code.google.com

Project Page

Downloads

News

Könguló, the personal spider for Google Desktop Search

Overview

This plugin for Google Desktop Search is a simple web spider (Könguló is Icelandic for spider) that crawls websites you specify, e.g. intranet websites, and dumps them into Google Desktop Search so that you will see results from them when you perform a desktop search, and can browse their contents offline via GDS's cache.

Features include:

  • Follows links in HTML frame, image and anchor tags
  • Supports HTTP and HTTPS protocols, and HTML and plain text filetypes
  • Obeys robots.txt
  • Knows basic and digest HTTP authentication and allows you to specify your usernames and passwords for multiple resources
  • Can run in a loop, recrawling over previously crawled pages every X minutes
  • When recrawling, uses If-Modified-Since HTTP header to minimize transfers
  • You can specify a regular expression to limit crawls to e.g. your intranet domain

Bad things that would be nice to fix include:

  • No GUI, just your friend the command line
  • No persistence between sessions; it'd be nicer if the state of which pages have already been fetched, and their last-modified timestamp, were stored and reused next time
  • No support for form-based authentication.

Könguló is distributed under the terms of the BSD License.

For downloads, news, and other information, visit our Project Page

Example

This is by no means a complete example; it simply gives you a feel for what Könguló can do. Index your intranet wiki page one level deep: kongulo.exe --depth=1 http://mywiki/wiki-index.cgi Re-crawl every 30 minutes: kongulo.exe --loop --sleep=30 -d 1 http://mywiki/wiki-index.cgi
 

Download

For downloads, visit our Project Page

Installation

See the README file for installation instructions.

Documentation

README

News

  • Initial Launch, May 2005

See the Project Page for news archives.

Google Groups

Bug reports and patches


o ooO
Code.google.com is Google's open-source project, releasing useful pieces of Google software into the wild. Keep watching for more.
SourceForge.net Logo