=== Könguló, the personal spider for Google Desktop Search === --- Overview --- This plugin for Google Desktop Search is a simple web spider (Könguló is Icelandic for spider) that crawls websites you specify, e.g. intranet websites, and dumps them into GDS so that you will see results from them when you perform a desktop search, and can browse their contents offline via GDS's cache. Features include: * Follows links in HTML frame, image and anchor tags * Supports HTTP and HTTPS protocols, and HTML and plain text filetypes * Obeys robots.txt * Knows basic and digest HTTP authentication and allows you to specify your usernames and passwords for multiple resources * Can run in a loop, recrawling over previously crawled pages every X minutes * When recrawling, uses If-Modified-Since HTTP header to minimize transfers * You can specify a regular expression to limit crawls to e.g. your intranet domain Bad things that would be nice to fix include: * No GUI, just your friend the command line * No persistence between sessions; it'd be nicer if the state of which pages have already been fetched, and their last-modified timestamp, were stored and reused next time * No support for form-based authentication. --- Installation --- To package Kongulo as a Windows executable, do: python setup.py py2exe This will create a 'dist' subdirectory with (amongst other things) a file named 'kongulo.exe'. This is the executable file you run from the command line. The first time you run kongulo.exe, Google Desktop Search will prompt you to ask if you would like to install this plugin. --- Usage --- PLEASE NOTE: Indiscriminate web crawling can put a large load on web servers. Please limit the use of this and other such tools to small sets of web pages, preferably only those on your intranet. To get usage help, run kongulo.exe -h Examples: * To crawl your (hypothetical) intranet Wiki, you might point Könguló to the Wiki's index page and tell it to crawl one level deep: kongulo.exe --depth=1 http://mywiki/wiki-index.cgi * To make Könguló check the Wiki for changes every 30 minutes, you could do this: kongulo.exe --loop --sleep=30 -d 1 http://mywiki/wiki-index.cgi * Now let's imagine your Wiki requires login, and your username is 'joi'. To have Könguló prompt you for your password and use that to log in so that it can crawl the Wiki, you could do this: kongulo.exe --passwords=joi@mywiki -d 1 http://mywiki/wiki-index.cgi Note that you need to rerun Könguló (or use the --loop parameter to have it run constantly) if you want new changes to your website to be picked up and added to the Google Desktop Search index. --- This & That --- This module requires Python 2.4. It also requires the win32all extensions for Windows, and will not function unless Google Desktop Search 1.0 or later is installed on the machine. This README file is all the user documentation currently available for Könguló. Documentation for developers can be found in the source file, kongulo.py License: BSD URL: http://code.google.com SF Project Page: https://sourceforge.net/projects/goog-kongulo/ Email: opensource@google.com