WebMoose is a web robot. WebMoose visits WWW sites and downloads their HTML pages. It processes the downloaded HTML and generates a database of statistics including: HTML keyword usage frequency page size server name and version When WebMoose notices a link, it tosses it in its database and visits the link sometime later. WebMoose tries not to flail on HTTP server sites—but because of the algorithm used, WebMoose might hit a particular site several times in a row before wandering off to another site.
You can't, just yet. This is mainly because WebMoose is still under development. Therefore, I haven't let it run quite long enough to get any meaningful results.
I do. My name is Mike Blaszczak. Visit my web page, if you're curious, or write to me directly.
I don't know. I'm writing it in my spare time, and I have plenty of other things to do. Some of those other things are actually more interesting than WebMoose, so it's possible that WebMoose may never be finished.
You can't. WebMoose isn't done, and even when it is done (if it is ever done), I might not release its source because there's just too much chance for abuse.
WebMoose itself runs on a 200 MHz Pentium Pro® system via an ISDN connection to The Microsoft Network. WebMoose was written using MFC 4.2 and Microsoft Visual C++ 4.2. WebMoose runs under Windows 95. WebMoose talks over a local Ethernet connection to a 90 MHz Pentium system running Windows NT Server 4.0. This box runs Microsoft SQL Server Version 6.0, and stores information about everything that WebMoose has found lately.
I generally run WebMoose for a few hours late on weekend evenings. I run WebMoose against a local web server to test it, so it doesn't often get out in public.
The Standard for Robot Exclusion gives web masters a chance at having web robots, like WebMoose, completely pass their site by. The standard is simple and flexible: it affords the server administrator a way to exclude robots by name, and to exclude robots from certain parts of their server. For now, WebMoose doesn't follow the standard but I'm working on it. (This is why, for now, I don't let the moose roam very far.) I'll probably implement handling of this standard before going much further with the development of the tool. The presence and absenece of a ROBOTS.TXT file, and the proper response to a request for such a file (whether it exists or not) are other statistics that WebMoose will keep.
In all HTTP requests it makes, WebMoose identifies itself with a User-Agent: header that looks like this: User-Agent: WebMoose/h.kk.bbbb Where h is a digit identifying the major version of the moose and kk is a pair of digits identifying the minor version of the moose. bbbb is a string of four digits identifying the build number of the moose. The build number increments each and every time the moose is recompiled, and that happens very often since WebMoose is still under development. At the exact time of this writing, WebMoose uses the string: User-Agent: WebMoose/0.01.0077 to identify itself. Undoubtedly, the last four digits have been incremented since I just thought of something else to fix.