whitehouse.gov/robots.txt

Major Variola (ret) mv at cdc.gov
Thu Dec 11 09:49:32 PST 2003


I'd suggest "wget" for spidering sites.  It can be told to ignore
.robots files.  It is
good for mirroring sites which you suspect may be taken down.  Win/Unix
versions
available.





More information about the cypherpunks-legacy mailing list