grarpamp at gmail.com
Thu Sep 9 23:08:43 PDT 2021
This one was easy enough write to fetch, parse, and
then fetch the real page data. GUI browsers do all that
behind the scenes.
Wget '-r' style spidering would be needed for users
to copy sites that use js/json frameworks sitewide.
I'll look into this thread to handle more complex sites,
and to keep from writing and maintaining site specific
Endless scroll, pages with 1MB of var encoded
obfuscated js spread across multiple script sources
just to display 100 lines of text... such wow.
Whatever happened to plain old html, some href
links, form fields to submit.
Ecom marketing cluttering search... :(
There might be use for a spider to stream everything
it's scraping in realtime into one formatted output pipe,
instead of first to disk then into the indexer.
Something like continuous WARC to stdout/pipe.
Parallel threads working different trees of the namespace.
Auto nudge in case one gets stuck in the weeds.
Seeding inventory requires the destination
market to provide api for loading, or the seeder
to write the shim loader, which you could then sell.
Connect the CLI to the darknets, one giant API :)
More information about the cypherpunks