
-----BEGIN PGP SIGNED MESSAGE----- On Mon, 8 Jul 1996, Igor Chudov @ home wrote:
Ben Holiday wrote:
If you have access to a shell, and to the news spool, you can generate some quick lists by hopping into the directory of any newsgroup that interests you and doing:
cat * | tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq > my-big-ol-wordlist
With most unixes that will generate an alphabetized list of all the unique words in your source text, converted to lowercase. I've had some problems with tr on a few machines, however. Adding a '-c' after 'uniq' will tell you how many times each word occured (useful for grepping out words that appear too infrequently, or too frequently) ..
Actually I am fairly sure that your selection of words will be mediocre at best. There are words (such as nethermost, insatiable, insufferable) that are almost never used in news.
According to Altavista: nethermost - 45 insatiable - 200 insufferable - 200 I know I have too much free time. - -- Mark =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= markm@voicenet.com | finger -l for PGP key 0xe3bf2169 http://www.voicenet.com/~markm/ | d61734f2800486ae6f79bfeb70f95348 "Freedom is the freedom to say that two plus two make four. If that is granted, all else follows." --George Orwell, _1984_ -----BEGIN PGP SIGNATURE----- Version: 2.6.3 Charset: noconv iQCVAwUBMeHp9bZc+sv5siulAQHjCgP6A/OuKaX/NwlkO9zhzbX2sBdKzajdKHHC FegZI5jIMd9hSFUb1iPUzw5H8YVaCQFDrighNnxLYvncAHB5dxAnRz52XjH4PFxj kDsH3CC3fN+x3Oh88HOwfcDKMiEAFbUkj+xSR5w6yxPt3mg9E27/xPef1Yg8bUWl gbsK/V0emcU= =Pr0B -----END PGP SIGNATURE-----