[ot][spam][crazy] Being a tiny part of a goal

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Sat Apr 2 07:14:20 PDT 2022


spm.SentencePieceTrainer.train can take a sentence_iterator keyword
parameter (kwarg)

this likely iterates over sentences to train on.

they may need linebreaks to easily include linebreaks, unknown.

so some data could help.

we are taxed. this reason to use data manually transcribed by others. but
real reason is it is just the tokenizer, so extra information doesn't flow
through model in certain way.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 617 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220402/6c8e3a21/attachment.txt>


More information about the cypherpunks mailing list