[ot][spam][crazy] Being a tiny part of a goal

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Sat Apr 2 06:38:08 PDT 2022


Cross-posts happened to other thread.

This looks useful:

https://github.com/google/sentencepiece/blob/master/python/README.md#training-without-local-filesystem

import urllib.request
import io
import sentencepiece as spm

# Loads model from URL as iterator and stores the model to BytesIO.
model = io.BytesIO()
with urllib.request.urlopen(
    'https://raw.githubusercontent.com/google/sentencepiece/master/data/botchan.txt'
) as response:
  spm.SentencePieceTrainer.train(
      sentence_iterator=response, model_writer=model, vocab_size=1000)

# Serialize the model as file.
# with open('out.model', 'wb') as f:
#   f.write(model.getvalue())

# Directly load the model from serialized model.
sp = spm.SentencePieceProcessor(model_proto=model.getvalue())
print(sp.encode('this is test'))
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 1822 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220402/d204f02b/attachment.txt>


More information about the cypherpunks mailing list