[ot][spam][crazy] Quickly autotranscribing xkcd 4/1 correctly

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Sat Apr 2 02:22:57 PDT 2022


here is example code for speech transcription from the huggingface
docs. note that this approach is for processing short chunks of
speech, not a long recording. the dataset obscures how the data is
provided to the model, but it is going to be just some kind of array
of numbers.

import torch
from transformers import Speech2TextProcessor,
Speech2TextForConditionalGeneration
from datasets import load_dataset
import soundfile as sf

model = Speech2TextForConditionalGeneration.from_pretrained("facebook/s2t-small-librispeech-asr")
processor = Speech2TextProcessor.from_pretrained("facebook/s2t-small-librispeech-asr")

def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


ds = load_dataset("hf-internal-testing/librispeech_asr_dummy",
"clean", split="validation")
ds = ds.map(map_to_array)

inputs = processor(ds["speech"][0], sampling_rate=16_000, return_tensors="pt")
generated_ids = model.generate(input_ids=inputs["input_features"],
attention_mask=inputs["attention_mask"])

transcription = processor.batch_decode(generated_ids)


More information about the cypherpunks mailing list