[ot][spam][crazy] Quickly autotranscribing xkcd 4/1 correctly
Undiscussed Horrific Abuse, One Victim of Many
gmkarl at gmail.com
Sat Apr 2 02:30:37 PDT 2022
Here is the python documentation on the soundfile.read function. We'd
be reading in chunks that are appropriately-sized for the transformer
model in use. We'd be resampling them to match the bitrate the model
expects.
Help on function read in module soundfile:
read(file, frames=-1, start=0, stop=None, dtype='float64',
always_2d=False, fill_value=None, out=None, samplerate=None,
channels=None, format=None, subtype=None, endian=None, closefd=True)
Provide audio data from a sound file as NumPy array.
By default, the whole file is read from the beginning, but the
position to start reading can be specified with `start` and the
number of frames to read can be specified with `frames`.
Alternatively, a range can be specified with `start` and `stop`.
If there is less data left in the file than requested, the rest of
the frames are filled with `fill_value`.
If no `fill_value` is specified, a smaller array is returned.
Parameters
----------
file : str or int or file-like object
The file to read from. See :class:`SoundFile` for details.
frames : int, optional
The number of frames to read. If `frames` is negative, the whole
rest of the file is read. Not allowed if `stop` is given.
start : int, optional
Where to start reading. A negative value counts from the end.
stop : int, optional
The index after the last frame to be read. A negative value
counts from the end. Not allowed if `frames` is given.
dtype : {'float64', 'float32', 'int32', 'int16'}, optional
Data type of the returned array, by default ``'float64'``.
Floating point audio data is typically in the range from
``-1.0`` to ``1.0``. Integer data is in the range from
``-2**15`` to ``2**15-1`` for ``'int16'`` and from ``-2**31`` to
``2**31-1`` for ``'int32'``.
.. note:: Reading int values from a float file will *not*
scale the data to [-1.0, 1.0). If the file contains
``np.array([42.6], dtype='float32')``, you will read
``np.array([43], dtype='int32')`` for ``dtype='int32'``.
Returns
-------
audiodata : numpy.ndarray or type(out)
A two-dimensional (frames x channels) NumPy array is returned.
If the sound file has only one channel, a one-dimensional array
is returned. Use ``always_2d=True`` to return a two-dimensional
array anyway.
If `out` was specified, it is returned. If `out` has more
frames than available in the file (or if `frames` is smaller
than the length of `out`) and no `fill_value` is given, then
only a part of `out` is overwritten and a view containing all
valid frames is returned.
samplerate : int
The sample rate of the audio file.
Other Parameters
----------------
always_2d : bool, optional
By default, reading a mono sound file will return a
one-dimensional array. With ``always_2d=True``, audio data is
always returned as a two-dimensional array, even if the audio
file has only one channel.
fill_value : float, optional
If more frames are requested than available in the file, the
rest of the output is be filled with `fill_value`. If
`fill_value` is not specified, a smaller array is returned.
out : numpy.ndarray or subclass, optional
If `out` is specified, the data is written into the given array
instead of creating a new array. In this case, the arguments
`dtype` and `always_2d` are silently ignored! If `frames` is
not given, it is obtained from the length of `out`.
samplerate, channels, format, subtype, endian, closefd
See :class:`SoundFile`.
More information about the cypherpunks
mailing list