[ot][spam][random][crazy][random][crazy]

Sat Nov 12 14:49:57 PST 2022

ok, to debug this, i gotta understand it again !!!!

i'll seed numpy's random number generator so the numbers are deterministic.

 16 B   def test():
 17  ->     np.random.seed(0)
 18         randvec = np.random.random(16)

(Pdb) n
> /shared/src/scratch/fourier.py(18)test()
-> randvec = np.random.random(16)
(Pdb)
> /shared/src/scratch/fourier.py(19)test()
-> ift16 = create_freq2time(1, 1, 16, 16)
(Pdb) p randvec
array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ,
       0.64589411, 0.43758721, 0.891773  , 0.96366276, 0.38344152,
       0.79172504, 0.52889492, 0.56804456, 0.92559664, 0.07103606,
       0.0871293 ])

The failing test is treating randvec as a frequency vector, so I think
that's a 0.5488 magnitude DC offset.

 27
 28         # sample data at a differing rate
 29 B       time_rate = np.random.random() * 2
 30         freq_rate = 1.0
 31         freqs = np.fft.fftfreq(len(randvec))
 32  ->     rescaling_ift = create_freq2time(freq_rate, time_rate,
len(randvec), len(randvec))
 33         rescaling_ft = create_time2freq(time_rate, freq_rate,
len(randvec), len(randvec))

(Pdb) p time_rate
0.04043679488065144

Oh that's very slow. It's interesting that it should still be able to
extract the wave, since it's all simulated math. I wonder what that
means for physical systems: are signals _really_ made of _precise_
sinusoids? If they are, you should be able to extract them at a
samplerate that is 2000% wrong, with only a small number of samples! I
guess you'd need incredibly good measuring equipment.

So I'm going to step into freq2time and look at how the matrix is
generated. Following lines in the code generate the data manually, and
that is a big candidate for mistakes, since it's otherwise untested.

(Pdb) list
  1     import numpy as np
  2
  3     def create_freq2time(freq_rate, time_rate, freq_count, time_count):
  4         freqs = np.fft.fftfreq(freq_count)
  5         offsets = np.arange(freq_count) * freq_rate / time_rate
  6  ->     mat = np.exp(2j * np.pi * np.outer(freqs, offsets))
  7         return mat / freq_count # scaled to match numpy convention
  8
  9     def create_time2freq(time_rate, freq_rate, time_count, freq_count):
 10         if time_count != freq_count:
 11             raise NotImplementedError("differing input and output
samplecounts")
(Pdb) p freqs
array([ 0.    ,  0.0625,  0.125 ,  0.1875,  0.25  ,  0.3125,  0.375 ,
        0.4375, -0.5   , -0.4375, -0.375 , -0.3125, -0.25  , -0.1875,
       -0.125 , -0.0625])
(Pdb) p offsets
array([  0.        ,  24.72995209,  49.45990418,  74.18985626,
        98.91980835, 123.64976044, 148.37971253, 173.10966462,
       197.83961671, 222.56956879, 247.29952088, 272.02947297,
       296.75942506, 321.48937715, 346.21932924, 370.94928132])

Since the time samplerate is very low, it makes sense that the offsets
used to evaluate the frequencies are very large. That much looks
right.

Time to think about the matrix. The frequencies are the first
dimension (the larger one), and the offsets are the second dimension
(the smaller one). So the frequencies will be the .... rows. And the
offsets will be the ... columns? I think? Yes, moving inside the row
is the smaller dimension, which is the columns.

So visually, looking at the mat, it will be laid out like the actual
signal, with time going from left to right, and the vertical parts
will be the frequencies, with DC at the top.

(Pdb) n
> /shared/src/scratch/fourier.py(7)create_freq2time()
-> return mat / freq_count # scaled to match numpy convention
(Pdb) p mat / freq_count
array([[ 0.0625    +0.j        ,  0.0625    +0.j        ,
         0.0625    +0.j        ,  0.0625    +0.j        ,
         0.0625    +0.j        ,  0.0625    +0.j        ,
         0.0625    +0.j        ,  0.0625    +0.j        ,
         0.0625    +0.j        ,  0.0625    +0.j        ,
         0.0625    +0.j        ,  0.0625    +0.j        ,
         0.0625    +0.j        ,  0.0625    +0.j        ,
         0.0625    +0.j        ,  0.0625    +0.j        ],
       [ 0.0625    +0.j        , -0.05994975-0.01767137j,
         0.05250712+0.03390062j, -0.04077949-0.04736331j,
         0.02572393+0.05696077j, -0.0085691 -0.06190978j,
        -0.00928505+0.06180645j,  0.02638146-0.05665923j,
        -0.04132493+0.04688816j,  0.05289596-0.03329065j,
        -0.06015026+0.01697635j,  0.06249581+0.00072336j,
        -0.05974121-0.01836403j,  0.05211125+0.03450605j,
        -0.04022859-0.04783211j,  0.02506296+0.05725468j],

There's the first row of DC components, and the second row handling
the first frequency, which is 0.0625, or 1/16th of a cycle/sample.

I'm immediately noticing that I divided by freq_count. I don't believe
I did that in my comparison code.

What does it mean, here? This is a matrix for converting from
frequency space, to time space. When rendering a wave, this matrix is
multiplied. That means summing all the frequency components.

So, if I am rendering a wave manually, I would multiply all the
sinusoids by randvec, and then take the _average_ of the result, not
the sum. Because of the numpy convention of storing spectrums scaled
by the sample count.

Since my random number generator is seeded to 0 now, I can halt the
debugging, add the change, and resume.

It passes now. The functions didn't need any changes, I don't believe.

Here's the new test:
    # sample data at a differing rate
    time_rate = np.random.random() * 2
    freq_rate = 1.0
    freqs = np.fft.fftfreq(len(randvec))
    rescaling_ift = create_freq2time(freq_rate, time_rate,
len(randvec), len(randvec))
    rescaling_ft = create_time2freq(time_rate, freq_rate,
len(randvec), len(randvec))
    rescaled_time_data = np.array([
        np.mean([
            randvec[freqidx] * np.exp(2j * np.pi * freqs[freqidx] *
sampleidx / time_rate)
            for freqidx in range(len(randvec))
        ])
        for sampleidx in range(len(randvec))
    ])
    assert np.allclose(rescaled_time_data, randvec @ rescaling_ift)
    unscaled_freq_data = rescaled_time_data @ rescaling_ft
    unscaled_time_data = unscaled_freq_data @ ift16
    assert np.allclose(unscaled_freq_data, randvec)
    assert np.allclose(unscaled_time_data, randvec2time)

So, next would be trying to extract my funny-sampled random data. I am
hoping it can be exactly recovered!