ok, to debug this, i gotta understand it again !!!! i'll seed numpy's random number generator so the numbers are deterministic. 16 B def test(): 17 -> np.random.seed(0) 18 randvec = np.random.random(16) (Pdb) n
/shared/src/scratch/fourier.py(18)test() -> randvec = np.random.random(16) (Pdb) /shared/src/scratch/fourier.py(19)test() -> ift16 = create_freq2time(1, 1, 16, 16) (Pdb) p randvec array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 , 0.64589411, 0.43758721, 0.891773 , 0.96366276, 0.38344152, 0.79172504, 0.52889492, 0.56804456, 0.92559664, 0.07103606, 0.0871293 ])
The failing test is treating randvec as a frequency vector, so I think that's a 0.5488 magnitude DC offset. 27 28 # sample data at a differing rate 29 B time_rate = np.random.random() * 2 30 freq_rate = 1.0 31 freqs = np.fft.fftfreq(len(randvec)) 32 -> rescaling_ift = create_freq2time(freq_rate, time_rate, len(randvec), len(randvec)) 33 rescaling_ft = create_time2freq(time_rate, freq_rate, len(randvec), len(randvec)) (Pdb) p time_rate 0.04043679488065144 Oh that's very slow. It's interesting that it should still be able to extract the wave, since it's all simulated math. I wonder what that means for physical systems: are signals _really_ made of _precise_ sinusoids? If they are, you should be able to extract them at a samplerate that is 2000% wrong, with only a small number of samples! I guess you'd need incredibly good measuring equipment. So I'm going to step into freq2time and look at how the matrix is generated. Following lines in the code generate the data manually, and that is a big candidate for mistakes, since it's otherwise untested. (Pdb) list 1 import numpy as np 2 3 def create_freq2time(freq_rate, time_rate, freq_count, time_count): 4 freqs = np.fft.fftfreq(freq_count) 5 offsets = np.arange(freq_count) * freq_rate / time_rate 6 -> mat = np.exp(2j * np.pi * np.outer(freqs, offsets)) 7 return mat / freq_count # scaled to match numpy convention 8 9 def create_time2freq(time_rate, freq_rate, time_count, freq_count): 10 if time_count != freq_count: 11 raise NotImplementedError("differing input and output samplecounts") (Pdb) p freqs array([ 0. , 0.0625, 0.125 , 0.1875, 0.25 , 0.3125, 0.375 , 0.4375, -0.5 , -0.4375, -0.375 , -0.3125, -0.25 , -0.1875, -0.125 , -0.0625]) (Pdb) p offsets array([ 0. , 24.72995209, 49.45990418, 74.18985626, 98.91980835, 123.64976044, 148.37971253, 173.10966462, 197.83961671, 222.56956879, 247.29952088, 272.02947297, 296.75942506, 321.48937715, 346.21932924, 370.94928132]) Since the time samplerate is very low, it makes sense that the offsets used to evaluate the frequencies are very large. That much looks right. Time to think about the matrix. The frequencies are the first dimension (the larger one), and the offsets are the second dimension (the smaller one). So the frequencies will be the .... rows. And the offsets will be the ... columns? I think? Yes, moving inside the row is the smaller dimension, which is the columns. So visually, looking at the mat, it will be laid out like the actual signal, with time going from left to right, and the vertical parts will be the frequencies, with DC at the top. (Pdb) n
/shared/src/scratch/fourier.py(7)create_freq2time() -> return mat / freq_count # scaled to match numpy convention (Pdb) p mat / freq_count array([[ 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j , 0.0625 +0.j ], [ 0.0625 +0.j , -0.05994975-0.01767137j, 0.05250712+0.03390062j, -0.04077949-0.04736331j, 0.02572393+0.05696077j, -0.0085691 -0.06190978j, -0.00928505+0.06180645j, 0.02638146-0.05665923j, -0.04132493+0.04688816j, 0.05289596-0.03329065j, -0.06015026+0.01697635j, 0.06249581+0.00072336j, -0.05974121-0.01836403j, 0.05211125+0.03450605j, -0.04022859-0.04783211j, 0.02506296+0.05725468j],
There's the first row of DC components, and the second row handling the first frequency, which is 0.0625, or 1/16th of a cycle/sample. I'm immediately noticing that I divided by freq_count. I don't believe I did that in my comparison code. What does it mean, here? This is a matrix for converting from frequency space, to time space. When rendering a wave, this matrix is multiplied. That means summing all the frequency components. So, if I am rendering a wave manually, I would multiply all the sinusoids by randvec, and then take the _average_ of the result, not the sum. Because of the numpy convention of storing spectrums scaled by the sample count. Since my random number generator is seeded to 0 now, I can halt the debugging, add the change, and resume. It passes now. The functions didn't need any changes, I don't believe. Here's the new test: # sample data at a differing rate time_rate = np.random.random() * 2 freq_rate = 1.0 freqs = np.fft.fftfreq(len(randvec)) rescaling_ift = create_freq2time(freq_rate, time_rate, len(randvec), len(randvec)) rescaling_ft = create_time2freq(time_rate, freq_rate, len(randvec), len(randvec)) rescaled_time_data = np.array([ np.mean([ randvec[freqidx] * np.exp(2j * np.pi * freqs[freqidx] * sampleidx / time_rate) for freqidx in range(len(randvec)) ]) for sampleidx in range(len(randvec)) ]) assert np.allclose(rescaled_time_data, randvec @ rescaling_ift) unscaled_freq_data = rescaled_time_data @ rescaling_ft unscaled_time_data = unscaled_freq_data @ ift16 assert np.allclose(unscaled_freq_data, randvec) assert np.allclose(unscaled_time_data, randvec2time) So, next would be trying to extract my funny-sampled random data. I am hoping it can be exactly recovered!