I’m thinking though if you had a model like that (theirs is not released yet, should have picked older paper, they of course describe how to train it though) you could use synthetic data to gently finetune it to simply give the magnitude or whatnot of the sound in many parallel recordings and then triangulate the location. you could alternatively make each receiver also an emitter and generate real world data to directly finetune location. making each receiver also an emitter rings of dystopic scenarios where mind control boss loudly tells everyone what to do all day. reminds one that microphones everywhere has an issue: these microphones can be accessed illegitimately to spy. of course everyone has a cell phone already, too, anyway, which has multiple microphones and speakers, so kind of a moot point. hey, if we made it for cell phones than people could decide individually whether to run it. and in what manner. maybe somebody could make one to detect how people behave when their cell phone is hacked ??? anyway it’s just an intro puzzle, like all of them