Some idle comments on speech over data...
A little bit of context first, since this message is going out to two lists to save me explaining things twice: I posted on the cypherpunks list recently about a little hardware hack I had planned, asking for technical help (which I received - thanks, to those of you who sent schematics etc!) I also received a few mails from people who didn't know the background to what I was doing, who thought it was a fairly worthless exercise. So rather than explain again to everyone individually, I'll make this post to cypherpunks for everyone who is interested. Now, there's a second list I'm on - netphone@moink.nmsu.edu - which was set up some time ago as a discussion group for people working on various independent projects to do with speech over data, mainly so that we could get together and swap notes. The group went rather quiet a few months back after a post from Henning Schulzrinne telling us about nevot - a speech over internet project, that seemed pretty well advanced - so we all thought 'fine, this is being taken care of, there's less urgency for us to hack something in a hurry'. Also at the time there was a lot of talk about various people working on secure speech over modems using either the Zyxel modem with built-in codec, or using soundblaster cards. Unfortunately none of the people working in this area would stand up and be counted - all you'd hear would be pgp mail saying 'don't worry, it's happening, be cool, shut up, and don't rock the boat in public...' - for some reason the guys on this project seemed more paranoid than most. (The netphone group by the way was about *voice* over *net* - if someone happened to add encryption later, fine, but it wasn't in the groups stated goals, so we were all pretty open about what we were doing ourselves...) As far as i can see, nothing has actually happened on any of those secret projects - i suspect the problems were too difficult, or the kids doing them were just all talk. Anyway, nevot looked like the way things were going to happen - except that when you actually fetched the code, you discovered that unless you had hardware support, none of the sound compression schemes gave good enough performance for real-time speech over v32bis, especially since you also had the overhead of slip or ppp. And it was *very much* a Unix program, it'll take a long time to port it to DOS I suspect, yet DOS (unfortunately, but it's a fact of life) is what most of the harware out there is running... - very few of us can afford high-powered private sparcs, or 64K comms, which is what nevot et al need. (by 'et al', I actually mean Van Jacobson's 'vt', though I haven't actually seen that program yet, if it exists; I'm not sure of any other compressed-speech over net programs. 'netfone' isn't compressed and needs an ethernet) So, the situation is that we have a nice bit of research done on protocols and necessary housekeeping stuff like network lag recovery etc, and silence detection, but no systems that run on cheap hardware or v32bis modems, which many more of us have access to than networked unices. Well, one big problem now seems to be solved: Tony Robinson (on the netphone list; don't think he's a cypherpunk - pops up in comp.speech now and then) has written a new piece of ADPCM code with some new algorithms of his own, and he gets 8 bit sample -> 3 bits of compression with it, which is pretty damn good, but even better than that it's *fucking fast*... - orders of magnitude faster than real-time. Anyone who has played with the GSM 'toast' program or the CELP demo thats around somewhere will appreciate what that means... And of course the compression is sampling rate independent. So I reckon that taking something like /dev/audio's 8000s/sec and just averaging successive samples down to 4000s/sec will give us a baud rate that fits very nicely into v32bis, thank you very much. And I know that 4000s/sec is adequate for speech because I've run experiments and tried it - trivial linear interpolation between samples to scale it back up to 8000s/sec, and out to /dev/audio, is perfectly intelligible. Not *great*, but intelligible. I've passed this info on to Henning for nevot, if he's willing to try it, though I haven't received a reply. It's quite possible one of us is having mail problems. If there are any other nevot hackers here, and you want to have a go yourself, Tony's code is available from svr-ftp.eng.cam.ac.uk:comp.speech/sources/shorten-2.alpha.2.tar.Z I really urge anyone experimenting with sound compression to try it. Now, the next news: I told cypherpunks that my old uncle, who was an *electrical* engineer in the coal mines (ie high voltage stuff etc) is now taking night classes in electronics, just for fun - and was talking with me recently about suggestions for projects for his class... they're a bit short of ideas, and I'm *never* short of ideas :-) [so far they've been doing silly stuff like doorbell chimes, or a detector for finding mains cables in walls] So I suggested that he builds a cheap sound sampler that I can plug into the netphone project... Well, lots of people wrote to me saying that this was wasted effort since a mass-market solution should be based on available kit like a soundblaster. To which I don't disagree - it should, and it will; it's just that the old guy is going to build *something* and I'd rather it was useful. [Thanks to all the people who suggested various ISDN codecs by the way, and gave info on driving PC ports - any more info is still welcome. Faxes of datasheets wouldn't go amiss either! - it's hard for me to get that sort of stuff...] *but*... the scheme I'm going to use this sampler in *is* amenable to a mass-market solution. The netphone people have heard this already, but here it is again for the cypherpunks: <<<< start of repost from netphone list from 3 or 4 months ago: : OK, here's the philosophy... we should have separate bits of hardware for : each identifiable task, and string them together like unix pipes. That way : we minimise CPU overhead *and* allow any individual task to be replaced : by any hardware/software combination we already have that does the job. : : Specifically, I'm thinking of this: : : A) A cheap digitiser as suggested earlier. This has a mic socket for your : average cheap microphone as with any cassette recorder, and a parallel port : output that's compatible with bi-directional printer ports such as the one : on the IBMPC. In the middle is an a2d. Maybe uLaw, maybe not. : : B) We have a compression board that has a parallel input port (getting data : from the above) and a serial output port. In the middle is either a CPU : or custom hardware. Doesn't matter. : : C) We have a crypto board which has a serial input port and a serial output : port. In the middle is a CPU. Almost anything will do. : : With these, we can build any system we like. The three products are : independent, thus letting us develop them in parallel, and (C) is probably : just your own computer anyway. If you've got a really beefy CPU, all three : might be real unix software pipes... : : The compression board can be either a micro running ADPCM (like shorten V2) : or a DSP as discussed. I'd say we try both - you now have *all* the spec : you need to build it: centronix interface parallel in, taking bytes at : 8000/sec, and rs232 out, writing data at 9600bps. (leaving bandwidth for : network layer over v32bis slip or ppp...) Again, if you have a spare : machine, you could simply use a PC for this task as well. Would have to be : your best one, but fast 486's will cope with some of the good compression : schemes mentioned. : : The crypto board is an optional extra that doesn't affect the design of the : rest of the system at all and needn't be discussed. If you're a PC user and : don't have the CPU for crypto as well as everything else you're doing, borrow : another PC and have it run as a filter reading from COM1 and writing to : COM2... PCs are cheap and lying around all over the place. Or *any* old : computer you have with two serial ports. An Amiga or whatever old junk : you thought you'd never use again... : : So, in summary: you could implement this with existing kit *now* using : 1 average sparc for the 8000K samples, one good PC for the high-quality : compression, and one average PC for a crypto filter. Then feed the output : into your internet machine (in my case, another PC talking SLIP down a : v32bis modem) ... and then you replace the computers one by one as : you build the custom devices. : : Pipelined parallelism is the way to go folks. I'm convinced this is : the best way to get this project started. And the modular approach will : be really attractive to home hackers who get the design off the net or : more likely a magazine... not too much to get working at once - keeps them : interested. Hey *there's* a thought! Whatever happened to Steve 'Circuit : Cellar' Ciarcia? A three-part hardware series in Byte synchronised with : articles on Clipper and 'how you can do it yourself' :) would be one hell : of a coup... Anyone got good contacts at Byte? : : Graham : : PS *Do* we have any electronics hackers here, or should I go recruiting...? : PPS Speech *output* left as an exercise for the reader :)
end of repost
Since then, I've thought about the detailed design of the sampler part, and that's what I'm going to ask my uncle to build. In fact, I just went out and bought the $7 microphone today, and the kit box :-) For the benefit of the netphone list who didn't see my cypherpunks post, and to give a bit more detail to the c-p's who suggested there were fatal flaws in what i was thinking about, here's the hardware plan: There's a cheap microphone, and a d2a and a parallel port, and a timer. The PC reads from the printer port and fetches a byte when the status byte says 'ready'. The PC does *not* have to do accurate timing, or work under interrupt - all it has to do is make sure it's dealt with the sample before it's time for another one. The sampler hardware does the 8000/sec (or more likely 4000/sec, switchable) timing, and doesn't say 'data ready' after a read until that length of time has elapsed. I chose interruptless parallel rather than interrupt-driven serial because serial would either have to do expensive on-chip compression, or drive the PC at very high speed - and I know from extensive comms experience that PCs *really* can't stand up to much over 19.2K - low-end PCs *certainly* can't. With this scheme there's virtually no overhead at all in getting the data to the cpu - it's almost an idle process. Interrupt driven parallel wouldn't be too bad either, at 4000/sec, though it does make the code harder to write. I'm very wary of predicating a project on tricky PC assembler - it can't be easy, otherwise there'd already be code (for example) to do something like getting bytes in a /dev/audio-like manner from say a SoundBlaster card. And i haven't found it yet if there is (though someone mailed me today to say he thought he had a copy of such code somewhere) Anyway, the point is that this box can be replaced by a soundblaster or by reading from /dev/audio - it's just another pipe in the series, but having a design lets us push it in some electronics magazine and get public awareness up another notch. But it's no big deal; it's not critical to the project. To the people who said it really should be serial and compressed - yes, I agree - that's the *next* box - a little micro running Tony's code that has parallel in and serial out. Or that micro can be a PC as it stands - no problem. PC's are cheap and plentiful. And the next box takes that serial stream and encrypts it. That too can be a PC, or if you have a powerful PC, make it a process on the one that did the compression. And finally the serial data can be shoved down a modem directly, or you can use vt's protocols etc and send it over the net, or the same protocols over slip and send *that* down a modem. The modular approach lets you do all sorts of things. Anyway, I was fed up with people talking about this project but never seeing anything working (except for Henning's laudable efforts - shame about the low-end users) so I got off my butt and am doing something about it myself. If anyone else wants to join in, the tasks needing done are: * add Tony Robinson's lossy compression to nevot * get a nevot-compatible program running on DOS and Mac over SLIP (major project here, but I know at least one guy who's starting some work on a Mac project, and I want to twist his arm to make it nevot compatible rather than Mac-proprietary, if he's listening :-) ) * help me (in the next couple of weeks, since the old guy's class restarts soon) with details of chips that he can use in a sampler - I've been told that there are several 'combo codecs' or maybe isdn codecs that do almost all the work - if we can make something that's 100% data compatible (uLaw) with a Sun, so much the better! (makes for easier development cycle and testing over the internet) * make nevot baud-rate/lag adaptive so that it works when scaled down to 14.4Kbaud and below (say when modems adapt to noise and run at 9600 - no problem, adjust the sample rate to 3000s/sec or whatever as appropriate) * experiment with crude zero-crossing algorithms (the kind they used in kiddy micros 15 years ago with 1-bit speakers) to hack a *really* low baud rate fallback algorithm to add to the protocols in nevot so you can *guarantee* some speech getting through under even the worst conditions. (We're talking around 4800baud here folks... maybe even 2400 if in dire straights - there may be times when getting the info over is more important than sounding like a dying dalek...) [btw, the zero-cossing stuff is also sometimes known as 'time encoded speech'] * hack up a much cruder system than nevot, which works in half-duplex mode, for low-powered systems that can't do incoming and outgoing speech compression/decompression at once - make it a sort of old- fashioned ham-radio interface, where you do the equivalent of 'over' at the end of an utterance, and the whole lot is sent, stored, and played back at the right speed, even if transmission over the medium isn't fast enough to keep up with speech. Such a program would *guarantee* that even the world's slowest modem would still allow crypto speech, even though the interface would take some getting used to for modern kids who never had the pleasure of half-duplex comms :-) This system needn't assume any specific underlying protocol - udp, tcp/ip, appletalk, whatever - just treat the comms medium as an error-corrected byte-stream and use what's available. Ie it'll work even if all you have is a 2400bd v42 modem... Oh, and make this code *portable* - the only device dependent bit it needs is 'put byte to comms port' and 'get byte from comms port' - you shouldn't even need to poll the port to see if data is ready, if you do it properly - remember, it's half-duplex: put stuff in the protocol you invent to turn the line around... Pretend it's very fast turnaround voice-mail if that makes you feel any better about it ;-) -- That's about it for now. Actually I'm getting married in a couple of days (to another cypherpunk as it happens) so I'll be mostly off the net for two weeks, but please write if you've anything to say and I'll answer all your mail as soon as I'm back on line. thanks for reading all this! Graham PS If anyone has comments to make to everyone rather than to me, the cypherpunks list is cypherpunks@toad.com and the netphone list is netphone@moink.nmsu.edu - I recommend technical discussions to the latter and general comments or politics or questions to the former.
participants (1)
-
gtoal@gtoal.com