dev.nlited.com

>>

Recording Audio

next >>>>

2016-02-16 23:27:52 chip Page 1568 📢 PUBLIC

Feb 16 2016


Capturing audio and encoding to MP3 using only javascript.

I wanted to record audio from a web page. I had assumed this would require some sort of java or (shudder) ActiveX plug-in, but found that HTML5 now includes a "UserMedia" interface that allows the browser and javascript to capture audio on the user's machine. I found an example (See ) that pointed me in the right direction and I was able to write an audio capture and mp3 encoder using only javascript.

TL;DR

I wrote my own PCM audio recorder and joined it to the lamejs library. The finished code is here: mp3.htm. A demo page is here.

WebRTC currently only works with FireFox. Temasys provides a plugin for IE and Safari, but I haven't tried it yet.


Gory Details: ImOk

I created a simple page to experiment with audio capture and encoding. To keep things simple, I am assuming the audio is 1 channel at 44.1KHz sample rate. I am using FireFox to view the page.

This code only works with FireFox. Browser-compatibility fixes would be most welcome.

The first step is to create a simple page to provide a UI and to contain the javascript. The lamejs docs suggest the encoder should be able to encode in real-time, so instead of start recording, capture, stop recording, export to WAV, and encode I will be encoding each audio buffer as it arrives and capturing only the MP3 stream. There will be no separate encode step.

Note that the audio control and download link do not have a "src" or "href" attribute, that will be supplied later when the output stream has been created.


mp3.htm: <html> <head> <title>MP3 Encoder</title> <script src="js/sprintf.js"></script> <script src="js/lame.min.js"></script> </head> <body> <h1>MP3 Encoder</h1> <audio id="AudioPlayer" controls="controls"> <source type="audio/mpeg"> </audio><br> <button id="btnPower" onclick="onPower()">Power</button> <button id="btnRecord" onclick="onRecord()" disabled>Record</button> <button id="btnStop" onclick="onStop()" disabled>Stop</button> <p> <a id="DownloadLink" download="recorded.mp3">recorded.mp3</a> <span id="status"></span> </p> <div id="log" style="width:100%; border:solid thin; margin:8px; padding:4px;"> Waiting for scripts...<br> </div> <script id="taskUI" type="text/javascript"> function onPower(btn) { } function onRecord(btn) { } function onStop(btn) { } </script> </body> </html>

I saved this to webv6/sites/imok/pub/mp3.htm, and access it through my local nginx web server using FireFox as
http://localhost:8082/imok/mp3.htm

One of my goals is to avoid front-loading the page as much as possible. I won't initialize any of the audio components or encoder library until the user requests it. The lame.min.js library is 156KB. I am embedding this in the page for now, later I want to load it only when the user clicks the Power button.

I need to have some sort of debug feedback, so I create a simple log window and status update.


mp3.htm: <script id="logger" type="text/javascript"> function log(fmt,args) { var text= this.sprintf.apply(this,arguments); var element= document.querySelector('#log'); element.innerHTML+= text+"<br>\n"; } function status(fmt,args) { var text= this.sprintf.apply(this,arguments); var element= document.getElementById('status'); element.innerHTML= text; } (function(window) { log("Window loaded."); })(window); </script>

Clicking the Power button will create the audio interface.


mp3.htm: <script id="taskUI" type="text/javascript"> var gAudio= null; //Audio context var gAudioSrc= null; //Audio source var gNode= null; //The audio processor node var gIsRecording= false; var gPcmCt= 0; function onPower(btn) { if(!gAudio) { PowerOn(); } else { PowerOff(); } } function PowerOn() { try { //Browser compatibility if(!window.AudioContext) window.AudioContext= window.webkitAudioContext; navigator.getUserMedia= navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia; if(!window.AudioContext) { log("ERR: No AudioContext."); } else if(!navigator.getUserMedia) { log("ERR: No UserMedia interface."); } else if(!(gAudio= new AudioContext())) { log("ERR: Unable to create Audio interface."); } else { var caps= { audio: true }; navigator.getUserMedia(caps,onUserMedia,onFail); } } catch(ex) { log("ERR: Unable to find any audio support."); } function onFail(ex) { log("ERR: getUserMedia failed: %s",ex); } } function onUserMedia(stream) { if(!(gAudioSrc= gAudio.createMediaStreamSource(stream))) { log("ERR: Unable to create audio source."); } else { gPcmCt= 0; log("Power ON."); document.getElementById('btnRecord').disabled= false; document.getElementById('btnStop').disabled= false; } } function PowerOff() { if(gIsRecording) { log("ERR: PowerOff: You need to stop recording first."); } else { gAudioSrc= null; gAudio= null; log("Power OFF."); document.getElementById('btnRecord').disabled= true; document.getElementById('btnStop').disabled= true; } }

Clicking the Record button will create the audio source node. The term "node" refers to its role within the media "graph" that controls the flow of the media data from source, through various filters, to its final destination. The media data will arrive in sample buffers through the onaudioprocesss callback. As long as I am done processing each buffer before the next one is ready, I can do everything synchronously in a single thread. Later, I will move the encoder into a worker task that will isolate the encoder from the UI task.


mp3.htm: var gNode= null; //The audio processor node var gPcmCt= 0; function onRecord(btn) { var creator; if(!gAudio) { log("ERR: No Audio source."); } else if(gIsRecording) { log("ERR: Already recording."); } else { if(!gNode) { if(!(creator= gAudioSrc.context.createScriptProcessor || gAudioSrc.createJavaScriptNode)) { log("ERR: No processor creator?"); } else if(!(gNode= creator.call(gAudioSrc.context,gCfg.bufSz,gCfg.chnlCt,gCfg.chnlCt))) { log("ERR: Unable to create processor node."); } } if(!gNode) { log("ERR: onRecord: No processor node."); } else { gNode.onaudioprocess= onAudioProcess; gAudioSrc.connect(gNode); gNode.connect(gAudioSrc.context.destination); gIsRecording= true; log("RECORD"); } } } function onStop(btn) { if(!gAudio) { log("ERR: onStop: No audio."); } else if(!gAudioSrc) { log("ERR: onStop: No audio source."); } else if(!gIsRecording) { log("ERR: onStop: Not recording."); } else { gNode.onaudioprocess= null; gAudioSrc.disconnect(gNode); gNode.disconnect(); gIsRecording= false; log("STOP"); } } function onAudioProcess(e) { var inBuf= e.inputBuffer; var samples= inBuf.getChannelData(0); var sampleCt= samples.length; gPcmCt+= sampleCt; status("%d",gPcmCt); }

I should now be able to click the Power button, the browser will ask if the page can use the microphone, and the audio source will be created. Clicking Record will create the audio capture node and audio buffers should start arriving until I click Stop. I am not doing anything with the audio yet, excepting counting the bytes.

It is time to deploy the lamejs library. I want to initialize lamejs during the PowerOn sequence, after the audio source has been created and I know the audio parameters, such as sample rate and number of channels. At this point, I am assuming the audio matches my default configuration. Later I will configure the audio source or adjust my configuration to match.


mp3.htm: var gLame= null; //The LAME encoder library var gEncoder= null; //The MP3 encoder object var gStrmMp3= []; //Collection of MP3 buffers var gCfg= { chnlCt: 1, bufSz: 4096, sampleRate: 44100, bitRate: 128 }; var gMp3Ct= 0; function onUserMedia(stream) { if(!(gAudioSrc= gAudio.createMediaStreamSource(stream))) { log("ERR: Unable to create audio source."); } else if(!(gEncoder= Mp3Create())) { log("ERR: Unable to create MP3 encoder."); } else { gStrmMp3= []; gMp3Ct= 0; log("Power ON."); } } function PowerOff() { if(gIsRecording) { log("ERR: PowerOff: You need to stop recording first."); } else { gEncoder= null; gLame= null; gNode= null; gAudioSrc= null; gAudio= null; log("Power OFF."); } } function onRecord(btn) { var creator; if(!gAudio) { log("ERR: No Audio source."); } else if(!gEncoder) { log("ERR: No encoder."); } else if(gIsRecording) { log("ERR: Already recording."); } else { ... } } function Mp3Create() { if(!(gLame= new lamejs())) { log("ERR: Unable to create LAME object."); } else if(!(gEncoder= new gLame.Mp3Encoder(gCfg.chnlCt,gCfg.sampleRate,gCfg.bitRate))) { log("ERR: Unable to create MP3 encoder."); } else { log("MP3 encoder created."); } return(gEncoder); }

Run this just to make sure the calls the lamejs library are working. I am still not doing anything with the audio yet.

It is finally time to encode the audio buffers. The browser delivers the audio samples as floating-point (which is nothing but a tremendous waste of CPU cycles) so the first task is to convert them back to 16bit signed integers, in the range of -32767 to +32767 (0x8000 in hex). I need to clamp the values at the extremes to avoid nasty audio pops due to integer overflows. I submit the 16bit samples to lamejs and add the output mp3 buffer to my mp3 stream. Note that gStrmMp3 will be an array of buffers, not just a simple data buffer.


mp3.htm: function onAudioProcess(e) { var inBuf= e.inputBuffer; var samples= inBuf.getChannelData(0); var sampleCt= samples.length; var samples16= convertFloatToInt16(samples); if(samples16.length > 0) { gPcmCt+= samples16.length*2; var mp3buf= gEncoder.encodeBuffer(samples16); var mp3Ct= mp3buf.length; if(mp3Ct>0) { gStrmMp3.push(mp3buf); gMp3Ct+= mp3Ct; } status("%d / %d: %2.2f%%",gPcmCt,gMp3Ct,(gMp3Ct*100)/gPcmCt); } } function convertFloatToInt16(inFloat) { var sampleCt= inFloat.length; var outInt16= new Int16Array(sampleCt); for(var n1=0;n1<sampleCt;n1++) { //This is where I can apply waveform modifiers. var sample16= 0x8000*inFloat[n1]; sample16= (sample16 < -32767) ? -32767 : (sample16 > 32767) ? 32767 : sample16; outInt16[n1]= sample16; } return(outInt16); }

Finally, I need to handle the stop operation by flushing out the last mp3 buffer and present the final mp3 stream to the user. [TODO: It would be better to stop the input source and wait for the onend callback, but I could not find my way from gAudioSrc to a node that had a stop interface. This code abruptly disconnects the audio stream, dropping any buffer(s) that are in the pipeline, resulting in the last half-second of audio being clipped.] I coalesce the mp3 stream buffers into a single data blob in memory (which will be a complete mp3 file), create a local URL to the blob, then paste the URL into the audio control and the download link.


mp3.htm: function onStop(btn) { if(!gAudio) { log("ERR: onStop: No audio."); } else if(!gAudioSrc) { log("ERR: onStop: No audio source."); } else if(!gIsRecording) { log("ERR: onStop: Not recording."); } else { gAudioSrc.disconnect(gNode); gNode.disconnect(); gIsRecording= false; var mp3= gEncoder.flush(); if(mp3.length>0) gStrmMp3.push(mp3); showMp3(gStrmMp3); log("STOP"); } } function showMp3(mp3) { //Consolidate the collection of MP3 buffers into a single data Blob. var blob= new Blob(gStrmMp3,{type: 'audio/mp3'}); //Create a URL to the blob. var url= window.URL.createObjectURL(blob); //Paste the URL into the audio control and download links. var audio= document.getElementById('AudioPlayer'); var download= document.getElementById('DownloadLink'); audio.src= url; download.href= url; }

The polishing touch is to remove the reference to lamejs from the document head and load it on demand during the PowerOn operation. I add a new script to provide a function to load an external script. By calling it from onUserMedia() it will only be loaded after a valid audio device has been created. Loading the script takes time and I will be notified when the script is available by way of a callback, so I need to split onUserMedia() into pre- and post-load steps.


mp3.htm: <head> <title>MP3 Encoder</title> <script src="js/sprintf.js"></script> <script src="js/lame.min.js"></script> </head> ... <script id="loadjs" type="text/javascript"> function loadScript(name,path,cb) { var node= document.createElement('SCRIPT'); node.type= 'text/javascript'; node.src= path; var head= document.getElementsByTagName('HEAD'); if(head[0]!=null) head[0].appendChild(node); if(cb!=null) { node.onreadystagechange= cb; node.onload= cb; } } </script> ... <script id="taskUI" type="text/javascript"> var gIsLame= false; //Has lame.min.js been loaded? ... function onUserMedia(stream) { if(!(gAudioSrc= gAudio.createMediaStreamSource(stream))) { log("ERR: Unable to create audio source."); } else if(!gIsLame) { log("Fetching lame library..."); loadScript("lame","js/lame.min.js",LameCreate); } else { LameCreate(); } } function LameCreate() { gIsLame= true; if(!(gEncoder= Mp3Create())) { log("ERR: Unable to create MP3 encoder."); } else { gStrmMp3= []; gPcmCt= 0; gMp3Ct= 0; log("Power ON."); document.getElementById('btnRecord').disabled= false; document.getElementById('btnStop').disabled= false; } }

Now the initial page load is tiny and fast.

The complete code is here: mp3.htm

I tried loading the page with IE11 and it failed, I could not find the AudioContext or getUserMedia interfaces. It may be that my version of IE11 (11.0) is too old. My focus is on achieving demo-level usability -- chasing after full browser compatibility is well outside the scope.



WebV7 (C)2018 nlited | Rendered by tikope in 38.483ms | 3.21.46.24