getUserMedia(), the device API and the state of the webcam in browser using javascript and html5 video

Since first considering web camera applications during my MSc thesis I've be fascinated by different uses and application of webcams on the web. I believe outside of skype they are a very under utilised tool for both communication and interaction and despite their long existence I believe there is still a very long way to go in what they can do.

Android camera live view in a webpage getUserMedia

In this post I intend to summarise recent developments surrounding using a webcam within web applications and go on to discuss current implementations available for using a webcam with javascript.

Buzz about html5 <device>

Last year at Full Frontal 2010 Raul Rouget gave a talk entitled 'Batshit crazy stuff you'll be able to do in browsers', he basically stole the show with a series of demos of webgl, audio api and file api developments in firefox, however the parts I was most intrigued by involved the use of the now defunct device API - using a webcam or other similar input devices directly in the browser.

Later that evening I sat in may hotel room, after a good amount of beer, and attempted to compile a patched version firefox from source. After downloading the whole lot over a tethered mobile connection and merging in some patches attached to a bug report I got a binary which crashed on startup. I tried again several times later and still didn't get the desired result, having seen no other blog posts about it I assume very few other people got it to work!

getUserMedia()

Now, 8 months on, the former HTML5 <device> API has been depreciated in favour of a new API based around getUserMedia() this is part of a larger API for peer to peer, in browser  video communication. It allows html5 <video> elements to have their source defined as a stream from a device:

<!DOCTYPE html>

<h1>Simple web camera display demo</h1>
<video autoplay></video>

<script>
//Grab the elements
var video = document.getElementsByTagName('video')[0],
heading = document.getElementsByTagName('h1')[0];

//test for getUserMedia
if(navigator.getUserMedia) {
  //setup callbacks
  navigator.getUserMedia('video', successCallback, errorCallback);

  //if everything if good then set the source of the video element to the mediastream
  function successCallback( stream ) {
    video.src = stream;
  }

  //If everything isn't ok then say so
  function errorCallback( error ) {
    heading.textContent =
        &quot;An error occurred: [CODE &quot; + error.code + &quot;]&quot;;
  }
}
else {
  //show no support for getUserMedia
  heading.textContent =
      &quot;Native web camera streaming is not supported in this browser!&quot;;
}
</script>
</html>

Code sample taken from http://my.opera.com/core/blog/2011/03/23/webcam-orientation-preview and commented by me

That code produces this output when viewed in an opera mobile development build:

Opera getUserMedia Example

Implementations

At present implementions of this API are very thin on the ground:

Mobile

Opera Mobile

Opera have released a build of opera mobile which has support for use of a mobile phone webcam using the new API on android phones. I have successfully tested this on my HTC Desire Z (see screenshots) the support for the full API for peer to peer communication is still in development but this has support for the basics of video capture. This implementation appears to follow the published specification fairly closely

Desktop

Webkit (Ericsson)

An even more experimental implementation of the API has been written by Ericsson which uses a modified version of the libwebkitgtk library which allows use of the new API with compatible desktop browsers (supported with Epiphany). The installation is simple as they've provided a PPA:

#Install the Ericsson Labs public GPG key used to verify the package signatures
wget -O- --quiet https://labs.ericsson.com/files/gpg/public.key | sudo apt-key add -

#Add the Ericsson Labs PPA
sudo add-apt-repository http://files.labs.ericsson.net/ubuntu
sudo apt-get update

# Upgrade to the Ericsson Labs modified libwebkitgtk packages
sudo apt-get -y install libwebkitgtk-1.0-0

# (optional) Install Epiphany from the default Ubuntu repository
sudo apt-get install epiphany-browser

Once this installation is complete epiphany (and other browsers configured to use the modified library will have support for the API. It does however support a great deal more functionality compared with the Opera implementation including peer to peer functionality to enable serverless webcam chat type applications. Full documentation is available here

Sadly this implementation of the API differs from the standard specification and uses a prefixed version using webkitGetUserMedia() along with various other annoying differences. A comparable example  to the simple webcam view above could be written  thus:

<html>
<body>

<h1>Simple web camera display demo</h1>
<video id='selfView' autoplay audio=muted></video>

<script>
//Grab the elements
var video = document.getElementsByTagName('video')[0],
heading = document.getElementsByTagName('h1')[0];

//test for getUserMedia
if(navigator.webkitGetUserMedia) {
  //setup callbacks
  navigator.webkitGetUserMedia('video', successCallback);

  //if everything if good then set the source of the video element to the mediastream
  function successCallback( stream ) {
    video.src = webkitURL.createObjectURL(stream);
  }
}
else {
  //show no support for getUserMedia
  heading.textContent =
      'Native web camera streaming is not supported in this browser!';
}
</script>
</body>
</html>

Annoyances

This is not an API ready for use in any kind of real work application as there is no implementation in any mainstream browser and the specification is still in a state of flux. There are also functions which need to be worked out to handle multiple input sources and choosing the correct one, selecting the input size and many other bits and pieces. It could be many months or years before there is a reliable implementation  across browsers.

The Future

This API could be the basis for many cool web applications and potentially replace applications like skype but there is a long way to go before we see standard implementations. However many parts of the API may be able to be replicated in flash and, for the simple parts at least, this may be a realistic step on the ladder for those wishing to get going with the API. A polyfill for the simple features is on my list of things to try...get hold your breath though!

My Plans

Recently I've been playing with the API on Opera Mobile in conjunction with jquery mobile, the results can be seen on github. At present it contains a simple camera app which can post a snapshot of the video stream to the sever as a base64 encoded image via the canvas.