You are on page 1of 12

AudioCaptureRaw Walkthrough: C++

Capturing the Raw Audio Stream


Beta 1 Draft Version 1.1 July 22, 2011

About This Walkthrough In the Kinect for Windows Software Development Kit (SDK) Beta from Microsoft Research, the AudioCaptureRaw sample uses the Windows Audio Session API (WASAPI) to capture the raw audio stream from the microphone array of the Kinect for Xbox 360 sensor and write it to a .wav file. This document is a walkthrough of the sample. Resources For a complete list of documentation for the Kinect for Windows SDK Beta, plus related reference and links to the online forums, see the beta SDK website at: http://research.microsoft.com/kinectsdk

Contents
Introduction ....................................................................................................................................................................................................... 2 Program Description ...................................................................................................................................................................................... 2 Select a Capture Device ................................................................................................................................................................................ 3 Enumerate the Capture Devices ............................................................................................................................................................ 4 Retrieve the Device Name ....................................................................................................................................................................... 5 Determine the Device Index ................................................................................................................................................................... 6 Prepare for Audio Capture .......................................................................................................................................................................... 6 Initialize Audio Engine for Capture ...................................................................................................................................................... 7 Load the Format .......................................................................................................................................................................................... 7 Initialize the Audio Engine ....................................................................................................................................................................... 7 Capture an Audio Stream from the Microphone Array ................................................................................................................... 8 The Primary Thread .................................................................................................................................................................................... 8 The Worker Thread .................................................................................................................................................................................. 10

License: The Kinect for Windows SDK Beta from Microsoft Research is licensed for non-commercial use only. By installing, copying, or otherwise using the beta SDK, you agree to be bound by the terms of its license. Read the license. Disclaimer: This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. 2011 Microsoft Corporation. All rights reserved. Microsoft, DirectX, Kinect, LifeChat, MSDN, and Windows are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners.

AudioCaptureRaw Walkthrough: C++ 2

Introduction
The audio component of the Kinect for Xbox 360 sensor is a four-element linear microphone array. An array provides some significant advantages over a single microphone, including more sophisticated acoustic echo cancellation and noise suppression, and the ability to determine the direction of a sound source. The primary way for C++ applications to access the Kinect sensors microphone array is through the KinectAudio Microsoft DirectX Media Object (DMO). However, it is useful for some purposes to simply capture the raw audio streams from the arrays microphones. The Kinect sensors microphone array is a standard Windows multichannel audio-capture device, so you can also capture the audio stream by using the Windows Audio Session API (WASAPI) or by using the microphone array as a standard Windows microphone. The AudioCaptureRaw sample uses the WASAPI to capture the raw audio stream from the Kinect sensors microphone array and write it to a .wav file. This document is a walkthrough of the sample. For more information on WASAPI, see About WASAPI on the Microsoft Developer Network (MSDN) website. Note The WASAPI is COM-based, and this document assumes that you are familiar with the basics of how to use COM objects and interfaces. You do not need to know how to implement COM objects. For the basics of how to use COM objects, see Programming DirectX with COM on the MSDN website. This MSDN topic is written for DirectX programmers, but the basic principles apply to all COM-based applications.

Program Description
AudioCaptureRaw is installed with the Kinect for Windows Software Development Kit (SDK) Beta in the \Users\Public\Documents\Microsoft Research KinectSDK Samples\Audio\AudioCaptureRaw directory. AudioCaptureRaw is a C++ console application that is implemented in the following files: AudioCaptureRaw.cpp contains the applications entry point and manages overall program execution. WASAPICapture.cpp and its associated headerWASAPICapture.himplement the CWASAPICapture class, which handles the details of capturing the audio stream. Enumerate the systems capture devices and select the appropriate device. Because the system might have multiple audio capture devices, the application enumerates all such devices and has the user specify the appropriate one. 2. 3. Record 10 seconds of audio data from the device. Write the recorded data to a WAVE file: out.wav.

The AudioCaptureRaw basic program flow is as follows: 1.

The recording process multiplexes the streams from each microphone channel in an interleaved formatch 1/ ch 2/ ch 3/ ch 4/ ch 1/ ch 2/... and so onwith each channels data in a 16-kiloHertz (kHz), 32-bit mono pulse code modulation (PCM) format.

AudioCaptureRaw Walkthrough: C++ 3

The following is a lightly edited version of the AudioCaptureRaw output for a system with two capture devicesa Microsoft LifeChat headset and a Kinect sensor:
WASAPI Capture Shared Timer Driven Sample Copyright (c) Microsoft. Select an output device: 0: 1: 0 Capture audio data for 10 seconds 1 Successfully wrote WAVE data to out.wav Microphone Array (Kinect USB Audio) ({0.0.1.00000000} {6ed40fd5-a340-4f8a-b324-edac93fa6702}) Headset Microphone (3- Microsoft LifeChat LX-3000 )({0.0.1.00000000} {97721472-fc66-4d63-95a2-86c1044e0893}) All Rights Reserved

The remainder of this document walks you through the application. Note This document includes code examples, most of which have been edited for brevity and readability. In particular, most routine error correction code has been removed. For the complete code, see the example. Hyperlinks in this walkthrough refer to content on the MSDN website.

Select a Capture Device


The applications entry point is wmain, in WASAPICaptureRaw.cpp. This function manages the overall program execution, with private functions handling most of the details. WASAPI is COM-based, so AudioCapture Raw first initializes COM, as follows:

int wmain() { ... HRESULT hr = CoInitializeEx(NULL, COINIT_MULTITHREADED); ... }


Tip Applications that have a graphical user interface (GUI) should use COINIT_APARTMENTTHREADED instead of COINIT_MULTITHREADED. AudioCaptureRaw next calls the private PickDevice method to select the capture device, as follows:

bool PickDevice(IMMDevice **DeviceToUse, bool *IsDefaultDevice, ERole *DefaultDeviceRole) { IMMDeviceEnumerator *deviceEnumerator = NULL; IMMDeviceCollection *deviceCollection = NULL; *IsDefaultDevice = false;

AudioCaptureRaw Walkthrough: C++ 4

hr = CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&deviceEnumerator)); ... }


PickDevice calls the CoCreateInstance function to create a device enumerator object and get a pointer to its IMMDeviceEnumerator interface.

Enumerate the Capture Devices


PickDevice enumerates the systems capture devices by calling the enumerator objects IMMDeviceEnumerator::EnumAudioEndpoints method, as follows:

bool PickDevice(...) { ... hr = deviceEnumerator->EnumAudioEndpoints(eCapture, DEVICE_STATE_ACTIVE, &deviceCollection); ... }


The EnumAudioEndpoints parameter values are as follows: 1. A value from the EDataFlow enumeration that indicates the device type. eCapture directs EnumAudioEndpoints to enumerate only capture devices. 2. A DEVICE_STATE_XXX constant that specifies which device states to enumerate. DEVICE_STATE_ACTIVE directs EnumAudioEndpoints to enumerate only active devices. 3. The address of an IMMDeviceCollection interface pointer that contains the enumerated capture devices.

PickDevice then uses the IMMDeviceCollection interface to list the available capture devices and let the user select the appropriate devicepresumably the Kinect sensoras follows:

bool PickDevice(...) { UINT deviceCount; ... hr = deviceCollection->GetCount(&deviceCount); for (UINT i = 0 ; i < deviceCount ; i += 1) { LPWSTR deviceName; deviceName = GetDeviceName(deviceCollection, i); printf_s(" %d: %S\n", i, deviceName); free(deviceName); } ... }

AudioCaptureRaw Walkthrough: C++ 5

PickDevice first calls the collection objects IMMDeviceCollection::GetCount method to determine the number of devices in the collection and then iterates through the collection and lists the device names.

Retrieve the Device Name


PickDevice iterates through the collection and calls the private GetDeviceName method to retrieve the device name, as follows:

LPWSTR GetDeviceName(IMMDeviceCollection *DeviceCollection, UINT DeviceIndex) { IMMDevice *device; LPWSTR deviceId; HRESULT hr; hr = DeviceCollection->Item(DeviceIndex, &device); hr = device->GetId(&deviceId); IPropertyStore *propertyStore; hr = device->OpenPropertyStore(STGM_READ, &propertyStore); SafeRelease(&device); PROPVARIANT friendlyName; PropVariantInit(&friendlyName); hr = propertyStore->GetValue(PKEY_Device_FriendlyName, &friendlyName); wchar_t deviceName[128]; hr = StringCbPrintf(deviceName, sizeof(deviceName), L"%s (%s)", friendlyName.vt != VT_LPWSTR ? L"Unknown" : friendlyName.pwszVal, deviceId); ...//Clean up and return the device name }
Each device in the collection is identified by a zero-based index and is represented by a device object that exposes an IMMDevice interface. The device detailsincluding a readable friendly nameare stored in the device objects property store, which is represented by an IPropertyStore interface. A property store provides general-purpose storage. Each item is identified by a keya PROPERTYKEY structurethat is typically named PKEY_XYZ. The key for the devices friendly name is named PKEY_Device_FriendlyName. To obtain the devices friendly name, GetDeviceName: 1. 2. 3. Calls the IMMDeviceCollection::Item method to retrieve the specified device objects IMMDevice interface. Calls the IMMDevice::GetId method to retrieve the device ID. Calls the IMMDevice::OpenPropertyStore method to get a read-only pointer to the device objects IPropertyStore interface.

AudioCaptureRaw Walkthrough: C++ 6

4. 5.

Passes the friendly name property key to the IPropertyStore::GetValue method, which returns a PROPVARIANT structure with the devices friendly name. Calls the StringCbPrintf function to extract the name string from the PROPVARIANT structure.

Determine the Device Index


The user enters an integer value that specifies the device index. PickDevice converts the string to an unsigned long and passes the index to IMMDeviceCollection::Item to retrieve the appropriate IMMDevice interface, which is then returned to wmain, as shown in the following code example:

bool PickDevice(...) { ... wchar_t choice[10]; _getws_s(choice); long deviceIndex; wchar_t *endPointer; deviceIndex = wcstoul(choice, &endPointer, 0); hr = deviceCollection->Item(deviceIndex, &device); ... }

Prepare for Audio Capture


The audio capture process is handled by a CWASAPICapture object, as follows:

int wmain() { ... CWASAPICapture *capturer = new (std::nothrow) CWASAPICapture(device, role); if (capturer->Initialize(TargetLatency)) { ... } ... }
To create the object, wmain passes the devices IMMDevice interface and a role value to the constructor. The constructor uses this input to set some private data members. The contents of the if block implement the capture process and are discussed in the next section. wmain passes a target latency value to CWASAPICapture::Initialize to initialize the object. AudioCaptureRaw polls for data. Target latency defines the wait time and also influences the size of the buffer that is shared between the application and the audio client.

AudioCaptureRaw Walkthrough: C++ 7

Initialize Audio Engine for Capture


CWASAPICapture::Initialize prepares the audio engine for capture, as follows:

bool CWASAPICapture::Initialize(UINT32 EngineLatency) { _ShutdownEvent = CreateEventEx(NULL, NULL, 0, EVENT_MODIFY_STATE | SYNCHRONIZE); HRESULT hr = _Endpoint->Activate(__uuidof(IAudioClient), CLSCTX_INPROC_SERVER, NULL, reinterpret_cast<void **>(&_AudioClient)); hr = CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&_DeviceEnumerator)); LoadFormat()) InitializeAudioEngine() return true; }
Initialize creates a shutdown event that is used later to help manage the capture process. It then calls the devices IMMDevice::Activate method to create an audio client object for the device, which is represented by an IAudioClient interface. It completes the preparation by calling the private LoadFormat and InitializeAudioEngine methods.

Load the Format


The private LoadFormat method calls the devices IAudioClient::GetMixFormat method to retrieve the audio stream format. It uses that information to define the frame size and store it for later use, as follows:

bool CWASAPICapture::LoadFormat() { HRESULT hr = _AudioClient->GetMixFormat(&_MixFormat); _FrameSize = (_MixFormat->wBitsPerSample / 8) * _MixFormat->nChannels; return true; }

Initialize the Audio Engine


Initialize calls InitializeAudioEngine to initialize the audio engine in timer-driven mode, as follows:

bool CWASAPICapture::InitializeAudioEngine() { HRESULT hr = _AudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_NOPERSIST, _EngineLatencyInMS*10000, 0, _MixFormat, NULL);

AudioCaptureRaw Walkthrough: C++ 8

hr = _AudioClient->GetService(IID_PPV_ARGS(&_CaptureClient)); return true; }


InitializeAudioEngine: 1. 2. Calls the IAudioClient::Initialize method to initialize the object. Calls the IAudioClient::GetService method to retrieve an IAudioCaptureClient interface, which enables a client to read the input data from a capture device.

The data from the final step is stored for later use.

Capture an Audio Stream from the Microphone Array


The capture process works as follows: 1. 2. 3. The primary thread creates a worker thread to capture the data and then starts a countdown timer. While the countdown timer runs, the worker thread captures audio data in the background. After the countdown timer completes, the primary thread notifies the worker thread to stop capturing data and ends the process.

The Primary Thread


The code to manage this process was represented by the ellipsis in the if block that was shown at the beginning of the Prepare for Audio Capture section. The following code example shows the complete block:

int wmain() { ... if (capturer->Initialize(TargetLatency)) { size_t captureBufferSize = capturer->SamplesPerSecond() * TargetDurationInSec * capturer->FrameSize(); BYTE *captureBuffer = new (std::nothrow) BYTE[captureBufferSize]; if (capturer->Start(captureBuffer, captureBufferSize)) { do { printf_s(" \r%d\r", TargetDurationInSec); Sleep(1000); } while (--TargetDurationInSec); printf_s("\n"); capturer->Stop(); // Save the data to a WAVE file and clean up. ... } }

AudioCaptureRaw Walkthrough: C++ 9

Before starting the capture process, wmain first computes the size of the capture buffer, which is the product of the following: The sample rate, in samples per second, which is extracted from the mix format by the private CWASAPICapture::SamplesPerSecond method. The target duration, in seconds, which is hard-coded to 10 seconds. The frame size, which was computed earlier and is retrieved by the private CWASAPICapture::FrameSize method.

Start the Capture Process


wmain calls the private CWASAPICapture::Start method to start the capture process, as follows:

bool CWASAPICapture::Start(BYTE *CaptureBuffer, size_t CaptureBufferSize) { HRESULT hr; _CaptureBuffer = CaptureBuffer; _CaptureBufferSize = CaptureBufferSize; _CaptureThread = CreateThread(NULL, 0, WASAPICaptureThread, this, 0, NULL); hr = _AudioClient->Start(); return true; }
Start: 1. Calls the CreateThread function to create the worker thread. CreateThread creates a new thread and calls CWASAPICapture::WASAPICaptureThread on that thread. WASAPICaptureThread is discussed in the following section. 2. Calls the IAudioClient::Start method to direct the audio client to start streaming data between the endpoint buffer and the audio engine.

Manage the Capture Process


After CWASAPICapture::Start returns, wmain starts the countdown timer on the primary thread, which also provides the user with a visual indicator of the capture process. When the countdown timer is finished, wmain calls CWASAPICapture::Stop to stop the capture process, as follows:

void CWASAPICapture::Stop() { HRESULT hr; if (_ShutdownEvent) { SetEvent(_ShutdownEvent); } hr = _AudioClient->Stop(); if (_CaptureThread) { WaitForSingleObject(_CaptureThread, INFINITE);

AudioCaptureRaw Walkthrough: C++ 10

CloseHandle(_CaptureThread); _CaptureThread = NULL; } }


Stop: 1. 2. 3. 4. Raises _ShutdownEvent to notify the worker thread to stop capturing data. Calls IAudioClient::Stop to direct the audio engine to stop streaming data. Waits for the worker thread to signal the thread object, which indicates that the capture process is complete. Terminates the thread.

wmain then calls the private SaveWaveData method to write the captured data to a .wav file. For details, see the sample. wmain then performs final cleanup and terminates the application.

The Worker Thread


On the worker thread, WASAPICaptureThread calls CWASAPICapture::WASAPIDoCapture to handle the capture process, as follows:

DWORD CWASAPICapture::DoCaptureThread() { bool stillPlaying = true; HANDLE mmcssHandle = NULL; DWORD mmcssTaskIndex = 0; HRESULT hr = CoInitializeEx(NULL, COINIT_MULTITHREADED); mmcssHandle = AvSetMmThreadCharacteristics(L"Audio", &mmcssTaskIndex); while (stillPlaying) { // Capture audio stream until stopped by primary thread. } AvRevertMmThreadCharacteristics(mmcssHandle); CoUninitialize(); return 0; }
DoCaptureThread: 1. Calls the CoInitializeEx function to initialize COM for the worker thread. You must initialize COM separately for each thread. 2. 3. Calls the AvSetMmThreadCharacteristics function to associate the worker thread with the capture task. Starts a while loop to capture the data, which runs until the primary thread calls CWASAPICapture::Stop.

AudioCaptureRaw Walkthrough: C++ 11

The following code sample shows the capture loop:

DWORD CWASAPICapture::DoCaptureThread() { ... while (stillPlaying) { HRESULT hr; DWORD waitResult = WaitForSingleObject(_ShutdownEvent, _EngineLatencyInMS / 2); switch (waitResult) { case WAIT_OBJECT_0 + 0: stillPlaying = false; break; case WAIT_TIMEOUT: BYTE *pData; UINT32 framesAvailable; DWORD flags; hr = _CaptureClient->GetBuffer(&pData, &framesAvailable, &flags, NULL, NULL); if (SUCCEEDED(hr)) { UINT32 framesToCopy = min(framesAvailable, static_cast<UINT32>((_CaptureBufferSize _CurrentCaptureIndex) / _FrameSize)); if (framesToCopy != 0) { if (flags & AUDCLNT_BUFFERFLAGS_SILENT) { ZeroMemory(&_CaptureBuffer[_CurrentCaptureIndex], framesToCopy*_FrameSize); } else { CopyMemory(&_CaptureBuffer[_CurrentCaptureIndex], pData, framesToCopy*_FrameSize); } _CurrentCaptureIndex += framesToCopy*_FrameSize; } hr = _CaptureClient->ReleaseBuffer(framesAvailable); } break; } } ... }

AudioCaptureRaw Walkthrough: C++ 12

For each iteration, the capture loop waits until the next frames data has been streamed: If the primary thread raises _ShutdownEvent before the time-out ends, the capture loop terminates. If the primary thread does not raise _ShutdownEvent, the capture loop fills the capture buffer and starts the next iteration.

For More Information For more information about implementing audio and related samples, see the Programming Guide page on the Kinect for Windows SDK Beta website at: http://research.microsoft.com/kinectsdk

You might also like