Professional Documents
Culture Documents
About This Walkthrough In the Kinect for Windows Software Development Kit (SDK) Beta from Microsoft Research, the AudioCaptureRaw sample uses the Windows Audio Session API (WASAPI) to capture the raw audio stream from the microphone array of the Kinect for Xbox 360 sensor and write it to a .wav file. This document is a walkthrough of the sample. Resources For a complete list of documentation for the Kinect for Windows SDK Beta, plus related reference and links to the online forums, see the beta SDK website at: http://research.microsoft.com/kinectsdk
Contents
Introduction ....................................................................................................................................................................................................... 2 Program Description ...................................................................................................................................................................................... 2 Select a Capture Device ................................................................................................................................................................................ 3 Enumerate the Capture Devices ............................................................................................................................................................ 4 Retrieve the Device Name ....................................................................................................................................................................... 5 Determine the Device Index ................................................................................................................................................................... 6 Prepare for Audio Capture .......................................................................................................................................................................... 6 Initialize Audio Engine for Capture ...................................................................................................................................................... 7 Load the Format .......................................................................................................................................................................................... 7 Initialize the Audio Engine ....................................................................................................................................................................... 7 Capture an Audio Stream from the Microphone Array ................................................................................................................... 8 The Primary Thread .................................................................................................................................................................................... 8 The Worker Thread .................................................................................................................................................................................. 10
License: The Kinect for Windows SDK Beta from Microsoft Research is licensed for non-commercial use only. By installing, copying, or otherwise using the beta SDK, you agree to be bound by the terms of its license. Read the license. Disclaimer: This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. 2011 Microsoft Corporation. All rights reserved. Microsoft, DirectX, Kinect, LifeChat, MSDN, and Windows are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners.
Introduction
The audio component of the Kinect for Xbox 360 sensor is a four-element linear microphone array. An array provides some significant advantages over a single microphone, including more sophisticated acoustic echo cancellation and noise suppression, and the ability to determine the direction of a sound source. The primary way for C++ applications to access the Kinect sensors microphone array is through the KinectAudio Microsoft DirectX Media Object (DMO). However, it is useful for some purposes to simply capture the raw audio streams from the arrays microphones. The Kinect sensors microphone array is a standard Windows multichannel audio-capture device, so you can also capture the audio stream by using the Windows Audio Session API (WASAPI) or by using the microphone array as a standard Windows microphone. The AudioCaptureRaw sample uses the WASAPI to capture the raw audio stream from the Kinect sensors microphone array and write it to a .wav file. This document is a walkthrough of the sample. For more information on WASAPI, see About WASAPI on the Microsoft Developer Network (MSDN) website. Note The WASAPI is COM-based, and this document assumes that you are familiar with the basics of how to use COM objects and interfaces. You do not need to know how to implement COM objects. For the basics of how to use COM objects, see Programming DirectX with COM on the MSDN website. This MSDN topic is written for DirectX programmers, but the basic principles apply to all COM-based applications.
Program Description
AudioCaptureRaw is installed with the Kinect for Windows Software Development Kit (SDK) Beta in the \Users\Public\Documents\Microsoft Research KinectSDK Samples\Audio\AudioCaptureRaw directory. AudioCaptureRaw is a C++ console application that is implemented in the following files: AudioCaptureRaw.cpp contains the applications entry point and manages overall program execution. WASAPICapture.cpp and its associated headerWASAPICapture.himplement the CWASAPICapture class, which handles the details of capturing the audio stream. Enumerate the systems capture devices and select the appropriate device. Because the system might have multiple audio capture devices, the application enumerates all such devices and has the user specify the appropriate one. 2. 3. Record 10 seconds of audio data from the device. Write the recorded data to a WAVE file: out.wav.
The recording process multiplexes the streams from each microphone channel in an interleaved formatch 1/ ch 2/ ch 3/ ch 4/ ch 1/ ch 2/... and so onwith each channels data in a 16-kiloHertz (kHz), 32-bit mono pulse code modulation (PCM) format.
The following is a lightly edited version of the AudioCaptureRaw output for a system with two capture devicesa Microsoft LifeChat headset and a Kinect sensor:
WASAPI Capture Shared Timer Driven Sample Copyright (c) Microsoft. Select an output device: 0: 1: 0 Capture audio data for 10 seconds 1 Successfully wrote WAVE data to out.wav Microphone Array (Kinect USB Audio) ({0.0.1.00000000} {6ed40fd5-a340-4f8a-b324-edac93fa6702}) Headset Microphone (3- Microsoft LifeChat LX-3000 )({0.0.1.00000000} {97721472-fc66-4d63-95a2-86c1044e0893}) All Rights Reserved
The remainder of this document walks you through the application. Note This document includes code examples, most of which have been edited for brevity and readability. In particular, most routine error correction code has been removed. For the complete code, see the example. Hyperlinks in this walkthrough refer to content on the MSDN website.
bool PickDevice(IMMDevice **DeviceToUse, bool *IsDefaultDevice, ERole *DefaultDeviceRole) { IMMDeviceEnumerator *deviceEnumerator = NULL; IMMDeviceCollection *deviceCollection = NULL; *IsDefaultDevice = false;
PickDevice then uses the IMMDeviceCollection interface to list the available capture devices and let the user select the appropriate devicepresumably the Kinect sensoras follows:
bool PickDevice(...) { UINT deviceCount; ... hr = deviceCollection->GetCount(&deviceCount); for (UINT i = 0 ; i < deviceCount ; i += 1) { LPWSTR deviceName; deviceName = GetDeviceName(deviceCollection, i); printf_s(" %d: %S\n", i, deviceName); free(deviceName); } ... }
PickDevice first calls the collection objects IMMDeviceCollection::GetCount method to determine the number of devices in the collection and then iterates through the collection and lists the device names.
LPWSTR GetDeviceName(IMMDeviceCollection *DeviceCollection, UINT DeviceIndex) { IMMDevice *device; LPWSTR deviceId; HRESULT hr; hr = DeviceCollection->Item(DeviceIndex, &device); hr = device->GetId(&deviceId); IPropertyStore *propertyStore; hr = device->OpenPropertyStore(STGM_READ, &propertyStore); SafeRelease(&device); PROPVARIANT friendlyName; PropVariantInit(&friendlyName); hr = propertyStore->GetValue(PKEY_Device_FriendlyName, &friendlyName); wchar_t deviceName[128]; hr = StringCbPrintf(deviceName, sizeof(deviceName), L"%s (%s)", friendlyName.vt != VT_LPWSTR ? L"Unknown" : friendlyName.pwszVal, deviceId); ...//Clean up and return the device name }
Each device in the collection is identified by a zero-based index and is represented by a device object that exposes an IMMDevice interface. The device detailsincluding a readable friendly nameare stored in the device objects property store, which is represented by an IPropertyStore interface. A property store provides general-purpose storage. Each item is identified by a keya PROPERTYKEY structurethat is typically named PKEY_XYZ. The key for the devices friendly name is named PKEY_Device_FriendlyName. To obtain the devices friendly name, GetDeviceName: 1. 2. 3. Calls the IMMDeviceCollection::Item method to retrieve the specified device objects IMMDevice interface. Calls the IMMDevice::GetId method to retrieve the device ID. Calls the IMMDevice::OpenPropertyStore method to get a read-only pointer to the device objects IPropertyStore interface.
4. 5.
Passes the friendly name property key to the IPropertyStore::GetValue method, which returns a PROPVARIANT structure with the devices friendly name. Calls the StringCbPrintf function to extract the name string from the PROPVARIANT structure.
bool PickDevice(...) { ... wchar_t choice[10]; _getws_s(choice); long deviceIndex; wchar_t *endPointer; deviceIndex = wcstoul(choice, &endPointer, 0); hr = deviceCollection->Item(deviceIndex, &device); ... }
int wmain() { ... CWASAPICapture *capturer = new (std::nothrow) CWASAPICapture(device, role); if (capturer->Initialize(TargetLatency)) { ... } ... }
To create the object, wmain passes the devices IMMDevice interface and a role value to the constructor. The constructor uses this input to set some private data members. The contents of the if block implement the capture process and are discussed in the next section. wmain passes a target latency value to CWASAPICapture::Initialize to initialize the object. AudioCaptureRaw polls for data. Target latency defines the wait time and also influences the size of the buffer that is shared between the application and the audio client.
bool CWASAPICapture::Initialize(UINT32 EngineLatency) { _ShutdownEvent = CreateEventEx(NULL, NULL, 0, EVENT_MODIFY_STATE | SYNCHRONIZE); HRESULT hr = _Endpoint->Activate(__uuidof(IAudioClient), CLSCTX_INPROC_SERVER, NULL, reinterpret_cast<void **>(&_AudioClient)); hr = CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&_DeviceEnumerator)); LoadFormat()) InitializeAudioEngine() return true; }
Initialize creates a shutdown event that is used later to help manage the capture process. It then calls the devices IMMDevice::Activate method to create an audio client object for the device, which is represented by an IAudioClient interface. It completes the preparation by calling the private LoadFormat and InitializeAudioEngine methods.
The data from the final step is stored for later use.
int wmain() { ... if (capturer->Initialize(TargetLatency)) { size_t captureBufferSize = capturer->SamplesPerSecond() * TargetDurationInSec * capturer->FrameSize(); BYTE *captureBuffer = new (std::nothrow) BYTE[captureBufferSize]; if (capturer->Start(captureBuffer, captureBufferSize)) { do { printf_s(" \r%d\r", TargetDurationInSec); Sleep(1000); } while (--TargetDurationInSec); printf_s("\n"); capturer->Stop(); // Save the data to a WAVE file and clean up. ... } }
Before starting the capture process, wmain first computes the size of the capture buffer, which is the product of the following: The sample rate, in samples per second, which is extracted from the mix format by the private CWASAPICapture::SamplesPerSecond method. The target duration, in seconds, which is hard-coded to 10 seconds. The frame size, which was computed earlier and is retrieved by the private CWASAPICapture::FrameSize method.
bool CWASAPICapture::Start(BYTE *CaptureBuffer, size_t CaptureBufferSize) { HRESULT hr; _CaptureBuffer = CaptureBuffer; _CaptureBufferSize = CaptureBufferSize; _CaptureThread = CreateThread(NULL, 0, WASAPICaptureThread, this, 0, NULL); hr = _AudioClient->Start(); return true; }
Start: 1. Calls the CreateThread function to create the worker thread. CreateThread creates a new thread and calls CWASAPICapture::WASAPICaptureThread on that thread. WASAPICaptureThread is discussed in the following section. 2. Calls the IAudioClient::Start method to direct the audio client to start streaming data between the endpoint buffer and the audio engine.
void CWASAPICapture::Stop() { HRESULT hr; if (_ShutdownEvent) { SetEvent(_ShutdownEvent); } hr = _AudioClient->Stop(); if (_CaptureThread) { WaitForSingleObject(_CaptureThread, INFINITE);
wmain then calls the private SaveWaveData method to write the captured data to a .wav file. For details, see the sample. wmain then performs final cleanup and terminates the application.
DWORD CWASAPICapture::DoCaptureThread() { bool stillPlaying = true; HANDLE mmcssHandle = NULL; DWORD mmcssTaskIndex = 0; HRESULT hr = CoInitializeEx(NULL, COINIT_MULTITHREADED); mmcssHandle = AvSetMmThreadCharacteristics(L"Audio", &mmcssTaskIndex); while (stillPlaying) { // Capture audio stream until stopped by primary thread. } AvRevertMmThreadCharacteristics(mmcssHandle); CoUninitialize(); return 0; }
DoCaptureThread: 1. Calls the CoInitializeEx function to initialize COM for the worker thread. You must initialize COM separately for each thread. 2. 3. Calls the AvSetMmThreadCharacteristics function to associate the worker thread with the capture task. Starts a while loop to capture the data, which runs until the primary thread calls CWASAPICapture::Stop.
DWORD CWASAPICapture::DoCaptureThread() { ... while (stillPlaying) { HRESULT hr; DWORD waitResult = WaitForSingleObject(_ShutdownEvent, _EngineLatencyInMS / 2); switch (waitResult) { case WAIT_OBJECT_0 + 0: stillPlaying = false; break; case WAIT_TIMEOUT: BYTE *pData; UINT32 framesAvailable; DWORD flags; hr = _CaptureClient->GetBuffer(&pData, &framesAvailable, &flags, NULL, NULL); if (SUCCEEDED(hr)) { UINT32 framesToCopy = min(framesAvailable, static_cast<UINT32>((_CaptureBufferSize _CurrentCaptureIndex) / _FrameSize)); if (framesToCopy != 0) { if (flags & AUDCLNT_BUFFERFLAGS_SILENT) { ZeroMemory(&_CaptureBuffer[_CurrentCaptureIndex], framesToCopy*_FrameSize); } else { CopyMemory(&_CaptureBuffer[_CurrentCaptureIndex], pData, framesToCopy*_FrameSize); } _CurrentCaptureIndex += framesToCopy*_FrameSize; } hr = _CaptureClient->ReleaseBuffer(framesAvailable); } break; } } ... }
For each iteration, the capture loop waits until the next frames data has been streamed: If the primary thread raises _ShutdownEvent before the time-out ends, the capture loop terminates. If the primary thread does not raise _ShutdownEvent, the capture loop fills the capture buffer and starts the next iteration.
For More Information For more information about implementing audio and related samples, see the Programming Guide page on the Kinect for Windows SDK Beta website at: http://research.microsoft.com/kinectsdk