You are on page 1of 7

INVESTIGATING THE USE OF SUBJECT-SPECIFIC HRTFS IN CREATING ACCURATE 3D AUDITORY ENVIRONMENTS.

BRIAN TUOHY1
1

Sonic Arts Research Center (SARC), Queens University Belfast, U.K. btuohy02@qub.ac.uk

The relevance of 3D audio extends well beyond applications that are solely concerned with music. Creating immersive environments can aid the advancement of medical and technical training methods, provide more realistic gaming experiences, and add a further level of engagement to 3D movies. One method of creating such environments is to implement head-related transfer functions (HRTFs), which recreate the ears response to sound coming from a particular point in space. These responses, however, will vary from person to person due to differing physical characteristics such as the size and shape of the head, pinnae and torso. In order to achieve more accurate modeling of the sound in a 3D environment, subject-specific HRTFs have been investigated. These take into account the unique anthropometric parameters of the listeners body in order to improve spatial impression and localization accuracy. This paper seeks to investigate current research into the potential of HRTF personalisation as a means of creating improved virtual environments. INTRODUCTION In attempting to recreate a 3D auditory environment, one must first consider how we perceive sound naturally. The human ears act as antennae that allow us to receive audio cues from our surroundings [1]. Localisation of sounds in such a space is aided by the brains recognition of differences in the sound waves received at each ear. These differences come in the form of interaural time differences (ITD) and interaural intensity differences (IID). The ITD and IID will reflect the earlier arrival and higher intensity of the signal at the ipsilateral ear, and hence assist in localizing the origin of the sound. When a sound is located on the median plane, however, these ITD and IIDs are essentially not present. In cases such as this, other parts of our body help to locate the sound through a system of diffraction (e.g. the head) and reflection (e.g. the torso). The pinnae filter sound arriving at the ears depending on the angle of incidence. This filtering further informs the brain as to the possible location of the auditory cue. Standard stereo audio does not contain this ITD, IID and filtering information, and, as such, the sound is not perceived as though it is emanating from the surrounding environment. This lack of externalisation is also a result of the absence of dynamic cues, which inform the listener as to his/her interaction with the environment by moving their head in relation to the sound source. A third factor affecting the spatial perception of headphone audio is the lack of environmental cues, which indicate the interaction between the sound waves and the acoustic space, such as reflections off walls. This helps to provide a sense of externalisation and distance associated with the sounds. This paper is concerned with the accurate measurement and recreation of the natural processing that occurs when sound waves interact with the human body, and how this affects perception of sound in space. 1 HEAD-RELATED TRANSFER FUNCTIONS

A head-related transfer function (HRTF) can be defined as "the acoustical transfer function between a sound source and the entrance of the ear canal" [2]. As such, a HRTF can be seen as a way of simulating spatialisation by measuring how the sound waves interact with the listeners body. HRTFs seek to replicate the sound heard at the listener's ears in response to auditory stimuli located at a specific point in space [3]. This is represented as time, intensity and timbral differences between the sounds heard in the left and right ear. These functions reproduce the effects of the head, torso, and pinnae, which act as reflectors and diffusors, creating short time delays that can be implemented electronically with comb filters and similar processing methods [4]. When these differences are applied to a dry, unspatialised source, the ear interprets the resulting sound as having occurred at the modelled location in space. 3D sound systems such as virtual auditory displays (VADs) work by convolving a sound source with HRTFs in order to present a spatialised sound to a listener [5]. 1.1 Calculating HRTFs A standard method for calculating HRTFs for a human subject is demonstrated in [5]. In this method, a small microphone is placed at the entrance of each of the subjects ear canals and the canals are blocked with clay. The subject sits at the centre of a spherical array of 70 loudspeakers placed at a distance of 1.5m, as shown

First Spatial Audio Conference, SARC, Belfast, 2012 March 21 & 28

Tuohy in Figure 1. The speakers are located at 10-degree intervals of azimuth and elevation. The entire array is rotated at 5-degree intervals. A time stretched pulse is generated from one loudspeaker at each angle of elevation. HRTFs are then recorded for each 5 degrees of azimuth and 10 degrees of elevation. The result is a series of recordings representing the ears response to sounds located at 2592 different points in space. Using this database of HRTFs, it is possible to convolve any source sound to simulate the effect of it being located at the chosen point in space.

SUBJECT-SPECIFIC HRTFS construct a generalised representation for most directions of incidence above 8KHz. It was not possible, however, to simply average HRTFs for the subjects because this would result in a flattening of curves, which is not an accurate depiction of a typical response. The level of variation in the ears response for two different subjects is illustrated in Figure 2.

Figure 2. HRTF spectra for two different subjects. Figure 1. Spherical speaker array used in measurement of HRTFs. 1.2 Implementation of HRTFs HRTFs have found many applications in spatial audio, such as in VADs and virtual reality game sound. However, a problem with the commercial implementation of HRTFs exists with regard to listener response to non-individualised HRTFs. It has been established that the characteristics of HRTFs change depending on location of sound source, but they have also been shown to vary between subjects, due to anthropometric differences [6]. In this regard, Francis Rumsey compares pinnae to fingerprints due to the vast variations in size and shape that exist between individuals [7]. This makes generalisation of HRTFs difficult, as the same functions will not apply to everyone. Some evidence suggests that generalisation is possible, and that people can adjust to HRTFs that are not their own eventually learning to localise better as familiarity increases [8]. In this sense, generalisation is best afforded by using HRTFs belonging to people who are good at localisation, as these have been found to be more useful for broader application. Moller et al [9] tested the capacity for generalisation by calculating and examining HRTFs for 40 subjects. They found that the HRTFs of subjects differed only slightly up to around 8KHz and that it was still possible to One of the primary disadvantages of using generalised HRTFs is an increase in front-back confusion and elevation inaccuracies [8]. Rumsey describes experiments that have been conducted; where subjects have had their own ears blocked and were directly fed signals that had been processed with another person's HRTFs. The subjects ability to localise accurately with the borrowed HRTFs was significantly reduced [7]. This has led to much research into what characteristics have most influence on HRTFs. The extent of anthropometric influence on spatial cues has been exemplified, in particular, by studies that have examined the differences in HRTFs calculated for children and those found with adults [10]. Physical characteristics, including head and torso size, have been shown to have a significant effect on HRTFs. This is a fact that could prove problematic in the implementation of HRTFs in commercial applications, such as in the gaming industry. Over-generalisation of HRTF parameters could narrow the range of suitable candidates for a particular product. Hence, in order to target a wide demographic, a larger number of generalised HRTFs may be a necessary option in order to improve accurate localisation. Despite the evidence pointing to the inefficiencies of using generic HRTFs, this method remains common practise in commercial systems. The reason for this is largely due to the time and cost involved in measuring HRTFs for a large range of subjects.

First Spatial Audio Conference, SARC, Belfast, 2012 March 21 & 28

Tuohy Kay Stanney points to the lack of an efficient system for providing individualised HRTFs as an obstacle, preventing virtual reality applications reaching their full potential [11]. By only using generalised HRTFs, the perceptual accuracy of virtual 3D auditory environments is compromised. 2 ANTHROPOMETRIC FACTORS AFFECTING HRTFS

SUBJECT-SPECIFIC HRTFS interact with and become amplified by the fossa as seen in Figure 4. Tan and Gan found that directly exciting the concha in order to induce individual frontal cues can reduce front-back error when used simultaneously with spectral cues from a traditional HRTF.

2.1 Sound processing at the pinna The importance of anthropometric features in relation to HRTFs has already been mentioned in this paper. In particular, the pinnae have been highlighted as strongly influential on the process of localising sounds. Blauert noted that even slightly altering the processing of signals at the entrance to the ears could significantly influence the quality of spatial hearing and accuracy of localisation [12]. In particular, it has been shown that the concha and fossa of the helix (shown in Figure 3) highly influence the perception of high frequency directionality [13].

Figure 4. Sound waves from different locations interacting with different parts of the pinna. The influence of the pinnae has been cited as primarily related to high frequencies (above 5KHz). Alternately, studies in perception have indicated that the head and torso provide much of the information related to low frequency vertical localisation (Below 3KHz) [14]. 2.2 Proposed solutions to the inaccuracies of generalised HRTFs Several methods of accounting for the inaccuracies of generalised HRTFs have been suggested. Some research proposes methods of HRTF selection based on listening tests [15]. This method uses subjective testing of a number of HRTFs from a database in order to select the record that is most suited to a specific user. Another suggested solution to the problem is to provide a number of different HRTFs and to introduce calibration test signals in order to allow users to tune their system to match their ears' response [4]. In this way, the listener can start with a HRTF of best fit, and continue to alter the effect of that HRTF by selecting additional processing methods. An example of an extensive database of HRTF measurements can be seen in the CIPIC HRTF database [16]. This database is comprised of HRTFs from 45 subjects and 2 mannequins at 25 azimuth and 50 elevation positions. The anthropometric characteristics of the subjects are also provided. As this database is in the public domain, it is an important resource in assessing and manipulating HRTFs to best suit the individual listener required. An accurate HRTF is needed in order to properly localise sound, however, it is not practical to measure the exact HRTF for each listener at all possible sound directions. This is the motivation for attempting to estimate HRTFs with simple anthropometric

Figure 3. Structure of external ear. The concha resonance is created by the main cavity in the centre of the pinna. It is understood that this is responsible for giving the impression of externalisation - sound being located outside the head, rather than from within [7]. This resonance can be distorted by certain headphones, leading to poor externalisation. In response to this, attempts have been made to design headphones that stimulate the concha resonance from outside the ear canal in order to better apply certain features of individual HRTFs and reproducing externalised perception [8]. In [8], Tan and Gan propose a method of exciting the concha in order to introduce spatial cues in the absence of individualised HRTFs. The concha is described as being responsible for directing sound at or above ear level, while sounds coming from below primarily

First Spatial Audio Conference, SARC, Belfast, 2012 March 21 & 28

Tuohy measurements [2]. Physical measurements of important morphological characteristics are taken and then inserted into a formula for multiple regression analysis in order to estimate individualised HRTFs. 3 METHODS OF DELIVERING SPECIFIC HRTFS SUBJECT-

SUBJECT-SPECIFIC HRTFS short and allowed the listener to subjectively tailor the most suitable HRTF for them to be more accurate. 3.2 Individualised HRTFs Using Physically Measured Anthropometric Calculations With this method, individualised HRTFs are estimated for subjects by using measurements of physical features. Inoue et al propose the analysis of head and ear size in order to provide approximate HRTF representations using multiple regression analysis [2]. This allows the HRTF to be expressed as a function of a number of physical features. In addition to the HRTFs calculated from the anthropometric data, actual HRTFs were recorded for each participant in the same manner as described above. This study was carried out with 86 Japanese subjects 15 females, 71 males, ranging in age from 17 - 33. Nine physical features were measured (see Figure 5.) - those seen as important in the design of the KEMAR mannequin: 1) Ear length, 2) Ear breadth, 3) Concha length, 4) Concha breadth, 5) Protrusion, 6) Bitragion diameter, 7) Radial distances among the bitragion and the pronasale, 8) Radial distances among the bitragion and the opistocranion, 9) Radial distances among the pornasale, the vertex, and the opistocranion Comparisons were carried out between the measured HRTFs of particular subjects, and the estimated HRTFs based on their physical features. The results showed no significant difference between the measured HRTF and the estimated HRTF in terms of perception and localisation. The overall localisation was very accurate, showing that an estimation of HRTF behaviour can be calculated without recording the impulse response of the individual head, but by calculating the proportions and filtering accordingly. In one sense, this study does not take into account how the calculations may vary for greater diversity in race or age. For example, much of the concern in video gaming would be related to HRTFs for children, which could be quite different to results calculated for adults. Further research into anthropometric measurements has sought to identify key anthropometric measurements (KAMs) to be considered when estimating individual HRTFs [18]. Both global and local KAMs are identified - the latter being dependant on specific source positions. A weighted correlation method is proposed, whereby local KAMs are classified according to the level of

The following sections will discuss a number of methods employed by different researchers in an attempt to provide accessible individualised HRTFs. 3.1 Individualised HRTFs Using Subjective Tests And Database Matching Subjective testing of existing HRTFs is one of the most common methods of individualising measurements. Iwaya proposed a method for selection of the most suitable HRTFs for an individual subject based on a layered examination of an existing database [5]. The system developed was called Determination method of OptimuM Impulse-response by Sound Orientation (DOMISO). The selection system was based on a tournament, whereby two HRTFs would be subjectively tested against one another and the most accurate example would progress to the next stage of testing. The body of HRTFs was built from measuring 120 individuals. The tournament would begin with the random selection of 32 sets of HRTFs. The test for localisation accuracy consisted of pink noise convolved with each of the HRTFs, tested for 13 different points of azimuth. The pattern of sound delivery was shown to the listener before the test, and they choose which of the two HRTFs best matches the proposed pattern, this HRTF wins the match and continues to the next test. One advantage of an individualisation method such as is that it this does not require any physical measurement of listener characteristics. The time required to test the listener for the most suitable HRTF is about 15 minutes, as opposed to two hours for calculation of exact HRTFs using the method described previously in this paper. With reference to localisation accuracy, the experiments showed very little difference between the accuracy of localisation using subject-measured HRTFs and subjected-selected HRTFs that won the tournament. Some slight accuracy advantages of the tournament selected HRTFs were shown in cases, and the advantage over non-fitted HRTFs was significant. Another user-selection process, which avoided actual measurement of individual HRTFs, is described in [17]. The addition of a test sequence to an existing 3D audio system allows the user to select the most suitable HRTF set. This HRTF could then be more accurately tuned with additional filtering and processing of the spectrum. This method proved particularly successful because it was built on the extensive resources included in the CIPIC HRTF database. The testing process was very

First Spatial Audio Conference, SARC, Belfast, 2012 March 21 & 28

Tuohy influence they have on sounds from specific locations. This study found an increase in localisation accuracy of over 9% when using KAMs compared to generalised HRTFs. If used in correlation with a large and diverse HRTF database, it would be possible to use KAMs to classify specific groups in a population, to which certain HRTFs would be more appropriate.

SUBJECT-SPECIFIC HRTFS The distance from ear to shoulder was shown as the most important parameter with regard to differences in HRTF. The back vertex and head breadth were also mentioned as highly influential. Conversely, the frontal vertex, head height and chin measurement had only a slight influence on the resulting HRTFs. When examining the variation in HRTFs as a product of age, the height of the ear was shown to have little influence on the IID and HRTF. One might expect a greater influence here, as this is the characteristic that varies most due to growth. The greatest influence was attributed to the dimensions of the cavum concha. The use of physically measured anthropometric data appears to be the most accurate method of individualisation examined in this paper. 3.3 Individualised HRTFs Using Image Processing To Calculate Anthropometric Data Although still quite a new area of research, there has been some progress shown in using image processing as a method of estimating anthropometric data for use in the calculation of individualised HRTFs. In [20], the authors calculated the anthropometric features of subjects using a 3D laser scanner, and applied this data to filtering processes on existing HRTFs. The individual HRTFs calculated provided an approximate 30% improvement on general HRTFs measured using a mannequin. While the specialist nature of the equipment is noted, it is also suggested that such morphological calculations may theoretically be possible using high-resolution two-dimensional photography. Zotkin [1] outlines a method for simple HRTF customisation based on the CIPIC database - adjusted according to measurements of subject pinnae. The anthropometric measurements are calculated using image processing and the acquired values are used to customise HRTFs. In psycho-acoustical tests, the method showed a 25% improvement in azimuth localisation compared to a fixed non-individualised HRTF. A photo of each ear is taken, and measurements are calculated by an operator in under a minute, then the best match is chosen from the CIPIC database. The authors acknowledged the coarse nature of such calculations, and the existence of far more dependencies than those of the pinnae alone, but this type of approach does point to the potential for such methods in matching user-specific anthropometric measurements to closest results from a HRTF database. Zotkin conducts further work by introducing a head and torso (HAT) model in the form of two spheres representing head and body, separated by a gap (neck) [21]. This produces increased accuracy in calculating HRTF behaviour, especially for low frequencies, which are not well represented by pinnae models alone.

Figure 5. Physical features measured. This suggests the ability to integrate several accurate size options into commercial auditory products. In highlighting the importance of anthropometric measurements in the individualisation of HRTFs, the authors suggest that head-related transfer functions might instead be referred to as "body-related transfer functions" as it would seem a more suitable terminology for such a broad system of influential factors. In [6], the authors attempt to reduce the number of anthropometric measurements required to accurately estimate individual HRTFs. This was done by using a system of weighting 8 components according to their level of influence on particular types of spatial cues. This method uses the CIPIC database as a starting point, and inserts the listener's anthropometric measurements into a formula to customise the HRTF The question of varying HRTFs dependent on age is addressed in [19]. This study assesses the influence of pinna, head and torso characteristics on spatial localisation. Importantly, the study looks at a large demographic, including children and adults, in order to assess the influence of anthropometric measurements at different ages. Six parameters were used to describe the head - (head height and breadth, front and back vertex, chin, and ear to shoulder distance). Another six parameters described the pinna - (Outer ear height and breadth, Cavum conchae height, breadth and depth, and ear rotation). Resulting HRTFs are shown to be vastly different between a young child and an older person.

First Spatial Audio Conference, SARC, Belfast, 2012 March 21 & 28

Tuohy Further systems for calculating anthropometric measurements using 2D imaging have also been suggested [22]. In this paper, a front, side and Rear view of the subject is taken with a digital camera, and linear and circumferential measurements are calculated. Accuracy of measurement depends on the correct identification of anatomical landmarks. The system did not provide entirely accurate results but suggested an important move in the calculation of anthropometric measurements using standard photographic equipment. 4 DISCUSSION & CONCLUSIONS This paper has presented an analysis of research into the topic of individualised HRTFs. There is a clear need for subject-specific HRTFs in order to improve the accuracy of localisation with headphone-based 3D sound systems. While most commercially available systems currently use generalised HRTFs, the research presented highlights the possibility of integrating the ability to individualise HRTFs in the future. The most accurate systems examined were based on the premise of physically measuring the anthropometric data specific to a listener, and implementing individual HRTFs based on the listeners specifications. It is clear that HRTFs are part of a modular system and do not depend solely on the pinnae or the head. Many important physical features have been mentioned, each of which influences the propagation pattern of sound waves as they travel to the ears. While the technology may not yet be available to permit users to carry out accurate calculations for entirely individualised HRTFs at home, research to date indicates significant potential for easily calibrated systems in the future. In particular, the author suggests that current advancements in digital imaging technology and products such as the Kinect could inform the next generation of anthropometric measurement and HRTF individualisation software. [5]

SUBJECT-SPECIFIC HRTFS Y. Iwaya. Individualization of head-related transfer functions with tournament-style listening test: Listening with other's ears. Acoustical Science and Technology 27(6), pp. 340-343. 2006. Hugeng, W. Wahab and D. Gunawan. Enhanced individualization of head-related impulse response model in horizontal plane based on multiple regression analysis. Presented at International Conference on Computer Engineering and Applications (ICCEA) . 2010, Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper. htm?arnumber=5445590. F. Rumsey. Spatial Audio 2001. C. J. Tan and W. S. Gan. Direct concha excitation for the introduction of individualized hearing cues. JOURNAL-AUDIO ENGINEERING SOCIETY 48(7), pp. 642-653. 2000. H. Moller, M. F. Sorensen, D. Hammershoi and C. B. Jensen. Head-related transfer functions of human subjects. Audio Engineering Society, AES 43(5), pp. 300-321. 1995. Available: http://www.aes.org/elib/browse.cfm?elib=7949}. J. Fels, "From Children To Adults: How Binaural Cues And Ear Canal Impedances Grow," 2008. K. Stanney. Realizing the full potential of virtual reality: Human factors issues that could stand in the way. Presented at Virtual Reality Annual International Symposium. 1995. J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization (Revised ed.) 1997. D. Begault, "Challenges to the successful implementation of 3-D sound," Journal of the Acoustical Society of America, ASA, vol. 39, pp. 864, November 1991. V. R. Algazi, C. Avendano and R. O. Duda. Elevation localization and head-related transfer function analysis at low frequencies. Journal of the Acoustical Society of America, ASA 109(3), pp. 1110-1110. 2001. Available: http://link.aip.org/link/JASMAN/v109/i3/p1110/ s1&Agg=doi.

[6]

[7] [8]

[9]

[10]

[11]

[12]

REFERENCES [1] D. N. Zotkin, R. Duraiswami, L. S. Davis, A. Mohan and V. Raykar. Virtual audio system customization using visual matching of ear parameters. Presented at IEEE International Conference on Pattern Recognition. 2002. [2] N. Inoue, T. Kimura, T. Nishino, K. Itou and K. Takeda. Evaluation of HRTFs estimated using physical features. Acoustical Science & Technology 26(5), pp. 453-455. 2005. K. C. Pohlmann. Principles of Digital Audio. C. Roads. The Computer Music Tutorial 1994.

[13]

[14]

[3] [4]

First Spatial Audio Conference, SARC, Belfast, 2012 March 21 & 28

Tuohy [15] B. U. Seeber and H. Fastl. Subjective selection of non-individual head-related transfer functions. Presented at International Conference on Auditory Display. 2003. V. R. Algazi, R. O. Duda, D. M. Thompson and C. Avendano. The CIPIC HRTF database. Presented at IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 2001. C. J. Tan and W. S. Gan. User-defined spectral manipulation of HRTF for improved localisation in 3D sound systems. Electronic Letters 34(25), pp. 2387-2389. 1998. S. Xu, Z. Li and G. Salvendy. Identification of anthropometric measurements for individualization of head-related transfer functions. Acta Acustica United with Acustica 95(1), pp. 168-177. 2009. Available: http://openurl.ingenta.com/content/xref?genre=ar ticle&issn=16101928&volume=95&issue=1&spage=168. J. Fels and M. Vorlander. Anthropometric parameters influencing head-related transfer functions. Acta Acustica United with Acustica 95(2), pp. 331-342. 2009. Available: http://openurl.ingenta.com/content/xref?genre=ar ticle&issn=16101928&volume=95&issue=2&spage=331 N. Gupta, A. Barreto and M. Choudhury. Modeling head-related transfer functions based on pinna anthropometry. Presented at Second LACCEI International Latin American and Caribbean Conference for Engineering and Technology (LACCEI). 2004. D. Zotkin, J. Hwang, R. Duraiswaini and L. S. Davis. HRTF personalization using anthropometric measurements. Presented at IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 2003. C. Hung, C. P. Witana and R. S. Goonetilleke. Anthropometric measurements from photographic images. Presented at Work with Computing Systems. 2004,

SUBJECT-SPECIFIC HRTFS

[16]

[17]

[18]

[19]

[20]

[21]

[22]

First Spatial Audio Conference, SARC, Belfast, 2012 March 21 & 28

You might also like