Professional Documents
Culture Documents
David Griesinger
Cambridge MA USA www.DavidGriesinger.com
Introduction
This paper proposes fundamental questions about the properties of human hearing (the topic of this conference)
How do we localize sounds in the up/down and front/back planes?
Are the methods used different for different individuals? Can binaural recordings made for one individual be made to work for another individual without head-tracking?
Given the extremely non-uniform transfer of sound pressure from the soundfield to a human eardrum, how can we accurately perceive a frequency balance as flat Does a frequency balanced pink noise from a frontal loudspeaker sound balanced in frequency? If not, are commercial recordings, which are equalized using loudspeakers, actually frequency balanced? If not in what ways are they biased?
Tiny probe microphones were also built with a very soft tip. This allows binaural recording of performances at the authors eardrums, correct headphone calibration, and verification of the accuracy of the dummy head model.
A perplexing Discrepancy
Recordings made with this technology provide excellent localization accuracy.
But at least initially the timbre of the playback through carefully calibrated headphones seems incorrect. The frequencies around 3kHz seem too strong, and the bass is usually weaker than my memory of the performance.
Checking and re-checking the calibrations has convinced me the recordings and the playback are correct.
It is my memory of the performance that is flawed. The most reasonable explanation is that we continuously adapt to the frequency balance of sounds around us. We remember the timbre after such adaptation has taken place.
Over a long period of time the brain builds spectral maps for the features that define up/down and back/front information in HRTFs. When a sound is heard these features are compared to the maps, and a localization is found.
When a match has been found, the perceptible features of the particular HRTF are removed, again from a fixed spectral map.
But this spectrum is altered by a relatively short time constant adaptive equalizer, with acts to make all frequency bands equally perceived. The time constant of this mechanism for the author is about 5 minutes. It may be shorter for some individuals.
An example
The author once noticed a gliding whistle while walking under an overhead ventilator slot that emitted broadband noise.
Walking rapidly (~3.5mph) under that noise source produced a gliding whistle, somewhat like a Doppler shift. This is the uncorrected sound of the vertical HRTFs In spite of the lack of timbre correction the sound was correctly localized even at much higher speeds.
No timbre shift was perceived when walking slowly under the slot (<2mph).
When there is sufficient time our brains correct the timbre but this correction takes time in this case a fraction of a second.
Headphone listening
When we listen to binaural recordings with headphones the whole process is broken. Headphones match individuals very poorly (as we will see). None of the spectral features match the fixed HRTF maps. The brain is confused, and the subject perceives the sound inside the head.
But the adaptive equalizer is still active and after a time period the sound is perceived as frequency balanced.
Upward Masking
Sound enters basilar membrane at the oval window. High frequencies excite the membrane near the entrance, passing through it and exiting through the second window below. Low frequencies travel further down the spiral, until they excite the membrane and pass through.
Strong low frequencies disturb the high frequency portion of the membrane, causing the well know phenomenon of upward masking.
Upward masking is a purely mechanical effect, and it cannot be compensated by adaptive equalization. The high frequencies are simply not detected. Intelligibility is frequently low in acoustic spaces because there is little low frequency absorption, and the LF acoustic power is boosted. We adapt to the frequency imbalance, and say the sound is OK but unintelligible
If safe, comfortable probe microphones are available, it is possible to make accurate binaural recordings. First we measure the headphone response at the eardrum response H. We can then record with the same probe microphones. If we equalize the recording with the inverse of H, H, the recording will play back with perfect fidelity.
If we want to play back the binaural recording over speakers, or if we want to play loudspeaker music over headphones, we need to measure the spectrum of a carefully equalized loudspeaker at the eardrums of the listener. This is the spectrum S. We then equalize the binaural recording with S, and we can play it over speakers. Equalizing the phones with HS allows playback of both binaural and loudspeaker mixed music. HS is the inverse of the free-field earphone response
All measurements with the probes are first convolved with this inverse function.
Second order parametric filters are combined to produce the other equalization filters. Parametric filters can be easily inverted, and sound better than mathematical inverse filters to the author
Probe Equalization
This graph shows the frequency response and time response of the digital inverse of the two probes as measured against a B&K 4133 microphone. Matlab is used to construct the precise digital inverse of the probe response, both in frequency and in time. The resulting probe response is flat from ~25Hz to 17kHz. In general, I prefer NOT to use a mathematical inverse response, as these frequently contain audible artifacts. I minimized these artifacts here by carefully truncating the measured response as a function of frequency.
In spite of the data, Hammershoi and Muller recommend using measurements at the entrance to the ear canal!
The recommendation can be disproved by a single subject
Here are pictures of a partially blocked canal and a fully blocked canal. The following data applies to the fully blocked measurements, but the partially blocked measurements are similar.
Twenty different HRTFs were measured with a blocked canal, equalized by the above EQ, and the difference between them and the open ear canal are plotted. This data supports Hammershoi and Mullers contention that that the directional properties of the measured HRTFs are preserved by the blocked measurement, at least to a frequency of ~7kHz.
Note the vertical scale is +-30dB. The errors at 7-10k are significant.
Using the same method, I measured three headphones. Blue is the AKG 701, red is the AKG 240, and Cyan is the Sennheiser 250 The curves plot the difference between the blocked and unblocked measurement, with the measured HRTF at azimuth 15, elevation 0 as a reference. The vertical scale is +-30dB. Errors of at least 10dB exist at midband.
More headphones
Blue and old but excellent noise protection earphone by Sharp. Red Ipod earbuds. The error in the blocked measurements are large enough to prevent accurate localization of binaural recordings.
Analysis
The previous curves are NOT the frequency response of the headphones under test. They show the ERRORs that occur when a blocked ear canal measurement is used instead of the eardrum pressure. Because the scale of the plots is +-30dB the difference curves look better than they really are. Errors of 10dB in frequency ranges vital for timbre are present for almost all the examples shown. We can conclude that it is possible to use recordings from dummy heads that lack accurate ear canals IF AND ONLY IF it is possible to equalize them, either by comparison to a reference with ear canals, or by equalizing them to sort-of flat for a frontal sound source. If this is done, we must also equalize the headphones at the eardrum for the same source. We can with more assurance conclude that it is NOT possible to equalize headphones with a measurement system that does NOT include an accurate ear canal model.
Both KEMAR and HATS do not qualify.
Measurement systems with true ear canals are a very good thing
In addition I have found that for many earphones it is vital to have a pinna model with identical compliance to a human ear. Particularly on-ear headphones alter the concha volume and drastic changes in the frequency response can result if the compliance is not accurate. Pinna are complex structures with variable compliance so this is tricky!
Equal Loudness
Top ISO equal loudness curves for 80dB and 60dB SPL these are the average from many individuals, so features in them are broadened.
Bottom (blue/red) averaged frontal response over a +-5 degree cone in front of the author, measured at the eardrums. The loudspeaker was equalized to 200Hz. Bottom - black/cyan the same measurement for the authors dummy head with no equalization. The difference in eardrum impedance above 8kHz boosts the response of the dummy but this can be removed by equalization.
Equal Loudness 2
We can measure equal loudness curves because the ear does not adapt when the stimulus is narrow band either noise or tone. The differences between the top and bottom curves in the previous slide can be attributed to the properties of the middle ear and the inner ear. Thus equal loudness curves are a method of measuring the effective frequency response an individuals hearing system in the absence of short-term adaptation to the environment. They represent our sensitivity to timbre in a quiet environment, or before adaptation takes place. Their extreme lack of flatness is proof of the existence, and effectiveness, of adaptation.
The difference of the loudspeaker and headphone measurements becomes the ideal headphone correction for this individual.
This program can be used to test the variation in response of a particular headphone over a wide range of individuals.
Subjects report that the resulting equalization is very pleasant, and binaural recordings made with the authors ears reproduce well without head tracking. Music recorded for loudspeakers is judged identical in timbre in both the headphones and the loudspeaker. The equalization is also identical in timbre to a large high-quality stereo sound system.
After doing the experiment the subjects were given the opportunity to listen to music both with the frontal equalization and with their own equal loudness equalization. (the speaker curves were not subtracted)
The authors binaural recordings were perceived with better localization with the free-field equalization. (These recordings were equalized for free-field reproduction.) Many subjects preferred their own equal loudness equalization for other material.
This equalization requires no adaptation to a recording that has an accurately flat frequency response. The sound can be quite seductive.
Some Speculation
Equal loudness curves have two prominent features; the increase insensitivity around 3kHz, and the decrease in sensitivity at low frequencies. Music that has been recorded with frequency linear microphones and not post-processed often seems lacking in bass and harsh in the midrange both on loudspeakers and on eardrum-equalized headphones. The author speculates that an unconscious collusion between loudspeaker designers and recording engineers routinely boosts the bass, and tweaks the 3kHz region on commonly available recordings.
It is common to boost the bass 10dB at 60Hz in automobiles.
Floyd Tooles findings that the loudspeakers that are closest to frequency linear are preferred in blind listening tests may be biased by the choice of recordings used in the tests.
The spectrum of choral music in the authors unprocessed recordings shows a ~3dB peak around 3kHz. This peak is generally absent in vocalists on pop music. Perhaps they use a different singing technique and perhaps the equalization has been adjusted closer to an equal-loudness curve.
Conclusions
Experiments and observation suggest that human hearing uses a combination of fixed spectral maps to perceive the localization of a sound, and then corrects the HRTF timbre with a similar map. These fixed maps are combined with a relatively rapid AGC system that tends to equalize loudness across frequency bands. The existence of equal loudness curves show that for narrow band signals adaptation does not take place. When a new, unknown broadband signal is first heard, the ear hears the timbre that reflects the equal loudness calibration. But this timbre is replaced in a short time with a more balanced timbre, and this balanced timbre is remembered. It is likely that given the opportunity to equalize a recording to their own taste using loudspeakers with a flat frequency response, recording engineers will be sorely tempted to move toward their own equal loudness curve.
The temptation is dangerous but probably harmless. We can see that individual loudness curves can be rather different particularly at low frequencies. But adaptation will continue to work when the recording is played back, and if the response does not match that of the listener, they will soon not notice the difference.