You are on page 1of 18

Basic Features of Audio Signals

()
Jyh-Shing Roger Jang ()
http://www.cs.nthu.edu.tw/~jang
MIR Lab, CS Dept, Tsing Hua Univ.
Hsinchu, Taiwan

Audio Features
Four commonly used audio features
Volume
Pitch
Zero crossing rate
Timber
Our goal
These features can be perceived subjectively.
But we need to compute them quantitatively for
further processing and recognition.
Audio Features in Time Domain
Audio features presented in the time domain



Intensity
Fundamental period
Timbre: Waveform within an FP
Audio Features in Frequency Domain
Volume: Magnitude of spectrum
Pitch: Distance between harmonics
Timber: Smoothed spectrum








Second formant
F2
First formant
F1
Pitch freq
Intensity
Demo: Real-time Spectrogram
Try dspstfft_audio under MATLAB:
Spectrogram: Spectrum:
Steps for Audio Feature Extraction
Frame blocking
Frame duration of 20 ms or so
Feature extraction
Volume, zero-crossing rate, pitch, MFCC, etc
Endpoint detection
Usually based on volume & zero-crossing rate
Frame Blocking
Sample rate = 11025 Hz
Frame size = 256 samples
Overlap = 84 samples
(Hop size = 256-84)
Frame rate = 11025/(256-84)=64 frames/sec
0 50 100 150 200 250 300
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Intensity (I)
Intensity
Visual cue: Amplitude of vibration
Computation:
Volume:

Log energy (in decibel):
Characteristics
Influenced by
microphone types
Microphone setups
Perceived volume is influenced by frequency and timbre
1
n
i
i
vol s
=
=

2
10
1
10*log
n
i
i
energy s
=
| |
=
|
\ .

Intensity (II)
To avoid DC drifting
DC drifting: The vibration is not around zero
Computation:
Volume:

Log energy (in decibel):
Theoretical background (How to prove?)

( )
1
n
i
i
vol s median s
=
=

( ) ( )
2
10
1
10*log
n
i
i
energy s mean s
=
| |
=
|
\ .

| | ( )
1 2
1
, ,..., arg min
n
n i
x
i
s s s s s x median s
=
= =

| | ( ) ( )
2
1 2
1
, ,..., arg min
n
n i
x
i
s s s s s x mean s
=
= =

Intensity (III)
Examples
Please refer to the online tutorial
Pitch
Definition
Pitch is known as fundamental frequency, which
is equal to the no. of fundamental period within a
second. The unit used here is Hertz (Hz).
More commonly, pitch is in terms of semitone,
which can be converted from pitch in Hertz:
2
69 12*log
440
Hz
semitone
| |
= +
|
\ .
Pitch Computation (I)
Pitch of tuning forks
( ) ( )
semitone
f f
pitch
Hz f f
98 . 68
440
log * 12 69
56 . 439 5 / 7 187 / 16000
2
=
|
.
|

\
|
+ =
= =
Pitch Computation (II)
Pitch of speech
( ) ( )
semitone
f f
pitch
Hz f f
42 . 46
440
log * 12 69
403 . 119 3 / 75 477 / 16000
2
=
|
.
|

\
|
+ =
= =
Statistics of Mandarin Chinese
5401 characters, each character is at least associated with a
base syllable and a tone
411 base syllables, and most syllables have 4 ones, so we
have 1501 tonal syllables
Tone is characterized by the pitch curves:
Tone 1: high-high
Tone 2: low-high
Tone 3: high-low-high
Tone 4: high-low
Some examples of tones:
1242
1234
?????Taiwanese
Sinusoidal Signals
How to generate a stream of sinusoidal signals
fs=16000;
duration=3;
f=440;
t=(1:fs*duration)/fs;
y=0.8*sin(2*pi*f*t);
plot(t,y); axis([0.6, 0.65, -1 1]);
sound(y, fs);
Zero Crossing Rate
Zero crossing rate (ZCR)
The number of zero crossing in a frame.
Characteristics
Noise and unvoiced sound have high ZCR.
ZCR is commonly used in endpoint detection,
especially in detection the start and end of
unvoiced sounds.
To distinguish noise/silence from unvoiced sound,
usually we add a bias before computing ZCR.

ZCR Computations
Two types of ZCR definition
If a sample with zero value is considered a case of
ZCR, then the value of ZCR is higher. Otherwise
its lower.
It affects the ZCR, especially when the sample
rate is low.
Other consideration
Zero-justification is required.
ZCR with shift can be used to distinguish between
unvoiced sounds and silence. (How to determine
the shift amount?)
ZCR
Examples
Please refer to the online tutorial.

You might also like