226 views

Uploaded by shimazaki

Visit http://bit.ly/9JXr2L for a web app, based on the method developed in this paper, and more on the topic.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
J Comput Neurosci DOI 10.1007/s10827-009-0180-4
Kernel bandwidth optimization in spike rate estimation
Hideaki Shimazaki · Shigeru Shinomoto
Received: 22 December 2008 / Revised: 20 May 2009 / Accepted: 23 July 2009 © The Author(s) 2009.

- Chapter 1 Mathematics
- Histogram Definition
- Analisis de Consist en CIA Cuenca Sama
- 12_04_1117_1124
- Tiered Groups Linear Programming
- FYP Report
- Lab 4 Discrete Convolution.docx
- FPM in Proportion
- Process+Corner+Variation+Aware+Design+of+Low+Power+Current+Starved+VCO
- Inversa Laplace
- Convolution
- Quantitative Methods for Economic Analysis 6nov2014
- statics-ln04
- Multiple Goal Programming
- pmp quality management.docx
- tqm
- skittlespart2
- OR Models
- Wheel loader optimization
- Paper 14

You are on page 1of 12

DOI 10.1007/s10827-009-0180-4

Hideaki Shimazaki · Shigeru Shinomoto

Received: 22 December 2008 / Revised: 20 May 2009 / Accepted: 23 July 2009 / Published online: 5 August 2009

© The Author(s) 2009. This article is published with open access at Springerlink.com

Abstract Kernel smoother and a time-histogram are Keywords Kernel density estimation ·

classical tools for estimating an instantaneous rate of Bandwidth optimization · Mean integrated

spike occurrences. We recently established a method squared error

for selecting the bin width of the time-histogram, based

on the principle of minimizing the mean integrated

square error (MISE) between the estimated rate and

unknown underlying rate. Here we apply the same 1 Introduction

optimization principle to the kernel density estimation

in selecting the width or “bandwidth” of the kernel, Neurophysiologists often investigate responses of a

and further extend the algorithm to allow a variable single neuron to a stimulus presented to an animal

bandwidth, in conformity with data. The variable ker- by using the discharge rate of action potentials, or

nel has the potential to accurately grasp non-stationary spikes (Adrian 1928; Gerstein and Kiang 1960; Abeles

phenomena, such as abrupt changes in the firing rate, 1982). One classical method for estimating spike rate is

which we often encounter in neuroscience. In order to the kernel density estimation (Parzen 1962; Rosenblatt

avoid possible overfitting that may take place due to 1956; Sanderson 1980; Richmond et al. 1990; Nawrot

excessive freedom, we introduced a stiffness constant et al. 1999). In this method, a spike sequence is convo-

for bandwidth variability. Our method automatically luted with a kernel function, such as a Gauss density

adjusts the stiffness constant, thereby adapting to the function, to obtain a smooth estimate of the firing

entire set of spike data. It is revealed that the classical rate. The estimated rate is sometimes referred to as a

kernel smoother may exhibit goodness-of-fit compara- spike density function. This nonparametric method is

ble to, or even better than, that of modern sophisticated left with a free parameter for kernel bandwidth that

rate estimation methods, provided that the bandwidth determines the goodness-of-fit of the density estimate

is selected properly for a given set of spike data, accord- to the unknown rate underlying data. Although theo-

ing to the optimization methods presented here. ries have been suggested for selecting the bandwidth,

cross-validating with the data (Rudemo 1982; Bowman

1984; Silverman 1986; Scott and Terrell 1987; Scott

Action Editor: Peter Latham 1992; Jones et al. 1996; Loader 1999a, b), individual

researchers have mostly chosen bandwidth arbitrarily.

H. Shimazaki (B)

This is partly because the theories have not spread

Grün Unit, RIKEN Brain Science Institute,

Saitama 351-0198, Japan to the neurophysiological society, and partly due to

e-mail: shimazaki@brain.riken.jp inappropriate basic assumptions of the theories them-

selves. Most optimization methods assume a stationary

S. Shinomoto

Department of Physics, Kyoto University,

rate fluctuation, while the neuronal firing rate often

Kyoto 606-8502, Japan exhibits abrupt changes, to which neurophysiologists, in

e-mail: shinomoto@scphys.kyoto-u.ac.jp particular, pay attention. A fixed bandwidth, optimized

172 J Comput Neurosci (2010) 29:171–182

using a stationary assumption, is too wide to extract the The recorded spike trains are aligned at the onset of

details of sharp activation, while in the silent period, the stimuli, and superimposed to form a raw density, as

fixed bandwidth would be too narrow and may cause

1

N

spurious undulation in the estimated rate. It is therefore xt = δ (t − ti ), (1)

desirable to allow a variable bandwidth, in conformity n i=1

with data.

The idea of optimizing bandwidth at every in- where n is the number of repeated trials. Here, each

stant was proposed by Loftsgaarden and Quesenberry spike is regarded as a point event that occurs at an

(1965). However, in contrast to the progress in methods instant of time ti (i = 1, 2, · · · , N) and is represented

that vary bandwidths at sample points only (Abramson by the Dirac delta function δ(t). The kernel density

1982; Breiman et al. 1977; Sain and Scott 1996; Sain estimate is obtained by convoluting a kernel k(s) to the

2002; Brewer 2004), the local optimization of band- raw density xt ,

width at every instant turned out to be difficult be-

cause of its excessive freedom (Scott 1992; Devroye λ̂t = xt−s k(s) ds. (2)

and Lugosi 2000; Sain and Scott 2002). In earlier

studies, Hall & Schucany used the cross-validation of Throughout this study, the ∞ integral that does not

Rudemo and Bowman, within local intervals (Hall and specify bounds refers to −∞ . The kernel function sat-

Schucany 1989), yet the interval length was left free. isfies the normalization

condition, k(s) ds = 1, a zero

Fan et al. applied cross-validation to locally optimized first moment,

2 sk(s) ds = 0, and has a finite bandwidth,

bandwidth (Fan et al. 1996), and yet the smoothness of w = s k(s) ds < ∞. A frequently used kernel is the

2

In this study, we first revisit the fixed kernel method,

1 s2

and derive a simple formula to select the bandwidth of kw (s) = √ exp − 2 , (3)

2πw 2w

the kernel density estimation, similar to the previous

method for selecting the bin width of a peristimulus where the bandwidth w is specified as a subscript. In the

time histogram (See Shimazaki and Shinomoto 2007). body of this study, we develop optimization methods

Next, we introduce the variable bandwidth into the ker- that apply generally to any kernel function, and derive

nel method and derive an algorithm for determining the a specific algorithm for the Gauss density function in

bandwidth locally in time. The method automatically the Appendix.

adjusts the flexibility, or the stiffness, of the variable

bandwidth. The performance of our fixed and variable 2.2 Mean integrated squared error optimization

kernel methods are compared with established den- principle

sity estimation methods, in terms of the goodness-of-

fit to underlying rates that vary either continuously or Assuming that spikes are sampled from a stochastic

discontinuously. We also apply our kernel methods to process, we consider optimizing the estimate λ̂t to be

the biological data, and examine their ability by cross- closest to the unknown underlying rate λt . Among

validating with data. several plausible optimizing principles, such as the

Though our methods are based on the classical ker- Kullback-Leibler divergence or the Hellinger distance,

nel method, their performances are comparable to var- we adopt, here, the mean integrated squared error

ious sophisticated rate estimation methods. Because of (MISE) for measuring the goodness-of-fit of an esti-

the classics, they are rather convenient for users: the mate to the unknown underlying rate, as

methods simply suggest bandwidth for the standard b

kernel density estimation. MISE = E(λ̂t − λt )2 dt, (4)

a

spike generation process under a given inhomogeneous

2 Methods rate λt . It follows, by definition, that Ext = λt .

In deriving optimization methods, we assume the

2.1 Kernel smoothing Poisson nature, so that spikes are randomly sampled at

a given rate λt . Spikes recorded from a single neuron

In neurophysiological experiments, neuronal response correlate in each sequence (Shinomoto et al. 2003, 2005,

is examined by repeatedly applying identical stimuli. 2009). In the limit of a large number of spike trains,

J Comput Neurosci (2010) 29:171–182 173

term in Eq. (5) is given as

Ext Eλ̂t = E xt λ̂t − E (xt − Ext ) λ̂t − Eλ̂t , (6)

random variables. Using Eq. (2), the covariance (the

second term of Eq. (6)) is obtained as

E (xt − Ext ) λ̂t − Eλ̂t

= kw (t − s) E [(xt − Ext ) (xs − Exs )] ds

1

= kw (t − s) δ (t − s) Exs ds

n

1

= kw (0)Ext . (7)

n

Here, to obtain the second equality, we used the as-

sumption of the Poisson point process (independent

however, mixed spikes are statistically independent and

spikes).

the superimposed sequence can be approximated as a

Using Eqs. (6) and (7), Eq. (5) becomes

single inhomogeneous Poisson point process (Cox 1962;

b

Snyder 1975; Daley and Vere-Jones 1988; Kass et al.

2005). Cn (w) = Eλ̂2t dt

a

b 1

2.3 Selection of the fixed bandwidth −2 E xt λ̂t − kw (0)Ext dt. (8)

a n

Given a kernel function such as Eq. (3), the density Equation (8) is composed of observable variables only.

function Eq. (2) is uniquely determined for a raw den- Hence, from sample sequences, the cost function is

sity Eq. (1) of spikes obtained from an experiment. estimated as

A bandwidth w of the kernel may alter the density b b

1

estimate, and it can accordingly affect the goodness-of- Ĉn (w) = λ̂t dt − 2

2

xt λ̂t − kw (0)xt dt. (9)

a a n

fit of the density function λ̂t to the unknown underlying

rate λt . In this subsection, we consider applying a kernel In terms of a kernel function, the cost function is writ-

of a fixed bandwidth w, and develop a method for ten as

selecting w that minimizes the MISE, Eq. (4). 1

Ĉn (w) = 2 ψw ti , t j

The integrand of the MISE is decomposed into three n i, j

parts: Eλ̂2t − 2λt Eλ̂t + λ2t . Since the last component ⎧ ⎫

does not depend on the choice of a kernel, we subtract it 2 ⎨

⎬

from the MISE, then define a cost function as a function − 2 kw ti − t j − kw (0)N

n ⎩ ⎭

of the bandwidth w: i, j

1

2

b

= 2 ψw ti , t j − 2 kw ti − t j , (10)

Cn (w) = MISE − λ2t dt n i, j n i= j

a

b b where

= Eλ̂2t dt − 2 λt Eλ̂t dt. (5)

a a

b

a

Rudemo and Bowman suggested the leave-one-out

cross-validation to estimate the second term of Eq. (5) The minimizer of the cost function, Eq. (10), is an

(Rudemo 1982; Bowman 1984). Here, we directly esti- estimate of the optimal bandwidth, which is denoted by

mate the second term with the Poisson assumption (See w ∗ . The method for selecting a fixed kernel bandwidth

also Shimazaki and Shinomoto 2007). is summarized in Algorithm 1. A particular algorithm

174 J Comput Neurosci (2010) 29:171–182

bandwidth wt is given by

λ̂t = xt−s kwt (s) ds. (12)

bandwidth optimized in a local interval. In this ap-

proach, the interval length for the local optimization

regulates the shape of the function wt , therefore, it

subsequently determines the goodness-of-fit of the esti-

mated rate to the underlying rate. We provide a method

for obtaining the variable bandwidth wt that minimizes

the MISE by optimizing the local interval length.

To select an interval length for local optimization, we

introduce the local MISE criterion at time t as

2

localMISE = E λ̂u − λu ρW u−t

du, (13)

where λ̂u = xu−s kw (s) ds is an estimated rate with a

fixed bandwidth w. Here, a weight function ρW u−t

local-

izes the integration of the squared error in a character-

istic interval W centered at time t. An example of the

weight function is once again the Gauss density func-

tion. See the Appendix for the specific algorithm for

the Gauss weight function. As in Eq. (5), we introduce

the local cost function at time t by subtracting the term

irrelevant for the choice of w as

Cnt (w, W) = localMISE − λ2u ρ u−t W du. (14)

mizer of the estimated cost function:

1 t

Ĉnt (w, W) = 2 ψ ti , t j

n i, j w,W

2

ti −t

− 2

kw ti − t j ρW , (15)

n i= j

where

u−t

ψw,W

t

ti , t j = kw (u − ti ) kw u − t j ρW du. (16)

developed for the Gauss density function is given in the The derivation follows the same steps as in the previ-

Appendix. ous section. Depending on the interval length W, the

optimal bandwidth w ∗ varies. We suggest selecting an

2.4 Selection of the variable bandwidth interval length that scales with the optimal bandwidth

as γ −1 w∗ . The parameter γ regulates the interval length

The method described in Section 2.3 aims to select for local optimization: With small γ ( 1), the fixed

a single bandwidth that optimizes the goodness-of-fit bandwidth is optimized within a long interval; With

of the rate estimate for an entire observation interval large γ (∼ 1), the fixed bandwidth is optimized within a

[a, b ]. For a non-stationary case, in which the degree short interval. The interval length and fixed bandwidth,

γ γ

of rate fluctuation greatly varies in time, the rate es- selected at time t, are denoted as Wt and w̄t .

γ

timation may be improved by using a kernel function The locally optimized bandwidth w̄t is repeatedly

whose bandwidth is adaptively selected in conformity obtained for different t(∈ [a, b ]). Because the intervals

J Comput Neurosci (2010) 29:171–182 175

overlap, we adopt the Nadaraya-Watson kernel regres- interval length. The method for optimizing the variable

γ

sion (Nadaraya 1964; Watson 1964) of w̄t as a local kernel bandwidth is summarized in Algorithm 2.

bandwidth at time t:

γ t−s γ

wt = ρW γ w̄s ds ρW

t−s

γ ds. (17) 3 Results

s s

γ

The variable bandwidth wt obtained from the same 3.1 Comparison of the fixed and variable kernel

data, but with different γ , exhibits different degrees methods

of smoothness: With small γ ( 1), the variable band-

width fluctuates slightly; With large γ (∼ 1), the variable By using spikes sampled from an inhomogeneous

bandwidth fluctuates significantly. The parameter γ is Poisson point process, we examined the efficiency of

thus a smoothing parameter for the variable bandwidth. the kernel methods in estimating the underlying rate.

Similar to the fixed bandwidth, the goodness-of-fit of We also used a sequence obtained by superimposing

the variable bandwidth can be estimated from the data. ten non-Poissonian (gamma) sequences (Shimokawa

The cost function for the variable bandwidth selected and Shinomoto 2009), but there was practically no

with γ is obtained as significant difference in the rate estimation from the

b Poissonian sequence.

2

Cˆn (γ ) = λ̂2t dt − 2 kwtγ ti − t j , (18) Figure 1 displays the result of the fixed kernel me-

a n i= j i

thod based on the Gauss density function. The kernel

bandwidth selected by Algorithm 1 applies a reason-

where λ̂t = xt−s kwtγ (s) ds is an estimated rate, with the able filtering to the set of spike sequences. Figure 1(d)

γ

variable bandwidth wt . The integral is calculated nu- shows that a cost function, Eq. (10), estimated from the

merically. With the stiffness constant γ ∗ that minimizes spike data is similar to the original MISE, Eq. (4), which

Eq. (18), local optimization is performed in an ideal was computed using the knowledge of the underlying

estimation. (a) The (a)

Spike Rate [spike/s]

Poisson point process. (b) 20 t Underlying rate

spike sequences sampled

from the underlying rate, and 20

(c) Kernel rate estimates

made with three types of 0 1 2

bandwidth: too small, (d)

optimal, and too large. The (b) Cost function

Spike Sequences Cn (w )

gray area indicates the

underlying spike rate. (d) The Empirical cost function

cost function for bandwidth -100 Theoretical cost function

w. Solid line is the estimated

cost function, Eq. (10), Too small w

computed from the spike 0 1 2

data; The dashed line is the (c)

exact cost function, Eq. (5), ˆt -140

directly computed by using Too small w

the known underlying rate 40

20 Too large w

Spike Rate [spike/s]

-180

ˆt

Optimized w* Optimized w *

20

0 0.1 0.2 0.3 0.4

Bandwidth w [s]

ˆt

Too large w

20

0 1 2

Time [s]

176 J Comput Neurosci (2010) 29:171–182

rate. This demonstrates that MISE optimization can, in continuous rate processes. Figure 3(a) shows the results

practice, be carried out by our method, even without for sinusoidal and sawtooth rate processes, as sam-

knowing the underlying rate. ples of continuous and discontinuous processes, respec-

Figure 2(a) demonstrates how the rate estimation tively. We also examined triangular and rectangular

is altered by replacing the fixed kernel method with rate processes as different samples of continuous and

the variable kernel method (Algorithm 2), for identi- discontinuous processes, but the results were similar.

cal spike data (Fig. 1(b)). The Gauss weight function The goodness-of-fit of the density estimate to the un-

is used to obtain a smooth variable bandwidth. The derlying rate is evaluated in terms of integrated squared

manner in which the optimized bandwidth varies in the error (ISE) between them.

time axis is shown in Fig. 2(b): the bandwidth is short The established density estimation methods exam-

in a moment of sharp activation, and is long in the pe- ined for comparison are the histogram (Shimazaki

riod of smooth rate modulation. Eventually, the sharp and Shinomoto 2007), Abramson’s adaptive kernel

activation is grasped minutely and slow modulation is (Abramson 1982), Locfit (Loader 1999b), and Bayesian

expressed without spurious fluctuations. The stiffness adaptive regression splines (BARS) (DiMatteo et al.

constant γ for the bandwidth variation is selected by 2001; Kass et al. 2003) methods, whose details are

minimizing the cost function, as shown in Fig. 2(c). summarized below.

A histogram method, which is often called a peri-

stimulus time histogram (PSTH) in neurophysiologi-

3.2 Comparison with established density estimation cal literature, is the most basic method for estimating

methods the spike rate. To optimize the histogram, we used a

method proposed for selecting the bin width based on

We wish to examine the fitting performance of the the MISE principle (Shimazaki and Shinomoto 2007).

fixed and variable kernel methods in comparison with Abramson’s adaptive kernel method (Abramson

established density estimation methods, by paying at- 1982)

uses the sample point kernel estimate λ̂t =

tention to their aptitudes for either continuous or dis- i k w ti

(t − ti ), in which the bandwidths are adapted

(a)

Underlying Rate

Spike Rate [spike/s]

30

VARIABLE KERNEL (c) Cost function

FIXED KERNEL

20 Cn ( γ )

Empirical cost funtion

Theoretical cost funtion

10

0 1 2

-185

Time [s]

(b)

Optimized Bandwidth

Optimized γ∗

Bandwidth [s]

FIXED -190

0.1 0 0.2 0.4 0.6 0.8 1

Stiffness constant γ

0 1 2

Time [s]

Fig. 2 Variable kernel density estimation. (a) Kernel rate esti- the dashed line is the fixed bandwidth selected by Algorithm 1.

mates. The solid and dashed lines are rate estimates made by the (b) The cost function for bandwidth stiffness constant. The solid

variable and fixed kernel methods for the spike data of Fig. 1(b). line is the cost function for the bandwidth stiffness constant γ ,

The gray area is the underlying rate. (b) Optimized bandwidths. Eq. (18), estimated from the spike data; the dashed line is the

The solid line is the variable bandwidth determined with the cost function computed from the known underlying rate

optimized stiffness constant γ ∗ = 0.8, selected by Algorithm 2;

J Comput Neurosci (2010) 29:171–182 177

(a) (b)

40 Sinusoid

20 400

HISTOGRAM

FIXED KERNEL

VARIABLE KERNEL

ABRAMSON'S

Spike Rate [spikes/s]

300 LOCFIT

BARS

40 Sawtooth

20

200

0 1 2

Time [s]

Underlying Rate 100

HISTOGRAM ABRAMSON'S 0 50 100 150 200 250

FIXED KERNEL LOCFIT

VARIABLE KERNEL BARS Integrated Squared Error (Sinusoid)

Fig. 3 Fitting performances of the six rate estimation methods, their goodness-of-fit, based on the integrated squared error (ISE)

histogram, fixed kernel, variable kernel, Abramson’s adaptive between the underlying and estimated rate. The abscissa and

kernel, Locfit, and Bayesian adaptive regression splines (BARS). the ordinate are the ISEs of each method applied to sinusoidal

(a) Two rate profiles (2 [s]) used in generating spikes (gray and sawtooth underlying rates (10 [s]). The mean and standard

area), and the estimated rates using six different methods. The deviation of performance evaluated using 20 data sets are plotted

raster plot in each panel is sample spike data (n = 10, superim- for each method

posed). (b) Comparison of the six rate estimation methods in

at the sample points. Scaling the bandwidths as wti = Figure 3(a) displays the density profiles of the six dif-

w (g/λ̂ti )1/2 was suggested, where w is a pilot bandwidth, ferent methods estimated from an identical set of spike

g = ( i λ̂ti )1/N , and λ̂t is a fixed kernel estimate with w. trains (n = 10) that are numerically sampled from a si-

Abramson’s method is a two-stage method, in which nusoidal or sawtooth underlying rate (2 [s]). Figure 3(b)

the pilot bandwidth needs to be selected beforehand. summarizes the goodness-of-fit of the six methods to

Here, the pilot bandwidth is selected using the fixed the sinusoidal and sawtooth rates (10 [s]) by averaging

kernel optimization method developed in this study. over 20 realizations of samples.

The Locfit algorithm developed by Loader (1999b) For the sinusoidal rate function, representing con-

fits a polynomial to a log-density function under the tinuously varying rate processes, the BARS is most

principle of maximizing a locally defined likelihood. efficient in terms of ISE performance. For the saw-

We examined the automatic choice of the adaptive tooth rate function, representing discontinuous non-

bandwidth of the local likelihood, and found that the stationary rate processes, the variable kernel estimation

default fixed method yielded a significantly better fit. developed here is the most efficient in grasping abrupt

We used a nearest neighbor based bandwidth method, rate changes. The histogram method is always inferior

with a parameter covering 20% of the data. to the other five methods in terms of ISE performance,

The BARS (DiMatteo et al. 2001; Kass et al. 2003) due to the jagged nature of the piecewise constant

is a spline-based adaptive regression method on an function.

exponential family response model, including a Poisson

count distribution. The rate estimated with the BARS 3.3 Application to experimental data

is the expected splines computed from the posterior

distribution on the knot number and locations with We examine, here, the fixed and variable kernel meth-

a Markov chain Monte Carlo method. The BARS is, ods in their applicability to real biological data. In

thus, capable of smoothing a noisy histogram without particular, the kernel methods are applied to the spike

missing abrupt changes. To create an initial histogram, data of an MT neuron responding to a random dot

we used 4 [ms] bin width, which is small enough to stimulus (Britten et al. 2004). The rates estimated from

examine rapid changes in the firing rate. n = 1, 10, and 30 experimental trials are shown in

178 J Comput Neurosci (2010) 29:171–182

Fig. 4. Fine details of rate modulation are revealed both fixed and variable, is obtained with a training

as we increase the sample size (Bair and Koch 1996). data set of n trials. The error is evaluated by comput-

The fixed kernel method tends to choose narrower ing the cost function, Eq. (18), in a cross-validatory

bandwidths, while the variable kernel method tends to manner:

choose wider bandwidths in the periods in which spikes

are not abundant. b

2

The performance of the rate estimation methods Ĉn (wt ) = λ̂2t dt − k w

ti − t j , (19)

a n2 i= j ti

is cross-validated. The bandwidth, denoted as wt for

(a)

100 n=1

FIXED KERNEL

50 VARIABLE KERNEL

0.2

0

(d)

(b) The difference in cross-validated MISE

n=10

100 100

50

50

0.04

0.02

0

0

(c) -50

n=30

100

50 -100

1 5 10 20 30

The number of spike sequences, n

0.02

0.01

0

0 1 2

Time [s]

Fig. 4 Application of the fixed and variable kernel methods less variable). The positive difference indicates superior fitting

to spike data of an MT neuron (j024 with coherence 51.2% in performance of the variable kernel method. The cross-validated

nsa2004.1 (Britten et al. 2004)). (a–c): Analyses of n = 1, 10, cost function is obtained as follows. Whole spike sequences

and 30 spike trains; (top) Spike rates [spikes/s] estimated with (ntotal = 60) are divided into ntotal /n blocks, each composed

the fixed and variable kernel methods are represented by the of n(= 1, 5, 20, 30) spike sequences. A bandwidth was selected

gray area and solid line, respectively; (middle) optimized fixed using spike data of a training block. The cross-validated cost

and variable bandwidths [s] are represented by dashed and solid functions, Eq. (19), for the selected bandwidth are computed

lines, respectively; (bottom) A raster plot of the spike data used using the ntotal /n − 1 leftover test blocks, and their average is

in the estimation. (d) Comparison of the two kernel methods. computed. The cost function is repeatedly obtained ntotal /n-times

Bars represent the difference between the cross-validated cost by changing the training block. The mean and standard deviation,

functions, Eq. (19), of the fixed and variable kernel methods (fix computed from ntotal /n samples, are displayed

J Comput Neurosci (2010) 29:171–182 179

where the test spike times {ti } are obtained from n fixed kernel to small samples in comparison with the

spike sequences in the leftovers, and λ̂t = n1 i kwt (t−ti ). Locfit and BARS, as well as the Gaussian process

Figure 4(d) shows the performance improvements by smoother (Cunningham et al. 2008; Smith and Brown

the variable bandwidth over the fixed bandwidth, as 2003; Koyama and Shinomoto 2005). The adaptive

evaluated by Eq. (19). The fixed and variable kernel methods, however, have the potential to outperform

methods perform better for smaller and larger sizes the fixed method with larger samples derived from a

of data, respectively. In addition, we compared the non-stationary rate profile (See also Endres et al. 2008

fixed kernel method and the BARS by cross-validating for comparisons of their adaptive histogram with the

the log-likelihood of a Poisson process with the rate fixed histogram and kernel method). The result in Fig. 4

estimated using the two methods. The difference in confirmed the utility of our variable kernel method for

the log-likelihoods was not statistically significant for larger samples of neuronal spikes.

small samples (n = 1, 5 and 10), while the fixed kernel We derived the optimization methods under the

method fitted better to the spike data with larger sam- Poisson assumption, so that spikes are randomly drawn

ples (n = 20 and 30). from a given rate. If one wishes to estimate spike rate

of a single or a few sequences that contain strongly cor-

related spikes, it is desirable to utilize the information

as to non-Poisson nature of a spike train (Cunningham

4 Discussion et al. 2008). Note that a non-Poisson spike train may be

dually interpreted, as being derived either irregularly

In this study, we developed methods for selecting the from a constant rate, or regularly from a fluctuating

kernel bandwidth in the spike rate estimation based rate (Koyama and Shinomoto 2005; Shinomoto and

on the MISE minimization principle. In addition to the Koyama 2007). However, a sequence obtained by su-

principle of optimizing a fixed bandwidth, we further perimposing many spike trains is approximated as a

considered selecting the bandwidth locally in time, as- Poisson process (Cox 1962; Snyder 1975; Daley and

suming a non-stationary rate modulation. Vere-Jones 1988; Kass et al. 2005), for which dual

We tested the efficiency of our methods using interpretation does not occur. Thus the kernel methods

spike sequences numerically sampled from a given rate developed in this paper are valid for the superimposed

(Figs. 1 and 2). Various density estimators constructed sequence, and serve as the peristimulus density estima-

on different optimization principles were compared in tor for spike trains aligned at the onset or offset of the

their goodness-of-fit to the underlying rate (Fig. 3). stimulus.

There is in fact no oracle that selects one among various Kernel smoother is a classical method for estimating

optimization principles, such as MISE minimization or the firing rate, as popular as the histogram method. We

likelihood maximization. Practically, reasonable prin- have shown in this paper that the classical kernel meth-

ciples render similar detectability for rate modulation; ods perform well in the goodness-of-fit to the underly-

the kernel methods based on MISE were roughly com- ing rate. They are not only superior to the histogram

parable to the Locfit based on likelihood maximization method, but also comparable to modern sophisticated

in their performances. The difference of the perfor- methods, such as the Locfit and BARS. In particular,

mances is not due to the choice of principles, but rather the variable kernel method outperformed competing

due to techniques; kernel and histogram methods lead methods in representing abrupt changes in the spike

to completely different results under the same MISE rate, which we often encounter in neuroscience. Given

minimization principle (Fig. 3(b)). Among the smooth simplicity and familiarity, the kernel smoother can still

rate estimators, the BARS was good at representing be the most useful in analyzing the spike data, pro-

continuously varying rate, while the variable kernel vided that the bandwidth is chosen appropriately as

method was good at grasping abrupt changes in the rate instructed in this paper.

process (Fig. 3(b)).

We also examined the performance of our methods Acknowledgements We thank M. Nawrot, S. Koyama, D.

in application to neuronal spike sequences by cross- Endres for valuable discussions, and the Diesmann Unit for

validating with the data (Fig. 4). The result demon- providing the computing environment. We also acknowledge

K. H. Britten, M. N. Shadlen, W. T. Newsome, and J. A.

strated that the fixed kernel method performed well in Movshon, who made their data available to the public, and

small samples. We refer to Cunningham et al. (2008) W. Bair for hosting the Neural Signal Archive. This study is

for a result on the superior fitting performance of a supported in part by a Research Fellowship of the Japan Society

180 J Comput Neurosci (2010) 29:171–182

z 2

for the Promotion of Science for Young Scientists to HS and where erf (z) = √2π 0 e−t dt. A simplified equation is

Grants-in-Aid for Scientific Research to SS from the MEXT obtained by evaluating the MISE in an unbounded

Japan (20300083, 20020012).

domain: a → −∞ and b → +∞. Using erf (±∞) = ±1,

Open Access This article is distributed under the terms of the we obtain

Creative Commons Attribution Noncommercial License which

permits any noncommercial use, distribution, and reproduction

1 (ti −t j )2

in any medium, provided the original author(s) and source are ψw ti , t j ≈ √ e− 4w2 . (24)

credited. π 2w

Appendix: Cost functions of the Gauss kernel function selecting the bandwidth of the Gauss kernel function,

√

2 π n2 Ĉn (w):

In this appendix, we derive definite MISE optimization

algorithms we developed in the body of the paper with 2 − (ti −t2j ) √ − (ti −t j )2

2

N

the particular Gauss density function, Eq. (3). + e 4w − 2 2e 2w 2

. (25)

w w i< j

A.1 A cost function for a fixed bandwidth

A.2 A local cost function for a variable bandwidth

The estimated cost function is obtained, as in Eq. (10):

The local cost function is obtained, as in Eq. (15):

n2 Ĉn (w) = ψw ti , t j − 2 kw ti − t j ,

i, j i= j

ti −t

n2 Ĉnt (w, W) = ψw,W

t

ti , t j − 2 kw ti − t j ρW ,

where, from Eq. (11), i, j i= j

b

a

A symmetric kernel function, including the Gauss func-

ψw,W

t

ti , t j = kw (u − ti )kw (u − t j) ρW

u−t

du.

tion, is invariant to exchange of ti and t j when comput-

ing kw (ti − t j). In addition, the correlation of the kernel

function, Eq. (11), is symmetric with respect to ti and t j. For the summations in the local cost function, Eq. (15),

Hence, we obtain the following relationships we have equalities:

kw ti − t j = 2 kw ti − t j , (20)

ti −t

ti −t t −t

i= j i< j

kw ti − t j ρW = kw ti − t j ρW + ρWj ,

i= j i< j

(26)

ψw ti , t j = ψw (ti , ti ) + 2 ψw ti , t j . (21)

i, j i i< j

By plugging Eqs. (20) and (21) into Eq. (10), the cost

t

ψw,W

t

ti , t j = ψw,W (ti , ti ) + 2 ψw,W

t

ti , t j .

function is simplified as .

i, j i i< j

n2 Ĉn (w) = ψw (ti , ti ) (27)

i

+2 ψw ti , t j − 2kw ti − t j . (22) The first equation holds for a symmetric kernel. The

i< j second equation is derived because Eq. (16) is invariant

to an exchange of ti and t j. Using Eqs. (26) and (27),

For the Gauss kernel function, Eq. (3), with band-

Eq. (15) can be computed as

width w, Eq. (11) becomes

1 (ti −t j )2

n2 Ĉnt (w, W) = ψw,W

t

ti , t j

ψw ti , t j = √ e− 4w2

π4w i

ti −t t −t

!

2b − ti − t j 2a − ti − t j +2 ψw,W

t

ti , t j − kw ti − t j ρW + ρWj .

× erf − erf ,

2w 2w i< j

(23) (28)

J Comput Neurosci (2010) 29:171–182 181

For the Gauss kernel function and the Gauss weight Cox, R. D. (1962). Renewal theory. London: Wiley.

function with bandwidth w and W respectively, Eq. (16) Cunningham, J., Yu, B., Shenoy, K., Sahani, M., Platt, J., Koller,

D., et al. (2008). Inferring neural firing rates from spike trains

is calculated as

using Gaussian processes. Advances in Neural Information

Processing Systems, 20, 329–336.

1

1 Daley, D., & Vere-Jones, D. (1988). An introduction to the theory

ψw,W

t

ti , t j =√

2π w2 2π W of point processes. New York: Springer.

"

2 # Devroye, L., & Lugosi, G. (2000). Variable kernel estimates: On

(u − ti )2 + u − t j (u − t)2 the impossibility of tuning the parameters. In E. Giné, D.

× exp − − du. (29) Mason, & J. A. Wellner (Eds.), High dimensional probability

2w 2 2W 2 II (pp. 405–442). Boston: Birkhauser.

DiMatteo, I., Genovese, C. R., & Kass, R. E. (2001). Bayesian

By completing the square with respect to u, the expo- curve-fitting with free-knot splines. Biometrika, 88(4),

1055–1071.

nent in the above equation is written as Endres, D., Oram, M., Schindelin, J., & Foldiak, P. (2008).

Bayesian binning beats approximate alternatives: Estimating

2 peristimulus time histograms. Advances in Neural Informa-

w2 +2W 2 τ W 2 + (ti −t) w2

u+ tion Processing Systems, 20, 393–400.

2w2 W 2 w2 +2W 2 Fan, J., Hall, P., Martin, M. A., & Patil, P. (1996). On local

smoothing of nonparametric curve estimators. Journal of the

(t − ti )2 + (t − t j)2 w2 + (ti − t j)2 W 2

−

. (30) American Statistical Association, 91, 258–266.

2w 2 w 2 +2W 2 Gerstein, G. L., & Kiang, N. Y. S. (1960). An approach to the

quantitative analysis of electrophysiological data from single

$π neurons. Biophysical Journal, 1(1), 15–28.

e−Au du =

2

Using the formula A

, Eq. (16) is ob- Hall, P., & Schucany, W. R. (1989). A local cross-validation algo-

tained as rithm. Statistics & Probability Letters, 8(2), 109–117.

Jones, M., Marron, J., & Sheather, S. (1996). A brief survey of

1 bandwidth selection for density estimation. Journal of the

ψ tw,W ti , t j =√ American Statistical Association, 91(433), 401–407.

2π w w 2 +2W 2 Kass, R. E., Ventura, V., & Brown, E. N. (2005). Statistical issues

" 2 # in the analysis of neuronal data. Journal of Neurophysiology,

(t−ti )2 +(t−t j)2 w +(ti −t j)2 W 2

× exp −

. (31) 94(1), 8–25.

2w 2 w2 +2W 2 Kass, R. E., Ventura, V., & Cai, C. (2003). Statistical smoothing

of neuronal data. Network-Computation in Neural Systems,

14(1), 5–15.

Koyama, S., & Shinomoto, S. (2005). Empirical Bayes inter-

pretations of random point events. Journal of Physics A-

Mathematical and General, 38, 531–537.

References Loader, C. (1999a). Bandwidth selection: Classical or plug-in?

The Annals of Statistics, 27(2), 415–438.

Abeles, M. (1982). Quantification, smoothing, and confidence- Loader, C. (1999b). Local regression and likelihood. New York:

limits for single-units histograms. Journal of Neuroscience Springer.

Methods, 5(4), 317–325. Loftsgaarden, D. O., & Quesenberry, C. P. (1965). A nonpara-

Abramson, I. (1982). On bandwidth variation in kernel estimates- metric estimate of a multivariate density function. The An-

a square root law. The Annals of Statistics, 10(4), 1217–1223. nals of Mathematical Statistics, 36, 1049–1051.

Adrian, E. (1928). The basis of sensation: The action of the sense Nadaraya, E. A. (1964). On estimating regression. Theory of

organs. New York: W.W. Norton. Probability and its Applications, 9(1), 141–142.

Bair, W., & Koch, C. (1996). Temporal precision of spike trains in Nawrot, M., Aertsen, A., & Rotter, S. (1999). Single-trial esti-

extrastriate cortex of the behaving macaque monkey. Neural mation of neuronal firing rates: From single-neuron spike

Computation, 8(6), 1185–1202. trains to population activity. Journal of Neuroscience Meth-

Bowman, A. W. (1984). An alternative method of cross- ods, 94(1), 81–92.

validation for the smoothing of density estimates. Bio- Parzen, E. (1962). Estimation of a probability density-function

metrika, 71(2), 353. and mode. The Annals of Mathematical Statistics, 33(3),

Breiman, L., Meisel, W., & Purcell, E. (1977). Variable ker- 1065.

nel estimates of multivariate densities. Technometrics, 19, Richmond, B. J., Optican, L. M., & Spitzer, H. (1990). Tempo-

135–144. ral encoding of two-dimensional patterns by single units in

Brewer, M. J. (2004). A Bayesian model for local smoothing primate primary visual cortex. i. stimulus-response relations.

in kernel density estimation. Statistics and Computing, 10, Journal of Neurophysiology, 64(2), 351–369.

299–309. Rosenblatt, M. (1956). Remarks on some nonparametric esti-

Britten, K. H., Shadlen, M. N., Newsome, W. T., & Movshon, mates of a density-function. The Annals of Mathematical

J. A. (2004). Responses of single neurons in macaque Statistics, 27(3), 832–837.

mt/v5 as a function of motion coherence in stochastic Rudemo, M. (1982). Empirical choice of histograms and kernel

dot stimuli. The Neural Signal Archive. nsa2004.1. http:// density estimators. Scandinavian Journal of Statistics, 9(2),

www.neuralsignal.org. 65–78.

182 J Comput Neurosci (2010) 29:171–182

Sain, S. R. (2002). Multivariate locally adaptive density estima- Shinomoto, S., & Koyama, S. (2007). A solution to the contro-

tion. Computational Statistics & Data Analysis, 39, 165–186. versy between rate and temporal coding. Statistics in Medi-

Sain, S., & Scott, D. (1996). On locally adaptive density estima- cine, 26, 4032–4038.

tion. Journal of the American Statistical Association, 91(436), Shinomoto, S., Kim, H., Shimokawa, T., Matsuno, N., Funahashi,

1525–1534. S., Shima, K., et al. (2009). Relating neuronal firing patterns

Sain, S., & Scott, D. (2002). Zero-bias locally adaptive den- to functional differentiation of cerebral cortex. PLoS Com-

sity estimators. Scandinavian Journal of Statistics, 29(3), putational Biology, 5, e1000433.

441–460. Shinomoto, S., Miyazaki, Y., Tamura, H., & Fujita, I. (2005)

Sanderson, A. (1980). Adaptive filtering of neuronal spike train Regional and laminar differences in in vivo firing patterns of

data. IEEE Transactions on Biomedical Engineering, 27, primate cortical neurons. Journal of Neurophysiology, 94(1),

271–274. 567–575.

Scott, D. W. (1992). Multivariate density estimation: Theory, prac- Shinomoto, S., Shima, K., & Tanji, J. (2003). Differences in spik-

tice, and visualization. New York: Wiley-Interscience. ing patterns among cortical neurons. Neural Computation,

Scott, D. W., & Terrell, G. R. (1987). Biased and unbiased cross- 15(12), 2823–2842.

validation in density estimation. Journal of the American Silverman, B. W. (1986). Density estimation for statistics and data

Statistical Association, 82, 1131–1146. analysis. London: Chapman & Hall.

Shimazaki, H., & Shinomoto, S. (2007). A method for selecting Smith, A. C., & Brown, E. N. (2003). Estimating a state-space

the bin size of a time histogram. Neural Computation, 19(6), model from point process observations. Neural Computa-

1503–1527. tion, 15(5), 965–991.

Shimokawa, T., & Shinomoto, S. (2009). Estimating instanta- Snyder, D. (1975). Random point processes. New York: Wiley.

neous irregularity of neuronal firing. Neural Computation, Watson, G. S. (1964). Smooth regression analysis. Sankhya: The

21(7), 1931–1951. Indian Journal of Statistics, Series A, 26(4), 359–372.

- Chapter 1 MathematicsUploaded byPoonam Naidu
- Histogram DefinitionUploaded bytutorvistateam
- Analisis de Consist en CIA Cuenca SamaUploaded byJaime Chuchon Remon
- 12_04_1117_1124Uploaded byjdnfjngs
- Tiered Groups Linear ProgrammingUploaded byrahmadardianto
- FYP ReportUploaded byahmed143_leo
- Lab 4 Discrete Convolution.docxUploaded byziafat
- FPM in ProportionUploaded byRiaz Baloch Notezai
- Process+Corner+Variation+Aware+Design+of+Low+Power+Current+Starved+VCOUploaded byMunish Jain
- Inversa LaplaceUploaded byFrancisco Bajo García
- ConvolutionUploaded byVish Shar
- Quantitative Methods for Economic Analysis 6nov2014Uploaded bydorababu2007
- statics-ln04Uploaded byVysakh Sreenivasan
- Multiple Goal ProgrammingUploaded byMKogi
- pmp quality management.docxUploaded byselinasimpson0801
- tqmUploaded byManoj Kumar
- skittlespart2Uploaded byapi-313630053
- OR ModelsUploaded byamul65
- Wheel loader optimizationUploaded byJovan Pavlovic
- Paper 14Uploaded byRakeshconclave
- Geg202 StatisticsUploaded byChidi Henry
- PY1PR1 Stats Lecture 1 HandoutUploaded byProf. Bilal Mustafa Khan
- zimplUploaded bywashere
- BIO 2060 Answers Problem Set 3Uploaded byElizabethKeeping
- junior docsUploaded byjevanjunior
- Control ChartUploaded bysj19330
- Queing Theory (1)Uploaded byAshish Gandotra
- Transform TechniquesUploaded bymahesh
- MIT DocumentUploaded byvideou
- Seven QC ToolsUploaded byMohammad Umair

- Bayesian StatisticsUploaded byHany Badran
- MBA_Full_Time(Core_&_Sectoral_program)_Syllabus.pdfUploaded by'Rohan Nikam
- Total Quality Management Factors in Bahrain Higher Education Institutions: Influences on their PerformanceUploaded byInternational Journal of Innovative Science and Research Technology
- Topic 9 and 10 Control ChartUploaded byNurfitry Hareqh
- Basic Definitions of UncertaintyUploaded byMohammed Ashique
- Bayes NetsUploaded byspramya
- final stats projectUploaded byapi-238717717
- ch01testUploaded byNeeti Gurunath
- Hw JMPUploaded byAna Alejandra Gutierrez Ortiz
- Oskar Morgenstern, International Financial Transactions and Business CyclesUploaded byOikos Venezuela
- The Impact of the Sarbanes Oxley Act on Firms Going Private 2007 Research in Accounting RegulationUploaded byChris Gayle
- 101 Things Every Six Sigma Black Belt Should KnowUploaded byeishaant_rajput
- Quality DictionaryUploaded bymohamed
- Y8 Statistics Math Report- 18shikharkUploaded by18shikhark
- Statistics and Probability Homework SheetUploaded by3millead
- Stationarity, Non Stationarity, Unit Roots and Spurious RegressionUploaded byNuur Ahmed
- Crim Just Artefact 2Uploaded bygazmat10
- VIFUploaded byDigito Dunkey
- solucionario estadística matematica con aplicacionesUploaded byJeans Carlos
- Forecasting PptUploaded byAnonymous XybLZf
- Chapter 6 Continuous Probability DistributionsUploaded byAnastasia
- Use Abuse ReliabilityUploaded byIsaac Ortega
- Handwritten Character Recognition for Hindi languageUploaded byIRJET Journal
- 45Uploaded byJasmina Lozanović Šajić
- SERVICE REPORT ON THE SERVICE MARKETING PRACTICES OF BANGLADESH ROAD TRANSPORT AUTHORITY [BRTA]Uploaded byAsif Iqbal Shuvro
- price and return modelsUploaded bynoir_weizz
- US Federal Reserve: 199642papUploaded byThe Fed
- Challenges and Chances of the 11g Query OptimizerUploaded bylusifer_scribd
- Report on Socio Economic Condition of Tea Stall OwnersUploaded byRocky Mahmud