You are on page 1of 22

-2 -

AB S1RA cr
We establish the consistency of nonparamenic conditional quantile estimators based on artificial
neural networks. The results follow from general results on sieve estimation for dependent
processes. We also show that conditional quantiles can be learned to any pre-specified accuracy
using approximate rather than exact network optimization.
-3 -
I. INTRODUCnON
In forecasting, it is by far most common to give point forecasts, the most common point
forecast being some estimate of the conditional expectation of the variable of interest given some
more or less explicit set of conditioning variables. By their very nature, point forecasts generally
provide no infol1Jlation about the variation of the variable of interest around its conditional
expectation. Sometimes point forecasts are augmented by "margins of error", but these margins
may be based on unwarranted assumptions (e.g. normality) about the conditional distribution of
the variable of interest.
An attractive alternative to point forecasts, whether or not accompanied by error margins, is
a centered interval forecast (e.g. Granger, White and Kamstra, 1989). The median (50%
percentile) provides a convenient central value, while the interval limits specify upper and lower
bounds such that the variable of interest lies below the lower bound or above the upper bound
with (the same) given probability, conditional on available infonnation. For example, a 50%
centered interval forecast is such that 25% of the time the forecasted variable will lie below the
lower bound and 25% of the time the forecasted variable will lie above the upper bound.
Construction of interval forecasts is based on the conditional quantiles of the variable of
1nrerest given a speclneCl set of conClitioning variables. Often, there is little infoImation that
would permit a confident specification of a parametric model for the conditional quantile. Linear
models are those most frequently considered (e.g. Koenker and Bassett. 1978), but this is
probably due more to the tractability of linear models than to the cenain knowledge that the
phenomena of interest are linear. fu the absence of a finn model for the conditional quantile
function, it is desirable to use some flexible model specification capable of adapting to whatever
features the data may present. In this paper. we consider the use of artificial neural network
models to provide the desired flexibility in estimating conditional quantiles.
-4.
Anificial neural networks can be viewed as flexible nonlinear functional fomls suitable for
approximating arbitrary mappings. As numerous authors have recently shown (e.g. Carroll and
Dickinson, 1989; Cybenko 1989; Funahashi, 1989; Hecht-Nielsen, 1989; Homik, 1989; Homik,
Stinchcombe and White, 1989a,b (HSWa,b); Stinchcombe and White, 1989, 1990) suitable
classes of network output functions are dense in a broad range of function spaces under general
conditions. White (1990) exploited this fact to show that a class of artificial neural networks. the
"single hidden layer feedforward networks," can be used to obtain a consistent nonparametIic
estimator of a square integrable conditional expectation function. White's (1990) re.'mlt~ :lrp.
proven by applying the method of sieves (Grenander, 1981; Geman and Hwang, 1982; White and
Wooldridge, 1990). Here we give analogous results, using the method of sieves to establish the
consistency of nonparametric conditional quantile estimators based on single hidden layer
feedforward networks.
We also consider approximate conditional quantile estimators obtained by approximately
solving the optimization problem that delivers the exact quantile estimators. Our results are
fonnulated both for independent identically distributed (i.i.d.) random variables as well as for
stationary mixing or ergodic stochastic processes relevant to interval forecasting of time series
2. MAIN RESULTS
2.a Exact Optimization
Our first assumption describes the properties of the stochastic process generating our
observations.
ASSUMPTION A.l: Observations are generated as the realization of a bounded vector-valued
stochastic process {21} defined on a complete probability space (0, F, P), where p is such that
either
5
(i) {Zt} is an independent identically distributed (i.i.d.) process; or
(ii) {Zt} is a stationary l/J- or a-mixing process such that l/J(k) = l/Jo f;k or a(k) = ao f;k,
<; < 1, rPo, ao > 0, k > 0. D
We partition Zt as Zt = (yt, x; )', where yt is a random scalar and Xt is a random r X 1 vector.
Without loss of generality, we may assume Zt: .<1 -7 Dr+l = xr~l [0,1]. In pan (ii), we define
the <1>- and a-mixing coefficients in the usual way as
l/J(m) ="SUP{Ae F~.Be F;-+..:P(AO} IP(B IA)-P(B)I
a(m) = SUP{Ae F~.Be F;-~..} IP(A nB)-P(A)P(B)1 .
The object of our interest is the conditional quantile p of yt given Xtt defined by a function
(}p: Dr ~ D such that
P[Y, $ p(X,) I X,] =p , p E (0, I)
A k x 100 % centered interval forecast is obtained using ep by taking the lower bound for Yl
given XI as (} k/2(XI) and the upper bound for YI given XI as (}1-k/2(XI).
We consider approximations to (} p obtained as the output of a single hidden layer
feedforward network,
q -
r(x, oq) = fJo + LfJj1jf(x'rj)
j=l
where x = (1, x')' is an (,+1) x 1 vector consisting of a "bias unit" and "network inputs" x. the
function 111 : IR ~ IR is a nonlinear "hidden layer activation function" (often chosen to be a
cumulative distribution function (c.d.t)), q E IN is the number of "hidden units" of the network,
and oq =(13'.r')' (where 13 =(130.131. 13q)'. r=(r~ r~)'. rj =(rjO.rj1 ..
is an s x 1 parameter vector, s = q(r+2) + 1. In network parlance, oq is the vector of'i,network
connection strengths." We explicitly index @ by q in order to emphasize the dependence of the
-6 -
dimension of this vector on q.
To obtain our results, we use the "connectionist sieve" introduced by White (1990), defined
as follows:
DEFINITION 2.1: Let "' : IR -7 IR be a given bounded function, and let 8 be a subset of the
set of functions from Dr to m. For any q E IN and L\ E m.+ define a collection of network
output functions T( 1f/ , q, Ll) as
q -
T( 1JI. Q. A) = {[} E A: [}(r) = fio + L fi j 1Jf(r' 'Y) for all -T in Dr
j=l
q q r
L lf3jl~~, L L Irji
j=O j=l i=O
I ~ q~}
For given 11' and sequences { q71 } , { 671 } , define the sequence of (single hidden layer) connectionist
sieves { en ( 1/1) } as
8n(1jI)=T(1jI, qn, L\n) ,
n = 1, 2,... .
0
When e E E>n(1jf), we have e(X) =f1"(X, ~" ) in the notation of (2.1). By letting qn -700 and
Lln ~ 00, we obtain a sequence of increasingly flexible network models.
We obtain the conditional quantile estimator (say {;n) from a sample of size n by solving the
optimization problem
9 Eme~'lf) Qn(8) = n
11
-1 L I Y,-()(Xt)1(pl[Y, ~()(Xt)] + (l-p)l[Yt )(X,)]) ,
t =1
(2.2)
where 1 [ .] is the indicator function for the specified event. Solving this problem is therefore
equivalent to solving the problem
n
mill n-1 L I yt -~.(Xt, $1. )
8"' E D. t =1
(pl[Yt ~fl"(xt, ~" )] + (l-p)l[Yt <fl"(xt, ~" )])
$ iln } .This is a direct non-linear analog
-9-
The growth of en(1jI) must be sufficiently restricted to ensure that the uniform convergence
condition (2.3) holds. In the present application Q is defined by Q(}) = E(Q/I(})). The required
unifonn convergence holds under straightfo1Ward conditions using Theorem 4.2 and Lemma 4.3
of White (1990). To control the growth of en(1jI) appropriately, we impose conditions on 1jI, {qn}
and {~n } .We say that 11' satisfies a Lipschitz condition if 111'(a I) -1I'(a2) I $: L I a 1 -a21 for all
a 1, a2 E m. and some L E m.+ .We denote L as the set of all functions 1/' : m. -? m. such that
11' is bounded, satisfies a Lipschitz condition (for given L < ~ ) and is either a c.d.f. or is l-finite.
The following condition imposes the appropriate structure for our problem.
ASSUMPnON A.2: en(111) = T(1/', qn' ~n)' n = 1, 2"., where 111 E L and {qn} and {~n} are
such that q/1 i 00, ~/I i 00 as n ~ 00, ~/I = o(n'l. ) and either (i) q/1 ~; log q/1 ~/I = o(n) or (ii)
qn ~nlog qn ~n = o(nY.).
LE~1MA 2.4: Suppose Assumptions A. 1 (i) and A.2(i) or A.l(ii) and A.2(ii) hold. Then
0 P[SUP8Ee.(IJI)IQ/I(e)-E(Q/I(e))1 >e]-7Q as n-7oo
It remains to ensure the identification condition (2.4) and the continuity of Q at 80 = 8po
Continuity is straightforward. For (2.4), the following condition suffices.
ASSUMPnON A.3: For given p E (0,1), ep: Or ~ 0 is a measurable function such that
P[Yt ~8p(XJIXt] =p and for every 8 E e and all E > O sufficiently small E 18(Xt)-8p(Xt)1 >E
implies
E[1[(9p(Xt) + 9(Xt))/2 ~ rt < 9p(Xt)] 19(Xt) < 9p(Xt)] > 8 and
E[1[6p(X,) S yt < (6p(X,) + 6 (Xt/2 ] I 6(X,) ~ 6p(Xt)] > oe . 0
This assumption ensures that the conditional distribution of Y, given X, is continuous in a
neighborhood ofep(x,), ensuring the uniqueness (a.s.-P) ofep(Xt).
13 -
optimum. For given' > 0. picking {) 11, e S1l(E. ,) requires only approximate optimization.
The result for approximate estimation can now be stated.
THEOREM 2.8: Given Assumptions B.l, B.2 and A.3, for anye > O there exist qE E IN and
'e > O such that if{} n. e E Sn(E, 'e), n = I, 2,... then {}n.e E S(E) a.a.n, a.s.-P.
0
Thus, conditional quantile functions 8 p can be estimated to any desired degree of (L 1-) accuracy
using artificial neural networks having estimated parameters that approximately solve the
nonlinear quantilc rcgrc33ion problcm.
3. SOME REMARKS ON COMPUTAnON
Due to the nonlinearity in parameters of the neural network output function, standard linear
programming methods for computing e1l or {) 11, (e.g. FultOn, Subramanian and Carson, 1987)
cannot be used. However, the neural network output function is linear in the /3 parameters. This
suggests that computation of estimates for fixed q could proceed by selecting parameters r (e.g.
at random) and then applying standard teclmiques to estimate parameters fJ. Picking sufficiently
many values for r (e.g. by multi-start methods) and then estimating/3 as just described should
produce useful estimates.
Alternatively, the nonsmoothness and nonlinearity of the objective function Qn(e) make it
an attractive candidate for application of simulated annealing (e.g. Hajek. 1985) or genetic
algorithm (GA) optimization methods (Holland, 1975; Goldberg,1989),
Regardless of the method applied, computing either ell or {) 11, E is cenain to be
computationally demanding.
The attractiveness of using such estimates to produce
nonparamettic interval forecasts may suffice in certain applications to justify this effort.
-14-
MATHEMAnCAL APPENDIX
All notation and assumptions are as given in the text.
PROOF OF LEMMA 2.3:
x E D.r, {30 E 1R, {3j E 1R, rj E 1Rr+l , j = 1, ..., q, q E IN}. It follows from Theorem 2.4 of
HSWa or Corollary 3.6 of HSWb and Theorem 3.14 of Rudin (1974) that Lr(1fI) is dense in
L 1 ( Dr, .u). Let 9 be an arbitrary element of Lr(1JI) so that for some q E IN, fJo E 1R , fJ j E 1R ,
r .E IRr...l J" =
1 J ...
0 -
, q, we have g(x) = fJo + L fJ j 1jI(x' rj). Because qn -7 00 and Lln -7 00
j=1
we can always pick n sufficiently large that Lj=o lf3j I ~ Llnl Lj=l L;=o Irji I ~ Lln and q ~ qn'
Thus, for n sufficiently large 9 belongs to en(1jf) = T(1jf, qll' Lln) and therefore U;=l en (1jf).
Because 9 is arbitrary }::,r(1/f) c U:=1 8/1 (1jI). It follows from tIle denseness of}::,r(1jI) inLl( Or ,JL)
that U;=l 8n(1jf) is dense inLl( nn,JL),
D
PROOF OF LEMMA 2.4: We apply Theorem 4.2 and Lemma 4.3 of White (1990). For
Theorem 4.2, the complete probability space (.Q., F, P) is that of Assumption A.l; the metric
space (e, p) is that of the continuous functions on Dr with unifonn metric. The sequence { en }
of Theorem 4.2 corresponds to {8n(1/l)} here. By choice of 1/1, {qn} and {L\n}, this is an
increasing sequence of (separable) subsets of e. The summands of interest are
Sn(Zt. 0) = I Yt-O(Xt)l(pl[Yt ~O(Xt)] + (l-p)l[Yt <O(XJ])
where Sn (White's (1990) notation) does not depend on n here. Note that the continuity of e
ensures that for each () in en(1jI) Sn( ., ()) is continuous on n, as required.
Now for each z in Dr + 1 and ()O in en (lJI), geometric arguments ensure that
I S/I(Z, (}) -S/I(Z, (}O ) ~max (p, I-p) 18(x)-8(z)1
~ max (p, I-p)p(e, eo )
-15-
for all e in 11 n(eO ) = {e E en(1jf) : p(e, eO) < 1 }. Consequently, putting mn(z, e) = 1 and mn = 1
(White's (1990) notation)
that the
guarantees
required
domination condition
Is/I(Z, (})-S/I(Z' (}O)I ~m/1(z, (}O)p(}, (}Of holds with..:t = 1 and d/I(}O) = 1 in White's (1990)
notation.
We also require a choice for Sn ;?; sup 8 e 8.('1') I Sn(Z, 8) I. Now
f)(x) I
Isn(z,8)1 ~ ly-8(x)1 ~ I yl +
Taking the bound on 1JI to be unity (without loss of generality), we have I(}(x) I :5 Lln. With
/1n ~ 1 we then have Isn(z,())1 ~2/1n for all z e Dr+l and () e en (1jI).
We therefore take
s,. = 21\.,.
Because the conditions of Theorem 4.2 of White (1990) thus hold, it follows from (i) of
Theorem 4.2 ( {Zt } i.i.d.) that
ta.l.l)
n
Plsupee e.(I/f)ln-1 :}:[sn(z,,(})-E(sn(Z,,(}]/ >E]
1=1
~ 2Gn(E/6)[exp(-6n/7) + exp(-E2n/4~;[18+4E])]
for any e > O and all n sufficiently large, where Gn(e) = exp Hn(e) and Hn(e) is the metric entropy
of en(1j/). If in addition ,,-1 ~ -7 O as ,~ -7 ~ culd fUl cilll;; ? O (S;;fn) /fn(EIO) ~ o as n ~ 00, then
the right hand side of (a.l.i) converges to zero, and we are done with the proof of case (i).
Similarly, (ii) of Theorem 4.2 of White (1990) ({Zt} mixing as in Assumption A.1(ii
ensures that there exist constants O < c 1 .c 2 < 00 independent nf n ~l1rh that for any S ~ O and all
n sufficiently large
n
(a.l.ii) P[sup 9 E e.(1jI) I n-1 L [Sn(ZI. e) -E(Sn(ZI. e))] I > E]
1=1
~ c 1 Gn(e/6)[exp(-c2 n'l. ) + eXp(-C2 e n 'I./12Lln)] .
-16-
If in addition n-l i -7 O as n -7 00 and for all e > 0 ( snln% ) Hn(eI6) -7 O as n -700, then the
right hand side of (a.l.ii) converges to zero, and we are done with the proof of case (ii).
It remains to verify the conditions on i and the convergence to zero of the bounds in (a.l.i)
and (a.l.ii). Because sn = 2~n' in either case it suffices that ~n = o(n'1.). Now Lemma 4.3 of
White (1990) ensures that for all E > 0 sufficiently small.
Hn(E) ~ vn tog q/E + Vn 1og[~n + rL ~; ] + vn tog Vn ,
We seek choices for {qn} ensuring that (iln) Hn(eI6) ~ O or where Vn = qn(r + 2) + 1
( Sn/n x) Hn(E/6) ~ 0. Now for all n sufficiently large. we will have Ll~ > Lln and Ll~ > (I +rL), as
well as ~~ > 48/. Consequently,
Hn(e/6) ~ Vn lag 48/e + Vn lag[An+rL A; ] + Vn lag Vn
~ Vn log ~~ + Vn log ~~(l +rL) + Vn log Vn
~ 6vn log ~n + Vn log Vn
$ 6vll log ~II VII
--;l;
have VII L\~ log VII L\II = o (n) SO that ( sllln) HII(eI6) -7 O as required.
Because qn ~n log qn ~n = o(n Yo ),
Similarly, ( snln Yz ) Hn(EI6) S n-Yz 12 Vn /).n log Vn /).n.
the desired result again follows.
0
PROOF OF THEOREM 2.5: (a) The argument is sketched in the text. Note that the
compactness of en(ljI) follows because it is totally bounded and closed.
(b) The denseness OfV;=l en('1') required in Theorem 2.2(b) is established by Lemma 2.3,
:Inti thp lmiform convergence condition (2.3) is established by Lcmma 2.4. It rcm4U~ tu vcrify
(2.4) and the continuity at (}p of Q, where
-19-
inf9 e 1JC(9,. )Q(()) -Q(() p) > 0 /2 > O and verifying (2.4).
D
PROOF OF THEOKEM 2.S: The argument is similar to that of Theorem 3.1 of Stinchcombe
and White (1990). For given E > 0. choose E > 0 as guaranteed by Assumption A.3. Given
Assumption B.2, it follows from either Theorem 2.3 or 2.6 of Stinchcombe and White (1990) and
the argument of Corollary of HSWa that there exists q e E iN sufficiently large and ee E 8(1//, 6)
such thatp(f), f)p) < ~12, ~e =e DE/2.
Because ee(1fI, Ll) is the continuous image of the compact set {~.
: 1/30 I ~ ~, 1/3 j I ~ ~,
Irji I ~ Ll, i = 0, ..., r, j = 1, ..., qe}, ee(!JI, Ll) is compact Because Qn(e) is continuous on
e.,(1Jf .A) for each realization of {2, ) it follnw~ that there always exists a miniInizcr e II, of QII
on E>e(1jI, 6.). It follows from Theorem 1 of White (1989) that {j n, e ~ 8; a.s., where
provided
that there exists Dl~ IYl-fJ(Xl)I((l-p)l[Y,~fJ(X,)]-p l[Yl>fJ(X,)]) for all
8 E 8e(1f/ , ~) such that E(Dt) < 00,
But I Yt -()(Xt) I ((l-p)l[Yt ~ ()(Xt)] -p l[Yt > ()(XuD
~ 21 Yr -8(Xr) I ~ 21 Yr I + 2 sup 9 E e,(1iI. ~).X E U' 8(x) E Dr. Assumptions B.l and B.2 ensure that
E(Dt) < 00. Note that Theorem 1 of White (1989) assumes that {yt, Xt} is an i.i.d. sequence, but
ergodic theorem (e.g. Stout, 1974, Theorem 3.5.7) instead of the Kolmogorov law of large
numbers for i.i.d.
in establishing
the unifoml sequences law of large
numbers,
suPe E B.(1JI. A) I Qn((}) -Q((}) I ~ O a.s.
We establish that {}n.E E S(e) a.a.n. a.s. by contradiction. Suppose the desired conclusion
is false. Then there exists F E F, P(F) > 0, such that for each (0 E F there exists a subsequence
{ n '} for which {) n " E e S (e) for all n '. Without loss of generality .we may also choose {.0 so that
-A .
sup 8 E 8.('1', ~) I Q1I((0, e) -Q(e) I -7 O and e 11,((0) -78 as n -700, as these events occur for (0 in
of
probability
a set
Now one.
20-
Q(fJn-,(J))1 + IQ(fJn-,(J))-Q(8;)1 + IQ(8;)-Q(8p)1 for8;e e;. For the last teffil,the
argument of Theorem 2.5 establishes that Q is minimized at 8po Hence Q(8p) ~ Q(8; ) ~ Q(8),
so that O ~ Q(}; ) -Q(}p) ~ Q(}E) -Q(}p) <P(}E' (}p) < ~E/2 by choice of (}Eo
I Q((); ) -Q(()p) I < ~e/2. For the second term, the fact that {}n'.e((J)) -78; and the continuity of
Q imply that IQ({)n'.((J))) -Q(e; ) I < '18 for all n' sufficiently large.
For the first telm. ~ I Q({)n',e(co)) -Qn'(CO, {)n',e(co))1
+ IQ(en,.E({J)))-Qn'({J),en,.E((J)))1 + IQn'((O' {)n',e((0)) -Qn'((O, {)n',e((0))1 < t;e/8 + t;e/8 + t;
for all n' sufficiently large by the triangle inequality, the unifolm law of large numbers and the
definition of {) n'. E. Putting t; = t;E/8 and collecting together all the preceding inequalities, we
--
have I Q({)n',e(ro)) -Q((Jp) I < 'e for all n' sufficiently large.
We
complete
the
proof by showing
that
--
IQ({)n'.e((J)))-Q(ep)1 e implies
P({)n',e(ro).8p) <e for all n' sufficiently large. contradicting {)n',e(ro) E S(e) for all n
In the proof of Theorem 2.5 we established that given Assumption A.3,
Q(f))-Q(f)p) > 8eP(f),f)p)/2 for f) E 17C(f)p,E)
Because {)n',e(OJ) e S(e), {)n',((J)) E 1JC(ep. e) so
Q(8 '.',I:;(tt) -Q(9p) > Dc p(8 '.',I:;(tt, 9p)J2 for all
Eufficiendy large.
But n
--
t;e> Q(8n,.e(J)))- Q(}p) > 8ep(8n'.e(J)), (}p)/2 impliesp(8n,.e(J)), (}p) <e for all n' sufficiently
large, because , = 8 e/2. We thus have a contradiction, and as (0 E F was arbitrary, the proof is
complete.
0
-21
REFERENCES
Canon, S.M. and B. W. Dickinson, (1989), "Construction of Neural Nets Using the Radon
TransfoIm,'. in rroceedmgs of the International Joint Conference on Neural Networks,
Washington, D.C., New York: IEEE Press, pp. 1:607-611.
Cybenko, G. (1989), "Approx.imation by Superposition of a Sigmoid Function," Mathematics of
Control, Signals and Systems 2, 303-314.
Fulton, M., S. Subramanian and R. Carson (1987), "Estimating Fast Regression Quantiles Using a
Modification of t1lC B(UlUI.lCIlc Wll.l R.obens L 1 Algonmm,'. uepamnem of Economics
Discussion Paper 87-8, University of California, San Diego.
Funahashi, K. (1989), "On the Approximate Realization of Continuous Mappings by Neural
Networks," Neural Networks, 2, 183-192.
Geman, S. and c. Hwang (1982): "Nonparametric Maximum Likelihood Estimation by the
Method of Sieves," The Annals ofStan'sn'cs 10, 401-414.
Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning.
Reading, MA: Addison Wesley.
Grenander, U. (1981). Abstract Inference. New York: Wiley.
Hajek, B. (1985), "A Tutorial Survey of Theory and Applications of Simulated Annealing," in
Procccdin83 of thc 24th IEEE Confcrcllcc QI' Dc:(;iJiu" u,Jd Cun,rul, pp. 7-'-'- 700.
Hecht-Nie1sen, R. (1989), "Theory of the Back-Propagation Neural Network," in Proceedings of
the International Joint Conference on Neural Networks, WashingtOn, D.C., New York:
IEEE Press, pp. I: 593-606.
Homik, K. (1989), "Learning Capabilities of Multi1ayer Feedforward Networks," Technische
Universitat Wien, technical report.
Homik, K., M. Stinchcombe and H. White {1989), "Multilayer Feedforward Networks are
Universal Approxirnators," Neural Networks 2,359-366.
Homik, K., M. Stinchcombe and H. White (1990), "Universal Approximation of an Unknown
Mapping and Its Derivatives," Neural Networks 3, (to appear).
Holland, J. (1975). Adaptation in Natural and Artificial SyJtC171.r
Michigan Press.
Aru1 AtllUl. Ul1ivt:Thlly or
Koenker, R. and G. Basset (1978), "Regression Quantiles," Econometrica 46,33-55.
Rinnooy Kan, A.H.G., C.G.E. Boender and G. Th. Timmer (1985), "A Stockastic Approach to
Global Optimization," in K. Schittkowski, ed., Computational Mathematical
Programming, NATO ASI Series, Vo1. F15. Berlin: Springer-Verlag, pp. 281-308.
Rudin, W. (1974). Real and Complex Analysis. New York: McGraw Hill.
-22-
Stinchcombe, M. and H. White (1989), "Universal Approximation Using Feedforward Networks
with Non-Sigmoid Hidden Layer Activation Functions," in Proceedings of the
International Joint Conference on Neural Networks, Washington, D.C., New York: IEEE
Pct::):), pp. I: 613-617.
Stinchcombe, M. and H. White (1990), "Approximating and Learning Unknown Mappings Using
Multilayer Feedforward Networks with Bounded Weights," in Proceedings of the
International Joint Conference on Neural Networks, San Diego. New York: IEEE Press,
pp. lli-7-15.
Stout, w.r. (1974). AllfwJt SUIt: CUI'Yt:rb't:"ct:. Ncw YUlk.; ACC1Uemlc PresS.
White, H. (1989), "Learning in Artificial Neural Networks: A Statistical Perspective," Neural
Computation 1,425-464.
White, H. (1990), "Connectionist Nonparametric Regression: Multilayer FeedfolWard Networks
Can Learn Arbitrary Mappings," N eural N etworks 3, (to appear).
wrute, ti. ana J. Woolartdge (1~~U), "Some Results on Sieve Estimation with Dependent
Observations," in w. Barnett, I. Powell and G. Tauchen, eds., Nonparametric and Semi-
Parametric Methods in Econometrics and Statistics. New York: Cambridge University
Press. (to appear)

You might also like