The document describes the training process of a neural network for classification. It discusses:
1. The parameters used for training including a learning rate of 0.05, bias of 1.0, and sequential gradient descent.
2. Why shuffling the data and using sequential gradient descent was chosen over batch gradient descent.
3. Different methods of initializing the weights and why initializing to place the decision boundary at x=2 worked best on average.
Original Description:
Single Layer Perceptron for Regression and Classification
The document describes the training process of a neural network for classification. It discusses:
1. The parameters used for training including a learning rate of 0.05, bias of 1.0, and sequential gradient descent.
2. Why shuffling the data and using sequential gradient descent was chosen over batch gradient descent.
3. Different methods of initializing the weights and why initializing to place the decision boundary at x=2 worked best on average.
The document describes the training process of a neural network for classification. It discusses:
1. The parameters used for training including a learning rate of 0.05, bias of 1.0, and sequential gradient descent.
2. Why shuffling the data and using sequential gradient descent was chosen over batch gradient descent.
3. Different methods of initializing the weights and why initializing to place the decision boundary at x=2 worked best on average.
Part A (1): Illustration of Training Process Setup of training process Parameter Setting 0.5 bias 1.0 Error function erce!tron criterion "eig#t u!date rule $e%uential &radient 'escent Acti(ation function )ea(iside: * + ,1 if -. / 001 21 ot#erwise 3n!ut attern4 5arget class 1 + -11 104 -01 104 target + ,1 class 2 + -01 004 -11 004 target + 21 3nitial "eig#ts w 5 + -w 0 1 w 1 1 w 2 0 + -01 01 00 3 am using t#e erce!tron criterion as error function because of its information about t#e current error gradient and 3 am using $e%uential &radient 'escent because in a %uick e.!erimental e(aluation 3 found t#at it con(erges faster t#an 6atc# &radient 'escent -see t#e discussion in art A -20 below0. After initialisation1 t#e weig#t (ector is at t#e origin1 #ence wit# t#e gi(en in!ut1 7 !atterns 1 -11 101 -01 10 and -11 00 are misclassi8ed after t#e 8rst training e!oc#. 5#e in!ut -01 00 is classi8ed correctl* as its acti(ation (alue is 01 and as t#e acti(ation function #as a #ard limit at . / 01 t#e !attern is t#erefore classi8ed wit# a !redicted target (alue of 21. 1 of 20 5able 1: $etu! of 5raining rocess lot 1: $tate of t#e Network after 3nitialisation Neural Networks Assignment 1 Candidate Number: 19214 After t#e 8rst e!oc# t#e weig#t (ector is u!dated as w 5 + -20.51 01 0.501 resulting in a decision boundar* of . 2 + 2w 0 9w 2 1 w#ic# results in a #ori:ontal line at . 2 + 1. 3n t#is training e!oc#1 resulting in 2 in!ut !atterns1 -11 10 and -11 00 being misclassi8ed and an u!dated weig#t (ector of w 5 + -20.51 01 10. 5#is s#ifts t#e decision boundar* down towards t#e origin1 resulting in a #ori:ontal line at . 2 + 0.5. "it# t#is weig#t u!date1 all !oints are correctl* classi8ed and t#erefore training ends after t#e ne.t e!oc#1 w#ere t#e weig#ts are left unc#anged as t#e correct decision boundar* #as alread* been learnt. 5#e 8nal weig#t (ector t#erefore is w 5 + -20.51 01 10 and is dis!la*ed in lot 7 below. 2 of 20 lot 2: $tate of t#e Network after 5raining E!oc# 1 lot 7: $tate of t#e Network after 5raining E!oc# 2 Neural Networks Assignment 1 Candidate Number: 19214 Learnability 5#e !erce!tron is able to learn an* linearl* se!arable in!ut1 out of t#e ; different in!ut combinations1 4 are linearl* se!arable and can be learnt b* t#e erce!tron1 t#e ot#er 21 re!resenting <=> and <N=> res!.1 are not linearl* se!arable and #ence1 cannot be learnt b* t#e erce!tron. 5able 2 below gi(es an o(er(iew o(er w#ic# in!ut combinations t#e erce!tron can learn. N6: 3n essence t#e ; different in!ut combinations conform to 7 uni%ue in!ut !attern1 t#at is1 w#en class 1 + -01 004 -11 10 and class 2 + -01 104 -11 001 t#en t#ere is an in!ut combination w#ere class 2 + class 1 and class 1 + class 21 w#ic# re!resents t#e same !attern1 ?ust wit# different class ad#erence1 and #ence1 t#e erce!tron is able to learn 2 out of 7 uni%ue in!ut !atterns. # Input Patterns Target Learnable Comment 1 class 1 + -01 004 -01 10 ,1 @es class 2 + -11 004 -11 10 21 2 class 1 + -01 004 -11 00 ,1 @es class 2 + -01 104 -11 10 21 7 class 1 + -01 004 -11 10 ,1 No <N=> class 2 + -01 104 -11 00 21 4 class 1 + -01 104 -11 00 ,1 No <=> class 2 + -01 004 -11 10 21 5 class 1 + -01 104 -11 10 ,1 @es class 2 + -01 004 -11 00 21 ; class 1 + -11 004 -11 10 ,1 @es class 2 + -01 004 -01 10 21 7 of 20 5able 2: Aearnabilit* =(er(iew Neural Networks Assignment 1 Candidate Number: 19214 Epocs until Con!ergence Bor t#e illustrated training !rocedure wit# + 0.51 bias + 1.0 and initial weig#t (ector of w 5 + -01 01 001 t#e learning algorit#m con(erged after 227 e!oc#s -for t#e linearl* se!arable !roblems01 5able 7 gi(es a s#ort o(er(iew of t#e number of e!oc#s until con(ergence !er in!ut !attern for t#e abo(e mentioned starting !arameters. Input Patterns Epocs until con!ergence class 1 + -11 004 -11 101 target + ,1 2 class 2 + -01 004 -01 101 target + 21 class 1 + -01 004 -01 101 target + ,1 7 class 2 + -11 004 -11 101 target + 21 class 1 + -01 004 -01 101 target + ,1 7 class 2 + -11 004 -11 101 target + 21 class 1 + -11 104 -01 101 target + ,1 7 class 2 + -01 004 -11 001 target + 21 3n general 3 found t#at con(ergence itself1 t#at is t#e fact w#et#er or not t#e !roblem is learnable or not1 is inde!endent of t#e (alues of and t#e bias1 and t#at furt#er1 t#e (alue of doesnCt affect t#e number of e!oc#s taken until con(ergence1 #owe(er t#e bias (alue does #a(e an im!act on t#e number of e!oc#s for t#e gi(en setu!. lot 4 dis!la*s t#e number of e!oc#s until con(ergence for (ar*ing (alues of + D1.51 1.01 0.51 0.11 0.051 0.011 0.0051 0.001E and t#e bias + D221 21.51 21.01 20.51 01 0.51 11 1.51 2E and an initial weig#t (ector of w 5 + -01 01 00. 4 of 20 5able 7: E!oc#s until Con(ergence =(er(iew lot 4: Error $urface for different (alues of and t#e bias Neural Networks Assignment 1 Candidate Number: 19214 Part A ("): Training Process Basic Setup of training process Parameter Setting 0.05 bias 1.0 Error function erce!tron criterion "eig#t u!date rule $e%uential &radient 'escent Acti(ation function )ea(iside: * + ,1 if -. / 001 21 ot#erwise 3 decided to use erce!tron criterion as error function because its gi(ing me information about t#e error gradient w#ic# is needed to !erform &radient 'escent. 3 furt#er c#ose to use $e%uential &radient 'escent in fa(our of 6atc# &radient 'escent because 3 found t#at $e%uential &radient con(erged %uicker in m* e.!eriments. =(er 100 test runs1 $e%uential &radient 'escent1 on a(erage con(erged after 1F e!oc#s1 w#ereas 6atc# &radient 'escent took 21 e!oc#s for con(ergence on a(erage. 3 furt#er decided to stick wit# a bias (alue of 1.0 and to c#oose + 0.051 w#ic#1 after a few test runs1 a!!eared to be a reasonable c#oice between granularit* and s!eed of con(ergence. To Suf#e or not to Suf#e 6efore e(er* training e!oc#1 3 s#ufGed t#e w#ole dataset so as to not o(er8t t#e data. 3n general 3 found t#at w#en s#ufGing is !erformed1 con(ergence usuall* takes longer -o(er 100 test runs1 a network t#at s#ufGes its data before e(er* e!oc# con(erged in 1F e!oc#s on a(erage w#ereas a network wit#out s#ufGing con(erged in 17 e!oc#s on a(erage01 but often results in more solid looking decision boundaries. )ence1 3 added s#ufGing of in!ut data to m* training regime. $eigt Initialisation 3 found t#at initialising t#e weig#ts in s!eci8c wa*s #as a signi8cant im!act on t#e number of training e!oc# t#e network takes to con(erge. Bor t#e gi(en task1 we were to sam!le 10 data !oints from 2 &aussian distributions wit# means 1 + 0 and 2 + 4 res!ecti(el*. lot 5 s#ows t#e 2 distributions wit# t#e o!timal decision boundar* -in a 6a*esian sense0 at . + 2. )ence1 3 e.!ected t#at an initial decision boundar* at . + 2 would often alread* be t#e correct decision boundar* for t#e gi(en data !oints. 5 of 20 5able 4: 6asic 3nitialisation of t#e free !arameters Neural Networks Assignment 1 Candidate Number: 19214 5#is #*!ot#esis turned out to be true1 in an e.!eriment wit# 1000 test runs1 initialising t#e weig#ts so t#at t#e decision boundar* is a (ertical line at . + 2 turned out to be t#e correct solution in ;14 out of 1000 runs. 5#e mean number of e!oc#s until con(ergence was 151 #owe(er t#is number is somew#at distorted b* t#e fact t#at not all of t#e 1000 !roblems were linearl* se!arable1 w#ere t#e algorit#m terminated after 200 iterations. 3n com!arison1 random weig#t initialisation took at least 2 e!oc#s to 8nd a decision boundar* and t#is #a!!ened onl* 1H4 out of 1000 times1 t#us underlining t#e su!eriorit* of m* weig#t initialisation met#od. )owe(er1 3 found an e(en better weig#t initialisation met#od t#an setting a (ertical line at . + 21 namel* 3 initialised t#e weig#ts b* using t#e Iinimum $%uared Error criterion1 w#ic# essentiall* is &radient 'escent wit# Aeast Iean $%uares as error function. As t#e gradient is a(ailable in closed form for t#e t#is setu!1 no &radient 'escent was needed and t#e Aeast Iean $%uares solution could be obtained directl*. 5#e ca(eat of t#is met#od is t#at it ma* fail to 8nd a solution1 e(en if t#ere is one1 #ence 3 onl* used it as a wa* of initialising t#e weig#ts for t#e network. 5#is initialisation resulted in being a solution in ;92 out of 1000 test runs1 out!erforming t#e (ertical line at . + 2 initialisation. 5#e a(erage number of e!oc#s until con(ergence for t#is met#od was J14.;1 #owe(er as alread* mentioned abo(e1 t#is number is slig#tl* distorted due to t#e fact t#at not all !roblems #a(e been linearl* se!arable. 5aken t#e w#ole setu! furt#er1 t#e Iinimum $%uared Error criterion would #a(e gi(en rise to t#e )o2Kas#*a! !rocedure1 w#ic# 3 started to im!lement but lack of time !re(ented me from 8nis#ing an im!lementation 1 . Note t#at for !ur!oses of better illustrating t#e network learning !rogress 3 generall* initialised t#e weig#ts wit# random (alues. 1 &utierre:2=suna1 >icardo1 L17: Linear Discriminant Functions. 5e.as: 5e.as A L I Mni(ersit*. A(ailable from: #tt!:99researc#.cs.tamu.edu9!rism9lectures9!r9!rNl1F.!df -accessed 17 t# Bebruar* 20140 ; of 20 lot 5: 2 &aussian 'istributions Neural Networks Assignment 1 Candidate Number: 19214 Con!ergence % &ecay 3 also e.!erimented wit# a deca* factor for t#e learning rate 1 w#ere 3 di(ided t#e learning rate b* 2 e(er* 20 e!oc#s. 5#is a!!roac# introduces an additional ad(antage as well as an additional disad(antage. 5#e merit being t#at in case t#e learning rate #as initiall* been too large1 w#ic# could lead to t#e situation t#at a global minimum is being o(ers#ot1 t#e deca* of t#e learning rate acts as a regulator to scale down until a minimum can be reac#ed. 5#e drawback of t#is met#od is t#at it could slow down con(ergence and cause t#e algorit#m to terminate wit#out #a(ing found a solution e(en if t#ere would #a(e been one. As 3 didnCt want t#at to #a!!en1 3 disabled t#e deca* factor for most e.!eriments. Sigmoi'al Acti!ation !s( )ea!isi'e Acti!ation 5#e ma?or difference between t#e #ea(iside function and sigmoids -a!art from t#e fact t#at t#e latter can be differentiated w#ereas t#e former cannot0 is t#at a sigmoidal acti(ation function suc# as tan# is continuous1 w#ereas t#e #ea(iside acti(ation function is unde8ned at acti(ationNmagnitude + 0. 5#e (alue returned b* a sigmoidal acti(ation function can be inter!reted as t#e le(el of con8dence of t#e current classi8cation decision1 and in indeed t#e logistic acti(ation function re!resents t#e !robabilit* of a gi(en data !oint for t#e gi(en class -CO.0. 3n ot#er words1 t#e (alue returned b* a sigmoidal acti(ation function can also be seen as a distance measure to t#e current decision boundar*. Bor t#e tan# acti(ation function1 t#e closer t#e (alue is to 0 t#e closer t#e current data !oint is to t#e decision boundar*. A sigmoidal and t#e #ea(iside acti(ation function s#are t#e fact t#at at some !oint a #ard limit needs to be a!!lied in order to get a classi8cation decision. Bor t#e #ea(iside function as well as for tan# t#is is at acti(ationNmagnitude + 01 w#ere t#e data !oint needs to be ma!!ed to t#e target s!ace in some wa*. 3 em!iricall* e(aluated t#e a(erage e!oc#s for con(ergence for a tan# acti(ation function and t#e #ea(iside acti(ation function and found t#at on a(erage a network trained wit# t#e #ea(iside acti(ation function con(erges slig#tl* faster t#an wit# tan#. =(er 100 test runs1 a network wit# tan# con(erged after 14 e!oc#s on a(erage w#ereas a network trained wit# #ea(iside con(erged after 12.F e!oc#s on a(erage. 6ut t#is is not t#e onl* difference1 t#e resulting decision boundaries usuall* differ as well as lots ; L F s#ow1 w#ere classi8cation was carried out wit# t#e same data !oints and t#e same initialisation of t#e networks free !arameters. )owe(er1 inde!endent of t#e acti(ation function1 onl* linearl* se!arable !roblems can be sol(ed. F of 20 Neural Networks Assignment 1 Candidate Number: 19214 *on+linearly separable &ata 3f no e.it criterion after a gi(en number of e!oc#s would #a(e been su!!lied t#en t#e algorit#m would not terminate for a non2linearl* se!arable !roblem. As lots H211 s#ow t#e network is somew#at des!eratel* tr*ing to 8nd a solution t#at se!arates t#e 2 classes but doesnCt 8nd one. 5#ese 4 lots re!resent t#e network after 91 101 11 and 12 training e!oc#s res!ecti(el* and are based on sam!ling data from 2 &aussian distributions wit# means 1 + 0 and 2 + 2. lot 12 s#ows t#e corres!onding error rate w#ic# is #ea(il* oscillating as t#e algorit#m tries to 8t a decision boundar*. 3n contrast lot 17 s#ows t#e error rate for a linearl* se!arable !roblem. H of 20 lot ;: >esulting 'ecision 6oundar* wit# )ea(iside Acti(ation Bunction lot F: >esulting 'ecision 6oundar* wit# tan# Acti(ation Bunction lot H212: C#ange of 'ecision 6oundar* in a linearl* unse!arable !roblem. Neural Networks Assignment 1 Candidate Number: 19214 Illustrating te Training Process lots 15221 s#ow t#e learning !rocess for a linearl* se!arable !roblem wit# t#e 7 !re(iousl* described weig#t initialisation met#ods. lot 15 re!resents t#e decision boundar* w#en t#e weig#ts are initalised wit# t#e Iinimum $%uared Error criterion. lots 1; L 1F s#ow t#e networks learning !rogress w#en t#e weig#ts are initialised b* setting t#e decision boundar* at . + 2 and lot 1H s#ows t#e decision boundar* wit# randoml* initialised weig#ts and lots 19 221 s#ow t#e last 7 training e!oc#s -out of ; in total0 of t#e networks learning !rogress. 9 of 20 lot 17: Error rate for a non2linearl* se!arable !roblem lot 14: Error rate for a linearl* se!arable !roblem lot 15: Network 'ecision 6oundar* wit# Iinimum $%uared Error criterion weig#t initialisation. Neural Networks Assignment 1 Candidate Number: 19214 10 of 20 lot 1;: Network 'ecision 6oundar* wit# . + 2 weig#t initialisation. 'ue to an outlier of class 2 at J-1.H90.H01 t#e initialised 'ecision 6oundar* is not *et a solution. lot 1F: Network 'ecision 6oundar* wit# . + 2 weig#t initialisation. 5#e Network was able to learn a correct 'ecision 6oundar* after onl* 1 training e!oc#. lot 1H: Network 'ecision 6oundar* wit# randoml* initialised weig#ts before t#e 8rst training e!oc#. lot 19: Network 'ecision 6oundar* wit# randoml* initialised weig#ts1 after training e!oc# 4 of ;. lot 20: Network 'ecision 6oundar* wit# randoml* initialised weig#ts1 after training e!oc# 5 of ;. lot 21: Network 'ecision 6oundar* wit# randoml* initialised weig#ts1 after training e!oc# ; of ;. 'ecision 6oundar* successfull* learnt. Neural Networks Assignment 1 Candidate Number: 19214 Part , (1): Setup of training process Parameter Setting 0.0001 bias 7 , mean-0 Error function Aeast Iean $%uared Error "eig#t u!date rule $e%uential &radient 'escent 3nitial "eig#ts w 5 + -w 0 1 w 1 0 + -11 0.40 5#e task for t#e network is to 8nd t#e best 8t line for t#e gi(en data !oints. Bor a regression scenario like t#e gi(en one1 t#e goal is to !redict a target (ariable * gi(en in!uts from 1 P n (ariables. 3n essence1 for regression1 t#ere is no need for an acti(ation function1 as t#e acti(ation !roduced b* t#e network alread* re!resents t#e %uantit* of interest1 alt#oug# strictl* s!eaking1 one could argue t#at t#e network uses t#e identit* function as its acti(ation function. *et-or. Initialisation 5#e %uantit* is drawn from a uniform distribution in t#e inter(al Q210 ,10R and is added to t#e interce!t term of t#e function * + 0.4. , 7 , 1 #ence gi(en an in8nite amount of data !oints for t#e function1 3 would e.!ect t#e to con(erge to 0. As t#ere is onl* 1 in!ut !arameter1 t#e regression line will be a straig#t line of t#e form * + k. , d1 wit# t#e gradient being close to t#e gradient of t#e original function1 so 0.4. 3 c#ose to initialise t#e weig#t (ector as w 5 + -w 0 1 w 1 0 + -11 0.40 and to use a bias (alue of 7 , mean-0. "it# an initialisation close to t#e underl*ing real function 3 was #o!ing to reduce t#e number of training e!oc#s re%uired. As for classi8cation1 3 also s#ufGed t#e data before eac# training e!oc# for t#e regression task and found t#at network con(ergence took a lot longer. =(er 100 test runs1 a network t#at s#ufGed t#e data before e(er* e!oc# re%uired 211 e!oc#s on a(erage for con(ergence1 w#ereas t#e a(erage con(ergence wit#out s#ufGing t#e data was 2 e!oc#s. )owe(er1 on t#e ot#er #and1 a network t#at used s#ufGing !roduced slig#tl* better lines in terms of t#e a(erage s%uared error t#e* !roduced. Again o(er 100 test runs t#e mean a(erage s%uared error for a network wit# s#ufGing was 1;.021 w#ereas for a network wit#out s#ufGing t#e error was 1F.2. 3 used $e%uential &radient 'escent in con?unction wit# Aeast Iean $%uares as m* error function. 3 c#ose $e%uential &radient 'escent in fa(our of 6atc# &radient 'escent1 because in m* e.!eriments on linear regression1 $e%uential &radient 'escent generall* con(erged faster and resulted in a smaller error and t#erefore a better regression line. Bor 100 test runs1 $e%uential &radient 'escent con(erged after 210 e!oc#s on a(erage for t#e gi(en setu! w#ereas 6atc# &radient 'escent con(erged onl* after 750 e!oc#s on a(erage. 5#e a(erage of t#e mean s%uared error o(er 100 test runs for $e%uential &radient 'escent was 1F.F91H1 w#ereas t#e mean s%uared error for 6atc# &radient 'escent was 19.22;F. 3 11 of 20 5able 5: 6asic 3nitialisation of t#e free !arameters Neural Networks Assignment 1 Candidate Number: 19214 c#ose Aeast Iean $%uares as m* error function because it is sim!le to im!lement and for t#e gi(en setu! and in con?unction wit# &radient 'escent1 is guaranteed to con(erge to a global minimum -as long as t#e ot#er !arameters1 i.e. t#e learning rate1 are accordingl* set0. Con!ergence Bor testing wet#er t#e algorit#m #as con(erged1 3 com!ared t#e current error to t#e error of t#e !re(ious training e!oc# -a.k.a. !re(ious error0. 3f t#e difference between t#e current error and t#e !re(ious error is below a !rede8ned t#res#old 1 t#en t#e algorit#m sto!s. 3 most commonl* used 0.0001 or 0.00001 for . 5#e second termination criterion was w#en a !rede8ned number of e!oc#s #as been reac#ed. 3 most commonl* used (alues between 100 and 5001 w#ic# is %uite low1 but was suf8cient for t#e gi(en tasks. /f te !irtues of Preprocessing "#en running t#e network to 8nd a best28t line1 3 found it makes a #uge difference w#et#er or not t#e data #a(e been !ro!erl* !re!rocessed1 for t#e following !aragra!#s 3 will t#erefore fre%uentl* com!are between t#e differences of a!!l*ing !re!rocessing to not a!!l*ing an* !re!rocessing. All m* !re!rocessing consisted of normalising t#e in!ut and target (alues and after #a(ing found t#e best28t line1 con(erting t#e data back to its original s!ace -see Bormulas 1 L 20. /n ,ias0 $eigts an' Training Error Bor t#e training runs w#ere 3 didnCt !re!rocess t#e data1 8nding a solution was #ugel* de!endent on t#e (alue of t#e learning rate1 w#ic# needed to be (er* small -+ 0.000010 in order for t#e network to con(erge and !roduce a good28t line1 #owe(er1 t#e training error didnCt constantl* decrease as 3 would #a(e e.!ected1 but was #ea(il* oscillating -see lot 220. 5#e weig#ts for t#e network con(erged towards w 1 S 0.4 and w 0 S 11 t#e actual results for one test run were w 1 + 0.4007 and w 0 + 1.1599 res!ecti(el*. 12 of 20 Bormula 1: 'ata Normalisation Bormula 2: ost!rocessing P con(erting t#e data back to its original s!ace Neural Networks Assignment 1 Candidate Number: 19214 Bor t#e training runs w#ere 3 normalised t#e data1 t#e network was less de!endent on s!eci8c (alues for -3 usuall* #ad in t#e inter(al Q0.0001 0.001R0. )a(ing t#e data normalised1 training error was now decreasing towards 01 w#ic# is illustrated in lot 27. 5#e weig#ts for a network wit# normalised data con(erged towards w 1 S 1 and w 0 S 0 -t#e e.act 8gures were w 1 + 0.9179 and w 0 + 0.000901 w#ic# makes sense as t#e normalised function #as a gradient of k + 1 and an interce!t of 0. )ence1 3 would conclude t#at t#e weig#ts for a regression task con(erge to t#e gradient and t#e interce!t of t#e gi(en function. Bormula 2 from abo(e #ad to be a!!lied1 to use t#e weig#ts1 learnt from a normalised model1 toget#er wit# t#e original data. 17 of 20 lot 22: Error rate for raw in!ut data -no normalisation or ot#er !re!rocessing carried out0. lot 27: Error rate for normalised in!ut data. Neural Networks Assignment 1 Candidate Number: 19214 Turning up te noise 3ncreasing t#e noise results in data !oints w#ere it is #arder to recognise a straig#t line as t#e underl*ing function. 5#erefore1 b* increasing t#e random Guctuations1 t#e resulting regression line becomes more #ori:ontal1 w#ic# means t#at t#e general trend of t#e data1 re!resented b* t#e gradient of t#e -underl*ing0 function1 can no longer be reliabl* estimated. Bor e.am!le1 t#e learnt weig#t for t#e gradient wit# + Q250 ,50R is no longer close to 11 but onl* 0.;11 resulting in a gradient of J0.25 for t#e regression line -see lots 24 L 250. 1o'ifying to un'erlying function 3 c#anged t#e function to * + 1.2. 2 2 , and initialised t#e weig#ts as w 0 + 11 wit# a bias (alue of 22 , mean-0 and w 1 + 1.2. 5#e resulting weig#ts of t#e network again con(erged close to t#e gradient of t#e underl*ing function -w 1 S 1.21 t#e e.act 8gure being w 1 + 1.1F0F0 and t#e interce!t - w 0 S 11 t#e e.act 8gure being w 0 + 0.H44H0. Bor a network trained on normalised data1 t#e weig#ts con(erged towards w 1 S 1 and w 0 S 0 res!ecti(el* -t#e e.act 8gures being w 1 + 0.9254 and w 0 + 0.000020. 14 of 20 lot 24: + Q210 10R1 t#e Network is still able to ca!ture t#e general trend of t#e data well1 wit# t#e learnt weig#ts con(erging towards w 1 S 0.4 and w 0 S 1. lot 25: + Q250 50R1 t#e random Guctuation signi8cantl* distort t#e underl*ing function1 resulting in t#e regression line being more #ori:ontal and ending wit# a gradient %uite different -J0.250 from t#e gradient of t#e original function -0.40. Neural Networks Assignment 1 Candidate Number: 19214 A *ote on te Close' 2orm 3egression Line Bor t#e gi(en !roblem it would be !ossible to calculate t#e best28t regression line in closed form instead of using an iterati(e !rocess. As would be e.!ected1 t#e resulting closed form regression line was alwa*s a better 8t1 in terms of minimising least mean s%uared error1 t#an t#e iterati(e a!!roac#es. )owe(er1 a little sur!risingl*1 o(er 1000 test runs wit# + 0.00001 for t#e iterati(e !rocess1 t#e difference in mean of a(erage s%uared errors was %uite small. 5#e mean of t#e a(erage s%uared errors for t#e closed form a!!roac# was 1;.7471 and t#e mean of t#e a(erage s%uared errors for t#e iterati(e a!!roac# was 1;.74;;. =ut of interest 3 increased t#e (alue of to + 0.0011 and obser(ed t#e mean of t#e a(erage s%uared error o(er 1000 test runs again1 resulting in closed form error + 1;.7;0; and iterati(e error + 1;.4;F;. Illustration of -at te *et-or. is 'oing lot 2; s#ows t#e data !oints1 t#e underl*ing original function1 t#e closed form regression line and t#e regression line retrie(ed from t#e network. lot 2F s#ows t#e data !oints ?ust wit# t#e regression line retrie(ed from t#e network. 5o furt#er illustrate t#e learning !rocess1 lots 2H272 s#ow t#e training !rogress w#en t#e network weig#ts #a(e been randoml* selected -to better illustrate learning !rogress0 and lot 77 s#ows t#e corres!onding error rate. Note t#at t#e algorit#m con(erged after 11 e!oc#s and t#e lots s#ow t#e line 8tting !rogress after e!oc#s 0221 ; and 11 res!ecti(el*. 5#e !lots inbetween #a(e been omitted for s!ace reasons. 15 of 20 lot 2;: All in one !lot1 containing t#e original function1 t#e regression line learnt b* t#e Network and t#e regression line obtained in closed form. lot 2F: lot ?ust containing t#e regression line learnt b* t#e Network. Neural Networks Assignment 1 Candidate Number: 19214 1; of 20 lot 2H: $tate of t#e Network before t#e start of learning. lot 29: >egression line after t#e 8rst training e!oc#. lot 70: lot 71: >egression line after t#e ; t# training e!oc#. lot 72: >egression line wit# con(erged Network !arameters after 11 training e!oc#s. lot 70: >egression line after t#e 2 nd training e!oc#. lot 77: Error rate of t#e Network. Neural Networks Assignment 1 Candidate Number: 19214 Part , (") Intro'uctory *otes Bor t#is task t#e dimensionalit* of t#e in!ut s!ace is increased to 2. 5#e in!uts are inde!endent of eac# ot#er. erforming regression for t#is function results in a regression !lane. After #a(ing a!!reciated t#e bene8ts of !re!rocessing in t#e !re(ious !art1 3 onl* e.!erimented wit# normalised data for t#is section. Normalisation was !erformed t#e same wa* as in art 6 -10 -see Bormulas 1 L 20. Bor weig#t initialisation 3 followed m* !re(ious a!!roac# of initialising t#e weig#t (ector to be 1 for t#e bias weig#t and t#e gradients of t#e res!ecti(e terms of t#e functions ot#erwise1 so for t#e gi(en function * + 0.4. 1 , 1.4. 2 , 1 t#e initial weig#t (ector was w 5 + -w 0 1 w 11 w 2 0 + -11 0.41 1.40. 3 also initialised t#e bias to mean-0 as !re(iousl*. 3 furt#er used t#e same error function and weig#t u!date rule and also !erformed s#ufGing before eac# training e!oc#. 5able ; summarises t#e basic setu! for t#is task. 3 also used $e%uential &radient 'escent in con?unction wit# Aeast Iean $%uares as error function for learning t#e network !arameters1 for t#e same reasons as stated in t#e !re(ious section. Setup of training process Parameter Setting 0.0001 bias 22 , mean-0 Error function Aeast Iean $%uared Error "eig#t u!date rule $e%uential &radient 'escent 3nitial "eig#ts w 5 + -w 0 1 w 11 w 2 0 + -11 0.41 1.40 /n te -eigt an' bias !alue As 3 e.!ected1 w 1 and w 2 con(erged towards 0.4 and 1.4 res!ecti(el*1 #owe(er t#is time t#e bias weig#t was a lot more (olatile1 con(erging towards 1 in some e.!eriments and con(erging towards 0 in ot#ers. 5#is was a bit sur!rising at 8rst1 but w#en 3 started !rinting t#e mean (alue of t#e random Guctuations1 mean-01 alongside t#e weig#ts 3 found t#at t#e closer mean-0 was to 01 t#e closer w 0 was to 1 -i.e. mean-0 + 20.1594 w 0 + 0.99901 and t#e furt#er mean-0 was awa* from 01 t#e closer w 0 con(erged towards 0 -i.e. mean-0 + 10.F2594 w0 + 0.00190. 3n bot# cases t#e resulting (alue for t#e interce!t would be in t#e inter(al Q21 ,1R. 5#e (ariance in t#e bias led me to run some more e.!eriments and 3 found t#at w 1 and w 2 actuall* donCt con(erge towards 0.4 and 1.4 at allT 5#e ke* was (ar*ing t#e (alue of 1 w#ere t#e learnt weig#ts c#anged %uite signi8cantl*. At + 0.01 t#e (alues were as re!orted abo(e1 w#en decreasing t#e (alue to + 0.00001 w 1 con(erged towards J0.21 so 1F of 20 5able ;: 6asic 3nitialisation of t#e free !arameters Neural Networks Assignment 1 Candidate Number: 19214 #alf t#e gradient (alue and w 2 con(erged towards J0.F1 also roug#l* #alf t#e gradient (alue. 5#is be#a(iour seems to be con8rmed b* t#e (alues obtained from a closed form solution. Also interestingl*1 t#e (alue for w 0 alwa*s con(erged towards 0 in t#e closed form a!!roac#. 5able F summarises t#e 8ndings of t#e !re(ious 2 !aragra!#s. E4periment $eigt Close' 2orm *et-or. (5 6(61) *et-or. (5 6(66661) mean() 1 w 0 0 0.995H 0.1091 20.4F0; w 1 0.2H11 0.79FF 0.2H11 20.4F0; w 2 0.F75F 1.7HF 0.F7F; 20.4F0; 2 w 0 0 0.0052 20.001; F.401; w 1 0.2H7H 0.790; 0.29F; F.401; w 2 0.F;17 1.740; 0.H7;; F.401; 1o'ifying te 2unction 6* c#anging t#e function to * + 21.2. 1 , 0.;. 2 , 0 3 could (alidate m* assum!tions t#at con(ergence of t#e weig#ts was rat#er roug#l* towards gradient 9 2 and de8nitel* not towards t#e gradient (alue1 as 5able H below s#ows. E4periment $eigt Close' 2orm *et-or. (5 6(61) *et-or. (5 6(66661) mean() 1 w 0 0 0.999 0.;00H 20.225F w 1 20.;997 21.1902 20.;994 20.225F w 2 0.4715 0.59;F 0.4715 20.225F 2 w 0 0 0.002; 0.001 211.1F9; w 1 20.F145 21.1FH1 20.FF52 211.1F9; w 2 0.2505 0.5F94 0.294H 211.1F9; 1H of 20 5able F: 3nter!retation of learnt weig#ts. Neural Networks Assignment 1 Candidate Number: 19214 $at te *et-or. is 'oing lots 7427F s#ow t#at t#e network is tr*ing to 8nd t#e best 8t !lane for t#e gi(en data. 19 of 20 lot 74: 'is!la*ing t#e best28t regression line1 learnt b* t#e Network1 for t#e gi(en data !oints. lot 75: 'is!la*ing t#e regression line obtained in closed form. lot 7;: 'is!la*ing t#e regression line obtained from t#e Network as well as t#e regression line obtained in closed form. lot 7F: Error rate of t#e Network. Neural Networks Assignment 1 Candidate Number: 19214 Appen'i4 5#e Iatlab code for t#is assignment is contained on t#e following !ages. 20 of 20