You are on page 1of 14

AutomaticLeftventriclevolume

calculationincardiacMRIusing
ConvolutionalNeuralNetwork
Name:TenciaLee&QiLiu
Location:LosAngeles,CA&NewYork,NY,USA
Email:tencia@gmail.com,liu.qi.alex@gmail.com
Date:03/16/2016
Competition:SecondAnnualDataScienceBowl,1stplacesolution

1.Backgroundonyou/yourteam
Tencia
:IgraduatedfromCaltechin2009withB.Scinappliedmathematicsandeconomics.I
thenworkedinquantitativefinance,firstasaresearcherandthenasaportfoliomanagerata
LosAngelesbasedhedgefund,foraboutsixyears.Irecentlytransitionedtoaroboticsstartup
asaresearchengineer.Ibecameinterestedindeeplearningalmostayearagoandhavebeen
studyingandlearningaboutitsincethen.IdecidedtoenterthiscompetitionasIthoughtitwould
beagreateducationalexperienceandachancetoapplythemethodsIhavebeenlearning.I
spentapproximately160hoursworkingonthiscompetition.
Qi:
IgotmyPh.Dintheoreticalphysics(aboutLatticeQuantumchromodynamics&
ChargeParityviolationoftheWeakinteraction)from
ColumbiaUniversity
4yearsagoandafter
thatIworkedasaquantitativetraderinahedgefund.ImtakingalongvacationrecentlysoI
havealotoftimeworkingontheKagglecompetitions,thisonehasalotofmoneyprizeandthe
problemiscomplicated(beingcomplicatedmeansthatthesignaltonoiseratioishigheronthe
LB)andinterestingsoIdecidetogiveitashot


WeoriginallydecidedtoformateambecauseQiwasworkingwithadynamicprogramming
segmentationmethodandTenciawasworkingwithneuralnetworks,andwethoughttheywould
becomplementarytoeachother.However,afterafewweeksitbecameclearthattheneural
networkapproachwascapableofhigherprecision,sowedroppedthedynamicprogramming
segmentationmethodcompletelytosimplifyourworkandcode.

2.Summary
Weusedasourprimarymodelanensembleoffullyconvolutionalneuralnetworkswhich
calculatedareasfromDICOMimages,andthenusedtheseareastocalculateheartvolumeat
differenttimesintheheartbeatcycle.Aspartofourensemble,wealsoincludedafully
convolutionalsegmentationnetworkfor4chamberviewDICOMs,asingleslicemodel,andan
agesexmodel.
Ourtimewasspent10%datacleaningmethodology,30%neuralnetworkdesignand
experimentation,30%identifyingmodelweaknesses,20%calculationofvolumefrom
segmentation,and10%manuallylabelingdata.
Oneofthetrickiestaspectsofthisdatasetwasthenumberofcaseswithmissingorpoorquality
data.Wedevelopedseveralheuristicstoevaluatesegmentationperformanceanddetect
outliers,andtodecideforeachcasewhichmodelstoinclude.
Wefoundthatoneofthemostdifficultproblemswithourapproachwasthatthesegmentation
networkwouldoftenfailtorecognizeaventricle.Imagenormalizationwasessentialto
remediatethisproblemhowever,wealsousedpseudoactivelearningtoselectexamplesfor
manualsegmentation.Intotalwesegmented130SAXviewDICOMimagesbyhand.
Ourapproachwasguidedbytheviewthatsinceweareapproximatingaderivednumber,we
shouldfindtheinputsandthencalculatetheendresultinthesamewaythegroundtruthshad
beencalculated.Forthisreasonwesteeredawayfromanendtoendpredictionalgorithmfor

ourprimarymodel,insteadoptingtoapproximatethesegmentationforeachsliceascloselyas
possibletohowadoctorwould,andthencalculatetheheartvolumeusingthoseareas.
Forneuralnetworktraining,weusedminibatchgradientdescentwiththeAdamoptimizer(4),
withuseofconvolutionsandbatchnormalizationinournetworks.Forensembling,weoptimized
alinearcombinationofdifferentmodels.TrainingandpredictionweredoneusingtwoGPUs,
andallcodewaswritteninPythonwiththeuseofTheano,Lasagne,andcuDNNforneural
networks.

3.FeaturesSelection/Engineering

FullyConvolutionalNeuralNetworkforsegmentingtheleftventricle
OurmainmodelusedseveralFullyConvolutionalNeuralNetworks(CNN)tosegmenttheleft
ventricleforeachMRIimage.Theoutputofeachnetworkwasanimageofthesamesizeasthe
inputimage,withpixelshavingvaluesintherange[0,1].Eachpixelsoutputrepresentsthe
probabilitythatthecorrespondingpixelintheinputimageispartoftheleftventricle.
ThearchitectureofthenetworksfromtheCNN_Bfolderareasfollows,where:

b=batchsize

Thefourdimensionsofthetensorrepresentbatchsize,channelsornumberoffilters,
andthetwodimensionsoftheimage,respectively.

Conv=convolutionwithastackofsquarekernelsof(#filters)x(filtersize)x(filtersize)

BN=batchnormalizationasinreference(2),implementedinLasagne,inreference(3)

ReLU(x)=max(0,x)

Sigmoid(x)=1/(1+exp(x))

Convolutionallayerswithvalidpaddingoutputavalueonlyforfilterpositionswhere
everyvalueinthefilterhasacorrespondingvalueintheinputtensor(shrinksthe
tensor).

Convolutionallayerswithfullpaddingoutputavalueforallfilterpositionsforwhichat
leastonevalueinthefilterhasacorrespondingvalueintheinputtensor(expandsthe
tensor),withallothervalueszeropadded.

MaxPoolandUpscalearedoneacrossthelasttwodimensionsofthetensor.

LayerOp/Type

#Filters/Pool/ FilterSize
UpscaleFactor

Padding

OutputShape

Input

(b,1,246,246)

Conv+BN+ReLU

valid

(b,8,240,240)

Conv+BN+ReLU

16

valid

(b,16,238,238)

MaxPool

(b,16,119,119)

Conv+BN+ReLU

32

valid

(b,32,117,117)

MaxPool

(b,32,58,58)

Conv+BN+ReLU

64

valid

(b,64,56,56)

MaxPool

(b,64,28,28)

Conv+BN+ReLU

64

valid

(b,64,26,26)

Conv+BN+ReLU

64

full

(b,64,28,28)

Upscale

(b,64,56,56)

Conv+BN+ReLU

64

full

(b,64,58,58)

Upscale

(b,64,116,116)

Conv+BN+ReLU

32

full

(b,32,122,122)

Upscale

(b,32,244,244)

Conv+BN+ReLU

16

full

(b,16,246,246)

Conv+BN+ReLU

valid

(b,8,240,240)

Conv+sigmoid

full

(b,1,246,246)

CNN_Anetworksaresimilartotheabovewithminorchangesinnumberoflayersandfilter
sizes,soforbrevityweomittheirexactarchitecturehere.Attesttime,CNN_Aevaluatedan
imagecroppedfromapproximatecenterofleftventricle,whileCNN_Bevaluatedthefullimage
resizedtomatchtheinputsize.


Exampleoutputsfromthesegmentationnetwork,withventriclehighlightedinpurple.
Volumecalculationfromsegmentedimages
WiththesegmentedresultfromtheCNN,thearea Ai,t foreachslice i attime t canbe
calculated.Weremovetheslicesthathasareaof0.Thenthevolumeatanytime t iscalculated
asasumofthepieces:
N

V t = (Ai,t + Ai+1,t )/2 (z i+1 z i ) + A1,t h/2 + AN ,t h/2(1)


i=2

Where z i istheslicelocation,histheslicethickness.Wetriedothermethodstocalculatethe

volumebutthismethodseemstogiveusaslightlybetterresultafteradjustmentsforsystematic
error.Itwouldbebestiftheorganizerofthiscompetitioncouldreleasetheactualformulasowe
dontneedtoguessone.

Thesystoleanddiastolevolumeistakenfromtheminimumof V t andthemaximumofit.We
observedthatformanycases,ourresultislargerthanthetruevaluewhileformanyother
casestheyagreeverywell.Webelievethisisatypeofsystematicerrorthatourmethodintends

toincludeallgoodlookingsliceswhileadoctormightdecidetoremoveitforsomereasons.So
wedidacorrectionwiththeformula:
V adj = V CN N

CN N

(2)

forsystoleanddiastolevolumeseparately.Herewehavetwoparameterstobedetermined sys
and dias .Wealsoexperimentedwithsimplelinearfittingbutitdidnotperformaswell.

Weassumethatthemeasuredvolumeobeysanormaldistributionwithmean V adj and


standarddeviation S v ,thenormalCDFisthengeneratedandtheContinuousRanked
ProbabilityScore(CRPS)iscalculated.Wefitthestandarddeviation S v tobelinearwith V adj ,
S v = V adj + S 0 (3)
Herewehave4parameterstobedetermined,2forSystoleand2forDiastolevolumestandard
deviation.These4parameterstogetherwith sys and dias aredeterminedsimultaneouslyby
optimizingthefinalCRPSscorefunctionforthetrainandvalidationset,andthenthefitted
coefficientsareusedtocalculatethevolumeandstdofvolumefortestcases.

Therearefewcasesthatourmethodmightfail.TheCNNmightfailtodetecttheLeftventricleor
somecaseshavemissingdata(e.g.,somedatasetonlyhave3SAXslices).Sowedeveloped
someothersimplermodelstoavoidgettingsomethingcompletelywrong.

Othermodelsforfailurecases

OneSliceModel
:Wetookthe80thpercentileoftheareavector, Adias and Asys fordiastoleand

systolerespectively,andusedthemtofitthevolumes.Thereasontouse80percentileinstead
ofthemaximumistoavoidthosewrong/extremecontourreadingsfromCNN.Becausethe
lengthoftheheartismoreorlessproportionaltothediameterofthelargestcrosssectional
area,sowefit
V = V 0 + A A

(4)

fordiastoleandsystolevolumeseparately.Thismodelachievesascoreof0.015intrainset.

4chamberModel
:Wehandsegmented7364chamberviewimagesandusedthemtotraina
fullyconvolutionalsegmentationnetworkinidenticalmannertotheCNN_Bmainmodel.We
trainedfivenetworks,eachwith20%trainingdataheldout,andusedtheaverageoutputof

thesefiveasthesegmentationoutput.Theoutputofthisnetworkwasasegmentationmask
indicatingwhethereachpixelwaspartoftheleftventricle,fromthe4chamberview.

Examplesofsegmentationoutputofthe4chamberview

FromhereweusedPCAtofindthemainaxisoftheventricle,thencalculatedadiskateach
pixelalongthemainventricleandsummedthediskstoarriveatavolume.Thisvolumewas
thenadjustedusinglinearfit,andstandarddeviationforCRPScurvewascalculatedasin(2).
V f inal = V seg + V 0 (5)
Thismodelachievesascoreof0.017.

AgesexModel
:Itisclearthatthesizeoftheheartgrowswithagebeforeitcomestomoreor
lessfixedsize.Anditalsosignificantlydependonthegenderofthepatient.Sowefitsome
linearmodelsusingageandsex.Ourmethodcloselyfollowsreference(1)thatwaspostedin
theforum.Thismodelscores0.037intrainset.

Wetookacombinationoftheoneslicemodeland4chambermodelasthefirstdefaultmodel,
andtogethertheyachieveascoreof0.0134forthetrainingdataset.WhenouroriginalSAX
viewbasedmodelfails,ittakesresultfromthisdefaultmodel.Ifitstillfailsthenwetookthe
resultfromtheagesexmodelwhichhasnofailurecases.

4.TrainingMethod(s)
Trainingdata
Fortrainingthesegmentationnetworks,weusedtheSunnybrookcontourdatainasimilar
mannerastheDeepLearningTutorial,aswellas130handlabeledSAXimagesselectedfrom
thetrainingset.
Trainingprocedure:
TheneuralnetworksweretrainedtominimizeamodifiedversionoftheSrensonDiceIndex:
Loss = (2 i,j predij targetij + s)/(i,j (predij + targetij ) + s) (6)
Inequation(5),
pred
foreachpixelwastheoutputofthenetworkafterthesigmoidnonlinearity,
withvaluesintherange[0,1],and
target
waswhetherthatpixelwaspartoftheleftventriclein
thegroundtruthsegmentationofthatimage,withvalue0ifnot,and1ifso.Wefoundthis
objectivefunctiongavemuchbetterresultsthanbinarycrossentropy.Thehyperparameter
s
wasusedtogiveanonzerolossevenwhenimagedidnotcontainanypartoftheventricle,and
wasusuallysetto100.
Eachneuralnetworkwastrainedforbetween150and300epochsusingminibatchgradient
descent,withbatchsizeseither8or16images,usingtheAdamoptimizer(4)withlearningrate
3e3.Hyperparameterswerehandtunedforeffectiveness,butduetotimeconstraintswerenot
automaticallyoptimized.
Forsomemodelsintheensemble,trainingwasdonewithsomepartofthedataheldoutasa
validationset.Inthesecases,theparametersthatweresavedastheresultofthatrunwereset
thatyieldedthelowestvalidationloss.Inothers,trainingwasdonewithallavailabledata,andin
thesecasestheparametersattheendweresaved.
Datapreprocessingandaugmentation:
Imagenormalizationwasnecessaryasapreprocessingsteptohomogenizethedatasetand
facilitateneuralnetworktraining.Twonormalizationmethodswereused:

Meanandstandarddeviation:themeanandstandarddeviationofthepixelvalues
calculatedoverthemiddle60%oftheimage,andthemeanwasthensubtractedfrom
theimageandtheresultdividedbythestandarddeviation.Thiswasdonebecause
extremevaluestendedtooccurattheedges,sothisprovidedamorestable
normalization.

Percentile:theimageintensitywasrescaled,withasetpercentilerangebeingstretched
totheentirerange,asinRef.(6).

Duetothesmallsizeofthesegmentationtrainingset,dataaugmentationduringtrainingwas
essentialforsuccessfulresults.BothCNN_AandCNN_Busedrandomrotationsand
translationsattraining.CNN_Busedafixednormalizationmethodforeachtrainingrun,
whereasCNN_Ausedrandomized(onthepercentilevalues)imagenormalizationasan
augmentationmethod.CNN_Aalsousedrandomscalingoftheimages.
AnapproximatecenterandtheboundingboxoftheLeftventriclecanbeestimatedfromthe
timevarianceoftheimages.WecanalsorotatetheimagesbasedontheDicominformationto
alignallthecasesroughlyintothesameorientation.Todiversifyourlistofmodels,weonlylet
CNN_Acropfromtheapproximatecenteroftheimagesanddotherotationalignment,while
CNN_Busesthefullimageandonlyrotatethosefewcasesthathaveapproximate90degree
rotations.

5.Interestingfindings

Mostofthemostusefultricksweusedrelatedtodetectingandfixingbaddata.Inmanycasesa
patientsfolderwouldcontainDICOMimagesfrommultiplescans,onlysomeofwhichbelonged
inthevolumecalculation,soweincorporatedinformationfromtheDICOMmetadatatofilter
whichslicesshouldbekeptordropped.Severalfairlynonintuitivemethodswereusedatthis
step,andthismayhavebeenoneofthefactorsthatseparatedusmosteffectivelyfromthe
othercompetitors.
PostprocessingresultsfromtheFCNoutputwasalsoessential.Wecameupwithtwomain
heuristicstocleanthesegmentationresults.

Firstly,wenoticedthatthenetworkwouldoftenoutputfalsepositives,inwhichitmistakenly
thoughtpixelsoutsidetheventriclewerepartofit.Toeliminatethese,foreachcase,wetook
thesegmentationmaskoutputsforeveryimagefromthatpatientandcalculatedtheaverage
maskvalueforeachpixel.Thiscreatesameanmaskimage(centerframeinbelowfigure).
Thenwefita2DisotropicGaussiankernelcenteredatthemaximumvalue.Forevery
contiguouspatchreturnedbythesegmenter,wecalculatedtheaveragelikelihoodacrossallof
itspixelsunderthisGaussiandistribution,andeliminatedeverypatchthathadanaverageless
thanasetthreshold.Ifmorethanonepatchremained,weeliminatedallbuttheonewiththe
highestcumulativelikelihood.Inthebelowfiguretheextraneouspatchesvisibleinthefirst
imagehavebeenremovedusingthismethod,yieldingthecleanmaskinthethirdimage.

Secondly,oncetheSAXviewimagesforapatientaresegmented,wecomposetheareasfound
afterapplyingtheabovefilterintoamatrix,withthexaxiscorrespondingtotimeandtheyaxis
tosliceordering.Thesematricescanthenbecomparedtoeachother,toinferaqualityofread
measurementforeachpatient.Tofacilitatecomparisons,thematriceswereresizedusing
bilinearsamplingto10x30andnormalizedtovaluesin[0,1],andforeachofthe300pixels
wecalculatedameanandstandarddeviationacrossallpatientsinthetrainingset.Theneach
patientsresizedmatrixcanbegivenaloglikelihoodscoreof:
LL = t,s log(p(ats |ts , ts )) (6)
Whenthemapcontainsunusualvalues,LLascalculatedabovewouldbelow,andthisusually
correspondedtothesegmentationnetbeingunabletosegmentsomeoralloftheimages

correctly.Therefore,insomeofourmodelsweusedLLasafiltertodeterminewhennottouse
theSAXmodel.Inretrospect,thisfilterwasfairlyimprecisebecausethepatternofrelativeareas
isnotfixedacrosspatients,andbecauseLLconsiderseachpixelseparatelywithoutconsidering
thejointdistributionofpixelsacrossamap.Thisfiltercouldlikelybeimprovedbyaddingaterm
representingvariation,becauseusuallybadreadswouldalsocorrespondtohighlynoisyor
bumpyareavaluesinthemap.

Examplesofaheatmapswithlow(bottomright)andhigh(otherpositions)LLscores

Oneinterestingaspectofthisdatasetisthatitismuchmorehighlystructuredeventhanour
methodreflects.Thetrueshapeoftheventricleisusuallysmooth,withnooutlyingvalues,and
theareaprofiledescribedbytheSAXviewandthe4chamberviewhavetoagree.
Nevertheless,wedidnotuseanycollaborativemethodfordetectingfailuresbetweenthesetwo
views,orenforceanysmoothnessconstraints.Wedidnoticethatinsomecasesoutliervalues
wouldmaketheirwayintothevolumecalculationhowever,wedidnothavetimetobuilda
morecompletemodelthatincludedallavailableinformation.

6.SimpleMethods

Wetookanensembleof10CNNs,butitonlyslightlyimprovestheresultoftheaverageoftwo
configurationsunderCNN_A/config_v3/6.py.ThetrainingtimeforasingleCNNtakesabout3

hoursforinputimagesize256x256,and2hoursfor196x196.Thepredictiontimeisabout20
secondsforeachcase.
Tofurthersimplifythemodel,wecandirectlytakethevolumecalculatedfromeq.(1)thatuses
thesegmentationoutputoftheCNN.Theadjustmentequation(2)isusedtomatchtheground
truthprovidedbyhumanannotationofthecontourswhichhaslotsofuncertaintybyitselffrom
whethertoincludeasliceornotatthebaseoftheheart.Asweexaminedmanycases,itseems
thatthegroundtruthresultsarenotfullyconsistentinkeepingtheendslicesornot.

Appendix
A1.ModelExecutionTime
Forthe8versionsofCNN(2differentarchitectureswithdifferentinputimagesize256and196,
and4differentkindofparameters)inCNN_A,eachCNNtakesabout3hourstotrainonaGPU
GTX970.Ittakesabout1020secondstoforecastforeachcase(dependingonhowmanySAX
slicesithas).Forall700testcases,thepredictiontakeslessthan3.5hoursforeachCNN.
Forthe2versionofCNNinCNN_B,sincewetrain5foldsofeachofthetwomodels(eachwith
20%ofdataheldout)inadditiontoonefullrun,ittakesabout30hourstotrainand8hoursto
evaluateonaGTX980TiwithcuDNN.The4chambermodeltrainsonly5folds(nofullrun),
andtrainingandevaluationtogethertakearound9hours.

A2.Dependencies
ForallthecodeexceptthoseinCNN_B,itcanrunwithPython2.7.6onUbuntu14.04witha
GTX970GPU,anditimportsthefollowingpackages:cv23.1.0,theano
0.8.0.dev0.devRELEASE,lasagne0.2.dev1,pandas0.14.1,numpy1.12.0.dev0+a2f5392.cuda
7.5isusedandcudnnisenabled.
ToruncodeinCNN_B,youneedtousecv2version2.4.12.

A3.HowToGeneratetheSolution(akaREADMEfile)
A.Downloadandpreparedata

ChangethedirectoriesinSETTINGS.pytoyoursettings,anddownloadthesunnybrookdata
set,thetrain,valid,andtestdataset.Appendtherowsfromvalidate.csvtotrain.csvand
renameitastrain_valid.csv
Thedirectorymanual_data/includesallthehandlabeledimagesandthecontours,theyare
combinedwiththesunnybrookdatatotraintheCNNnetworks.
B.TrainCNNstopredictthecontoursoftheLV
PartA
1. run>>bashCNN_A/run_train.sh
2. a)itpreprocessestheimagedatafortheCNNnettotrain.
3. b)ittrainsmanyversionoftheCNNmodelswithdifferentparameters.Tosavetime,you
cansimplyjustrunversions3and6andgetaslightlyworseresultbut1/4ofthetotal
amountoftime.ForeachversionofCNN,ittakesabout3hourstotrainonaGPUGTX
970,and20secondtopredictforeachcase.
4. c)itloadsthetrainedCNNmodelsandpredictsthecontoursforallcases.
5. d)itextractsthesexageinforamtionforlaterusetobuildsexagebaseddefaultmodel.
6. Ifthereareadditionalcasesthatyouneedtomakepredictions,justruntherun_test.sh
script:
7. run>>bashCNN_A/run_test.sh
8. a)predictsthecontoursfortestcases.
9. b)extractsthesexageinformationfortestcases.
PartB
run>>pythontrain.py

C.Calculatethevolumes
Combine(average)alltheprocessedresultsthatcontainstheareaofthecontours,calculatethe
volumesforeachcase,andfitsimplemodelsbasedonthetraindatasettocorrectsystematic
errors,andpredictfortheunknowns.
run>>./train_pred.py

A4.References

1) https://www.kaggle.com/c/secondannualdatasciencebowl/forums/t/18375/003
6023scorewithoutlookingattheimages
2) http://arxiv.org/abs/1502.03167
3) https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/normalization.py
4) http://arxiv.org/abs/1412.6980
5) https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
6) http://scikitimage.org/docs/dev/api/skimage.exposure.html#skimage.exposure.re
scale_intensity