Professional Documents
Culture Documents
(http://twiecki.github.com/)
Bayesianmodeling,ComputationalPsychiatry,andPython
RSS(/atom.xml)
About(https://sites.google.com/a/brown.edu/lncc/home/members/thomaswiecki)
Archives(/archives.html)
Publications(http://scholar.google.com/citations?hl=en&user=sIkjMAAAAJ&sortby=pubdate&view_op=list_works&gmla=AJsN
F5Oqgc3UBzbTBAJACr4gTDyi09
j1uryXtyXvDaEUrgtxiKmed0IIQlRvn9CHwFAcpHQB6ncpaBSY6vFsK6fazj3wmh6WLkuQdWdwuxd3uhwYN2kC8&undo=untrash_citations,W7OEmFMy1HYC)
Contact
MCMCsamplingfordummies
Nov10,2015
WhenIgivetalksaboutprobabilisticprogrammingandBayesianstatistics,Iusuallyglossoverthedetailsofhowinferenceisactuallyperformed,treating
it as a black box essentially. The beauty of probabilistic programming is that you actually don't have to understand how the inference works in order to
buildmodels,butitcertainlyhelps.
When I presented a new Bayesian model to Quantopian's (https://quantopian.com) CEO, Fawce (https://quantopian.com/about), who wasn't trained in
Bayesianstatsbutiseagertounderstandit,hestartedtoaskaboutthepartIusuallyglossover:"Thomas,howdoestheinferenceactuallywork?How
dowegetthesemagicalsamplesfromtheposterior?".
NowIcouldhavesaid:"Wellthat'seasy,MCMCgeneratessamplesfromtheposteriordistributionbyconstructingareversibleMarkovchainthathasas
itsequilibriumdistributionthetargetposteriordistribution.Questions?".
Thatstatementiscorrect,butisituseful?Mypetpeevewithhowmathandstatsaretaughtisthatnooneevertellsyouabouttheintuitionbehindthe
concepts(whichisusuallyquitesimple)butonlyhandsyousomescarymath.ThisiscertainlythewayIwastaughtandIhadtospendcountlesshours
bangingmyheadagainstthewalluntilthateurakamomentcameabout.Usuallythingsweren'tasscaryorseeminglycomplexonceIdecipheredwhatit
meant.
This blog post is an attempt at trying to explain the intuition behind MCMC sampling (specifically, the Metropolis algorithm
(https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm)). Critically, we'll be using code examples rather than formulas or mathspeak.
Eventuallyyou'llneedthatbutIpersonallythinkit'sbettertostartwiththeanexampleandbuildtheintuitionbeforeyoumoveontothemath.
Theproblemanditsunintuitivesolution
LetstakealookatBayesformula(https://en.wikipedia.org/wiki/Bayes%27_theorem):
P (x|)P ()
P (|x) =
P (x)
P (x, ) d
This is the key difficulty with Bayes formula while the formula looks innocent enough, for even slightly nontrivial models you just can't compute the
posteriorinaclosedformway.
Nowwemightsay"OK,ifwecan'tsolvesomething,couldwetrytoapproximateit?Forexample,ifwecouldsomehowdrawsamplesfromthatposterior
we can Monte Carlo approximate (https://en.wikipedia.org/wiki/Monte_Carlo_method) it." Unfortunately, to directly sample from that distribution you not
onlyhavetosolveBayesformula,butalsoinvertit,sothat'sevenharder.
Thenwemightsay"Well,insteadletsconstructaMarkovchainthathasasanequilibriumdistributionwhichmatchesourposteriordistribution".I'mjust
kidding,mostpeoplewouldn'tsaythatasitsoundsbatshitcrazy.Ifyoucan'tcomputeit,can'tsamplefromit,thenconstructingthatMarkovchainwith
allthesepropertiesmustbeevenharder.
ThesurprisinginsightthoughisthatthisisactuallyveryeasyandthereexistageneralclassofalgorithmsthatdothiscalledMarkovchainMonteCarlo
(https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo)(constructingaMarkovchaintodoMonteCarloapproximation).
Settinguptheproblem
First,letsimportourmodules.
In[1]: %matplotlibinline
importnumpyasnp
importscipyassp
importpandasaspd
importmatplotlib.pyplotasplt
importseabornassns
fromscipy.statsimportnorm
sns.set_style('white')
sns.set_context('talk')
np.random.seed(123)
Letsgeneratesomedata:100pointsfromanormalcenteredaroundzero.Ourgoalwillbetoestimatetheposteriorofthemean mu(we'llassumethatwe
knowthestandarddeviationtobe1).
In[2]: data=np.random.randn(20)
In[3]: ax=plt.subplot()
sns.distplot(data,kde=False,ax=ax)
_=ax.set(title='Histogramofobserveddata',xlabel='x',ylabel='#observations');
Next,wehavetodefineourmodel.Inthissimplecase,wewillassumethatthisdataisnormaldistributed,i.e.thelikelihoodofthemodelisnormal.As
youknow,anormaldistributionhastwoparametersmean andstandarddeviation .Forsimplicity,we'llassumeweknowthat
= 1
andwe'llwant
toinfertheposteriorfor .Foreachparameterwewanttoinfer,wehavetochoseaprior.Forsimplicity,letsalsoassumeaNormaldistributionasaprior
for .Thus,instatsspeakourmodelis:
Normal(0, 1)
x| Normal(x; , 1)
Whatisconvenient,isthatforthismodel,weactuallycancomputetheposterioranalytically.That'sbecauseforanormallikelihoodwithknownstandard
deviation,thenormalpriorfor muisconjugate(https://en.wikipedia.org/wiki/Conjugate_prior)(conjugateheremeansthatourposteriorwillfollowthesame
distributionastheprior),soweknowthatourposteriorfor isalsonormal.Wecaneasilylookuponwikipediahowwecancomputetheparametersof
the
posterior.
For
mathemtical
derivation
of
this,
see
here
(http://www.bcs.rochester.edu/people/robbie/jacobslab/cheat_sheet/bayes_normal_normal.pdf).
In[4]: defcalc_posterior_analytical(data,x,mu_0,sigma_0):
sigma=1.
n=len(data)
mu_post=(mu_0/sigma_0**2+data.sum()/sigma**2)/(1./sigma_0**2+n/sigma**2)
sigma_post=(1./sigma_0**2+n/sigma**2)**1
returnnorm(mu_post,np.sqrt(sigma_post)).pdf(x)
ax=plt.subplot()
x=np.linspace(1,1,500)
posterior_analytical=calc_posterior_analytical(data,x,0.,1.)
ax.plot(x,posterior_analytical)
ax.set(xlabel='mu',ylabel='belief',title='Analyticalposterior');
sns.despine()
This shows our quantity of interest, the probability of 's values after having seen the data, taking our prior information into account. Lets assume,
however,thatourpriorwasn'tconjugateandwecouldn'tsolvethisbyhandwhichisusuallythecase.
ExplainingMCMCsamplingwithcode
Nowontothesamplinglogic.Atfirst,youfindstartingparameterposition(canberandomlychosen),letsfixitarbitrarilyto:
mu_current=1.
Then,youproposetomove(jump)fromthatpositionsomewhereelse(that'stheMarkovpart).Youcanbeverydumborverysophisticatedabouthowyou
come up with that proposal. The Metropolis sampler is very dumb and just takes a sample from a normal distribution (no relationship to the normal we
assumeforthemodel)centeredaroundyourcurrent muvalue(i.e. mu_current)withacertainstandarddeviation(proposal_width)thatwilldeterminehowfar
youproposejumps(herewe'reusescipy.stats.norm):
proposal=norm(mu_current,proposal_width).rvs()
Next,youevaluatewhetherthat'sagoodplacetojumptoornot.Iftheresultingnormaldistributionwiththatproposed muexplainesthedatabetterthan
yourold mu,you'lldefinitelywanttogothere.Whatdoes"explainsthedatabetter"mean?Wequantifyfitbycomputingtheprobabilityofthedata,given
thelikelihood(normal)withtheproposedparametervalues(proposed muandafixed sigma=1).Thiscaneasilybecomputedbycalculatingtheprobability
foreachdatapointusing scipy.stats.normal(mu,sigma).pdf(data) and then multiplying the individual probabilities, i.e. compute the likelihood (usually you
woulduselogprobabilitiesbutweomitthishere):
likelihood_current=norm(mu_current,1).pdf(data).prod()
likelihood_proposal=norm(mu_proposal,1).pdf(data).prod()
#Computepriorprobabilityofcurrentandproposedmu
prior_current=norm(mu_prior_mu,mu_prior_sd).pdf(mu_current)
prior_proposal=norm(mu_prior_mu,mu_prior_sd).pdf(mu_proposal)
#NominatorofBayesformula
p_current=likelihood_current*prior_current
p_proposal=likelihood_proposal*prior_proposal
Up until now, we essentially have a hillclimbing algorithm that would just propose movements into random directions and only accept a jump if the
mu_proposal has higher likelihood than mu_current. Eventually we'll get to mu=0 (or close to it) from where no more moves will be possible. However, we
wanttogetaposteriorsowe'llalsohavetosometimesacceptmovesintotheotherdirection.Thekeytrickisbydividingthetwoprobabilities,
p_accept=p_proposal/p_current
we get an acceptance probability. You can already see that if p_proposal is larger, that probability will be >1 and we'll definitely accept. However, if
p_currentislarger,saytwiceaslarge,there'llbea50%chanceofmovingthere:
accept=np.random.rand()<p_accept
ifaccept:
#Updateposition
cur_pos=proposal
Thissimpleproceduregivesussamplesfromtheposterior.
Whydoesthismakesense?
Takingastepback,notethattheaboveacceptanceratioisthereasonthiswholethingworksoutandwegetaroundtheintegration.Wecanshowthisby
computingtheacceptanceratiooverthenormalizedposteriorandseeinghowit'sequivalenttotheacceptanceratiooftheunnormalizedposterior(letssay
0
isourcurrentposition,and isourproposal):
P (x|)P ()
P (x|)P ()
P (x)
P (x| )P ( )
0
=
P (x|0 )P (0 )
P (x)
In words, dividing the posterior of proposed parameter setting by the posterior of the current parameter setting, P (x) that nasty quantity we can't
compute gets canceled out. So you can intuit that we're actually dividing the full posterior at one position by the full posterior at another position (no
magichere).Thatway,wearevisitingregionsofhighposteriorprobabilityrelativelymoreoftenthanthoseoflowposteriorprobability.
Puttingitalltogether
In[5]: defsampler(data,samples=4,mu_init=.5,proposal_width=.5,plot=False,mu_prior_mu=0,mu_prior_sd=1.):
mu_current=mu_init
posterior=[mu_current]
foriinrange(samples):
#suggestnewposition
mu_proposal=norm(mu_current,proposal_width).rvs()
#Computelikelihoodbymultiplyingprobabilitiesofeachdatapoint
likelihood_current=norm(mu_current,1).pdf(data).prod()
likelihood_proposal=norm(mu_proposal,1).pdf(data).prod()
#Computepriorprobabilityofcurrentandproposedmu
prior_current=norm(mu_prior_mu,mu_prior_sd).pdf(mu_current)
prior_proposal=norm(mu_prior_mu,mu_prior_sd).pdf(mu_proposal)
p_current=likelihood_current*prior_current
p_proposal=likelihood_proposal*prior_proposal
#Acceptproposal?
p_accept=p_proposal/p_current
#Usuallywouldincludepriorprobability,whichweneglecthereforsimplicity
accept=np.random.rand()<p_accept
ifplot:
plot_proposal(mu_current,mu_proposal,mu_prior_mu,mu_prior_sd,data,accept,posterior,i)
ifaccept:
#Updateposition
mu_current=mu_proposal
posterior.append(mu_current)
returnposterior
#Functiontodisplay
defplot_proposal(mu_current,mu_proposal,mu_prior_mu,mu_prior_sd,data,accepted,trace,i):
fromcopyimportcopy
trace=copy(trace)
fig,(ax1,ax2,ax3,ax4)=plt.subplots(ncols=4,figsize=(16,4))
fig.suptitle('Iteration%i'%(i+1))
x=np.linspace(3,3,5000)
color='g'ifacceptedelse'r'
#Plotprior
prior_current=norm(mu_prior_mu,mu_prior_sd).pdf(mu_current)
prior_proposal=norm(mu_prior_mu,mu_prior_sd).pdf(mu_proposal)
prior=norm(mu_prior_mu,mu_prior_sd).pdf(x)
ax1.plot(x,prior)
ax1.plot([mu_current]*2,[0,prior_current],marker='o',color='b')
ax1.plot([mu_proposal]*2,[0,prior_proposal],marker='o',color=color)
ax1.annotate("",xy=(mu_proposal,0.2),xytext=(mu_current,0.2),
arrowprops=dict(arrowstyle=">",lw=2.))
ax1.set(ylabel='ProbabilityDensity',title='current:prior(mu=%.2f)=%.2f\nproposal:prior(mu=%.2f)=%.2
f'%(mu_current,prior_current,mu_proposal,prior_proposal))
#Likelihood
likelihood_current=norm(mu_current,1).pdf(data).prod()
likelihood_proposal=norm(mu_proposal,1).pdf(data).prod()
y=norm(loc=mu_proposal,scale=1).pdf(x)
sns.distplot(data,kde=False,norm_hist=True,ax=ax2)
ax2.plot(x,y,color=color)
ax2.axvline(mu_current,color='b',linestyle='',label='mu_current')
ax2.axvline(mu_proposal,color=color,linestyle='',label='mu_proposal')
#ax2.title('Proposal{}'.format('accepted'ifacceptedelse'rejected'))
ax2.annotate("",xy=(mu_proposal,0.2),xytext=(mu_current,0.2),
arrowprops=dict(arrowstyle=">",lw=2.))
ax2.set(title='likelihood(mu=%.2f)=%.2f\nlikelihood(mu=%.2f)=%.2f'%(mu_current,1e14*likelihood_curren
t,mu_proposal,1e14*likelihood_proposal))
#Posterior
posterior_analytical=calc_posterior_analytical(data,x,mu_prior_mu,mu_prior_sd)
ax3.plot(x,posterior_analytical)
posterior_current=calc_posterior_analytical(data,mu_current,mu_prior_mu,mu_prior_sd)
posterior_proposal=calc_posterior_analytical(data,mu_proposal,mu_prior_mu,mu_prior_sd)
ax3.plot([mu_current]*2,[0,posterior_current],marker='o',color='b')
ax3.plot([mu_proposal]*2,[0,posterior_proposal],marker='o',color=color)
ax3.annotate("",xy=(mu_proposal,0.2),xytext=(mu_current,0.2),
arrowprops=dict(arrowstyle=">",lw=2.))
#x3.set(title=r'priorxlikelihood$\propto$posterior')
ax3.set(title='posterior(mu=%.2f)=%.5f\nposterior(mu=%.2f)=%.5f'%(mu_current,posterior_current,mu_pr
oposal,posterior_proposal))
ifaccepted:
trace.append(mu_proposal)
else:
trace.append(mu_current)
ax4.plot(trace)
ax4.set(xlabel='iteration',ylabel='mu',title='trace')
plt.tight_layout()
#plt.legend()
VisualizingMCMC
Tovisualizethesampling,we'llcreateplotsforsomequantitiesthatarecomputed.EachrowbelowisasingleiterationthroughourMetropolissampler.
Thefirstcolumnsisourpriordistributionwhatourbeliefabout isbeforeseeingthedata.Youcanseehowthedistributionisstaticandweonlyplugin
our proposals.Theverticallinesrepresentourcurrent inblueandourproposed ineitherredorgreen(rejectedoraccepted,respectively).
The 2nd column is our likelihood and what we are using to evaluate how good our model explains the data. You can see that the likelihood function
changesinresponsetotheproposed .Thebluehistogramwhichisourdata.Thesolidlineingreenorredisthelikelihoodwiththecurrentlyproposed mu.
Intuitively,themoreoverlapthereisbetweenlikelihoodanddata,thebetterthemodelexplainsthedataandthehighertheresultingprobabilitywillbe.The
dottedlineofthesamecoloristheproposedmuandthedottedbluelineisthecurrentmu.
The3rdcolumnisourposteriordistribution.HereIamdisplayingthenormalizedposteriorbutaswefoundoutabove,wecanjustmultiplythepriorvalue
forthecurrentandproposed 'sbythelikelihoodvalueforthetwo 'stogettheunnormalizedposteriorvalues(whichweusefortheactualcomputation),
anddivideonebytheothertogetouracceptanceprobability.
The 4th column is our trace (i.e. the posterior samples of we're generating) where we store each sample irrespective of whether it was accepted or
rejected(inwhichcasethelinejuststaysconstant).
Notethatwealwaysmovetorelativelymorelikely values(intermsoftheirposteriordensity),butonlysometimestorelativelylesslikely values,as
canbeseeniniteration14(theiterationnumbercanbefoundatthetopcenterofeachrow).
In[6]: np.random.seed(123)
sampler(data,samples=8,mu_init=1.,plot=True);
Now the magic of MCMC is that you just have to do that for a long time, and the samples that are generated in this way come from the posterior
distributionofyourmodel.ThereisarigorousmathematicalproofthatguaranteesthiswhichIwon'tgointodetailhere.
Togetasenseofwhatthisproduces,letsdrawalotofsamplesandplotthem.
In[7]: posterior=sampler(data,samples=15000,mu_init=1.)
fig,ax=plt.subplots()
ax.plot(posterior)
_=ax.set(xlabel='sample',ylabel='mu');
This is usually called the trace. To now get an approxmation of the posterior (the reason why we're doing all this), we simply take the histogram of this
trace.It'simportanttokeepinmindthatalthoughthislookssimilartothedatawesampledabovetofitthemodel,thetwoarecompletelyseparate.The
belowplotrepresentsourbeliefin mu.Inthiscaseitjusthappenstoalsobenormalbutforadifferentmodel,itcouldhaveacompletelydifferentshape
thanthelikelihoodorprior.
In[8]: ax=plt.subplot()
sns.distplot(posterior[500:],ax=ax,label='estimatedposterior')
x=np.linspace(.5,.5,500)
post=calc_posterior_analytical(data,x,0,1)
ax.plot(x,post,'g',label='analyticposterior')
_=ax.set(xlabel='mu',ylabel='belief');
ax.legend();
Asyoucansee,byfollowingtheaboveprocedure,wegetsamplesfromthesamedistributionaswhatwederivedanalytically.
Proposalwidth
Above we set the proposal width to 0.5. That turned out to be a pretty good value. In general you don't want the width to be too narrow because your
samplingwillbeinefficientasittakesalongtimetoexplorethewholeparameterspaceandshowsthetypicalrandomwalkbehavior:
In[9]: posterior_small=sampler(data,samples=5000,mu_init=1.,proposal_width=.01)
fig,ax=plt.subplots()
ax.plot(posterior_small);
_=ax.set(xlabel='sample',ylabel='mu');
Butyoualsodon'twantittobesolargethatyouneveracceptajump:
In[10]: posterior_large=sampler(data,samples=5000,mu_init=1.,proposal_width=3.)
fig,ax=plt.subplots()
ax.plot(posterior_large);plt.xlabel('sample');plt.ylabel('mu');
_=ax.set(xlabel='sample',ylabel='mu');
Note,however,thatwearestillsamplingfromourtargetposteriordistributionhereasguaranteedbythemathemticalproof,justlessefficiently:
In[11]: sns.distplot(posterior_small[1000:],label='Smallstepsize')
sns.distplot(posterior_large[1000:],label='Largestepsize');
_=plt.legend();
Withmoresamplesthiswilleventuallylooklikethetrueposterior.Thekeyisthatwewantoursamplestobeindependentofeachotherwhichclearyisn't
thecasehere.Thus,onecommonmetrictoevaluatetheefficiencyofoursampleristheautocorrelationi.e.howcorrelatedasampleiistosample i1,
i2,etc:
In[12]: frompymc3.statsimportautocorr
lags=np.arange(1,100)
fig,ax=plt.subplots()
ax.plot(lags,[autocorr(posterior_large,l)forlinlags],label='largestepsize')
ax.plot(lags,[autocorr(posterior_small,l)forlinlags],label='smallstepsize')
ax.plot(lags,[autocorr(posterior,l)forlinlags],label='mediumstepsize')
ax.legend(loc=0)
_=ax.set(xlabel='lag',ylabel='autocorrelation',ylim=(.1,1))
Obviouslywewanttohaveasmartwayoffiguringouttherightstepwidthautomatically.Onecommonmethodistokeepadjustingtheproposalwidthso
thatroughly50%proposalsarerejected.
Extendingtomorecomplexmodels
Nowyoucaneasilyimaginethatwecouldalsoadda sigmaparameterforthestandarddeviationandfollowthesameprocedureforthissecondparameter.
In that case, we would be generating proposals for mu and sigma but the algorithm logic would be nearly identical. Or, we could have data from a very
different distribution like a Binomial and still use the same algorithm and get the correct posterior. That's pretty cool and a huge benefit of probabilistic
programming:JustdefinethemodelyouwantandletMCMCtakecareoftheinference.
For example, the below model can be written in PyMC3 quite easily. Below we also use the Metropolis sampler (which automatically tunes the proposal
width)andseethatwegetidenticalresults.Feelfreetoplayaroundwiththisandchangethedistributions.Formoreinformation,aswellasmorecomplex
examples,seethePyMC3documentation(http://pymcdevs.github.io/pymc3/getting_started/).
In[13]: importpymc3aspm
withpm.Model():
mu=pm.Normal('mu',0,1)
sigma=1.
returns=pm.Normal('returns',mu=mu,sd=sigma,observed=data)
step=pm.Metropolis()
trace=pm.sample(15000,step)
sns.distplot(trace[2000:]['mu'],label='PyMC3sampler');
sns.distplot(posterior[500:],label='Handwrittensampler');
plt.legend();
[100%]15000of15000completein1.7sec
Conclusions
Weglossedoveralotofdetailwhichiscertainlyimportantbuttherearemanyotherpoststhatdealwiththat.Here,wereallywantedtocommunicatethe
idea of MCMC and the Metropolis sampler. Hopefully you will have gathered some intuition which will equip you to read one of the more technical
introductionstothistopic.
Other,morefancy,MCMCalgorithmslikeHamiltonianMonteCarloactuallyworkverysimilartothis,theyarejustmuchmorecleverinproposingwhereto
jumpnext.
This
blog
post
was
written
in
Jupyter
Notebook,
you
can
find
the
underlying
NB
with
all
its
code
here
(https://github.com/twiecki/WhileMyMCMCGentlySamples/blob/master/content/downloads/notebooks/MCMCsamplingfordummies.ipynb).
PostedbyThomasWieckiNov10,2015bayesianstatistics(http://twiecki.github.com/tag/bayesianstatistics.html)
Comments
21Comments
WhileMyMCMCGentlySamples
Recommend 6
Share
Login
SortbyBest
Jointhediscussion
JoshLSpinoza amonthago
Thiswasextremelyhelpful.Thankyou!
Reply Share
Charles 3monthsago
Veryinsightful.Greatpost,thanks!
Reply Share
RobHicks 4monthsago
Thisisafantasticpostandhashelpedmystudentsinvisualizingthesamplingmechanism.I'veadapteditforoneofmylectures
andwantedyoutoknowIhadtomodifythevisualization,sincethemaximumofthelikelihoodfunctioninthesecondcolumn
ofgraphsin[6]isnotconstant.Sincep(y|mu)isonlymaximizedat1valueofmu,thecode(whereyouareplottingthe
likelihoodfunctionnotthevalueofthelikelihoodattheproposal)shouldbesomethinglikethis:
`y=np.array([norm(loc=i,scale=1).pdf(data).prod()foriinx])`
ratherthan
`y=norm(loc=mu_proposal,scale=1).pdf(x)`
intheplot_proposalfunction.
Reply Share
Mod >RobHicks 3monthsago
ThomasWiecki
Thanks,I'mgladit'suseful.
WhatI'mdoingthereisjustplottingthelikelihoodfunctionthatwillbeusedtoevaluatethedata,nottheactual
likelihoodatthatpoint.ButIcanseehowthat'sabitmisleading.
Reply Share
JoshLSpinoza>ThomasWiecki amonthago
HiThomas,Itriedcommentingononeofthepostsbelowbutithasbeenclosed.I'mreallyinterestedinSpike
andSlabpriorsasyoumentionedbelow.IsthereanywaytoimplementthisinPyMC2/3?Ireadapaperthat
implementsaBernoulliforSpikeandaGaussianfortheSlab.CanthatbedoneusingPyMC2/3?
Reply Share
ThomasWiecki
HiJosh,Yes,cancertainlybedone.There'snoexamplebutitshouldn'tbetoohardforyoutocodeitup.If
youdo,pleaseconsidercontributingittopymc3asanexample.
Reply Share
JoshLSpinoza>ThomasWiecki amonthago
Thanksfortheresponse!Isthereatutorialonaddingnewpriors?Ifnot,howdoesthefunctionneedtobe
setupforuseinPyMC3?I'messentiallytryingtodothisfunctioninPythonandPyMC3seemstobethe
bestway!http://rpackages.ianhowson.com...I'mnewatPyMC3butIunderstandBayesianInference.I
knowPythonreallywellsowantedtogiveitashotbeforeIgiveupandusetheRpackage.
Reply Share
ThomasWiecki
Hereisadescriptionofhowtoaddarbitrarydistributions:http://pymcdevs.github.io/pym...
Reply Share
warrenon 6monthsago
HeyThomas,greatintro!Couldyouprovidesomelinkstothemoretechnicalsourcesyoumeantionintheconclusion?Thanks,
Warren
Reply Share
ThomasWiecki
IlikedthistalkbyIanMurray:
Reply Share
Jun 6monthsago
Excellentpost,thisisthemostintuitiveexplanationaboutMCMCIhaveeverread.Ireallylikehowyouusecodeinsteadof
mathtoexplainthealgorithm.
Reply Share
Kevin 7monthsago
Excellentarticle!JustonequickquestionThomas:Lookingattheplotwhereyoudrew15ksamplesofmufromtheposterior,
whyisitrandominthatrangeovermultiplesamples?Shouldntthesampleconvergetoasinglevalueoversamplesratherthan
beconsistentwithinthatrange?Forexample,inyourmultichartimage,youcanseethetraceconvergetowardmu=0evenwith
asamplesizeofabout8!Shouldn'tanylargeguessesafterthatNOTbeacceptedandthuseverysampleafterthatpointbeabout
0?Incodeterm,I'mreferringtothesamplerfunction.
Reply Share
ThomasWiecki
Thekeyisthisline:`accept=np.random.rand()<p_accept`
Soyes,it'snotguaranteedbutwithsomeprobabilitythesamplerwilljumptoavaluethathasless(unnormalized)
posteriorprobability.Butcertainly,verylargeguesseswillbeacceptedonlyveryrarely(that'swhythesamplerwiththe
largestepwidthonlyacceptsjumpsorarely).
It'sthusnot"random"intherange,thetraceplotonlymakesitlookthatway.Valuescloserto0arevisitedmoreoften
(youcanseethatinthehistogram).
Reply Share
Kevin>ThomasWiecki 7monthsago
Thanksforthequickreply!Thatexplainsitwellandthat'swhatIwasoriginallythinkingaswell.Another
questionI'vehadwas,correctmeifi'mwrong,intermsofyousayingthatMCMCisabletoapproximatewell
distributionsthatmaynothaveatrivialposterior.However,inourexamplethenormaldistributionwasusedand
thusmadetheposteriorcalculationtrivial.Inmanycases,wemightnotexactlyknowthedistributionthatthe
dataissampledfrom,wouldMCMCbeabletostillworkwhenwedon'tassumethatx|mucomesfromanormal
distributionordowehavetoALWAYSspecifytheunderlyingdistributioninthelikelihood?
Reply Share
ThomasWiecki
Yes,MCMCwillworknomatterwhatlikelihoodorpriorsyouuse.Butyoudohavetospecifythem
somehow.
Reply Share
antiquechrono 7monthsago
Ihaveacoupleofquestionsthatarebuggingmeifyoudon'tmindanswering.
Whenyousayyouarecalculatingtheprobabilityofeachdatapointthesearen'treallyprobabilityvalues,butprobabilitydensity
valueswithunitsattached.Howareyouabletousethedensity(thatdoesn'tsumto1)inplaceofaprobabilityvalue?Isit
becausewhenyoudividethecurrentandproposeddensitytheunitscancelcreatingarealprobabilityvalue?
Second,whenmostpeopleareintroducedtoBayes'Ruleit'susuallywithacancerexamplewhereyoupluginprobabilityvalues
andcalculatethenew"posterior"probability.HowareyouabletogofrompluggingprobabilitiesintoBayes'Ruletopluggingin
alikelihoodandadistribution?
Reply Share
ThomasWiecki
1.Goodquestion.It'sprobablyeasiesttothinkabouttheprobabilitymassofaninfinitesimallysmallregionaroundthe
likelihoodvalue.Seehttps://en.wikipedia.org/wiki/...formoreinfo.Butthenormalizationisaseparatethingandalsois
presentinthediscretecase.
2.Everymodelworkswithalikelihoodfunction/distribution.Inthecancerexampleyou'reusingaBernoullidistribution
(https://en.wikipedia.org/wiki/...asthelikelihood,theyjustdon'ttellthat'showit'scalledsothatit'slessscary:).Bayes
formulastaysthesameinallcasesthough.
Reply Share
antiquechrono>ThomasWiecki 7monthsago
Thankyoufortakingthetimetoreply.
1.AreyoureferringtotheBaye'snormalizationhere?Iwasaskingmorealongthelinesofwhenyoucalculate
p_accept=p_proposal/p_current.Ifforthesakeofargumentfromtheexampleinthepostwesaythatmuis
measuredingrams.
Sowhenyoucalculatesomethinglikep_proposalit'snotanormalprobabilityvaluethatrangesfrom01.It'sa
densitywhichisbasicallythechangeinprobabilityperunitwhichisnotavanillaprobability.Whenyoucalculate
p_acceptyougetp_propsalg^1/p_currentg^1sowhenyoudothedivisionyouareleftwithaunitless
quantitywhichisnowarealprobabilityvalue?
2.WellifyoulookatBayes'Rulefromyourpostyoucallthepriorp(theta).I'mhavingahardtimeunderstanding
howthatgoesfrombeingavanillaprobabilitylike0.5tobeinganentiredistributionlikeNormal(0,1).
Reply Share
Charles>antiquechrono 3monthsago
Idon'tthinkofp_acceptasbeingaformallydefinedprobability.Ithinkofitmoreasjustbeingasimple
comparisonthattellswhetherthenewvalue(mu_proposal)isbetterthanthecurrentone.Inregardsto
yoursecondquestion:whenyoutackonanyvaluetoyourmu_currentlistyouareessentiallygenerating
datafromwhichtomakeahistogramoutof.Youcanthenuseadensityestimatingfunctionto
approximatethedensityovertherangeofvaluesyouthinkyourparameterwillhave.Ithasbeenawhile
sinceyourquestion,soitmaynotbeaquestionanymore.Ihopethishelpsifitstillis.
Reply Share
aloctavodia 7monthsago
Verynicepost!Iambeenteachingstructuralbioinformatics(abranchofsciencethatalsousesMCMCmethods)forbiologist.
ThepreviouscoursehavebeenhighlyconceptualandnotashandonsasIwouldlikeit.Ihavecompletelychangedthecourse
forthenextyearIwillbeteachingthemhowtocodeandIhavebeenpreparingaMCMCchapterinlinewithyourpost.Ithink
Icanusethevisualizationpartyouhavecreated:)Ialsohavebeenthinkingonreplicatingavisualizationlikethisone
http://blog.revolutionanalytic...
Whataboutplottingthefirst5movementandthenevery100orsosteps,toseetheconvergenceofthechain?IthinkIwillalso
exploredIpython/Jupyterwidgets(Ihaveseenthose,butIneverhasdoneanything"real").
Well,thanksforsharing!
Reply Share
ThomasWiecki
Thanks!
Goodpoints,definitelyfeelfreetousethesegraphsand/orcodeandresubmitanyimprovementsyoumake.
Reply Share
ALSOONWHILEMYMCMCGENTLYSAMPLES
BayesianDeepLearning
1comment11daysago
EasilydistributingaparallelIPythonNotebookona
cluster
AvatarThomasWieckiUpdate:IgaveusingLasagneatryanditworks
quitenicely,withoutanymodifications.Seehereforanupdated
NB:
10comments2yearsago
AmodernguidetogettingstartedwithDataScienceand
Python
AnimatingMCMCwithPyMC3andMatplotlib
22comments2yearsago
AvatarThomasWieckiThanksforcommenting.Justthatthe
AvatarMichaelMcKernsNicepost,Thomas.Similarto
ipython_cluster_helperarepyinaandpathos,whichprovide
heterogeneousasynchronousparallel
5comments2yearsago
22comments2yearsago
AvatarThomasWieckiThanksforcommenting.Justthatthe
marginalplotsoftheslicesamplerarehavingthisrandomwalk
behaviorthatslowlymovesupand
AvatarThomasWieckiNo,Imeaninyourtranslationhere:
http://stackrefactoring.blogsp...Ifyoucould,rightunderthe
title"Amodernguide..."add:"Forthe
Subscribe
AddDisqustoyoursiteAddDisqusAdd
Privacy
RecentPosts
BayesianDeepLearning(http://twiecki.github.com/blog/2016/06/01/bayesiandeeplearning/)
MCMCsamplingfordummies(http://twiecki.github.com/blog/2015/11/10/mcmcsampling/)
AmodernguidetogettingstartedwithDataScienceandPython(http://twiecki.github.com/blog/2014/11/18/pythonfordatascience/)
TheBestOfBothWorlds:HierarchicalLinearRegressioninPyMC3(http://twiecki.github.com/blog/2014/03/17/bayesianglms3/)
EasilydistributingaparallelIPythonNotebookonacluster(http://twiecki.github.com/blog/2014/02/24/ipythonnbcluster/)
Categories
misc(http://twiecki.github.com/category/misc.html)
Tags
bayesianstatisticsdeeplearningneuralnetworks(http://twiecki.github.com/tag/bayesianstatisticsdeeplearningneuralnetworks.html),introdatascience
(http://twiecki.github.com/tag/introdatascience.html),computation(http://twiecki.github.com/tag/computation.html),bayesianstatistics
(http://twiecki.github.com/tag/bayesianstatistics.html)
GitHubRepos
pydata_docker_jupyterhub(https://github.com/twiecki/pydata_docker_jupyterhub)
DockercontainerwithaPyDatastackandJupyterHubserver
CythonGSL(https://github.com/twiecki/CythonGSL)
CythoninterfacefortheGNUScientificLibrary(GSL).
pydata_ninja(https://github.com/twiecki/pydata_ninja)
ThePathofthePyDataNinja
@twiecki(https://github.com/twiecki)onGitHub
Copyright2013ThomasWieckiPoweredbyPelican(http://getpelican.com)