You are on page 1of 15

WhileMyMCMCGentlySamples

(http://twiecki.github.com/)
Bayesianmodeling,ComputationalPsychiatry,andPython
RSS(/atom.xml)
About(https://sites.google.com/a/brown.edu/lncc/home/members/thomaswiecki)
Archives(/archives.html)
Publications(http://scholar.google.com/citations?hl=en&user=sIkjMAAAAJ&sortby=pubdate&view_op=list_works&gmla=AJsN
F5Oqgc3UBzbTBAJACr4gTDyi09
j1uryXtyXvDaEUrgtxiKmed0IIQlRvn9CHwFAcpHQB6ncpaBSY6vFsK6fazj3wmh6WLkuQdWdwuxd3uhwYN2kC8&undo=untrash_citations,W7OEmFMy1HYC)
Contact

MCMCsamplingfordummies
Nov10,2015
WhenIgivetalksaboutprobabilisticprogrammingandBayesianstatistics,Iusuallyglossoverthedetailsofhowinferenceisactuallyperformed,treating
it as a black box essentially. The beauty of probabilistic programming is that you actually don't have to understand how the inference works in order to
buildmodels,butitcertainlyhelps.
When I presented a new Bayesian model to Quantopian's (https://quantopian.com) CEO, Fawce (https://quantopian.com/about), who wasn't trained in
Bayesianstatsbutiseagertounderstandit,hestartedtoaskaboutthepartIusuallyglossover:"Thomas,howdoestheinferenceactuallywork?How
dowegetthesemagicalsamplesfromtheposterior?".
NowIcouldhavesaid:"Wellthat'seasy,MCMCgeneratessamplesfromtheposteriordistributionbyconstructingareversibleMarkovchainthathasas
itsequilibriumdistributionthetargetposteriordistribution.Questions?".
Thatstatementiscorrect,butisituseful?Mypetpeevewithhowmathandstatsaretaughtisthatnooneevertellsyouabouttheintuitionbehindthe
concepts(whichisusuallyquitesimple)butonlyhandsyousomescarymath.ThisiscertainlythewayIwastaughtandIhadtospendcountlesshours
bangingmyheadagainstthewalluntilthateurakamomentcameabout.Usuallythingsweren'tasscaryorseeminglycomplexonceIdecipheredwhatit
meant.
This blog post is an attempt at trying to explain the intuition behind MCMC sampling (specifically, the Metropolis algorithm
(https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm)). Critically, we'll be using code examples rather than formulas or mathspeak.
Eventuallyyou'llneedthatbutIpersonallythinkit'sbettertostartwiththeanexampleandbuildtheintuitionbeforeyoumoveontothemath.

Theproblemanditsunintuitivesolution
LetstakealookatBayesformula(https://en.wikipedia.org/wiki/Bayes%27_theorem):
P (x|)P ()
P (|x) =
P (x)

Wehave P (|x) ,theprobabilityofourmodelparameters giventhedata x andthusourquantityofinterest.Tocomputethiswemultiplytheprior P ()


(whatwethinkabout beforewehaveseenanydata)andthelikelihood P (x|) ,i.e.howwethinkourdataisdistributed.Thisnominatorisprettyeasyto
solvefor.
However, lets take a closer look at the denominator. P (x) which is also called the evidence (i.e. the evidence that the data x was generated by this
model).Wecancomputethisquantitybyintegratingoverallpossibleparametervalues:
P (x) =

P (x, ) d

This is the key difficulty with Bayes formula while the formula looks innocent enough, for even slightly nontrivial models you just can't compute the
posteriorinaclosedformway.
Nowwemightsay"OK,ifwecan'tsolvesomething,couldwetrytoapproximateit?Forexample,ifwecouldsomehowdrawsamplesfromthatposterior
we can Monte Carlo approximate (https://en.wikipedia.org/wiki/Monte_Carlo_method) it." Unfortunately, to directly sample from that distribution you not
onlyhavetosolveBayesformula,butalsoinvertit,sothat'sevenharder.

Thenwemightsay"Well,insteadletsconstructaMarkovchainthathasasanequilibriumdistributionwhichmatchesourposteriordistribution".I'mjust
kidding,mostpeoplewouldn'tsaythatasitsoundsbatshitcrazy.Ifyoucan'tcomputeit,can'tsamplefromit,thenconstructingthatMarkovchainwith
allthesepropertiesmustbeevenharder.
ThesurprisinginsightthoughisthatthisisactuallyveryeasyandthereexistageneralclassofalgorithmsthatdothiscalledMarkovchainMonteCarlo
(https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo)(constructingaMarkovchaintodoMonteCarloapproximation).

Settinguptheproblem
First,letsimportourmodules.
In[1]: %matplotlibinline
importnumpyasnp
importscipyassp
importpandasaspd
importmatplotlib.pyplotasplt
importseabornassns
fromscipy.statsimportnorm
sns.set_style('white')
sns.set_context('talk')
np.random.seed(123)
Letsgeneratesomedata:100pointsfromanormalcenteredaroundzero.Ourgoalwillbetoestimatetheposteriorofthemean mu(we'llassumethatwe
knowthestandarddeviationtobe1).
In[2]: data=np.random.randn(20)
In[3]: ax=plt.subplot()
sns.distplot(data,kde=False,ax=ax)
_=ax.set(title='Histogramofobserveddata',xlabel='x',ylabel='#observations');

Next,wehavetodefineourmodel.Inthissimplecase,wewillassumethatthisdataisnormaldistributed,i.e.thelikelihoodofthemodelisnormal.As
youknow,anormaldistributionhastwoparametersmean andstandarddeviation .Forsimplicity,we'llassumeweknowthat

= 1

andwe'llwant

toinfertheposteriorfor .Foreachparameterwewanttoinfer,wehavetochoseaprior.Forsimplicity,letsalsoassumeaNormaldistributionasaprior
for .Thus,instatsspeakourmodelis:
Normal(0, 1)
x| Normal(x; , 1)

Whatisconvenient,isthatforthismodel,weactuallycancomputetheposterioranalytically.That'sbecauseforanormallikelihoodwithknownstandard
deviation,thenormalpriorfor muisconjugate(https://en.wikipedia.org/wiki/Conjugate_prior)(conjugateheremeansthatourposteriorwillfollowthesame
distributionastheprior),soweknowthatourposteriorfor isalsonormal.Wecaneasilylookuponwikipediahowwecancomputetheparametersof
the

posterior.

For

mathemtical

derivation

of

this,

see

here

(http://www.bcs.rochester.edu/people/robbie/jacobslab/cheat_sheet/bayes_normal_normal.pdf).
In[4]: defcalc_posterior_analytical(data,x,mu_0,sigma_0):
sigma=1.
n=len(data)
mu_post=(mu_0/sigma_0**2+data.sum()/sigma**2)/(1./sigma_0**2+n/sigma**2)
sigma_post=(1./sigma_0**2+n/sigma**2)**1
returnnorm(mu_post,np.sqrt(sigma_post)).pdf(x)
ax=plt.subplot()
x=np.linspace(1,1,500)
posterior_analytical=calc_posterior_analytical(data,x,0.,1.)
ax.plot(x,posterior_analytical)
ax.set(xlabel='mu',ylabel='belief',title='Analyticalposterior');
sns.despine()

This shows our quantity of interest, the probability of 's values after having seen the data, taking our prior information into account. Lets assume,
however,thatourpriorwasn'tconjugateandwecouldn'tsolvethisbyhandwhichisusuallythecase.

ExplainingMCMCsamplingwithcode
Nowontothesamplinglogic.Atfirst,youfindstartingparameterposition(canberandomlychosen),letsfixitarbitrarilyto:
mu_current=1.

Then,youproposetomove(jump)fromthatpositionsomewhereelse(that'stheMarkovpart).Youcanbeverydumborverysophisticatedabouthowyou
come up with that proposal. The Metropolis sampler is very dumb and just takes a sample from a normal distribution (no relationship to the normal we
assumeforthemodel)centeredaroundyourcurrent muvalue(i.e. mu_current)withacertainstandarddeviation(proposal_width)thatwilldeterminehowfar
youproposejumps(herewe'reusescipy.stats.norm):
proposal=norm(mu_current,proposal_width).rvs()

Next,youevaluatewhetherthat'sagoodplacetojumptoornot.Iftheresultingnormaldistributionwiththatproposed muexplainesthedatabetterthan
yourold mu,you'lldefinitelywanttogothere.Whatdoes"explainsthedatabetter"mean?Wequantifyfitbycomputingtheprobabilityofthedata,given
thelikelihood(normal)withtheproposedparametervalues(proposed muandafixed sigma=1).Thiscaneasilybecomputedbycalculatingtheprobability
foreachdatapointusing scipy.stats.normal(mu,sigma).pdf(data) and then multiplying the individual probabilities, i.e. compute the likelihood (usually you
woulduselogprobabilitiesbutweomitthishere):

likelihood_current=norm(mu_current,1).pdf(data).prod()
likelihood_proposal=norm(mu_proposal,1).pdf(data).prod()

#Computepriorprobabilityofcurrentandproposedmu
prior_current=norm(mu_prior_mu,mu_prior_sd).pdf(mu_current)
prior_proposal=norm(mu_prior_mu,mu_prior_sd).pdf(mu_proposal)

#NominatorofBayesformula
p_current=likelihood_current*prior_current
p_proposal=likelihood_proposal*prior_proposal

Up until now, we essentially have a hillclimbing algorithm that would just propose movements into random directions and only accept a jump if the
mu_proposal has higher likelihood than mu_current. Eventually we'll get to mu=0 (or close to it) from where no more moves will be possible. However, we

wanttogetaposteriorsowe'llalsohavetosometimesacceptmovesintotheotherdirection.Thekeytrickisbydividingthetwoprobabilities,
p_accept=p_proposal/p_current

we get an acceptance probability. You can already see that if p_proposal is larger, that probability will be >1 and we'll definitely accept. However, if
p_currentislarger,saytwiceaslarge,there'llbea50%chanceofmovingthere:
accept=np.random.rand()<p_accept

ifaccept:
#Updateposition
cur_pos=proposal

Thissimpleproceduregivesussamplesfromtheposterior.

Whydoesthismakesense?
Takingastepback,notethattheaboveacceptanceratioisthereasonthiswholethingworksoutandwegetaroundtheintegration.Wecanshowthisby
computingtheacceptanceratiooverthenormalizedposteriorandseeinghowit'sequivalenttotheacceptanceratiooftheunnormalizedposterior(letssay
0

isourcurrentposition,and isourproposal):

P (x|)P ()

P (x|)P ()

P (x)
P (x| )P ( )
0

=
P (x|0 )P (0 )

P (x)

In words, dividing the posterior of proposed parameter setting by the posterior of the current parameter setting, P (x) that nasty quantity we can't
compute gets canceled out. So you can intuit that we're actually dividing the full posterior at one position by the full posterior at another position (no
magichere).Thatway,wearevisitingregionsofhighposteriorprobabilityrelativelymoreoftenthanthoseoflowposteriorprobability.

Puttingitalltogether
In[5]: defsampler(data,samples=4,mu_init=.5,proposal_width=.5,plot=False,mu_prior_mu=0,mu_prior_sd=1.):
mu_current=mu_init
posterior=[mu_current]
foriinrange(samples):
#suggestnewposition
mu_proposal=norm(mu_current,proposal_width).rvs()
#Computelikelihoodbymultiplyingprobabilitiesofeachdatapoint
likelihood_current=norm(mu_current,1).pdf(data).prod()
likelihood_proposal=norm(mu_proposal,1).pdf(data).prod()

#Computepriorprobabilityofcurrentandproposedmu
prior_current=norm(mu_prior_mu,mu_prior_sd).pdf(mu_current)
prior_proposal=norm(mu_prior_mu,mu_prior_sd).pdf(mu_proposal)

p_current=likelihood_current*prior_current
p_proposal=likelihood_proposal*prior_proposal

#Acceptproposal?
p_accept=p_proposal/p_current

#Usuallywouldincludepriorprobability,whichweneglecthereforsimplicity
accept=np.random.rand()<p_accept

ifplot:
plot_proposal(mu_current,mu_proposal,mu_prior_mu,mu_prior_sd,data,accept,posterior,i)

ifaccept:
#Updateposition
mu_current=mu_proposal

posterior.append(mu_current)

returnposterior
#Functiontodisplay
defplot_proposal(mu_current,mu_proposal,mu_prior_mu,mu_prior_sd,data,accepted,trace,i):
fromcopyimportcopy
trace=copy(trace)
fig,(ax1,ax2,ax3,ax4)=plt.subplots(ncols=4,figsize=(16,4))
fig.suptitle('Iteration%i'%(i+1))
x=np.linspace(3,3,5000)
color='g'ifacceptedelse'r'

#Plotprior
prior_current=norm(mu_prior_mu,mu_prior_sd).pdf(mu_current)
prior_proposal=norm(mu_prior_mu,mu_prior_sd).pdf(mu_proposal)
prior=norm(mu_prior_mu,mu_prior_sd).pdf(x)
ax1.plot(x,prior)
ax1.plot([mu_current]*2,[0,prior_current],marker='o',color='b')
ax1.plot([mu_proposal]*2,[0,prior_proposal],marker='o',color=color)
ax1.annotate("",xy=(mu_proposal,0.2),xytext=(mu_current,0.2),
arrowprops=dict(arrowstyle=">",lw=2.))
ax1.set(ylabel='ProbabilityDensity',title='current:prior(mu=%.2f)=%.2f\nproposal:prior(mu=%.2f)=%.2
f'%(mu_current,prior_current,mu_proposal,prior_proposal))

#Likelihood
likelihood_current=norm(mu_current,1).pdf(data).prod()
likelihood_proposal=norm(mu_proposal,1).pdf(data).prod()
y=norm(loc=mu_proposal,scale=1).pdf(x)
sns.distplot(data,kde=False,norm_hist=True,ax=ax2)
ax2.plot(x,y,color=color)
ax2.axvline(mu_current,color='b',linestyle='',label='mu_current')
ax2.axvline(mu_proposal,color=color,linestyle='',label='mu_proposal')
#ax2.title('Proposal{}'.format('accepted'ifacceptedelse'rejected'))
ax2.annotate("",xy=(mu_proposal,0.2),xytext=(mu_current,0.2),
arrowprops=dict(arrowstyle=">",lw=2.))
ax2.set(title='likelihood(mu=%.2f)=%.2f\nlikelihood(mu=%.2f)=%.2f'%(mu_current,1e14*likelihood_curren
t,mu_proposal,1e14*likelihood_proposal))

#Posterior
posterior_analytical=calc_posterior_analytical(data,x,mu_prior_mu,mu_prior_sd)
ax3.plot(x,posterior_analytical)
posterior_current=calc_posterior_analytical(data,mu_current,mu_prior_mu,mu_prior_sd)
posterior_proposal=calc_posterior_analytical(data,mu_proposal,mu_prior_mu,mu_prior_sd)
ax3.plot([mu_current]*2,[0,posterior_current],marker='o',color='b')
ax3.plot([mu_proposal]*2,[0,posterior_proposal],marker='o',color=color)
ax3.annotate("",xy=(mu_proposal,0.2),xytext=(mu_current,0.2),
arrowprops=dict(arrowstyle=">",lw=2.))
#x3.set(title=r'priorxlikelihood$\propto$posterior')
ax3.set(title='posterior(mu=%.2f)=%.5f\nposterior(mu=%.2f)=%.5f'%(mu_current,posterior_current,mu_pr
oposal,posterior_proposal))

ifaccepted:
trace.append(mu_proposal)
else:
trace.append(mu_current)
ax4.plot(trace)
ax4.set(xlabel='iteration',ylabel='mu',title='trace')
plt.tight_layout()
#plt.legend()

VisualizingMCMC
Tovisualizethesampling,we'llcreateplotsforsomequantitiesthatarecomputed.EachrowbelowisasingleiterationthroughourMetropolissampler.
Thefirstcolumnsisourpriordistributionwhatourbeliefabout isbeforeseeingthedata.Youcanseehowthedistributionisstaticandweonlyplugin
our proposals.Theverticallinesrepresentourcurrent inblueandourproposed ineitherredorgreen(rejectedoraccepted,respectively).
The 2nd column is our likelihood and what we are using to evaluate how good our model explains the data. You can see that the likelihood function
changesinresponsetotheproposed .Thebluehistogramwhichisourdata.Thesolidlineingreenorredisthelikelihoodwiththecurrentlyproposed mu.
Intuitively,themoreoverlapthereisbetweenlikelihoodanddata,thebetterthemodelexplainsthedataandthehighertheresultingprobabilitywillbe.The
dottedlineofthesamecoloristheproposedmuandthedottedbluelineisthecurrentmu.

The3rdcolumnisourposteriordistribution.HereIamdisplayingthenormalizedposteriorbutaswefoundoutabove,wecanjustmultiplythepriorvalue
forthecurrentandproposed 'sbythelikelihoodvalueforthetwo 'stogettheunnormalizedposteriorvalues(whichweusefortheactualcomputation),
anddivideonebytheothertogetouracceptanceprobability.
The 4th column is our trace (i.e. the posterior samples of we're generating) where we store each sample irrespective of whether it was accepted or
rejected(inwhichcasethelinejuststaysconstant).
Notethatwealwaysmovetorelativelymorelikely values(intermsoftheirposteriordensity),butonlysometimestorelativelylesslikely values,as
canbeseeniniteration14(theiterationnumbercanbefoundatthetopcenterofeachrow).
In[6]: np.random.seed(123)
sampler(data,samples=8,mu_init=1.,plot=True);

Now the magic of MCMC is that you just have to do that for a long time, and the samples that are generated in this way come from the posterior
distributionofyourmodel.ThereisarigorousmathematicalproofthatguaranteesthiswhichIwon'tgointodetailhere.
Togetasenseofwhatthisproduces,letsdrawalotofsamplesandplotthem.
In[7]: posterior=sampler(data,samples=15000,mu_init=1.)
fig,ax=plt.subplots()
ax.plot(posterior)
_=ax.set(xlabel='sample',ylabel='mu');

This is usually called the trace. To now get an approxmation of the posterior (the reason why we're doing all this), we simply take the histogram of this
trace.It'simportanttokeepinmindthatalthoughthislookssimilartothedatawesampledabovetofitthemodel,thetwoarecompletelyseparate.The
belowplotrepresentsourbeliefin mu.Inthiscaseitjusthappenstoalsobenormalbutforadifferentmodel,itcouldhaveacompletelydifferentshape
thanthelikelihoodorprior.
In[8]: ax=plt.subplot()
sns.distplot(posterior[500:],ax=ax,label='estimatedposterior')
x=np.linspace(.5,.5,500)
post=calc_posterior_analytical(data,x,0,1)
ax.plot(x,post,'g',label='analyticposterior')
_=ax.set(xlabel='mu',ylabel='belief');
ax.legend();

Asyoucansee,byfollowingtheaboveprocedure,wegetsamplesfromthesamedistributionaswhatwederivedanalytically.

Proposalwidth
Above we set the proposal width to 0.5. That turned out to be a pretty good value. In general you don't want the width to be too narrow because your
samplingwillbeinefficientasittakesalongtimetoexplorethewholeparameterspaceandshowsthetypicalrandomwalkbehavior:
In[9]: posterior_small=sampler(data,samples=5000,mu_init=1.,proposal_width=.01)
fig,ax=plt.subplots()
ax.plot(posterior_small);
_=ax.set(xlabel='sample',ylabel='mu');

Butyoualsodon'twantittobesolargethatyouneveracceptajump:
In[10]: posterior_large=sampler(data,samples=5000,mu_init=1.,proposal_width=3.)
fig,ax=plt.subplots()
ax.plot(posterior_large);plt.xlabel('sample');plt.ylabel('mu');
_=ax.set(xlabel='sample',ylabel='mu');

Note,however,thatwearestillsamplingfromourtargetposteriordistributionhereasguaranteedbythemathemticalproof,justlessefficiently:
In[11]: sns.distplot(posterior_small[1000:],label='Smallstepsize')
sns.distplot(posterior_large[1000:],label='Largestepsize');
_=plt.legend();

Withmoresamplesthiswilleventuallylooklikethetrueposterior.Thekeyisthatwewantoursamplestobeindependentofeachotherwhichclearyisn't
thecasehere.Thus,onecommonmetrictoevaluatetheefficiencyofoursampleristheautocorrelationi.e.howcorrelatedasampleiistosample i1,
i2,etc:

In[12]: frompymc3.statsimportautocorr
lags=np.arange(1,100)
fig,ax=plt.subplots()
ax.plot(lags,[autocorr(posterior_large,l)forlinlags],label='largestepsize')
ax.plot(lags,[autocorr(posterior_small,l)forlinlags],label='smallstepsize')
ax.plot(lags,[autocorr(posterior,l)forlinlags],label='mediumstepsize')
ax.legend(loc=0)
_=ax.set(xlabel='lag',ylabel='autocorrelation',ylim=(.1,1))

Obviouslywewanttohaveasmartwayoffiguringouttherightstepwidthautomatically.Onecommonmethodistokeepadjustingtheproposalwidthso
thatroughly50%proposalsarerejected.

Extendingtomorecomplexmodels
Nowyoucaneasilyimaginethatwecouldalsoadda sigmaparameterforthestandarddeviationandfollowthesameprocedureforthissecondparameter.
In that case, we would be generating proposals for mu and sigma but the algorithm logic would be nearly identical. Or, we could have data from a very
different distribution like a Binomial and still use the same algorithm and get the correct posterior. That's pretty cool and a huge benefit of probabilistic
programming:JustdefinethemodelyouwantandletMCMCtakecareoftheinference.
For example, the below model can be written in PyMC3 quite easily. Below we also use the Metropolis sampler (which automatically tunes the proposal
width)andseethatwegetidenticalresults.Feelfreetoplayaroundwiththisandchangethedistributions.Formoreinformation,aswellasmorecomplex
examples,seethePyMC3documentation(http://pymcdevs.github.io/pymc3/getting_started/).
In[13]: importpymc3aspm
withpm.Model():
mu=pm.Normal('mu',0,1)
sigma=1.
returns=pm.Normal('returns',mu=mu,sd=sigma,observed=data)

step=pm.Metropolis()
trace=pm.sample(15000,step)

sns.distplot(trace[2000:]['mu'],label='PyMC3sampler');
sns.distplot(posterior[500:],label='Handwrittensampler');
plt.legend();
[100%]15000of15000completein1.7sec

Conclusions
Weglossedoveralotofdetailwhichiscertainlyimportantbuttherearemanyotherpoststhatdealwiththat.Here,wereallywantedtocommunicatethe
idea of MCMC and the Metropolis sampler. Hopefully you will have gathered some intuition which will equip you to read one of the more technical
introductionstothistopic.
Other,morefancy,MCMCalgorithmslikeHamiltonianMonteCarloactuallyworkverysimilartothis,theyarejustmuchmorecleverinproposingwhereto
jumpnext.
This

blog

post

was

written

in

Jupyter

Notebook,

you

can

find

the

underlying

NB

with

all

its

code

here

(https://github.com/twiecki/WhileMyMCMCGentlySamples/blob/master/content/downloads/notebooks/MCMCsamplingfordummies.ipynb).
PostedbyThomasWieckiNov10,2015bayesianstatistics(http://twiecki.github.com/tag/bayesianstatistics.html)

Comments
21Comments

WhileMyMCMCGentlySamples

Recommend 6

Share

Login

SortbyBest

Jointhediscussion
JoshLSpinoza amonthago

Thiswasextremelyhelpful.Thankyou!

Reply Share

Charles 3monthsago

Veryinsightful.Greatpost,thanks!

Reply Share

RobHicks 4monthsago

Thisisafantasticpostandhashelpedmystudentsinvisualizingthesamplingmechanism.I'veadapteditforoneofmylectures
andwantedyoutoknowIhadtomodifythevisualization,sincethemaximumofthelikelihoodfunctioninthesecondcolumn
ofgraphsin[6]isnotconstant.Sincep(y|mu)isonlymaximizedat1valueofmu,thecode(whereyouareplottingthe
likelihoodfunctionnotthevalueofthelikelihoodattheproposal)shouldbesomethinglikethis:
`y=np.array([norm(loc=i,scale=1).pdf(data).prod()foriinx])`
ratherthan
`y=norm(loc=mu_proposal,scale=1).pdf(x)`

intheplot_proposalfunction.

Reply Share
Mod >RobHicks 3monthsago

ThomasWiecki

Thanks,I'mgladit'suseful.
WhatI'mdoingthereisjustplottingthelikelihoodfunctionthatwillbeusedtoevaluatethedata,nottheactual
likelihoodatthatpoint.ButIcanseehowthat'sabitmisleading.

Reply Share

JoshLSpinoza>ThomasWiecki amonthago

HiThomas,Itriedcommentingononeofthepostsbelowbutithasbeenclosed.I'mreallyinterestedinSpike
andSlabpriorsasyoumentionedbelow.IsthereanywaytoimplementthisinPyMC2/3?Ireadapaperthat
implementsaBernoulliforSpikeandaGaussianfortheSlab.CanthatbedoneusingPyMC2/3?

Reply Share

ThomasWiecki

Mod >JoshLSpinoza amonthago

HiJosh,Yes,cancertainlybedone.There'snoexamplebutitshouldn'tbetoohardforyoutocodeitup.If
youdo,pleaseconsidercontributingittopymc3asanexample.

Reply Share

JoshLSpinoza>ThomasWiecki amonthago

Thanksfortheresponse!Isthereatutorialonaddingnewpriors?Ifnot,howdoesthefunctionneedtobe
setupforuseinPyMC3?I'messentiallytryingtodothisfunctioninPythonandPyMC3seemstobethe
bestway!http://rpackages.ianhowson.com...I'mnewatPyMC3butIunderstandBayesianInference.I
knowPythonreallywellsowantedtogiveitashotbeforeIgiveupandusetheRpackage.

Reply Share

ThomasWiecki

Mod >JoshLSpinoza amonthago

Hereisadescriptionofhowtoaddarbitrarydistributions:http://pymcdevs.github.io/pym...

Reply Share

warrenon 6monthsago

HeyThomas,greatintro!Couldyouprovidesomelinkstothemoretechnicalsourcesyoumeantionintheconclusion?Thanks,
Warren

Reply Share

ThomasWiecki

Mod >warrenon 6monthsago

IlikedthistalkbyIanMurray:

Reply Share

Jun 6monthsago

Excellentpost,thisisthemostintuitiveexplanationaboutMCMCIhaveeverread.Ireallylikehowyouusecodeinsteadof
mathtoexplainthealgorithm.

Reply Share

Kevin 7monthsago

Excellentarticle!JustonequickquestionThomas:Lookingattheplotwhereyoudrew15ksamplesofmufromtheposterior,
whyisitrandominthatrangeovermultiplesamples?Shouldntthesampleconvergetoasinglevalueoversamplesratherthan
beconsistentwithinthatrange?Forexample,inyourmultichartimage,youcanseethetraceconvergetowardmu=0evenwith
asamplesizeofabout8!Shouldn'tanylargeguessesafterthatNOTbeacceptedandthuseverysampleafterthatpointbeabout
0?Incodeterm,I'mreferringtothesamplerfunction.

Reply Share

ThomasWiecki

Mod >Kevin 7monthsago

Thekeyisthisline:`accept=np.random.rand()<p_accept`
Soyes,it'snotguaranteedbutwithsomeprobabilitythesamplerwilljumptoavaluethathasless(unnormalized)
posteriorprobability.Butcertainly,verylargeguesseswillbeacceptedonlyveryrarely(that'swhythesamplerwiththe
largestepwidthonlyacceptsjumpsorarely).
It'sthusnot"random"intherange,thetraceplotonlymakesitlookthatway.Valuescloserto0arevisitedmoreoften
(youcanseethatinthehistogram).

Reply Share

Kevin>ThomasWiecki 7monthsago

Thanksforthequickreply!Thatexplainsitwellandthat'swhatIwasoriginallythinkingaswell.Another
questionI'vehadwas,correctmeifi'mwrong,intermsofyousayingthatMCMCisabletoapproximatewell
distributionsthatmaynothaveatrivialposterior.However,inourexamplethenormaldistributionwasusedand
thusmadetheposteriorcalculationtrivial.Inmanycases,wemightnotexactlyknowthedistributionthatthe
dataissampledfrom,wouldMCMCbeabletostillworkwhenwedon'tassumethatx|mucomesfromanormal
distributionordowehavetoALWAYSspecifytheunderlyingdistributioninthelikelihood?

Reply Share

ThomasWiecki

Mod >Kevin 7monthsago

Yes,MCMCwillworknomatterwhatlikelihoodorpriorsyouuse.Butyoudohavetospecifythem
somehow.

Reply Share

antiquechrono 7monthsago

Ihaveacoupleofquestionsthatarebuggingmeifyoudon'tmindanswering.
Whenyousayyouarecalculatingtheprobabilityofeachdatapointthesearen'treallyprobabilityvalues,butprobabilitydensity
valueswithunitsattached.Howareyouabletousethedensity(thatdoesn'tsumto1)inplaceofaprobabilityvalue?Isit
becausewhenyoudividethecurrentandproposeddensitytheunitscancelcreatingarealprobabilityvalue?
Second,whenmostpeopleareintroducedtoBayes'Ruleit'susuallywithacancerexamplewhereyoupluginprobabilityvalues
andcalculatethenew"posterior"probability.HowareyouabletogofrompluggingprobabilitiesintoBayes'Ruletopluggingin
alikelihoodandadistribution?

Reply Share

ThomasWiecki

Mod >antiquechrono 7monthsago

1.Goodquestion.It'sprobablyeasiesttothinkabouttheprobabilitymassofaninfinitesimallysmallregionaroundthe
likelihoodvalue.Seehttps://en.wikipedia.org/wiki/...formoreinfo.Butthenormalizationisaseparatethingandalsois

presentinthediscretecase.
2.Everymodelworkswithalikelihoodfunction/distribution.Inthecancerexampleyou'reusingaBernoullidistribution
(https://en.wikipedia.org/wiki/...asthelikelihood,theyjustdon'ttellthat'showit'scalledsothatit'slessscary:).Bayes
formulastaysthesameinallcasesthough.

Reply Share

antiquechrono>ThomasWiecki 7monthsago

Thankyoufortakingthetimetoreply.
1.AreyoureferringtotheBaye'snormalizationhere?Iwasaskingmorealongthelinesofwhenyoucalculate
p_accept=p_proposal/p_current.Ifforthesakeofargumentfromtheexampleinthepostwesaythatmuis
measuredingrams.
Sowhenyoucalculatesomethinglikep_proposalit'snotanormalprobabilityvaluethatrangesfrom01.It'sa
densitywhichisbasicallythechangeinprobabilityperunitwhichisnotavanillaprobability.Whenyoucalculate
p_acceptyougetp_propsalg^1/p_currentg^1sowhenyoudothedivisionyouareleftwithaunitless
quantitywhichisnowarealprobabilityvalue?
2.WellifyoulookatBayes'Rulefromyourpostyoucallthepriorp(theta).I'mhavingahardtimeunderstanding
howthatgoesfrombeingavanillaprobabilitylike0.5tobeinganentiredistributionlikeNormal(0,1).

Reply Share

Charles>antiquechrono 3monthsago

Idon'tthinkofp_acceptasbeingaformallydefinedprobability.Ithinkofitmoreasjustbeingasimple
comparisonthattellswhetherthenewvalue(mu_proposal)isbetterthanthecurrentone.Inregardsto
yoursecondquestion:whenyoutackonanyvaluetoyourmu_currentlistyouareessentiallygenerating
datafromwhichtomakeahistogramoutof.Youcanthenuseadensityestimatingfunctionto
approximatethedensityovertherangeofvaluesyouthinkyourparameterwillhave.Ithasbeenawhile
sinceyourquestion,soitmaynotbeaquestionanymore.Ihopethishelpsifitstillis.

Reply Share

aloctavodia 7monthsago

Verynicepost!Iambeenteachingstructuralbioinformatics(abranchofsciencethatalsousesMCMCmethods)forbiologist.
ThepreviouscoursehavebeenhighlyconceptualandnotashandonsasIwouldlikeit.Ihavecompletelychangedthecourse
forthenextyearIwillbeteachingthemhowtocodeandIhavebeenpreparingaMCMCchapterinlinewithyourpost.Ithink
Icanusethevisualizationpartyouhavecreated:)Ialsohavebeenthinkingonreplicatingavisualizationlikethisone
http://blog.revolutionanalytic...
Whataboutplottingthefirst5movementandthenevery100orsosteps,toseetheconvergenceofthechain?IthinkIwillalso
exploredIpython/Jupyterwidgets(Ihaveseenthose,butIneverhasdoneanything"real").
Well,thanksforsharing!

Reply Share

ThomasWiecki

Mod >aloctavodia 7monthsago

Thanks!
Goodpoints,definitelyfeelfreetousethesegraphsand/orcodeandresubmitanyimprovementsyoumake.

Reply Share

ALSOONWHILEMYMCMCGENTLYSAMPLES

BayesianDeepLearning
1comment11daysago

EasilydistributingaparallelIPythonNotebookona
cluster

AvatarThomasWieckiUpdate:IgaveusingLasagneatryanditworks
quitenicely,withoutanymodifications.Seehereforanupdated
NB:

10comments2yearsago

AmodernguidetogettingstartedwithDataScienceand
Python

AnimatingMCMCwithPyMC3andMatplotlib

22comments2yearsago

AvatarThomasWieckiThanksforcommenting.Justthatthe

AvatarMichaelMcKernsNicepost,Thomas.Similarto
ipython_cluster_helperarepyinaandpathos,whichprovide
heterogeneousasynchronousparallel

5comments2yearsago

22comments2yearsago

AvatarThomasWieckiThanksforcommenting.Justthatthe
marginalplotsoftheslicesamplerarehavingthisrandomwalk
behaviorthatslowlymovesupand

AvatarThomasWieckiNo,Imeaninyourtranslationhere:
http://stackrefactoring.blogsp...Ifyoucould,rightunderthe
title"Amodernguide..."add:"Forthe

Subscribe

AddDisqustoyoursiteAddDisqusAdd

Privacy

RecentPosts
BayesianDeepLearning(http://twiecki.github.com/blog/2016/06/01/bayesiandeeplearning/)
MCMCsamplingfordummies(http://twiecki.github.com/blog/2015/11/10/mcmcsampling/)
AmodernguidetogettingstartedwithDataScienceandPython(http://twiecki.github.com/blog/2014/11/18/pythonfordatascience/)
TheBestOfBothWorlds:HierarchicalLinearRegressioninPyMC3(http://twiecki.github.com/blog/2014/03/17/bayesianglms3/)
EasilydistributingaparallelIPythonNotebookonacluster(http://twiecki.github.com/blog/2014/02/24/ipythonnbcluster/)

Categories
misc(http://twiecki.github.com/category/misc.html)

Tags
bayesianstatisticsdeeplearningneuralnetworks(http://twiecki.github.com/tag/bayesianstatisticsdeeplearningneuralnetworks.html),introdatascience
(http://twiecki.github.com/tag/introdatascience.html),computation(http://twiecki.github.com/tag/computation.html),bayesianstatistics
(http://twiecki.github.com/tag/bayesianstatistics.html)

GitHubRepos
pydata_docker_jupyterhub(https://github.com/twiecki/pydata_docker_jupyterhub)
DockercontainerwithaPyDatastackandJupyterHubserver
CythonGSL(https://github.com/twiecki/CythonGSL)
CythoninterfacefortheGNUScientificLibrary(GSL).
pydata_ninja(https://github.com/twiecki/pydata_ninja)
ThePathofthePyDataNinja
@twiecki(https://github.com/twiecki)onGitHub
Copyright2013ThomasWieckiPoweredbyPelican(http://getpelican.com)

You might also like