You are on page 1of 13

09/09/2016

TheLostArtofCStructurePacking

TheLostArtofCStructurePacking
EricS.Raymond
<esr@thyrsus.com>
TableofContents
1.Whoshouldreadthis
2.WhyIwroteit
3.Alignmentrequirements
4.Padding
5.Structurealignmentandpadding
6.Bitfields
7.Structurereordering
8.Awkwardscalarcases
9.Readabilityandcachelocality
10.Otherpackingtechniques
11.Overridingalignmentrules
12.Tools
13.Proofandexceptionalcases
14.RelatedReading
15.Versionhistory

1.Whoshouldreadthis
ThispageisaboutatechniqueforreducingthememoryfootprintofCprogramsmanually
repackingCstructuredeclarationsforreducedsize.Toreadit,youwillrequirebasic
knowledgeoftheCprogramminglanguage.
Youneedtoknowthistechniqueifyouintendtowritecodeformemoryconstrained
embeddedsystems,oroperatingsystemkernels.Itisusefulifyouareworkingwith
applicationdatasetssolargethatyourprogramsroutinelyhitmemorylimits.Itisgoodto
knowinanyapplicationwhereyoureally,reallycareaboutminimizingcachelinemisses.
Finally,knowingthistechniqueisagatewaytootheresotericCtopics.Youarenotan
advancedCprogrammeruntilyouhavegraspedit.YouarenotamasterofCuntilyoucould
havewrittenthisdocumentyourselfandcancriticizeitintelligently.

2.WhyIwroteit
Thiswebpageexistsbecauseinlate2013IfoundmyselfheavilyapplyingaCoptimization
techniquethatIhadlearnedmorethantwodecadespreviouslyandnotusedmuchsince.
Ineededtoreducethememoryfootprintofaprogramthatusedthousandssometimes
hundredsofthousandsofCstructinstances.Theprogramwascvsfastexportandthe
problemwasthatitwasdyingwithoutofmemoryerrorsonlargerepositories.
Therearewaystoreducememoryusagesignificantlyinsituationslikethis,byrearrangingthe
orderofstructuremembersincarefulways.ThiscanleadtodramaticgainsinmycaseIwas

http://www.catb.org/esr/structurepacking/

1/13

09/09/2016

TheLostArtofCStructurePacking

abletocuttheworkingsetsizebyaround40%,enablingtheprogramtohandlemuchlarger
repositorieswithoutdying.
ButasIworked,andthoughtaboutwhatIwasdoing,itbegantodawnonmethatthe
techniqueIwasusinghasbeenmorethanhalfforgottenintheselatterdays.Alittleweb
researchconfirmedthatCprogrammersdontseemtotalkaboutitmuchanymore,atleast
notwhereasearchenginecanseethem.AcoupleofWikipediaentriestouchthetopic,butI
foundnobodywhocovereditcomprehensively.
Thereareactuallyreasonsforthisthatarentstupid.CScourses(rightly)steerpeopleaway
frommicrooptimizationtowardsfindingbetteralgorithms.Theplungingpriceofmachine
resourceshasmadesqueezingmemoryusagelessnecessary.Andthewayhackersusedto
learnhowtodoitbackinthedaywasbybumpingtheirnosesonstrangehardware
architecturesalesscommonexperiencenow.
Butthetechniquestillhasvalueinimportantsituations,andwillaslongasmemoryisfinite.
ThisdocumentisintendedtosaveCprogrammersfromhavingtorediscoverthetechnique,so
theycanconcentrateeffortonmoreimportantthings.

3.Alignmentrequirements
Thefirstthingtounderstandisthat,onmodernprocessors,thewayyourCcompilerlaysout
basicCdatatypesinmemoryisconstrainedinordertomakememoryaccessesfaster.
StorageforthebasicCdatatypesonanx86orARMprocessordoesntnormallystartat
arbitrarybyteaddressesinmemory.Rather,eachtypeexceptcharhasanalignment
requirementcharscanstartonanybyteaddress,but2byteshortsmuststartonaneven
address,4byteintsorfloatsmuststartonanaddressdivisibleby4,and8bytelongsor
doublesmuststartonanaddressdivisibleby8.Signedorunsignedmakesnodifference.
ThejargonforthisisthatbasicCtypesonx86andARMareselfaligned.Pointers,whether
32bit(4byte)or64bit(8byte)areselfalignedtoo.
Selfalignmentmakesaccessfasterbecauseitfacilitatesgeneratingsingleinstructionfetches
andputsofthetypeddata.Withoutalignmentconstraints,ontheotherhand,thecodemight
enduphavingtodotwoormoreaccessesspanningmachinewordboundaries.Charactersare
aspecialcasetheyreequallyexpensivefromanywheretheyliveinsideasinglemachine
word.Thatswhytheydonthaveapreferredalignment.
Isaid"onmodernprocessors"becauseonsomeolderonesforcingyourCprogramtoviolate
alignmentrules(say,bycastinganoddaddressintoanintpointerandtryingtouseit)didnt
justslowyourcodedown,itcausedanillegalinstructionfault.Thiswasthebehavior,for
example,onSunSPARCchips.Infact,withsufficientdeterminationandtheright(e18)
hardwareflagsetontheprocessor,youcanstilltriggerthisonx86.
Also,selfalignmentisnottheonlypossiblerule.Historically,someprocessors(especially
thoselackingbarrelshifters)havehadmorerestrictiveones.Ifyoudoembeddedsystems,you
mighttripoveroneoftheselurkingintheunderbrush.Beawarethisispossible.

4.Padding
http://www.catb.org/esr/structurepacking/

2/13

09/09/2016

TheLostArtofCStructurePacking

Nowwelllookatasimpleexampleofvariablelayoutinmemory.Considerthefollowing
seriesofvariabledeclarationsinthetoplevelofaCmodule:
char*p
charc
intx
Ifyoudidntknowanythingaboutdataalignment,youmightassumethatthesethree
variableswouldoccupyacontinuousspanofbytesinmemory.Thatis,ona32bitmachine4
bytesofpointerwouldbeimmediatelyfollowedby1byteofcharandthatimmediately
followedby4bytesofint.Anda64bitmachinewouldbedifferentonlyinthatthepointer
wouldbe8bytes.
Hereswhatactuallyhappens(onanx86orARMoranythingelsewithselfalignedtypes).
Thestorageforpstartsonaselfaligned4or8byteboundarydependingonthemachine
wordsize.Thisispointeralignmentthestrictestpossible.
Thestorageforcfollowsimmediately.Butthe4bytealignmentrequirementofxforcesagap
inthelayoutitcomesoutasthoughtherewereafourthinterveningvariable,likethis:
char*p/*4or8bytes*/
charc/*1byte*/
charpad[3]/*3bytes*/
intx/*4bytes*/
Thepad[3]characterarrayrepresentsthefactthattherearethreebytesofwastespaceinthe
structure.Theoldschooltermforthiswas"slop".Thevalueofthepaddingbitsisundefined
inparticularitisnotguaranteedthattheywillbezeroed.
Comparewhathappensifxisa2byteshort:
char*p
charc
shortx
Inthatcase,theactuallayoutwillbethis:
char*p/*4or8bytes*/
charc/*1byte*/
charpad[1]/*1byte*/
shortx/*2bytes*/
Ontheotherhand,ifxisalongona64bitmachine
char*p
charc
longx
weendupwiththis:

http://www.catb.org/esr/structurepacking/

3/13

09/09/2016

TheLostArtofCStructurePacking

char*p/*8bytes*/
charc/*1byte
charpad[7]/*7bytes*/
longx/*8bytes*/
Ifyouhavebeenfollowingcarefully,youareprobablynowwonderingaboutthecasewhere
theshortervariabledeclarationcomesfirst:
charc
char*p
intx
Iftheactualmemorylayoutwerewrittenlikethis
charc
charpad1[M]
char*p
charpad2[N]
intx
whatcanwesayaboutMandN?
First,inthiscaseNwillbezero.Theaddressofx,comingrightafterp,isguaranteedtobe
pointeraligned,whichisneverlessstrictthanintaligned.
ThevalueofMislesspredictable.Ifthecompilerhappenedtomapctothelastbyteofa
machineword,thenextbyte(thefirstofp)wouldbethefirstbyteofthenextoneand
properlypointeraligned.Mwouldbezero.
Itismorelikelythatcwillbemappedtothefirstbyteofamachineword.InthatcaseMwill
bewhateverpaddingisneededtoensurethatphaspointeralignment3ona32bitmachine,
7ona64bitmachine.
Intermediatecasesarepossible.Mcanbeanythingfrom0to7(0to3on32bit)becausea
charcanstartonanybyteboundaryinamachineword.
Ifyouwantedtomakethosevariablestakeuplessspace,youcouldgetthateffectbyswapping
xwithcintheoriginalsequence.
char*p/*8bytes*/
longx/*8bytes*/
charc/*1byte
Usually,forthesmallnumberofscalarvariablesinyourCprograms,bummingoutthefew
bytesyoucangetbychangingtheorderofdeclarationwontsaveyouenoughtobesignificant.
Thetechniquebecomesmoreinterestingwhenappliedtononscalarvariablesespecially
structs.
Beforewegettothose,letsdisposeofarraysofscalars.Onaplatformwithselfalignedtypes,
arraysofchar/short/int/long/pointerhavenointernalpaddingeachmemberis
automaticallyselfalignedattheendofthenextone.
http://www.catb.org/esr/structurepacking/

4/13

09/09/2016

TheLostArtofCStructurePacking

Inthenextsectionwewillseethatthesameisnotnecessarilytrueofstructurearrays.

5.Structurealignmentandpadding
Ingeneral,astructinstancewillhavethealignmentofitswidestscalarmember.Compilersdo
thisastheeasiestwaytoensurethatallthemembersareselfalignedforfastaccess.
Also,inCtheaddressofastructisthesameastheaddressofitsfirstmemberthereisno
leadingpadding.Beware:inC++,classesthatlooklikestructsmaybreakthisrule!(Whether
theydoornotdependsonhowbaseclassesandvirtualmemberfunctionsareimplemented,
andvariesbycompiler.)
(Whenyoureindoubtaboutthissortofthing,ANSICprovidesanoffsetof()macrowhichcan
beusedtoreadoutstructurememberoffsets.)
Considerthisstruct:
structfoo1{
char*p
charc
longx
}
Assuminga64bitmachine,anyinstanceofstructfoo1willhave8bytealignment.The
memorylayoutofoneoftheselooksunsurprising,likethis:
structfoo1{
char*p/*8bytes*/
charc/*1byte
charpad[7]/*7bytes*/
longx/*8bytes*/
}
Itslaidoutexactlyasthoughvariablesofthesetypeshasbeenseparatelydeclared.Butifwe
putcfirst,thatsnolongertrue.
structfoo2{
charc/*1byte*/
charpad[7]/*7bytes*/
char*p/*8bytes*/
longx/*8bytes*/
}
Ifthememberswereseparatevariables,ccouldstartatanybyteboundaryandthesizeofpad
mightvary.Becausestructfoo2hasthepointeralignmentofitswidestmember,thatsno
longerpossible.Nowchastobepointeraligned,andfollowingpaddingof7bytesislocked
in.
Nowletstalkabouttrailingpaddingonstructures.Toexplainthis,Ineedtointroduceabasic
conceptwhichIllcallthestrideaddressofastructure.Itisthefirstaddressfollowingthe
structuredatathathasthesamealignmentasthestructure.
http://www.catb.org/esr/structurepacking/

5/13

09/09/2016

TheLostArtofCStructurePacking

Thegeneralruleoftrailingstructurepaddingisthis:thecompilerwillbehaveasthoughthe
structurehastrailingpaddingouttoitsstrideaddress.Thisrulecontrolswhatsizeof()will
return.
Considerthisexampleona64bitx86orARMmachine:
structfoo3{
char*p/*8bytes*/
charc/*1byte*/
}
structfoo3singleton
structfoo3quad[4]
Youmightthinkthatsizeof(structfoo3)shouldbe9,butitsactually16.Thestride
addressisthatof(&p)[2].Thus,inthequadarray,eachmemberhas7bytesoftrailing
padding,becausethefirstmemberofeachfollowingstructwantstobeselfalignedonan8
byteboundary.Thememorylayoutisasthoughthestructurehadbeendeclaredlikethis:
structfoo3{
char*p/*8bytes*/
charc/*1byte*/
charpad[7]
}
Forcontrast,considerthefollowingexample:
structfoo4{
shorts/*2bytes*/
charc/*1byte*/
}
Becausesonlyneedstobe2bytealigned,thestrideaddressisjustonebyteafterc,and
structfoo4asawholeonlyneedsonebyteoftrailingpadding.Itwillbelaidoutlikethis:
structfoo4{
shorts/*2bytes*/
charc/*1byte*/
charpad[1]
}
andsizeof(structfoo4)willreturn4.
Heresalastimportantdetail:Ifyourstructurehasstructuremembers,theinnerstructswant
tohavethealignmentoflongestscalartoo.Supposeyouwritethis:
structfoo5{
charc
structfoo5_inner{
char*p
shortx
}inner
http://www.catb.org/esr/structurepacking/

6/13

09/09/2016

TheLostArtofCStructurePacking

}inner
}
Thechar*pmemberintheinnerstructforcestheouterstructtobepointeralignedaswell
astheinner.Actuallayoutwillbelikethisona64bitmachine:
structfoo5{
charc/*1byte*/
charpad1[7]/*7bytes*/
structfoo5_inner{
char*p/*8bytes*/
shortx/*2bytes*/
charpad2[6]/*6bytes*/
}inner
}
Thisstructuregivesusahintofthesavingsthatmightbepossiblefromrepackingstructures.
Of24bytes,13ofthemarepadding.Thatsmorethan50%wastespace!

6.Bitfields
Nowletsconsiderbitfields.Whattheygiveyoutheabilitytodoisdeclarestructurefieldsof
smallerthancharacterwidth,downto1bit,likethis:
structfoo6{
shorts
charc
intflip:1
intnybble:4
intseptet:7
}
Thethingtoknowaboutbitfieldsisthattheyareimplementedwithwordandbytelevel
maskandrotateinstructionsoperatingonmachinewords,andcannotcrosswordboundaries.
C99guarenteesthatbitfieldswillbepackedastightlyaspossible,providedtheydontcross
storageunitboundaries(6.7.2.1#10).
Assumingwereona32bitmachine,thatimpliesthatthelayoutmaylooklikethis:
structfoo6{
shorts/*2bytes*/
charc/*1byte*/
intflip:1/*total1bit*/
intnybble:4/*total5bits*/
intpad1:3/*padtoan8bitboundary*/
intseptet:7/*7bits*/
intpad2:25/*padto32bits*/
}
Butthisisnttheonlypossibility,becausetheCstandarddoesnotspecifythatbitsare
allocatedlowtohigh.Sothelayoutcouldlooklikethis:

http://www.catb.org/esr/structurepacking/

7/13

09/09/2016

TheLostArtofCStructurePacking

structfoo6{
shorts/*2bytes*/
charc/*1byte*/
intpad1:3/*padtoan8bitboundary*/
intflip:1/*total1bit*/
intnybble:4/*total5bits*/
intpad2:25/*padto32bits*/
intseptet:7/*7bits*/
}
Thatis,thepaddingcouldprecederatherthanfollowingthepayloadbits.
Notealsothat,aswithnormalstructurepadding,thepaddingbitsarenotguaranteedtobe
zeroC99mentionsthis.
Notethatthebasetypeofabitfieldisinterpretedforsignednessbutnotnecessarilyforsize.
Itisuptoimplementorswhether"shortflip:1"or"longflip:1"aresupported,andwhether
thosebasetypeschangethesizeofthestorageunitthefieldispackedinto.
ProceedwithcautionandcheckwithWpaddedifyouhaveitavailable(e.g.underclang).
Compilersonexotichardware,mightinterprettheC99rulesinsurprisingwaysolder
compilersmightnotquitefollowthem.
Therestrictionthatbitfieldscannotcrossmachinewordboundariesmeansthat,whilethe
firsttwoofthefollowingstructurespackintooneandtwo32bitwordsasyoudexpect,the
third(structfoo9)takesupthree32bitwords,inthelastofwhichonlyonebitisused.
structfoo7{
intbigfield:31/*32bitword1begins*/
intlittlefield:1
}
structfoo8{
intbigfield1:31/*32bitword1begins/*
intlittlefield1:1
intbigfield2:31/*32bitword2begins*/
intlittlefield2:1
}
structfoo9{
intbigfield1:31/*32bitword1begins*/
intbigfield2:31/*32bitword2begins*/
intlittlefield1:1
intlittlefield2:1/*32bitword3begins*/
}
Ontheotherhand,structfoo8wouldfitintoasingle64bitwordifthemachinehas
those.

7.Structurereordering
Nowthatyouknowhowandwhycompilersinsertpaddinginandafteryourstructureswell
examinewhatyoucandotosqueezeouttheslop.Thisistheartofstructurepacking.
http://www.catb.org/esr/structurepacking/

8/13

09/09/2016

TheLostArtofCStructurePacking

Thefirstthingtonoticeisthatsloponlyhappensintwoplaces.Oneiswherestorageboundto
alargerdatatype(withstricteralignmentrequirements)followsstorageboundtoasmaller
one.Theotheriswhereastructnaturallyendsbeforeitsstrideaddress,requiringpaddingso
thenextonewillbeproperlyaligned.
Thesimplestwaytoeliminateslopistoreorderthestructuremembersbydecreasing
alignment.Thatis:makeallthepointeralignedsubfieldscomefirst,becauseona64bit
machinetheywillbe8bytes.Thenthe4byteintsthenthe2byteshortsthenthecharacter
fields.
So,forexample,considerthissimplelinkedliststructure:
structfoo10{
charc
structfoo10*p
shortx
}
Withtheimpliedslopmadeexplicit,hereitis:
structfoo10{
charc/*1byte*/
charpad1[7]/*7bytes*/
structfoo10*p/*8bytes*/
shortx/*2bytes*/
charpad2[6]/*6bytes*/
}
Thats24bytes.Ifwereorderbysize,wegetthis:
structfoo11{
structfoo11*p
shortx
charc
}
Consideringselfalignment,weseethatnoneofthedatafieldsneedpadding.Thisisbecause
thestrideaddressfora(longer)fieldwithstricteralignmentisalwaysavalidlyalignedstart
addressfora(shorter)fieldwithlessstrictrequirements.Alltherepackedstructactually
requiresistrailingpadding:
structfoo11{
structfoo11*p/*8bytes*/
shortx/*2bytes*/
charc/*1byte*/
charpad[5]/*5bytes*/
}
Ourrepacktransformationdropsthesizefrom24to16bytes.Thismightnotseemlikealot,
butsupposeyouhavealinkedlistof200Kofthese?Thesavingsaddupfastespeciallyon
memoryconstrainedembeddedsystemsorinthecorepartofanOSkernelthathastostay
resident.
http://www.catb.org/esr/structurepacking/

9/13

09/09/2016

TheLostArtofCStructurePacking

Notethatreorderingisnotguaranteedtoproducesavings.Applyingthistechniquetoan
earlierexample,structfoo9,wegetthis:
structfoo12{
structfoo12_inner{
char*p/*8bytes*/
intx/*4bytes*/
}inner
charc/*1byte*/
}
Withpaddingwrittenout,thisis
structfoo12{
structfoo12_inner{
char*p/*8bytes*/
intx/*4bytes*/
charpad[4]/*4bytes*/
}inner
charc/*1byte*/
charpad[7]/*7bytes*/
}
Itsstill24bytesbecauseccannotbackintotheinnerstructstrailingpadding.Tocollectthat
gainyouwouldneedtoredesignyourdatastructures.
SinceshippingthefirstversionofthisguideIhavebeenaskedwhy,ifreorderingforminimal
slopissosimple,Ccompilersdontdoitautomatically.Theanswer:Cisalanguageoriginally
designedforwritingoperatingsystemsandothercodeclosetothehardware.Automatic
reorderingwouldinterferewithasystemsprogrammersabilitytolayoutstructuresthat
exactlymatchthebyteandbitlevellayoutofmemorymappeddevicecontrolblocks.

8.Awkwardscalarcases
Usingenumeratedtypesinsteadof#definesisagoodidea,ifonlybecausesymbolicdebuggers
havethosesymbolsavailableandcanshowthemratherthanrawintegers.But,whileenums
areguaranteedtobecompatiblewithanintegraltype,theCstandarddoesnotspecifywhich
underlyingintegraltypeistobeusedforthem.
Beawarewhenrepackingyourstructsthatwhileenumeratedtypevariablesareusuallyints,
thisiscompilerdependenttheycouldbeshorts,longs,orevencharsbydefault.Your
compilermayhaveapragmaorcommandlineoptiontoforcethesize.
Thelongdoubletypeisasimilartroublespot.SomeCplatformsimplementthisin80bits,
somein128,andsomeofthe80bitplatformspaditto96or128bits.
Inbothcasesitsbesttousesizeof()tocheckthestoragesize.
Finally,underx86Linuxdoublesaresometimesanexceptiontotheselfalignmentrulean8
bytedoublemayrequireonly4bytealignmentwithinastructeventhoughstandalone
doublesvariableshave8byteselfalignment.Thisdependsoncompilerandoptions.
http://www.catb.org/esr/structurepacking/

10/13

09/09/2016

TheLostArtofCStructurePacking

9.Readabilityandcachelocality
Whilereorderingbysizeisthesimplestwaytoeliminateslop,itsnotnecessarilytheright
thing.Therearetwomoreissues:readabilityandcachelocality.
Programsarenotjustcommunicationstoacomputer,theyarecommunicationstoother
humanbeings.Codereadabilityisimportanteven(orespecially!)whentheaudienceofthe
communicationisonlyyourfutureself.
Aclumsy,mechanicalreorderingofyourstructurecanharmreadability.Whenpossible,itis
bettertoreorderfieldssotheyremainincoherentgroupswithsemanticallyrelatedpiecesof
datakeptclosetogether.Ideally,thedesignofyourstructureshouldcommunicatethedesign
ofyourprogram.
Whenyourprogramfrequentlyaccessesastructure,orpartsofastructure,itishelpfulfor
performanceiftheaccessestendtofitwithinacachelinethememoryblockfetchedbyyour
processorwhenitistoldtogetanysingleaddresswithintheblock.On64bitx86acacheline
is64bytesbeginningonaselfalignedaddressonotherplatformsitisoften32bytes.
Thethingsyoushoulddotopreservereadabilitygroupingrelatedandcoaccesseddatain
adjacentfieldsalsoimprovecachelinelocality.Thesearebothreasonstoreorder
intelligently,withawarenessofyourcodesdataaccesspatterns.
Ifyourcodedoesconcurrentaccesstoastructurefrommultiplethreads,theresathirdissue:
cachelinebouncing.Tominimizeexpensivebustraffic,youshouldarrangeyourdatasothat
readscomefromonecachelineandwritesgotoanotherinyourtighterloops.
Andyes,thissometimescontradictsthepreviousguidanceaboutgroupingrelateddatainthe
samecachelinesizedblock.Multithreadingishard.Cachelinebouncingandother
multithreadoptimizationissuesareveryadvancedtopicswhichdeserveanentiretutorialof
theirown.ThebestIcandohereismakeyouawarethattheseissuesexist.

10.Otherpackingtechniques
Reorderingworksbestwhencombinedwithothertechniquesforslimmingyourstructures.If
youhaveseveralbooleanflagsinastruct,forexample,considerreducingthemto1bit
bitfieldsandpackingthemintoaplaceinthestructurethatwouldotherwisebeslop.
Youlltakeasmallaccesstimepenaltyforthisbutifitsqueezestheworkingsetenough
smaller,thatpenaltywillbeswampedbyyourgainsfromavoidedcachemisses.
Moregenerally,lookforwaystoshortendatafieldsizes.Incvsfastexport,forexample,one
squeezeIappliedwastousetheknowledgethatRCSandCVSrepositoriesdidntexistbefore
1982.Idroppeda64bitUnixtime_t(zerodateatthebeginningof1970)fora32bittime
offsetfrom19820101T00:00:00thiswillcoverdatesto2118.(Note:ifyoupullatricklike
this,doaboundscheckwheneveryousetthefieldtopreventnastybugs!)
Eachsuchfieldshorteningnotonlydecreasestheexplicitsizeofyourstructure,itmayremove
slopand/orcreateadditionalopportunitiesforgainsfromfieldreordering.Virtuouscascades
ofsucheffectsarenotveryhardtotrigger.

http://www.catb.org/esr/structurepacking/

11/13

09/09/2016

TheLostArtofCStructurePacking

Theriskiestformofpackingistouseunions.Ifyouknowthatcertainfieldsinyourstructure
areneverusedincombinationwithcertainotherfields,considerusingauniontomakethem
sharestorage.Butbeextracarefulandverifyyourworkwithregressiontesting,becauseif
yourlifetimeanalysisisevenslightlywrongyouwillgetbugsrangingfromcrashesto(much
worse)subtledatacorruption.

11.Overridingalignmentrules
Sometimesyoucancoerceyourcompilerintonotusingtheprocessorsnormalalignment
rulesbyusingapragma,usually#pragmapack.GCCandclanghaveanattributepackedyou
canattachtoindividualstructuredeclarationsGCChasanfpackstructoptionforentire
compilations.
Donotdothiscasually,asitforcesthegenerationofmoreexpensiveandslowercode.Usually
youcansaveasmuchmemory,oralmostasmuch,withthetechniquesIdescribehere.
Theonlygoodreasonfor#pragmapackisifyouhavetoexactlymatchyourCdatalayoutto
somekindofbitlevelhardwareorprotocolrequirement,likeamemorymappedhardware
port,andviolatingnormalalignmentisrequiredforthattowork.Ifyoureinthatsituation,
andyoudontalreadyknoweverythingelseImwritingabouthere,youreindeeptroubleand
Iwishyouluck.

12.Tools
TheclangcompilerhasaWpaddedoptionthatcausesittogeneratemessagesabout
alignmentholesandpadding.Someversionsalsohaveanundocumentedfdumprecord
layoutsoptionthatyieldsmoreinformation.
Ihavenotuseditmyself,butseveralrespondentsspeakwellofaprogramcalledpahole.
Thistoolcooperateswithacompilertoproducereportsonyourstructuresthatdescribe
padding,alignment,andcachelineboundaries.
Ivereceivedareportthataproprietarycodeaudingtoolcalled"PVSStudio"candetect
structurepackingopportunities.

13.Proofandexceptionalcases
Youcandownloadsourcecodeforalittleprogramthatdemonstratestheassertionsabout
scalarandstructuresizesmadeabove.Itispacktest.c.
Ifyoulookthroughenoughstrangecombinationsofcompilers,options,andunusual
hardware,youwillfindexceptionstosomeoftherulesIhavedescribed.Theygetmore
commonasyougobackintimetoolderprocessordesigns.
Thenextlevelbeyondknowingtheserulesisknowinghowandwhentoexpectthattheywill
bebroken.IntheyearswhenIlearnedthem(theearly1980s)wespokeofpeoplewhodidnt
getthisasvictimsof"alltheworldsaVAXsyndrome".Rememberthatnotalltheworldisa
PC.

14.RelatedReading
http://www.catb.org/esr/structurepacking/

12/13

09/09/2016

TheLostArtofCStructurePacking

ThissectionexiststocollectpointerstoessaysonotheradvancedCtopicswhichIjudgetobe
goodcompanionstothisone.
AGuidetoUndefinedBehaviorinCandC++
Time,Clock,andCalendarProgrammingInC

15.Versionhistory
1.14@20151219
Typocorrection:WpaddingWpadded.
1.13@20151123
Beexplicitaboutpaddingbitsbeingundefined.Moreaboutbitfields.
1.12@20151111
MajorrevisionofsectiononbitfieldsreflectingC99rules.
1.11@20150723
Mentiontheclangfdumprecordlayoutsoption.
1.10@20150220
Mentionattributepacked,fpackstruct,andPVSStudio.
1.9@20141001
Addedlinkto"Time,Clock,andCalendarProgrammingInC".
1.8@20140520
Improvedexplanationforthebitfieldexamples,
1.7@20140517
Correctaminorerrorinthedescriptionofthelayoutofstructfoo8.
1.6@20140514
Emphasizethatbitfieldscannotcrosswordboundaries.IdeafromDaleGulledge.
1.5@20140113
Explainwhystructurememberreorderingisnotdoneautomatically.
1.4@20140104
Anoteaboutdoubleunderx86Linux.
1.3@20140103
Newsectionsonawkwardscalarcases,readabilityandcachelocality,andtools.
1.2@20140102
Correctanerroneousaddresscalculation.
1.1@20140101
Explainwhyalignedaccessesarefaster.Mentionoffsetof.Variousminorfixes,including
thepacktest.cdownloadlink.
1.0@20140101
Initialrelease.

http://www.catb.org/esr/structurepacking/

13/13

You might also like