You are on page 1of 6

Next:CopyingASubsetofFiles,Up:WorkedExamples

10.1DeletingFiles
Oneofthemostcommontasksthatfindisusedforislocatingfilesthatcanbedeleted.Thismightinclude:
Fileslastmodifiedmorethan3yearsagowhichhaven'tbeenaccessedforatleast2years
Filesbelongingtoacertainuser
Temporaryfileswhicharenolongerrequired
Thisexampleconcentratesontheactualdeletiontaskratherthanonsophisticatedwaysoflocatingthefilesthatneedtobedeleted.We'll
assumethatthefileswewanttodeleteareoldfilesunderneath/var/tmp/stuff.
10.1.1TheTraditionalWay
Thetraditionalwaytodeletefilesin/var/tmp/stuffthathavenotbeenmodifiedinover90dayswouldhavebeen:
find/var/tmp/stuffmtime+90exec/bin/rm{}\;

Theabovecommandusesexectorunthe/bin/rmcommandtoremoveeachfile.Thisapproachworksandinfactwouldhaveworkedin
Version7Unixin1979.However,thereareanumberofproblemswiththisapproach.
Themostobviousproblemwiththeapproachaboveisthatitcausesfindtoforkeverytimeitfindsafilethatneedstodelete,andthechild
processthenhastousetheexecsystemcalltolaunch/bin/rm.Allthisisquiteinefficient.Ifwearegoingtouse/bin/rmtodothisjob,itis
bettertomakeitdeletemorethanonefileatatime.
Themostobviouswayofdoingthisistousetheshell'scommandexpansionfeature:
/bin/rm`find/var/tmp/stuffmtime+90print`

oryoucouldusethemoremodernform
/bin/rm$(find/var/tmp/stuffmtime+90print)

Thecommandsabovearemuchmoreefficientthanthefirstattempt.However,thereisaproblemwiththem.Theshellhasamaximum
commandlengthwhichisimposedbytheoperatingsystem(theactuallimitvariesbetweensystems).Thismeansthatwhilethecommand
expansiontechniquewillusuallywork,itwillsuddenlyfailwhentherearelotsoffilestodelete.Sincethetaskistodeleteunwantedfiles,
thisispreciselythetimewedon'twantthingstogowrong.

10.1.2MakingUseofxargs
So,isthereawaytobemoreefficientintheuseoffork()andexec()withoutrunningupagainstthislimit?Yes,wecanbealmostoptimally
efficientbymakinguseofthexargscommand.Thexargscommandreadsargumentsfromitsstandardinputandbuildsthemintocommand
lines.Wecanuseitlikethis:
find/var/tmp/stuffmtime+90print|xargs/bin/rm

Forexampleifthefilesfoundbyfindare/var/tmp/stuff/A,/var/tmp/stuff/Band/var/tmp/stuff/Cthenxargsmightissuethecommands
/bin/rm/var/tmp/stuff/A/var/tmp/stuff/B
/bin/rm/var/tmp/stuff/C

Theaboveassumesthatxargshasaverysmallmaximumcommandlinelength.Thereallimitismuchlargerbuttheideaisthatxargswill
run/bin/rmasmanytimesasnecessarytogetthejobdone,giventhelimitsoncommandlinelength.
Thisusageofxargsisprettyefficient,andthexargscommandiswidelyimplemented(allmodernversionsofUnixofferit).Sofarthen,the
newsisallgood.However,thereisbadnewstoo.
10.1.3Unusualcharactersinfilenames
UnixlikesystemsallowanycharacterstoappearinfilenameswiththeexceptionoftheASCIINULcharacterandtheslash.Slashescan
occurinpathnames(asthedirectoryseparator)butnotinthenamesofactualdirectoryentries.Thismeansthatthelistoffilesthatxargs
readscouldinfactcontainwhitespacecharactersspaces,tabsandnewlinecharacters.Sincebydefault,xargsassumesthatthelistoffiles
itisreadinguseswhitespaceasanargumentseparator,itcannotcorrectlyhandlethecasewhereafilenameactuallyincludeswhitespace.
Thismakesthedefaultbehaviourofxargsalmostuselessforhandlingarbitrarydata.
Tosolvethisproblem,GNUfindutilsintroducedtheprint0actionfor find.ThisusestheASCIINULcharactertoseparatetheentriesin
thefilelistthatitproduces.Thisistheidealchoiceofseparatorsinceitistheonlycharacterthatcannotappearwithinapathname.The0
optiontoxargsmakesitassumethatargumentsareseparatedwithASCIINULinsteadofwhitespace.Italsoturnsoffanothermisfeaturein
thedefaultbehaviourofxargs,whichisthatitpaysattentiontoquotecharactersinitsinput.Someversionsofxargsalsoterminatewhen
theyseealone_intheinput,butGNUfindnolongerdoesthat(sinceithasbecomeanoptionalbehaviourintheUnixstandard).
So,puttingfindprint0togetherwithxargs0wegetthiscommand:
find/var/tmp/stuffmtime+90print0|xargs0/bin/rm

Theresultisanefficientwayofproceedingthatcorrectlyhandlesallthepossiblecharactersthatcouldappearinthelistoffilestodelete.
Thisisgoodnews.However,thereis,asI'msureyou'reexpecting,alsomorebadnews.Theproblemisthatthisisnotaportableconstruct
althoughotherversionsofUnix(notablyBSDderivedones)supportprint0,it'snotuniversal.So,isthereamoreuniversalmechanism?

10.1.4Goingbacktoexec
Thereisindeedamoreuniversalmechanism,whichisaslightmodificationtotheexecaction.Thenormalexecactionassumesthat
thecommandtorunisterminatedwithasemicolon(thesemicolonnormallyhastobequotedinordertoprotectitfrominterpretationasthe
shellcommandseparator).TheSVR4editionofUnixintroducedaslightvariation,whichinvolvesterminatingthecommandwith+
instead:
find/var/tmp/stuffmtime+90exec/bin/rm{}\+

Theaboveuseofexeccausesfindtobuildupalongcommandlineandthenissueit.Thiscanbelessefficientthansomeusesofxargs
forexamplexargsallowsnewcommandlinestobebuiltupwhilethepreviouscommandisstillexecuting,andallowsyoutospecifya
numberofcommandstoruninparallel.However,thefind...exec...+constructhastheadvantageofwideportability.GNUfindutils
didnotsupportexec...+untilversion4.2.12oneofthereasonsforthisisthatitalreadyhadthe print0actioninanycase.
10.1.5Amoresecureversionofexec
Thecommandaboveseemstobeefficientandportable.However,withinitlurksasecurityproblem.Theproblemissharedwithallthe
commandswe'vetriedinthisworkedexamplesofar,too.Thesecurityproblemisaraceconditionthatis,ifitispossibleforsomebodyto
manipulatethefilesystemthatyouaresearchingwhileyouaresearchingit,itispossibleforthemtopersuadeyourfindcommandtocause
thedeletionofafilethatyoucandeletebuttheynormallycannot.
TheproblemoccursbecausetheexecactionisdefinedbythePOSIXstandardtoinvokeitscommandwiththesameworkingdirectoryas
findhadwhenitwasstarted.Thismeansthattheargumentswhichreplacethe{}includearelativepathfromfind'sstartingpointdownthe
filethatneedstobedeleted.Forexample,
find/var/tmp/stuffmtime+90exec/bin/rm{}\+

mightactuallyissuethecommand:
/bin/rm/var/tmp/stuff/A/var/tmp/stuff/B/var/tmp/stuff/passwd

Noticethefile/var/tmp/stuff/passwd.Likewise,thecommand:
cd/var/tmp&&findstuffmtime+90exec/bin/rm{}\+

mightactuallyissuethecommand:
/bin/rmstuff/Astuff/Bstuff/passwd

Ifanattackercanrenamestufftosomethingelse(makinguseoftheirwritepermissionsin/var/tmp)theycanreplaceitwithasymbolic

linkto/etc.Thatmeansthatthe/bin/rmcommandwillbeinvokedon/etc/passwd.Ifyouarerunningyourfindcommandasroot,the
attackerhasjustmanagedtodeleteavitalfile.Alltheyneededtodotoachievethiswasreplaceasubdirectorywithasymboliclinkatthe
vitalmoment.
Thereishowever,asimplesolutiontotheproblem.Thisisanactionwhichworksalotlikeexecbutdoesn'tneedtotraverseachainof
directoriestoreachthefilethatitneedstoworkon.Thisistheexecdiraction,whichwasintroducedbytheBSDfamilyofoperating
systems.Thecommand,
find/var/tmp/stuffmtime+90execdir/bin/rm{}\+

mightdeleteasetoffilesbyperformingtheseactions:
1.Changedirectoryto/var/tmp/stuff/foo
2.Invoke/bin/rm./file1./file2./file3
3.Changedirectoryto/var/tmp/stuff/bar
4.Invoke/bin/rm./file99./file100./file101
Thisisamuchmoresecuremethod.Wearenolongerexposedtoaracecondition.Formanytypicalusesoffind,thisisthebeststrategy.
It'sreasonablyefficient,butthelengthofthecommandlineislimitednotjustbytheoperatingsystemlimits,butalsobyhowmanyfileswe
actuallyneedtodeletefromeachdirectory.
Isitpossibletodoanybetter?Inthecaseofgeneralfileprocessing,no.However,inthespecificcaseofdeletingfilesitisindeedpossibleto
dobetter.
10.1.6Usingthedeleteaction
Themostefficientandsecuremethodofsolvingthisproblemistousethedeleteaction:
find/var/tmp/stuffmtime+90delete

Thisalternativeismoreefficientthananyoftheexecorexecdiractions,sinceitentirelyavoidstheoverheadofforkinganewprocess
andusingexectorun/bin/rm.Itisalsonormallymoreefficientthanxargsforthesamereason.Thefiledeletionisperformedfromthe
directorycontainingtheentrytobedeleted,sothedeleteactionhasthesamesecurityadvantagesastheexecdiractionhas.
ThedeleteactionwasintroducedbytheBSDfamilyofoperatingsystems.
10.1.7Improvingthingsstillfurther
Isitpossibletoimprovethingsstillfurther?Notwithouteithermodifyingthesystemlibrarytotheoperatingsystemorhavingmorespecific

knowledgeofthelayoutofthefilesystemanddiskI/Osubsystem,orboth.
Thefindcommandtraversesthefilesystem,readingdirectories.Itthenissuesaseparatesystemcallforeachfiletobedeleted.Ifwecould
modifytheoperatingsystem,therearepotentialgainsthatcouldbemade:
Wecouldhaveasystemcalltowhichwepassmorethanonefilenamefordeletion
Alternatively,wecouldpassinalistofinodenumbers(onGNU/Linuxsystems,readdir()alsoreturnstheinodenumberofeach
directoryentry)tobedeleted.
Theabovepossibilitiessoundinteresting,butfromthekernel'spointofviewitisdifficulttoenforcestandardUnixaccesscontrolsforsuch
processingbyinodenumber.Suchafacilitywouldprobablyneedtoberestrictedtothesuperuser.
Anotherwayofimprovingperformancewouldbetoincreasetheparallelismoftheprocess.Forexampleifthedirectoryhierarchyweare
searchingisactuallyspreadacrossanumberofdisks,wemightsomehowbeabletoarrangeforfindtoprocesseachdiskinparallel.In
practiceGNUfinddoesn'thavesuchanintimateunderstandingofthesystem'sfilesystemlayoutanddiskI/Osubsystem.
However,sincethesystemadministratorcanhavesuchanunderstandingtheycantakeadvantageofitlikeso:
find/var/tmp/stuff1mtime+90delete&
find/var/tmp/stuff2mtime+90delete&
find/var/tmp/stuff3mtime+90delete&
find/var/tmp/stuff4mtime+90delete&
wait

Intheexampleabove,fourseparateinstancesoffindareusedtosearchfoursubdirectoriesinparallel.Thewaitcommandsimplywaitsfor
allofthesetocomplete.Whetherthisapproachismoreorlessefficientthanasingleinstanceoffinddependsonanumberofthings:
Arethedirectoriesbeingsearchedinparallelactuallyonseparatedisks?Ifnot,thisparallelsearchmightjustresultinalotofdisk
headmovementandsothespeedmightevenbeslower.
Otheractivityareotherprogramsalsodoingthingsonthosedisks?
10.1.8Conclusion
Thefastestandmostsecurewaytodeletefileswiththehelpoffindistousedelete.Usingxargs0PNcanalsomakeeffectiveuseof
thedisk,butitisnotassecure.
Inthecasewherewe'redoingthingsotherthandeletingfiles,themostsecurealternativeisexecdir...+,butthisisnotasportableas
theinsecureactionexec...+.
Thedeleteactionisnotcompletelyportable,buttheonlyotherpossibilitywhichisassecure(execdir)isnomoreportable.Themost

efficientportablealternativeisexec...+,butthisisinsecureandisn'tsupportedbyversionsofGNUfindutilspriorto4.2.12.

You might also like