You are on page 1of 17

2015/7/18

BuildingManifoldCF

Apache>ManifoldCF>ReleaseDocumentation>release>release2.1>en_US

SearchthesitewithSolr
Search
PoweredbyLucidWorks
LastPublished:05/05/201508:23:01

Building ManifoldCF
BuildingManifoldCF
Buildingoverview
BuildingtheframeworkandtheconnectorsusingApacheAnt
BuildingandtestingthelegacyAlfrescoconnector
BuildingandtestingtheAlfrescoWebscriptconnector
BuildingandrunningtheDocumentumconnector
BuildingandrunningtheFileNetconnector
BuildingandrunningtheJDBCconnector,includingOracle,MSSQL,MySQL,SQLServer,andSybaseJDBCdrivers
BuildingandrunningthejCIFS/WindowsSharesconnector
BuildingandrunningtheLiveLinkconnector
BuildingtheMeridioconnector
BuildingandrunningtheSharePointconnector
RunningtheApacheSolroutputconnector
RunningtheElasticSearchoutputconnector
BuildingtheframeworkandtheconnectorsusingApacheMaven
Preparation
Howtobuild
BuildingManifoldCF'sApache2plugin
RunningManifoldCF
Overview
Binaryorganization
Exampledeployments
Quickstartsingleprocessmodel
Singleprocessdeployablewar
Simplifiedmultiprocessmodelusingfilebasedsynchronization
SimplifiedmultiprocessmodelusingZooKeeperbasedsynchronization
Commanddrivenmultiprocessmodel
Theconnectors.xmlconfigurationfile
Runningconnectorspecificprocesses
Databaseselection
ConfiguringaPostgreSQLdatabase
ConfiguringaMySQLdatabase
ConfiguringanHSQLDBdatabase
TheManifoldCFconfigurationfiles
properties.xmlfileproperties
Loggingconfigurationfileproperties
RunningtheManifoldCFApache2plugin
ConfiguringtheManifoldCFApache2plugin
RunningManifoldCFwithApacheMaven
IntegratingManifoldCFintoanotherapplication
IntegratingtheQuickStartexample
Integratingamultiprocesssetup
IntegratingManifoldCFwithasearchengine

Building ManifoldCF
ManifoldCFconsistsofaframework,asetofconnectors,andanoptionalApache2pluginmodule.Thesecanbebuiltasfollows.
Building overview
TherearetwowaystobuildManifoldCF.Theprimarymeansofbuilding(andthemostsupported)isviaApacheAnt.Theantbuildis
usedtocreateManifoldCFreleasesandtoruntests,loadtests,andUItests.Mavenisalsosupportedfordevelopbuildingonly.Maven
ManifoldCFbuildshavemanyrestrictionsandchallengesandareofsecondarypriorityforthedevelopmentteam.
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

1/17

2015/7/18

BuildingManifoldCF

TheManifoldCFframeworkisbuiltwithoutanydependenciesonconnectorcode.Itconsistsofasetofjars,afamilyofwebapplications,
andanumberofjavacommandclasses.Connectorsarethenbuiltthathavewelldefineddependenciesontheframeworkmodules.A
properlybuiltconnectortypicallyconsistsof:
Oneormorejarfilesmeanttobeincludedinthelibraryareameantforconnectorjarsandtheirdependencies.
Possiblysomejavacommands,whicharemeanttosupportorconfiguretheconnectorinsomeway.
Possiblyaconnectorspecificprocessortwo,eachrequiringadistinctclasspath,whichusuallyservestoisolatethecrawlerui
servlet,authorityserviceservlet,agentsprocess,andanycommandsfromproblematicaspectsoftheclientenvironment
Arecommendedsetofjava"define"variables,whichshouldbeusedconsistentlywithallinvolvedprocesses,e.g.theagents
process,theapplicationserverrunningtheauthorityserviceandcrawlerui,andanycommands.(Thisishistorical,andno
connectorsasofthiswritinghaveanyoftheseanylonger).
Anindividualconnectorpackagewilltypicallysupplyanoutputconnector,oratransformationconnector,oramappingconnector,ora
repositoryconnector,orsometimesbotharepositoryconnectorandanauthorityconnector.Themainantbuildscriptautomaticallyforms
eachindividualconnector'scontributiontotheoverallsystemintotheoverallpackage.
Building the framework and the connectors using Apache Ant
TobuildtheManifoldCFframeworkcode,andtheparticularconnectorsyouareinterestedin,youcurrentlyneedtodothefollowing:
1. Checkoutthedesiredreleasefromhttps://svn.apache.org/repos/asf/manifoldcf/tags,orunpackthedesiredsourcedistribution.
2. cdtothetopleveldirectory.
3. EITHER:overlaythelibdirectoryfromthecorrespondinglibdistribution(preferred,wherepossible),ORrun"antmakecore
deps"tobuildthecodedependencies.Thelatteristheonlypossibilityifyouarebuildingfromtrunk,butitisnotguaranteedtowork
forolderreleases.
4. Run"antmakedeps",todownloadLGPLandotheropensourcebutnonApachecompatiblelibraries.
5. Installproprietarybuilddependencies.Seebelowfordetails.
6. Run"antbuild".
7. Installdesireddependentproprietarylibraries.Seebelowfordetails.
Ifyoudonotruntheant"makedeps"target,andyousupplyNOLGPLorproprietarylibraries,notallcapabilitiesofManifoldCFwillbe
available.Theframeworkitselfandthefollowingrepositoryconnectorswillbebuilt:
AlfrescoWebscriptconnector
CMISconnector
EMCDocumentumconnector,builtagainstaDocumentumAPIstub
DropBoxconnector
Emailconnector
FileNetconnector,builtagainstaFileNetAPIstub
WGETcompatiblefilesystemconnector
GenericXMLrepositoryconnector
GoogleDriveconnector
GridFSconnector(mongoDB)
HDFSconnector
JDBCconnector,withthejustthePOSTGRESQLjdbcdriver
AtlassianJiraconnector
OpenTextLiveLinkconnector,builtagainstaLiveLinkAPIstub
Meridioconnector,builtagainstmodifiedMeridioAPIWSDLsandXSDs
RSSconnector
MicrosoftSharePointconnector,builtagainstSharePointAPIWSDLs
Webcrawlerconnector
Wikiconnector
Thefollowingauthorityconnectorswillbebuilt:
ActiveDirectoryauthority
AlfrescoWebscriptauthority
CMISauthority
EMCDocumentumauthority
AtlassianJiraauthority
LDAPauthority
OpenTextLiveLinkauthority
Meridioauthority,builtagainstmodifiedMeridioAPIWSDLsandXSDs
Nullauthority
MicrosoftSharePoint/ADauthority
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

2/17

2015/7/18

BuildingManifoldCF

MicrosoftSharePoint/Nativeauthority,builtagainstSharePointAPIWSDLs
Thefollowingoutputconnectorswillbebuilt:
WGETcompatiblefilesystemoutputconnector
MetaCartaGTSoutputconnector
ApacheSolroutputconnector
OpenSearchServeroutputconnector
ElasticSearchoutputconnector
WGETcompatiblefilesystemoutputconnector
HDFSoutputconnector
Nulloutputconnector
Thefollowingtransformationconnectorswillbebuilt:
Fieldmappingtransformationconnector
Documentfiltertransformationconnector
Nulltransformationconnector
Tikaextractortransformationconnector
Thefollowingmappingconnectorswillbebuilt:
Regularexpressionmappingconnector
ThedependenciesandbuildlimitationsofeachindividualLGPLandproprietaryconnectorisdescribedinseparatesectionsbelow.
Building and testing the legacy Alfresco connector

ThelegacyAlfrescoconnectorrequirestheAlfrescoWebServicesClientprovidedbyAlfrescoinordertobebuilt.Placethisjarintothe
directoryconnectors/alfresco/libproprietarybeforeyoubuild.Thiswilloccurautomaticallyifyouexecutetheanttarget"makedeps"
fromtheManifoldCFrootdirectory.
Torunintegrationtestsfortheconnectoryouhavetocopythealfresco.warincludingH2supportcreatedbytheMavenmoduletest
materials/alfresco4war(using"mvnpackage"insidethefolder)intotheconnectors/alfresco/testmaterialsproprietaryfolder.Then
usethe"anttest"or"mvnintegrationtest"forthestandardbuildtoexecuteintegrationtests.
Building and testing the Alfresco Webscript connector

TheAlfrescoWebscriptconnectorisbuiltagainstanopensourceAlfrescoIndexerclient,whichrequiresacorrespondingAlfrescoIndexer
plugintobeinstalledonyourAlfrescoinstance.ThisAlfrescoIndexerpluginisincludedwithManifoldCFdistributions.Installationofthe
pluginshouldfollowthestandardAlfrescoinstallationsteps,asdescribedhere.Seethispageforconfigurationdetails,andfortheplugin
itself.
Building and running the Documentum connector

TheDocumentumconnectorrequiresEMC'sDFCproductinordertoberun,butisbuiltagainstasetofstubclasses.Thestubsmimicthe
classstructureofDFC6.0.IfyourDFCisnewer,itispossiblethattheclassstructureoftheDFCclassesmighthavechanged,andyou
mayneedtobuildtheconnectoryourself.
IfyouneedtosupplyDFCclassesduringbuildtime,copytheDFCanddependentjarstothesourcedirectory
connectors/documentum/libproprietary,andbuildusing"antbuild".Thejarswillbecopiedintotherightplaceinyourdistdirectory
automatically.
Forabinarydistribution,justcopytheDFCjarstoprocesses/documentumserver/libproprietaryinstead.
Ifyouhavedoneeverythingright,youshouldbeabletostarttheDocumentumconnector'sregistryandserverprocesses,asperthe
instructions.
Building and running the FileNet connector

TheFileNetconnectorrequiresIBM'sFileNetP8APIjarinordertoberun,butisusuallybuiltagainstasetofstubclasses.Thestubs
mimictheclassstructureofFileNetP8API4.5.IfyourFileNetisnewer,itispossiblethattheclassstructureoftheAPImighthave
changed,andyoumayneedtobuildtheconnectoryourself.
IfyouneedtosupplyyourownJace.jaratbuildtime,copyittothesourcedirectoryconnectors/filenet/libproprietary,andbuildusing
"antbuild".TheJace.jarwillbecopiedintotherightplaceinyourdistdirectoryautomatically.
Ifyoudonotwishtobuild,simplycopyyourJace.jarandtheotherdependentjarsfromthatinstallationintothedistributiondirectory
processes/filenetserver/libproprietary.
Ifcorrectlydone,youwillbeabletostarttheFileNetconnector'sregistryandserverprocesses,aspertheinstructions.
Building and running the JDBC connector, including Oracle, MSSQL, MySQL, SQLServer, and Sybase JDBC drivers

TheJDBCconnectoralsoknowshowtoworkwithOracle,SQLServer,andSybaseJDBCdrivers.Inordertosupportthesedatabases,start
byplacingthemysqlconnectorjava.jarandthejtds.jarinthelibproprietarydirectory.Theanttarget"makedeps"willdothisforyou
automatically.ForOracle,downloadtheappropriateOracleJDBCjarfromtheOraclesite,andcopyitintothesamedirectorybeforeyou
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

3/17

2015/7/18

BuildingManifoldCF

buildManifoldCF.
Building and running the jCIFS/Windows Shares connector

Tobuildthisconnector,youneedtodownloadjcifs.jarfromhttp://jcifs.samba.org,andcopyitintotheconnectors/jcifs/libproprietary
directorybeforebuilding.Youcanalsojusttype"antmakedeps"fromtherootManifoldCFdirectoryandthisstepwillbedoneforyou.
Ifyouhavedownloadedabinarydistribution,placethejcifs.jarintotheconnectorlibproprietarydirectory,anduncommentthe
WindowsShareslineintheconnectors.xmlfile.
Building and running the LiveLink connector

ThisconnectorneedsOpenText'sLAPIpackageinordertoberun.Itisusuallybuiltagainstasetofstubs.Thestubs,however,mimicthe
classstructureofLAPI9.7.1.Laterversions(suchas10.x)haveadifferentclassstructure.Therefore,youmayneedtorebuildManifoldCF
againstyourlapi.jar,inorderfortheconnectortoworkproperly.
Ifyouneedtosupplyyourownlapi.jarandllssl.jaratbuildtime,copyittothesourcedirectoryconnectors/livelink/libproprietary,and
buildusing"antbuild".Thelapi.jarwillbecopiedintotherightplaceinyourdistdirectoryautomatically.
Ifyoudonotwishtobuild,simplycopyyourlapi.jarandllssl.jarintothebinarydistribution'sconnectorlibproprietarydirectory,and
uncommenttheLiveLinkrelatedconnectorlinesinconnectors.xmlfile.
Building the Meridio connector

TheMeridioconnectorgeneratesinterfaceclassesusingcheckedinwsdlsandxsdsoriginallyobtainedfromaninstalledMeridioinstance
usingdisco.exe,andsubsequentlymodifiedtoworkaroundlimitationsinApacheAxis.Thedisco.exeutilityisinstalledaspartof
MicrosoftVisualStudio,andistypicallyfoundunder"c:\ProgramFiles\MicrosoftSDKs\Windows\V6.x\bin".Ifdesired,youcanobtain
unmodifiedwsdlsandxsdsbyinterrogatingthefollowingMeridiowebservices:
http[s]://<meridio_server>/DMWS/MeridioDMWS.asmx
http[s]://<meridio_server>/RMWS/MeridioRMWS.asmx
Building and running the SharePoint connector

TheSharePointconnectorgeneratesinterfaceclassesusingcheckedinwsdlsoriginallyobtainedfromaninstalledMicrosoftSharePoint
instanceusingdisco.exe.Thedisco.exeutilityisinstalledaspartofMicrosoftVisualStudio,andistypicallyfoundunder"c:\Program
Files\MicrosoftSDKs\Windows\V6.x\bin".Ifdesired,youcanobtainunmodifiedwsdlsbyinterrogatingthefollowingSharePointweb
services:
http[s]://<server_name>/_vti_bin/Permissions.asmx
http[s]://<server_name>/_vti_bin/Lists.asmx
http[s]://<server_name>/_vti_bin/Dspsts.asmx
http[s]://<server_name>/_vti_bin/usergroup.asmx
http[s]://<server_name>/_vti_bin/versions.asmx
http[s]://<server_name>/_vti_bin/webs.asmx
Important:ForSharePointinstancesversion3.0(2007)orhigher,inordertoruntheconnector,youalsomustdeployacustom
SharePointwebserviceontheSharePointinstanceyouintendtoconnectto.ThisisrequiredbecauseMicrosoftoverlookedsupportfor
webservicebasedaccesstofileandfoldersecurityinformationwhenSharePoint2007wasreleased.ForSharePointversion4.0(2010),
theserviceisevenmorecritical,becausebackwardscompatibilitywasnotmaintainedandwithoutthisservicenocrawlingcanoccur.
SharePointversion5.0(2013)alsorequiresapluginalthoughitsfunctionalityisthesameasforSharePoint4.0,thebinaryyouinstallis
builtagainstSharePoint2013resourcesratherthanSharePoint2010resources,sothereisadifferentdistribution.
Theversionsofthisservicecanbefoundinthedistributiondirectoryplugins/sharepoint.Picktheversionappropriateforyour
SharePointinstallation,andinstallitfollowingtheinstructionsinthefileInstallationReadme.txt,foundinthecorresponding
directory.
Running the Apache Solr output connector

TheApacheSolroutputconnectorrequiresnospecialattentiontobuildorrunwithinManifoldCF.However,inorderforApacheSolrtobe
abletoenforcedocumentsecurity,youmustinstallandproperlyconfigureapluginforSolr.ThispluginisavailableforbothSolr3.xand
forSolr4.x,andcanbeusedeitherasaqueryparserplugin,orasasearchcomponent.Additionalindexfieldsarealsorequiredtocontain
thenecessarysecurityinformation.MuchmoreinformationcanbefoundintheREADME.txtfileinthepluginsthemselves.
Thecorrectversionsofthepluginsareincludedintheplugins/solrdirectoryofthemainManifoldCFdistribution.Youcanalsodownload
updatedversionsofthepluginsfromtheManifoldCFdownloadpage.Thecompatibilitymatrixisasfollows:

0.1.x1.4.x
1.5.x
>=1.6.x

ApacheManifoldCFSolr3.xandSolr4.xplugincompatibility
ManifoldCFversions
Pluginversion
0.x
1.x
2.x

IftheproperversionofthepluginisnotdeployedonSolr,documentswillnotbeproperlysecured.Thus,itisessentialto
verifythattheproperpluginversionhasbeendeployedfortheversionofManifoldCFyouareusing.
Running the ElasticSearch output connector

http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

4/17

2015/7/18

BuildingManifoldCF

TheElasticSearchoutputconnectorrequiresnospecialattentiontobuildorrunwithinManifoldCF.However,inorderforElasticSearchto
beabletoenforcedocumentsecurity,youmustinstall,properlyconfigure,andcodeagainstatoolkitpluginforElasticSearch.Additional
indexfieldsarealsorequiredtocontainthenecessarysecurityinformation.MuchmoreinformationcanbefoundintheREADME.txtfile
inthepluginitself.
Thecorrectversionsofthepluginisincludedintheplugins/elasticsearchdirectoryofthemainManifoldCFdistribution.Youcanalso
downloadupdatedversionsofthepluginfromtheManifoldCFdownloadpage.Thecompatibilitymatrixisasfollows:

0.1.x1.4.x
1.5.x
>=1.6.x

ApacheManifoldCFElasticSearchplugincompatibility
ManifoldCFversions
0.x
1.x
2.x

Pluginversion

Iftheproperversionofthepluginisnotdeployedandproperlyintegrated,documentswillnotbeproperlysecured.
Thus,itisessentialtoverifythattheproperpluginversionhasbeendeployedfortheversionofManifoldCFyouareusing.
ToworkwithManifoldCF,yourElasticSearchinstancemustalsoincludetheappropriateindexescreatedaswell.Herearesomesimple
stepsforcreatinganElasticSearchindex,usingtheCURLutility:
%curlXPUT'http://localhost:9200/manifoldcf'
%curlXPUT'http://localhost:9200/manifoldcf/attachment/_mapping'd'
{
"attachment":{
"_source":{
"excludes":["file"]
},
"properties":{
"allow_token_document":{
"type":"string"
},
"allow_token_parent":{
"type":"string"
},
"allow_token_share":{
"type":"string"
},
"attributes":{
"type":"string"
},
"createdOn":{
"type":"string"
},
"deny_token_document":{
"type":"string"
},
"deny_token_parent":{
"type":"string"
},
"deny_token_share":{
"type":"string"
},
"lastModified":{
"type":"string"
},
"shareName":{
"type":"string"
},
"file":{
"type":"attachment",
"path":"full",
"fields":{
"file":{
"store":true,
"term_vector":"with_positions_offsets",
"type":"string"
}
}
}
}
}
}'

Thiscommandcreatesanindexcalledmanifoldcfwithamappingnamedattachmentwhichhassomegenericfieldsforaccess
tokensandafieldfilewhichmakesuseoftheElasticSearchattachmentmapperplugin.Itisconfiguredforhighlighting
("term_vector":"with_positions_offsets").
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

5/17

2015/7/18

BuildingManifoldCF

Thefollowingpartisusefulfornotsavingthesourcejsonontheindexwhichreducestheindexsizesignificantly.Beawarethatyou
shouldn'tdothisifyouwillneedtoreindexdataontheElasticSearchsideoryouneedaccesstothewholedocument:
"_source":{
"excludes":["file"]
},

Building the framework and the connectors using Apache Maven


ManifoldCFincludessomesupportforbuildingjarsunderMaven.ApacheAntisconsideredtobeManifoldCF'sprimarybuildsystem,so
yourmileagewithMavenmayvary.
ThebiggestlimitationofthecurrentMavenbuildisthatitdoesnotsupportanyoftheproprietaryconnectorsorthemultiprocessmodel
ofexecution.ThebuildincludesonlytheApachelicensedandLGPLlicensedconnectors,thusavoidingconditionalcompilation,and
executesunderMavenusingonlytheQuickStartexample.
Acannedversionofallconfigurationfilesareincludedasresources.Ifyouwanttochangetheconfigurationinanyway,youwillneedto
rebuildwithMavenaccordingly.
Preparation

Nospecialpreparationisrequired,otherthantohaveaccesstotheApacheMavenrepository.
How to build

Buildingisstraightforward.IntheManifoldCFroot,type:
mvncleaninstall

Thisshouldgenerateallthenecessaryartifactstorunwith,andalsoruntheHsqldbbasedtests.
Tobuildandskiponlytheintegrationtests,type:
mvncleaninstallDskipITs

WhenyouhavethedefaultpackageinstalledlocallyinyourMavenrepository,toonlybuildManifoldCFartifacts,type:
mvncleanpackage

NOTE:DuetocurrentlimitationsintheManifoldCFMavenpoms,youMUSTrunacomplete"mvncleaninstall"asthefirststep.You
cannotskipsteps,orthebuildwillfail.
Building ManifoldCF's Apache2 plugin
Tobuildthemodauthzannotateplugin,youneedtostartwithaUnixsystemthathastheapache2developmenttoolsinstalledonit,plus
thecurldevelopmentpackage(fromhttp://curl.haxx.seorelsewhere).Then,cdtomodauthzannotate,andtype"make".Thebuildwill
produceafilecalledmodauthzannotate.so,whichshouldbecopiedtotheappropriateApache2directorysoitcanbeusedasaplugin.

Running ManifoldCF
Overview
ManifoldCFconsistsofseveralcomponents.Theseareenumeratedbelow:
Adatabase,whichiswhereManifoldCFkeepsallofitsconfigurationandstateinformation,usuallyPostgreSQL
Asynchronizationdirectory,whichhowManifoldCFcoordinatesactivityamongitsvariousprocesses
Anagentsprocess,whichistheprocessthatactuallycrawlsdocumentsandingeststhem
Acrawleruiservlet,whichpresentstheUIusersinteractwithtoconfigureandcontrolthecrawler
Anauthorityserviceservlet,whichrespondstorequestsforauthorizationtokens,givenausername
Anapiserviceservlet,whichrespondstoRESTAPIrequests
Theseunderlyingcomponentscanbepackagedinmanyways.Forexample,thethreeservletscanbedeployedinseparatewarfieldsas
separatewebapplications.Onemayalsodeployallthreeservletsinonecombinedwebapplication,andalsoincludetheagentsprocess.
Binary organization
WhetheryoubuildManifoldCFyourself,ordownloadabinarydistribution,youwillneedtoknowwhatiswhatinthebuildresult.Ifyou
buildManifoldCFyourself,thebinarybuildresultcanbefoundinthesubdirectorydist.Inabinarydistribution,thecontentsofthe
distributionarethecontentsofthedistdirectory.Thesecontentsaredescribedbelow.
distfile/directory
connectors.xml
connectorlib
connectorlib
proprietary

Distributiondirectoriesandfiles
Meaning
anxmlfiledescribingtheconnectorsthatshouldberegistered
jarsforalltheconnectors,referredtobyproperties.xml
proprietaryjarsforalltheconnectors,referredtobyproperties.xmlnotincludedinbinaryrelease

http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

6/17

2015/7/18

BuildingManifoldCF

obfuscationutility
lib
libproprietary
processes
scriptengine
example

autilitytoobfuscatepasswords,forinclusioninproperties.xmlfields
jarsforalloftheexamples,referencedbytheexamplescripts
proprietaryjarsforalloftheexamples,referencedbytheproprietaryexamplescripts
scripts,classpathjars,andDswitchvaluesneededfortherequiredconnectorspecificprocesses
jarsandscriptsforrunningtheManifoldCFscriptinterpreter
ajettybasedexamplethatrunsinasingleprocess(exceptforanyconnectorspecificprocesses),excludingall
proprietarylibraries
exampleproprietary
ajettybasedexamplethatrunsinasingleprocess(exceptforanyconnectorspecificprocesses),including
proprietarylibrariesnotincludedinbinaryrelease
multiprocessfile
scriptsandjarsforanexamplethatusesthemultipleprocessmodelusingfilebasedsynchronization,excluding
example
allproprietarylibraries
multiprocessfile
scriptsandjarsforanexamplethatusesthemultipleprocessmodelusingfilebasedsynchronization,including
exampleproprietary
proprietarylibrariesnotincludedinbinaryrelease
multiprocesszkexample scriptsandjarsforanexamplethatusesthemultipleprocessmodelusingZooKeeperbasedsynchronization,
excludingallproprietarylibraries
multiprocesszk
scriptsandjarsforanexamplethatusesthemultipleprocessmodelusingZooKeeperbasedsynchronization,
exampleproprietary
includingproprietarylibrariesnotincludedinbinaryrelease
web
appserverdeployablewebapplications(wars),excludingallproprietarylibraries
webproprietary
appserverdeployablewebapplications(wars),includingproprietarylibrariesnotincludedinbinaryrelease
doc
javadocsforframeworkandallincludedconnectors
plugins
prebuiltintegrationcomponentstodeployontargetsystems,e.g.forSolr
Ifyoudownloadedthebinarydistribution,youmaynoticethattheconnectorlibproprietarydirectorycontainsonlyanumberof
<connector>README.txtfiles.ThisisbecauseunderApachelicensingrules,incompatiblylicensedjarsmaynotberedistributed.Each
such<connector>README.txtdescribesthejarsthatyouneedtoaddtotheconnectorlibproprietarydirectoryinordertogetthe
correspondingconnectorworking.Youwillalsothenneedtouncommenttheappropriateentriesintheconnectors.xmlfileaccordinglyto
enabletheconnectorforuse.
NOTE:Theprebuiltbinarydistributioncannot,atthistime,includesupportforMySQL.NorcantheJDBCConnectoraccessMySQL,
MSSQL,SyBase,orOracledatabasesinthatdistribution.InordertousetheseJDBCdrivers,youmustbuildManifoldCFyourself.Start
bydownloadingthedriversandplacingtheminthelibproprietarydirectory.Thecommandantdownloaddependencieswilldomostof
thisforyou,withtheexceptionoftheOracleJDBCdriver.
Thedirectorytitledprocessesincludeseparateprocesseswhichmustbestartedinorderfortheassociatedconnectortofunction.The
numberofproducedprocessessubdirectoriesmayvary,becauseoptionalindividualconnectorsmayormaynotsupplyprocessesthat
mustberuntosupporttheconnector.Foreachoftheprocessessubdirectoriesabove,anyscriptsthatpertaintothatconnectorsupplied
processwillbeplacedintherootlevelofthesubdirectory.Thesuppliedscriptsforaprocessgenerallytakecareofbuildinganappropriate
classpathandsettingnecessaryDswitches.(Note:noneofthecurrentconnectorsrequirespecialDswitchesatthistime.)Ifyouneedto
constructaclasspathbyhand,itisimportanttorememberthat"more"isnotnecessarily"better".Theprocessdeploymentstrategyimplied
bythebuildstructurehasbeencarefullythoughtouttoavoidjarconflicts.Indeed,severalconnectorsarestructuredusingmultiple
processespreciselyforthatreason.
Theproprietarylibrariesrequiredbythesecondaryprocessprocessessubdirectoriesshouldbeinthedirectoryprocesses/xxx/lib
proprietary.Thesejarsarenotincludedinthebinarydistribution,andyouwillneedtosupplytheminordertomaketheprocesswork.A
README.txtfileisplacedineachlibproprietarydirectorydescribingwhatneedstobeprovidedthere.
Thepluginsdirectorycontainscomponentsyoumayneedtodeployonthetargetsystemtomaketheassociatedconnectorfunction
correctly.Forexample,theSolrconnectorincludespluginclassesforenforcingManifoldCFsecurityonSolr3.xand4.x.SeetheREADME
fileineachdirectoryfordetailedinstructionsonhowtodeploythecomponents.
Insidetheexampledirectory,youwillfindeverythingyouneedtofireupManifoldCFinasingleprocessmodelunderJetty.Everythingis
includedsothatallyouneedtodoischangetothatdirectory,andstartitusingthecommand<java>jarstart.jar.Thisisdescribedin
moredetaillater,andistherecommendedwayforbeginnerstotryoutManifoldCF.Thedirectoryexampleproprietarycontainsan
equivalentexamplethatincludesproprietaryconnectorsandjars.ThisisthestandardplacetostartifyoubuildManifoldCFyourself.
Example deployments
TherearemanydifferentwaystorunManifoldCFoutofthebox.Theseareenumeratedbelow:
Quickstartsingleprocessmodel
Singleprocessdeployablewar
Simplifiedmultiprocessmodel
Commanddrivenmultiprocessmodel
Eachwayhasadvantagesanddisadvantages.Forexample,singleprocessmodelslimittheflexibilityofdeployingManifoldCF
components.Multiprocessmodelsrequirethatinterprocesssynchronizationbeproperlyconfigured.Ifyouarejuststartingoutwith
ManifoldCF,wesuggestyoutrythequickstartsingleprocessmodelfirst,sincethatistheeasiest.
Quickstart single process model

YoucanrunmostofManifoldCFinasingleprocess,forevaluationandconvenience.ThissingleprocessversionusesJettytohandleits
webapplications,andHsqldbasanembeddeddatabase.AllyouneedtodotorunthisversionofManifoldCFistofollowtheAntbased
buildinstructionsabove,andthen:
cdexample
start[.bat|.sh]

http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

7/17

2015/7/18

BuildingManifoldCF

Inthequickstartmodel,alldatabaseinitializationandconnectorregistrationtakesplaceautomaticallywheneverManifoldCFisstarted
(atthecostofsomestartupdelay).ThecrawlerUIcanbefoundathttp://<host>:8345/mcfcrawlerui.Theauthorityservicecanbe
foundathttp://<host>:8345/mcfauthorityservice/UserACLs.TheprogrammaticAPIisathttp://<host>:8345/mcfapiservice.
YoucanstopthequickstartManifoldCFatanytimeusing^C,orbyusingthescriptstop[.bat|.sh]
BearinmindthatHsqldbisnotasfullfeaturedadatabaseasisPostgreSQL.Thismeansthatanyperformancetestingyoumaydo
againstthequickstartexamplemaynotbeapplicabletoafullinstallation.Furthermore,embeddedHsqldbonlypermitsoneprocessata
timetobeconnectedtoitsdatabases,soyoucannotuseanyoftheManifoldCFcommands(asdescribedbelow)whilethequickstart
ManifoldCFisrunning.
AnothercaveatthatyouwillneedtobeawareofwiththequickstartversionofManifoldCFisthatitinnowayremovestheneedforyouto
runanyseparateprocessesthatindividualconnectorsrequire.Specifically,theDocumentumandFileNetconnectorsrequireprocessesto
beindependentlystartedinordertofunction.Youwillneedtoreadabouttheseconnectorspecificprocessesbelowinordertousethe
correspondingconnectors.Scriptsforrunningtheseprocessescanbefoundinthedirectoriesnamedprocesses/xxx.
Singleprocess deployable war

Underthedistributiondirectoryweb/war,thereisawarfilecalledmcfcombinedservice.war.Thiswebapplicationcontainstheexact
samefunctionalityasthequickstartexample,butbundledupasasinglewarinstead.Anexamplescriptisprovidedtorunthisweb
applicationunderJetty.Youcanexecutethescriptasfollows:
cdexample
startcombined[.sh|.bat]

ThecombinedwebservicepresentsthecrawlerUIattherootpathforthewebapplication,whichishttp://<host>:8345/mcf/.The
authorityservicefunctionalitycanbefoundathttp://<host>:8345/mcf/UserACLs,similartothequickstartexample.However,the
programmaticAPIservicehasapathotherthantheroot:http://<host>:8345/mcf/api/.
Thescriptthatstartsthecombinedservicewebapplicationusesthesamedatabaseinstance(Hsqldbbydefault)asdoesthequickstart,
andthesameproperties.xmlfile.Thesamecaveatsaboutrequiredindividualconnectorprocessesalsoapplyastheydoforthequickstart
example.
Running singleprocess combined war example using Tomcat

InordertoruntheManifoldCFsingleprocesscombinedwarexampleunderTomcat,youwillneedtotakethefollowingsteps:
1. ModifytheTomcatstartupscript,orusetheTomcatserviceadministrationclient,tosetaJava"Dorg.apache.manifoldcf.configfile"
switchtopointtotheexample'sproperties.xmlfile.
2. StartTomcat.
3. Deployandstartthemcfcombinedservicewebapplication,preferablyusingtheTomcatadministrationclient.
Simplified multiprocess model using filebased synchronization

ManifoldCFcanalsobedeployedinasimplifiedmultiprocessmodelwhichusesfilestosynchronizeprocesses.Insidethemultiprocess
fileexampledirectory,youwillfindeverythingyouneedtodothis.(Themultiprocessfileexampleproprietarydirectoryissimilarbut
includesproprietarymaterialandisavailableonlyifyoubuildManifoldCFyourself.)Belowisalistofwhatyouwillfindinthisdirectory.
Filebasedmultiprocessexamplefilesanddirectories
multiprocessfile
Meaning
examplefile/directory
web
Webapplicationsthatshouldbedeployedontomcatortheequivalent,plusrecommendedapplicationserver
Dswitchnamesandvalues
processes
classpathjarsthatshouldbeincludedintheclasspathforallnonconnectorspecificprocesses,alongwith
Dswitches,usingthesameconventionasdescribedfortomcat,above
properties.xml
anexampleManifoldCFconfigurationfile,intherightplaceforthemultiprocessscripttofindit
logging.ini
anexampleManifoldCFloggingconfigurationfile,intherightplacefortheproperties.xmltofindit
syncharea
anexampleManifoldCFsynchronizationdirectory,whichmustbewritableinorderformultiprocess
ManifoldCFtowork
logs
wheretheManifoldCFlogsgetwrittento
startdatabase[.sh|.bat]
scripttostarttheHSQLDBdatabase
initialize[.sh|.bat]
scripttocreatethedatabaseinstance,createalldatabasetables,andregisterconnectors
startwebapps[.sh|.bat]
scripttostartJettywiththeManifoldCFwebapplicationsdeployed
startagents[.sh|.bat]
scripttostartthe(first)agentsprocess
startagents2[.sh|.bat]
scripttostartasecondagentsprocess
stopagents[.sh|.bat]
scripttostopallrunningagentsprocessescleanly
lockclean[.sh|.bat]
scripttocleanupdirtylocks(runonlywhenallwebappsandprocessesarestopped)
Initializing the database and running

Ifyourunthefilebasedmultiprocessmodel,afteryoufirststartthedatabase(usingstartdatabase[.sh|.bat]),youwillneedtoinitialize
thedatabasebeforeyoustarttheagentsprocessorusethecrawlerUI.Todothis,allyouneedtodoisruntheinitialize[.sh|.bat]script.
Then,youwillneedtostartthewebapplications(usingstartwebapps[.sh|.bat])andtheagentsprocess(usingstartagents[.sh|.bat]),
andoptionallythesecondagentsprocess(usingstartagents2[.sh|.bat]).
Running multiprocess filebased example using Tomcat

InordertoruntheManifoldCFmultiprocessfilebasedexampleunderTomcat,youwillneedtotakethefollowingsteps:
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

8/17

2015/7/18

BuildingManifoldCF

1. Startthedatabase(usingstartdatabase[.sh|.bat])
2. Initializethedatabase(usinginitialize[.sh|.bat])
3. Starttheagentsprocess(usingstartagents[.sh|.bat],andoptionallystartagents2[.sh|.bat])
4. ModifytheTomcatstartupscript,orusetheTomcatserviceadministrationclient,tosetaJava"Dorg.apache.manifoldcf.configfile"
switchtopointtotheexample'sproperties.xmlfile.
5. StartTomcat.
6. Deployandstartthemcfcrawlerui,mcfauthorityservice,andmcfapiservicewebapplications,preferablyusingtheTomcat
administrationclient.
Simplified multiprocess model using ZooKeeperbased synchronization

ManifoldCFcanbedeployedinasimplifiedmultiprocessmodelwhichusesApacheZooKeepertosynchronizeprocesses.Insidethe
multiprocesskzexampledirectory,youwillfindeverythingyouneedtodothis.(Themultiprocesszkexampleproprietarydirectoryis
similarbutincludesproprietarymaterialandisavailableonlyifyoubuildManifoldCFyourself.)Belowisalistofwhatyouwillfindinthis
directory.
ZooKeeperbasedmultiprocessexamplefilesanddirectories
multiprocesszk
Meaning
examplefile/directory
web
Webapplicationsthatshouldbedeployedontomcatortheequivalent,plusrecommendedapplicationserver
Dswitchnamesandvalues
processes
classpathjarsthatshouldbeincludedintheclasspathforallnonconnectorspecificprocesses,alongwith
Dswitches,usingthesameconventionasdescribedfortomcat,above
properties.xml
anexampleManifoldCFconfigurationfile,intherightplaceforthemultiprocessscripttofindit
propertiesglobal.xml
anexampleManifoldCFsharedconfigurationfile,intherightplaceforthesetglobalpropertiesscripttofind
it
logging.ini
anexampleManifoldCFloggingconfigurationfile,intherightplacefortheproperties.xmltofindit
zookeeper
theexampleZooKeeperstoragedirectory,whichmustbewritableinorderforZooKeepertowork
logs
wheretheManifoldCFlogsgetwrittento
runzookeeper[.sh|.bat]
scripttorunaZooKeeperserverinstance
setglobalproperties[.sh|.bat]scripttoinitializeZooKeeperwithpropertiesfrompropertiesglobal.xml
startdatabase[.sh|.bat]
scripttostarttheHSQLDBdatabase
initialize[.sh|.bat]
scripttocreatethedatabaseinstance,createalldatabasetables,andregisterconnectors
startwebapps[.sh|.bat]
scripttostartJettywiththeManifoldCFwebapplicationsdeployed
startagents[.sh|.bat]
scripttostart(thefirst)agentsprocess
startagents2[.sh|.bat]
scripttostartasecondagentsprocess
stopagents[.sh|.bat]
scripttostopallrunningagentsprocessescleanly
Initializing the database and running

IfyouruntheZooKeeperbasedmultiprocessexample,thenyoumustfollowthefollowingsteps:
1. StartZooKeeper(usingtherunzookeeper[.sh|.bat]script)
2. InitializetheManifoldCFsharedconfigurationdata(usingsetglobalproperties[.sh|.bat])
3. Startthedatabase(usingstartdatabase[.sh|.bat])
4. Initializethedatabase(usinginitialize[.sh|.bat])
5. Starttheagentsprocess(usingstartagents[.sh|.bat],andoptionallystartagents2[.sh|.bat])
6. Startthewebapplications(usingstartwebapps[.sh|.bat])
Running multiprocess ZooKeeper example using Tomcat

InordertoruntheManifoldCFZooKeeperexampleunderTomcat,youwillneedtotakethefollowingsteps:
1. StartZooKeeper(usingtherunzookeeper[.sh|.bat]script)
2. InitializetheManifoldCFsharedconfigurationdata(usingsetglobalproperties[.sh|.bat])
3. Startthedatabase(usingstartdatabase[.sh|.bat])
4. Initializethedatabase(usinginitialize[.sh|.bat])
5. Starttheagentsprocess(usingstartagents[.sh|.bat],andoptionallystartagents2[.sh|.bat])
6. ModifytheTomcatstartupscript,orusetheTomcatserviceadministrationclient,tosetaJava"Dorg.apache.manifoldcf.configfile"
switchtopointtotheexample'sproperties.xmlfile.
7. StartTomcat.
8. Deployandstartthemcfcrawlerui,mcfauthorityservice,andmcfapiservicewebapplications,preferablyusingtheTomcat
administrationclient.
Commanddriven multiprocess model

ThemostgenericwayofdeployingManifoldCFinvolvescallingManifoldCFoperationsusingscripts.Thereareanumberofjavaclasses
amongtheManifoldCFclassesthatareintendedtobecalleddirectly,toperformspecificactionsintheenvironmentorinthedatabase.
Theseclassesareusuallyinvokedfromthecommandline,withappropriateargumentssupplied,andarethusconsideredtobe
ManifoldCFcommands.Basicfunctionalitysuppliedbythesecommandclassesisasfollows:
Create/DestroytheManifoldCFdatabaseinstance
Start/Stoptheagentsprocess
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

9/17

2015/7/18

BuildingManifoldCF

Register/Unregisteranagentclass(there'scurrentlyonlyoneincluded)
Register/Unregisteranoutputconnector
Register/Unregisteratransformationconnector
Register/Unregisterarepositoryconnector
Register/Unregisteranauthorityconnector
Register/Unregisteramappingconnector
CleanupsynchronizationdirectorygarbageresultingfromanungracefulinterruptionofanManifoldCFprocess
Queryforcertainkindsofjobrelatedinformation
Individualconnectorsmaycontributeadditionalcommandclassesandprocessestothispicture.
Themultiprocesscommandexecutionscriptsaredeliveredintheprocessessubdirectory.Thescriptforexecutingcommandsis
processes/executecommand[.sh|.bat].Thisscriptrequirestwoenvironmentvariablestobesetbeforeexecution:JAVA_HOME,and
MCF_HOME,whichshouldpointtoManifoldCF'shomeexecutiondirectory,wheretheproperties.xmlfileisfound.)
ThebasicstepsrequiredtosetupandrunManifoldCFincommanddrivenfilebasedmultiprocessmodeareasfollows:
InstallPostgreSQLorMySQL.ThePostgreSQLJDBCdriverincludedwithManifoldCFisknowntoworkwithversion9.1,sothat
versionisthecurrentlyrecommendedone.IfyouwanttouseMySQL,theant"downloaddependencies"buildtargetwillfetchthe
appropriateMySQLJDBCdriver.
Configurethedatabaseforyourenvironmentthedefaultconfigurationisacceptablefortestingandexperimentation.
Createthedatabaseinstance(seecommandsbelow)
Initializethedatabaseinstance(seecommandsbelow)
Registerthepullagent(org.apache.manifoldcf.crawler.system.CrawlerAgent,seebelow)
Registeryourconnectorsandauthorities(seebelow)
InstallaJavaapplicationserver,suchasTomcat.
Deploythewarfilesfromweb/war,exceptformcfcombined.war,toyourapplicationserver(seebelow).
SetthestartingenvironmentvariablesforyourappservertoincludeanyDcommandsfoundinweb/define.TheDcommands
shouldbeoftheform,"D<filename>=<filecontents>".Youwillalsoneeda"Dorg.apache.manifoldcf.configfile=<propertiesfile>"
defineoption,ortheequivalent,intheapplicationserver'sJVMstartupinorderforManifoldCFtobeabletolocateitsconfiguration
file.
Usetheprocesses/executecommand[.bat|.sh]commandfromexecutetheappropriatecommandsfromthenextsectionbelow,being
suretofirstsettheJAVA_HOMEandMCF_HOMEenvironmentvariablesproperly.
Startanysupportingprocessesthatresultfromyourbuild.(SomeconnectorssuchasDocumentumandFileNethaveauxiliary
processesyouneedtoruntomaketheseconnectorsfunctional.)
Startyourapplicationserver.
StarttheManifoldCFagentsprocess.
Atthispoint,youshouldbeabletointeractwiththeManifoldCFUI,whichcanbeaccessedviathemcfcrawleruiwebapplication
Thedetailedlistofcommandsispresentedbelow.
Commands

Afteryouhavecreatedthenecessaryconfigurationfiles,youwillneedtoinitializethedatabase,registerthe"pullagent"agent,andthen
registeryourindividualconnectors.ManifoldCFprovidesasetofcommandsforperformingtheseactions,andothersaswell.Theclasses
implementingthesecommandsarespecifiedbelow.
CoreCommandClass
org.apache.manifoldcf.core.DBCreate
org.apache.manifoldcf.core.DBDrop
org.apache.manifoldcf.core.LockClean
org.apache.manifoldcf.core.Obfuscate

Arguments
dbuser[dbpassword]
dbuser[dbpassword]
None
string

AgentsCommandClass
org.apache.manifoldcf.agents.Install
org.apache.manifoldcf.agents.Uninstall
org.apache.manifoldcf.agents.Register
org.apache.manifoldcf.agents.UnRegister
org.apache.manifoldcf.agents.UnRegisterAll
org.apache.manifoldcf.agents.SynchronizeAll
org.apache.manifoldcf.agents.RegisterOutput
org.apache.manifoldcf.agents.UnRegisterOutput
org.apache.manifoldcf.agents.UnRegisterAllOutputs
org.apache.manifoldcf.agents.SynchronizeOutputs
org.apache.manifoldcf.agents.RegisterTransformation

Function
CreateManifoldCFdatabaseinstance
DropManifoldCFdatabaseinstance
Cleanoutsynchronizationdirectory
Obfuscateastring,foruseasanobfuscatedparametervalue

Arguments
None
None
classname
classname
None
None
classname
description
classname
None
None

classname
description
org.apache.manifoldcf.agents.UnRegisterTransformation
classname
org.apache.manifoldcf.agents.UnRegisterAllTransformationsNone
org.apache.manifoldcf.agents.SynchronizeTransformations None

Function
CreateManifoldCFagentstables
RemoveManifoldCFagentstables
Registeranagentclass
Unregisteranagentclass
Unregisterallcurrentagentclasses
Unregisterallregisteredagentclassesthatcan'tbefound
Registeranoutputconnectorclass
Unregisteranoutputconnectorclass
Unregisterallcurrentoutputconnectorclasses
Unregisterallregisteredoutputconnectorclassesthat
can'tbefound
Registeratransformationconnectorclass
Unregisteratransformationconnectorclass
Unregisterallcurrenttransformationconnectorclasses
Unregisterallregisteredtransformationconnectorclasses

http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

10/17

2015/7/18

org.apache.manifoldcf.agents.AgentRun
org.apache.manifoldcf.agents.AgentStop

BuildingManifoldCF

thatcan'tbefound
Mainagentsprocessclass
Stopstherunningagentsprocess

None
None

CrawlerCommandClass
org.apache.manifoldcf.crawler.Register

Arguments
classname
description
org.apache.manifoldcf.crawler.UnRegister
classname
org.apache.manifoldcf.crawler.UnRegisterAll
None
org.apache.manifoldcf.crawler.SynchronizeConnectorsNone

Function
Registerarepositoryconnectorclass

org.apache.manifoldcf.crawler.ExportConfiguration

Unregisterarepositoryconnectorclass
Unregisterallrepositoryconnectorclasses
Unregisterallregisteredrepositoryconnectorclassesthatcan't
befound
Exportcrawlerconfigurationtoafile

org.apache.manifoldcf.crawler.ImportConfiguration

filename
[passcode]
filename
[passcode]

Importcrawlerconfigurationfromafile

NOTE:ByaddingapasscodeasasecondargumenttotheExportConfigurationcommandclass,theexportedfilewillbeencryptedby
usingtheAESalgorithm.Thiscanbeusefultopreventrepositorypasswordstobestoredincleartext.Inordertousethisfunctionality,
youmustenterasaltvaluetoyourconfigurationfile.Thesamepasscodealongwiththesaltvalueareusedtodecryptthefilewiththe
ImportConfigurationcommandclass.Seethedocumentationforthecommandsandpropertiesabovetofindthecorrectargumentsand
settings.
AuthorizationDomainCommandClass
org.apache.manifoldcf.authorities.RegisterDomain
org.apache.manifoldcf.authorities.UnRegisterDomain

Arguments
domainnamedescription
domainname

UserMappingCommandClass
org.apache.manifoldcf.authorities.RegisterMapper

Arguments
classname
description
org.apache.manifoldcf.authorities.UnRegisterMapper
classname
org.apache.manifoldcf.authorities.UnRegisterAllMappersNone
org.apache.manifoldcf.authorities.SynchronizeMappers None
AuthorityCommandClass
org.apache.manifoldcf.authorities.RegisterAuthority

Arguments
classname
description
org.apache.manifoldcf.authorities.UnRegisterAuthority
classname
org.apache.manifoldcf.authorities.UnRegisterAllAuthoritiesNone
org.apache.manifoldcf.authorities.SynchronizeAuthorities None

Function
Registeranauthorizationdomain
Unregisteranauthorizationdomain

Function
Registeramappingconnectorclass
Unregisteramappingconnectorclass
Unregisterallmappingconnectorclasses
Unregisterallregisteredmappingconnectorclassesthat
can'tbefound
Function
Registeranauthorityconnectorclass
Unregisteranauthorityconnectorclass
Unregisterallauthorityconnectorclasses
Unregisterallregisteredauthorityconnectorclassesthat
can'tbefound

Rememberthatyouneedtoincludeallthejarsundermultiprocessfileexample/processes/libintheclasspathwheneveryourunoneof
thesecommands!But,luckily,therearescriptswhichdothisforyou.Thesecanbefoundinmultiprocessfile
example/processes/executecommand[.sh,.bat].Thescriptsrequiresomeenvironmentvariablestobeset,suchasMCF_HOMEand
JAVA_HOME,andexpecttheconfigurationfiletobefoundatMCF_HOME/properties.xml.
Deploying the mcfcrawlerui, mcfauthorityservice, and mcfapiservice web applications

IfyoubuiltManifoldCFusingant,thentheantbuildwillhaveconstructedfourwarfilesforyouunderweb/war.Youshouldignorethe
mcfcombinedwarinthisdirectoryforthisdeploymentmodel.IfyouintendtorunManifoldCFinmultiprocessmode,youwillneedto
deploytheotherwebapplicationsonyouapplicationserver.Thereisnorequirementthatthemcfcrawlerui,mcfauthorityservice,
andmcfapiservicewebapplicationsbedeployedonthesameinstanceoftheapplicationserver.Withthecurrentarchitectureof
ManifoldCF,theymustbedeployedonthesamephysicalserver,however.
ForeachoftheapplicationserversinvolvedwithManifoldCF,youmustsetthefollowingdefine,sothattheManifoldCFwebapplications
canlocatetheconfigurationfile:
Dorg.apache.manifoldcf.configfile=<configurationfilepath>

Running the agents process

TheagentsprocessistheprocessthatactuallyperformsthecrawlingforManifoldCF.Startthisprocessbyrunningthecommand
"org.apache.manifoldcf.agents.AgentRun".Thisclasswillrununtilstoppedbyinvokingthecommand
"org.apache.manifoldcf.agents.AgentStop".Itishighlyrecommendedthatyoustoptheprocessinthisway.Youmayalsostoptheprocess
usingaSIGTERMsignal,but"kill9"ortheequivalentisNOTrecommended,becausethatmayresultindanglinglocksinthe
ManifoldCFsynchronizationdirectory.(Ifyouhaveto,cleanuptheselocksbyshuttingdownallManifoldCFprocesses,includingthe
applicationserverinstancesthatarerunningthewebapplications,andinvokingthecommand"org.apache.manifoldcf.core.LockClean".)
The connectors.xml configuration file
Thequickstart,combined,andsimplifiedmultiprocesssampledeploymentsofManifoldCFhavetheirownconfigurationfile,called
connectors.xml,whichisusedtoregistertheavailableconnectorsinthedatabase.Thefilehasthisbasicformat:
<?xmlversion="1.0"encoding="UTF8"?>
<connectors>
(clauses)
</connectors>
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

11/17

2015/7/18

BuildingManifoldCF

Thefollowingtagsareavailabletospecifyyourconnectorsandauthorizationdomains:
<repositoryconnectorname="pretty_name"class="connector_class"/>
<authorityconnectorname="pretty_name"class="connector_class"/>
<mappingconnectorname="pretty_name"class="connector_class"/>
<outputconnectorname="pretty_name"class="connector_class"/>
<transformationconnectorname="pretty_name"class="connector_class"/>
<authorizationdomainname="pretty_name"domain="domain_name"/>
Theconnectors.xmlfiletypicallyhassomeconnectorscommentedoutnamelytheonesbuildwithstubswhichrequireyoutosupplya
thirdpartylibraryinorderfortheconnectortorun.IfyoubuildManifoldCFyourself,theexampleproprietaryandmultiprocessfile
exampleproprietaryandmultiprocesszkexampleproprietarydirectoriesinsteaduseconnectorsproprietary.xml.Theconnectorsyou
buildagainsttheproprietarylibrariesyousupplywillnothavetheirconnectorsproprietary.xmltagscommentedout.
Running connectorspecific processes
Connectorspecificprocessesrequiretheclasspathfortheirinvocationtoincludeallthejarsthatareinthecorresponding
processes/<process_name>directory.TheDocumentumandFileNetconnectorsaretheonlytwoconnectorsthatcurrentlyrequire
additionalprocesses.Starttheseprocessesusingthecommandslistedbelow,andstopthemwithSIGTERM(or^C,iftheyarerunningin
ashell).
Connector
Process
Documentumprocesses/documentum
server
Documentumprocesses/documentum
registry
FileNet
processes/filenetserver
FileNet
processes/filenetregistry

Mainclass
Scriptname(relativetodist)
org.apache.manifoldcf.crawler.server.DCTM.DCTM processes/documentum
server/run[.sh|.bat]
org.apache.manifoldcf.crawler.registry.DCTM.DCTMprocesses/documentum
registry/run[.sh|.bat]
org.apache.manifoldcf.crawler.server.filenet.Filenet processes/filenetserver/run[.sh|.bat]
org.apache.manifoldcf.crawler.registry.filenet.Filenet processes/filenetregistry/run[.sh|.bat]

Theregistryprocessinallcasesmustbestartedbeforethecorrespondingserverprocess,ortheserverprocesswillreportanerror.(Itwill,
however,retryaftersomeperiodoftime.)ThescriptsallrequireanMCF_HOMEenvironmentvariablepointingtotheplacewhere
properties.xmlisfound,aswellasaJAVA_HOMEenvironmentvariablepointingtheJDK.Theserverscriptsalsorequireother
environmentvariablesaswell,consistentwiththeneedsoftheDFCortheFileNetAPIrespectively.Forexample,DFCrequiresthe
DOCUMENTUMenvironmentvariabletobeset,whiletheFileNetserverscriptrequirestheWASP_HOMEenvironmentvariable.
Itisimportanttounderstandthatthescriptsworkbybuildingaclasspathoutofalljarsthatgetcopiedintothelibandlibproprietary
directoryunderneatheachprocessduringtheantbuild.Thelibproprietaryjarscannotbedistributedinthebinaryversionof
ManifoldCF,soifyouusethisoptionyouwillstillneedtocopythemthereyourselffortheprocessestorun.IfyoubuildManifoldCF
yourself,thesejarsarecopiedfromthelibproprietarydirectoriesunderneaththedocumentumorfilenetconnectordirectories.Forthe
serverstartupscriptstoworkproperly,thelibproprietarydirectoriesshouldhaveallofthejarsneededtoallowtheapicodetofunction.
Database selection
YouhaveavarietyofopensourcedatabasestochoosefromwhendeployingManifoldCF.Thesupporteddatabaseseachhavetheirown
strengthsandweaknesses,andarelistedbelow:
PostgreSQL(preferred)
MySQL(preferred)
MariaDB(notyetevaluated))
HSQLDB
Youcanselectthedatabaseofyourchoicebysettingtheappropratepropertiesintheapplicableproperties.xmlfile.Thechoiceof
databaseislargelyorthogonaltothechoiceofdeploymentmodel.TheManifoldCFdeploymentexamplesprovidedcanthusbereadily
alteredtousethedatabaseyoudesire.Thedetailsandcaveatsofeachchoiceisdescribedbelow.
Configuring a PostgreSQL database

Despitehavinganinternalarchitecturethatcleanlyabstractsfromspecificdatabasedetails,ManifoldCFiscurrentlyfairlyspecificto
PostgreSQLatthistime.Thereareanumberofreasonsforthis.
ManifoldCFusesthedatabaseforitsdocumentqueue,whichplacesasignificantloadonit.Thebackenddatabaseisthusa
significantfactorinManifoldCF'sperformance.But,inexchange,ManifoldCFbenefitsenormouslyfromtheunderlyingACID
propertiesofthedatabase.
Thestrategyforgettingoptimalqueryplansfromthedatabaseisnotabstracted.Forexample,PostgreSQL8.3+isverysensitiveto
certainstatisticsaboutadatabasetable,andwillnotgenerateaperformantplanifthestatisticsareinaccuratebyevenalittle,in
somecases.So,forPostgreSQL,thedatabasetablemustbeanalyzedveryfrequently,toavoidcatastrophicallybadplans.But
luckily,PostgreSQLisprettygoodatdoinganalysisquickly.Oracle,ontheotherhand,takesaverylongtimetoperformanalysis,
butitsplansaremuchlesssensitive.
PostgreSQLalwaysdoesasequentialscaninordertocountthenumberofrowsinatable,whileotherdatabasesreturnthis
efficiently.ThishasaffectedthedesignoftheManifoldCFUI.
Thechoiceofqueryforminfluencesthequeryplan.Ideally,thisisnottrue,butforbothPostgreSQLandfor(say)Oracle,itis.
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

12/17

2015/7/18

BuildingManifoldCF

PostgreSQLhasahighdegreeofparallelismandlackofinternalsinglethreadedness.
ManifoldCFhasbeentestedagainstversion8.3.7,8.4.5,9.1,9.2,and9.3ofPostgreSQL.Werecommendthefollowingconfiguration
parametersettingstoworkoptimallywithManifoldCF:
AdefaultdatabaseencodingofUTF8
postgresql.confsettingsasdescribedinthetablebelow
pg_hba.confsettingstoallowpasswordaccessforTCP/IPconnectionsfromManifoldCF
Amaintenancestrategyinvolvingcronjobstylevacuuming,ratherthanPostgreSQLautovacuum
Postgresql.confparameters
postgresql.confparameter
standard_conforming_strings
shared_buffers
checkpoint_segments
maintenanceworkmem
tcpip_socket
max_connections
checkpoint_timeout
datestyle
autovacuum

Testedvalue
on
1024MB
300
2MB
true
400
900
ISO,European
off

Notewell:Thestandard_conforming_stringsparametersettingisimportanttopreventanypossibilityofSQLinjectionattacks.While
ManifoldCFusesparameterizedqueriesinalmostallcases,whenitdoesdostringquotingitpresumesthattheSQLstandardforquoting
isadheredto.ItisingeneralgoodpracticetosetthisparameterwhenworkingwithPostgreSQLforthisreason.
A note about PostgreSQL database maintenance

PostgreSQL'sarchitecturecausesittoaccumulatedeadtuplesinitsdatafiles,whichdonotinterferewithitsperformancebutdobloatthe
databaseovertime.TheusagepatternofManifoldCFissuchthatitcancausesignificantbloattooccurtotheunderlyingPostgreSQL
databaseinonlyafewdays,undersufficientload.PostgreSQLhasafeaturetoaddressthisbloat,calledvacuuming.Thiscomesinthree
varieties:autovacuum,manualvacuum,andmanualfullvacuum.
WehavefoundthatPostgreSQL'sautovacuumfeatureisinadequateundersuchconditions,becauseitnotonlyfightsfordatabase
resourcesprettymuchallthetime,butitfallsfurtherandfurtherbehindaswell.PostgreSQL'sinplacemanualvacuumfunctionalityisa
bitbetter,butisstillmuch,muchslowerthanactuallymakinganewcopyofthedatabasefiles,whichiswhathappenswhenamanualfull
vacuumisperformed.
DeadtuplebloatalsooccursinindexesinPostgreSQL,sotablesthathavehadalotofactivitymaybenefitfrombeingreindexedatthe
timeofmaintenance.
Wethereforerecommendperiodic,scheduledmaintenanceoperationsinstead,consistingofthefollowing:
VACUUMFULLVERBOSE
REINDEXDATABASE<the_db_name>
Duringmaintenance,PostgreSQLlockstablesoneatatime.Nevertheless,thecrawleruimaybecomeunresponsiveforsomeoperations,
suchaswhencountingoutstandingdocumentsonthejobstatuspage.ManifoldCFthushastheabilitytocheckfortheexistenceofafile
priortosuchsensitiveoperations,andwilldisplayauseful"maintenanceinprogress"messageifthatfileisfound.Thisallowsauserto
setupamaintenancesystemthatprovidesadequatefeedbackforanManifoldCFuseroftheoverallstatusofthesystem.
Configuring a MySQL database

MySQLisnotquiteasfastasPostgreSQL,butitisarelativelyclosesecondinperformancetests.Nevertheless,theManifoldCFteamdoes
nothavealargeamountofexperiencewiththisdatabaseatthistime.Moredetailswillbeaddedtothissectionasinformationand
experiencebecomesavailable.
Configuring an HSQLDB database

HSQLDB'sperformanceseemscloselytiedtohowmuchofthedatabasecanbeactuallyheldinmemory.Performanceatthistimeisabout
halfthatofPostgreSQL.
HSQLDBcanbeusedwithManifoldCFineitheranembeddedfashion(whichonlyworkswithsingleprocessdeployments),orinexternal
fashion,withadatabaseinstancerunninginaseparateprocess.Seetheproperties.xmlpropertydescriptionsforconfigurationdetails.
The ManifoldCF configuration files
Currently,ManifoldCFrequirestwoconfigurationfiles:themainconfigurationpropertyfile,andtheloggingconfigurationfile.
properties.xml file properties

Theproperties.xmlpropertyfilepathcanbespecifiedbythesystemproperty"org.apache.manifoldcf.configfile".Ifnotspecifiedthrougha
Doperation,itsnameispresumedtobe<user_home>/lcf/properties.xml.TheformofthepropertyfileisXML,ofthefollowingbasic
form:
<?xmlversion="1.0"encoding="UTF8"?>
<configuration>
(clauses)
</configuration>
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

13/17

2015/7/18

BuildingManifoldCF

Theproperties.xmlfileallowspropertiestobespecified.Apropertyclausehastheform:
<propertyname="property_name"value="property_value"/>
Oneoftheoptionalpropertiesisthenameoftheloggingconfigurationfile.Thisproperty'snameis"org.apache.manifoldcf.logconfigfile".If
notpresent,theloggingconfigurationfilewillbeassumedtobe<user_home>/manifoldcf/logging.ini.Theloggingconfigurationfileisa
standardcommonsloggingpropertyfile,andshouldbeformattedaccordingly.
Notethatallpropertiesdescribedbelowcanalsobespecifiedonthecommandline,viaaDswitch.Ifbothmethodsofsettingthe
propertyareused,theDswitchvaluewilloverridethepropertyfilevalue.
Thefollowingtabledescribestheconfigurationpropertyfileproperties,andwhattheydo:
property.xmlproperties
Required?
Function
No
CrawlerUIloginuserID(defaultsto"admin")
No
CrawlerUIloginuserpassword(defaultsto"admin")
No
ObfuscatedcrawlerUIloginuserpassword(defaultsto"admin")
No
APIloginuserID(defaultsto"")
No
APIloginuserpassword(defaultsto"")
No
ObfuscatedAPIloginuserpassword(defaultsto"")
Yes,forJetty LocationofCrawlerUIwar
Yes,forJetty LocationofAuthorityServicewar
Yes,forJetty LocationofAPIServicewar
Yes,forJetty trueforsingleprocessexample,falseformultiprocessexample.
No
Locationofconnectors.xmlfile,forQuickStart,soManifoldCFcan
registerconnectors.
org.apache.manifoldcf.dbsuperusername
No
Databasesuperusername,forQuickStart,soManifoldCFcan
createdatabaseinstance.
org.apache.manifoldcf.dbsuperuserpassword
No
Databasesuperuserpassword,forQuickStart,soManifoldCFcan
createdatabaseinstance.
org.apache.manifoldcf.dbsuperuserpassword.obfuscatedNo
Obfuscateddatabasesuperuserpassword,forQuickStart,so
ManifoldCFcancreatedatabaseinstance.
org.apache.manifoldcf.ui.maxstatuscount
No
ThemaximumnumberofdocumentsManifoldCFwilltrytocount
forthejobstatusdisplay.Defaultsto500000.
org.apache.manifoldcf.databaseimplementationclass
No
Specifiestheclasstousetoimplementdatabaseaccess.Defaultis
abuiltinHsqldbimplementation.Supportedchoicesare:
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL,
org.apache.manifoldcf.core.database.DBInterfaceMySQL,
org.apache.manifoldcf.core.database.DBInterfaceMariaDB,
org.apache.manifoldcf.core.database.DBInterfaceHSQLDB
org.apache.manifoldcf.postgresql.hostname
No
PostgreSQLserverhostname,orlocalhostifnotspecified.
org.apache.manifoldcf.postgresql.port
No
PostgreSQLserverport,orstandardportifnotspecified.
org.apache.manifoldcf.postgresql.ssl
No
Setto"true"forsslcommunicationwithPostgreSQL.
org.apache.manifoldcf.mysql.server
No
TheMySQLorMariaDBservername.Defaultsto'localhost'.
org.apache.manifoldcf.mysql.client
No
TheMySQLorMariaDBclientproperty.Defaultsto'localhost'.
Youmaywanttosetthisto'%'foramultimachinesetup.
org.apache.manifoldcf.hsqldbdatabasepath
No
AbsoluteorrelativepathtoHSQLDBdatabasedefaultis'.'.
org.apache.manifoldcf.hsqldbdatabaseprotocol
Yes,forremote TheHSQLDBJDBCprotocolchoicesare'hsql','http',or'https'.
HSQLDB
Defaultisblank(whichmeansanembeddedinstance)
connection
org.apache.manifoldcf.hsqldbdatabaseserver
Yes,forremote TheHSQLDBremoteservername.
HSQLDB
connection
org.apache.manifoldcf.hsqldbdatabaseport
No
TheHSQLDBremoteserverport.
org.apache.manifoldcf.hsqldbdatabaseinstance
No
TheHSQLDBremotedatabaseinstancename.
org.apache.manifoldcf.lockmanagerclass
No
Specifiestheclasstousetoimplementsynchronization.Defaultis
eitherfilebasedsynchronizationorinmemorysynchronization,
usingtheorg.apache.manifoldcf.core.lockmanager.LockManager
class.Optionsinclude
org.apache.manifoldcf.core.lockmanager.BaseLockManager,
org.apache.manifoldcf.core.FileLockManager,and
org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.
org.apache.manifoldcf.synchdirectory
Yes,iffile
Specifiesthepathofasynchronizationdirectory.AllManifoldCF
based
processownersmusthaveread/writeprivilegestothisdirectory.
synchronization
classis
specified
org.apache.manifoldcf.zookeeper.connectstring
Yes,if
SpecifiestheZooKeeperconnectionstring,consistingofcomma
ZooKeeper
separatedhostname:portpairs.
based
synchronization
classis
specified
org.apache.manifoldcf.zookeeper.sessiontimeout
No
SpecifiestheZooKeepersessiontimeout,if
ZooKeeperLockManagerisspecified.Defaultsto2000.
org.apache.manifoldcf.database.maxhandles
No
Specifiesthemaximumnumberofdatabaseconnectionhandles
Property
org.apache.manifoldcf.login.name
org.apache.manifoldcf.login.password
org.apache.manifoldcf.login.password.obfuscated
org.apache.manifoldcf.login.apiname
org.apache.manifoldcf.login.apipassword
org.apache.manifoldcf.login.apipassword.obfuscated
org.apache.manifoldcf.crawleruiwarpath
org.apache.manifoldcf.authorityservicewarpath
org.apache.manifoldcf.apiservicewarpath
org.apache.manifoldcf.usejettyparentclassloader
org.apache.manifoldcf.connectorsconfigurationfile

http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

14/17

2015/7/18

BuildingManifoldCF

thatwillbypooled.Recommendedvalueis200.
org.apache.manifoldcf.database.handletimeout

No

org.apache.manifoldcf.database.connectiontracking

No

org.apache.manifoldcf.logconfigfile
org.apache.manifoldcf.database.name

No
No

org.apache.manifoldcf.database.username

No

org.apache.manifoldcf.database.password

No

org.apache.manifoldcf.database.password.obfuscated

No

org.apache.manifoldcf.crawler.threads
org.apache.manifoldcf.crawler.expirethreads

No
No

org.apache.manifoldcf.crawler.cleanupthreads

No

org.apache.manifoldcf.crawler.deletethreads
org.apache.manifoldcf.crawler.historycleanupinterval

No
No

org.apache.manifoldcf.misc

No

org.apache.manifoldcf.db

No

org.apache.manifoldcf.lock

No

org.apache.manifoldcf.cache

No

org.apache.manifoldcf.agents

No

org.apache.manifoldcf.perf

No

org.apache.manifoldcf.crawlerthreads

No

org.apache.manifoldcf.hopcount

No

org.apache.manifoldcf.jobs
org.apache.manifoldcf.connectors
org.apache.manifoldcf.scheduling

No
No
No

org.apache.manifoldcf.authorityconnectors

No

org.apache.manifoldcf.authorityservice

No

org.apache.manifoldcf.salt

Yes,iffile
encryptionis
used

Specifiesthemaximumtimeahandleistolivebeforeitis
presumeddead.Recommendavalueof604800,whichisthe
maximumallowable.
Trueorfalse.When"true",willtrackallallocateddatabase
connectionhandles,andwilldumpanallocationstacktracewhen
thepoolisexhausted.Usefulfordiagnosingconnectionleaks.
Specifieslocationofloggingconfigurationfile.
DescribesdatabasenameforManifoldCFdefaultsto"dbname"if
notspecified.
DescribesdatabaseusernameforManifoldCFdefaultsto
"manifoldcf"ifnotspecified.
Describesdatabaseuser'spasswordforManifoldCFdefaultsto
"local_pg_password"ifnotspecified.
Obfuscateddatabaseuser'spasswordforManifoldCFdefaultsto
"local_pg_password"ifnotspecified.
Numberofcrawlerworkerthreadscreated.Suggestavalueof30.
Numberofcrawlerexpirationthreadscreated.Suggestavalueof
10.
Numberofcrawlercleanupthreadscreated.Suggestavalueof
10.
Numberofcrawlerdeletethreadscreated.Suggestavalueof10.
Millisecondstoretainhistoryrecords.Defaultis0.Zeromeans
"forever".
Miscellaneousdebuggingoutput.LegalvaluesINFO,WARN,or
DEBUG.
Databasedebuggingoutput.LegalvaluesINFO,WARN,or
DEBUG.
Lockmanagementdebuggingoutput.LegalvaluesINFO,WARN,
orDEBUG.
Cachemanagementdebuggingoutput.LegalvaluesINFO,
WARN,orDEBUG.
Agentmanagementdebuggingoutput.LegalvaluesINFO,
WARN,orDEBUG.
Performanceloggingdebuggingoutput.LegalvaluesINFO,
WARN,orDEBUG.
Logcrawlerthreadactivity.LegalvaluesINFO,WARN,or
DEBUG.
Loghopcounttrackingactivity.LegalvaluesINFO,WARN,or
DEBUG.
Logjobactivity.LegalvaluesINFO,WARN,orDEBUG.
Logconnectoractivity.LegalvaluesINFO,WARN,orDEBUG.
Logdocumentschedulingactivity.LegalvaluesINFO,WARN,or
DEBUG.
Logauthorityconnectoractivity.LegalvaluesINFO,WARN,or
DEBUG.
Logauthorityserviceactivity.LegalvaluesareINFO,WARN,or
DEBUG.
Specifythesaltvaluetobeusedforencryptingthefiletowhich
thecrawlerconfigurationisexported.

Thefollowingtabledescribes'advanced'configurationpropertyfileproperties.Theyshouldn'tneedtobechangedbutprovideagreater
levelofcustomization:
Advancedproperty.xmlproperties
Property
Required?Default
Function
org.apache.manifoldcf.crawler.repository.store_historyNo
true
Ifyoudonotrequirereportsfromwithinthiswilldisable
loggingtotherepositoryhistory(althoughthereportswillstill
runtheywillnotcontainanycontent).Thiscanincrease
throughputandreducetherateofgrowthofthedatabase.
org.apache.manifoldcf.db.postgres.analyze.
No
2000
Forpostgresql,specifyhowmanychangesshouldbecarried
<tablename>
outbeforecarryingoutan'ANALYZE'onthespecifiedtable.
org.apache.manifoldcf.db.postgres.reindex.
No
250000 Forpostgresql,specifyhowmanychangesshouldbecarried
<tablename>
outbeforecarryingoutan'REINDEX'onthespecifiedtable.
org.apache.manifoldcf.db.mysql.analyze.<tablename> No
org.apache.manifoldcf.ui.maxstatuscount

No

2000

ForMySqlorMariaDB,specifyhowmanychangesshouldbe
carriedoutbeforecarryingoutan'ANALYZE'onthespecified
table.
500000 Settheupperlimitfortheprecisedocumentcounttobe
returnedonthe'StatusandJobManagement'page.

Theconfigurationfilecanalsospecifyasetofdirectorieswhichwillbesearchedforconnectorjars.Thedirectivethataddstotheclass
pathis:
<libdirpath="path"/>
Notethatthepathcanberelative.Forthepurposesofpathresolution,"."meansthedirectoryinwhichtheproperties.xmlfileisitself
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

15/17

2015/7/18

BuildingManifoldCF

located.
Logging configuration file properties

Thelogging.inifilecontainsApachecommonsloggingpropertiesinastandardJava<name>=<value>format.ThewaytheManifoldCF
loggingoutputisformattediscontrolledthroughthisfile,asareanyloggersthatManifoldCFdoesn'texplicitlydefine(e.g.loggersfor
Apachecommonshttpclient).Otherresourcesarethereforebestsuitedtodescribetheparametersthatcanbeusedandtowhateffect.
Running the ManifoldCF Apache2 plug in
TheManifoldCFApache2plugin,modauthzannotate,isdesignedtoconvertanauthenticatedprinciple(e.g.frommodauthkerb),and
queryasetofauthorityservicesforaccesstokensusinganHTTPrequest.Theseaccesstokensarethenpassedtoa(notincluded)search
engineUI,whichcanusethemtohelpcomposeasearchthatproperlyexcludescontentthattheuserisnotsupposedtosee.
ThelistofauthorityservicessoqueriedisconfiguredinApache'shttpd.conffile.Thisprojectincludesonlyonesuchservice:thejava
authorityservice,whichusesauthorityconnectionsdefinedinthecrawlerUItoobtainappropriateaccesstokens.
Inorderformodauthzannotatetobeused,itmustbeplacedintoApache2'sextensionsdirectory,andconfiguredappropriatelyinthe
httpd.conffile.
Note:TheManifoldCFprojectnowcontainssupportforconvertingaKerberosprincipaltoalistofActiveDirectorySIDs.This
functionalityiscontainedintheActiveDirectoryAuthority.Thefollowingconnectorsareexpectedtomakeuseofthisauthority:
FileNet
CIFS
SharePoint
Configuring the ManifoldCF Apache2 plug in

modauthzannotateunderstandsthefollowinghttpd.confcommands:
Command
AuthzAnnotateEnable
AuthzAnnotateAuthority
AuthzAnnotateACLAuthority
AuthzAnnotateIDAuthority
AuthzAnnotateIDACLAuthority

Meaning
Turnon/offtheplugin
PointtoanauthorityservicethatsupportsACLqueries,butnotIDqueries
PointtoanauthorityservicethatsupportsACLqueries,butnotIDqueries
PointtoanauthorityservicethatsupportsIDqueries,butnotACLqueries
PointtoanauthorityservicethatsupportsbothACLqueriesandIDqueries

Values
"On","Off"
TheauthorityURL
TheauthorityURL
TheauthorityURL
TheauthorityURL

Running ManifoldCF with Apache Maven


IfyoubuildManifoldCFwithMaven,thenyouwillneedtorunManifoldCFunderMaven.Youcurrentlydon'tgetalotofoptionsherethe
onlymodelofferedistheQuickStartsingleprocessmodel.Torunit,allyouneedtodois:
cdframework/jettyrunner
mvnexec:exec

Integrating ManifoldCF into another application


ManifoldCFcanbeintegratedintoanotherapplicationthroughavarietyofmethods.We'llcoverthesebelow.
Integrating the Quick Start example
TheQuickStartexamplecanreadilybeintegratedintoasingleprocessapplication,byusingthesupportmethodsfoundinthe
org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunnerclass.ForthewebapplicationcomponentsofManifoldCF,youcaneitheruse
thisclasstostartthemunderJetty,oryoucanchoosetodeploythemyourself.Pleasenote,however,thatifyoustarttheManifoldCF
agentsprocesswithinawebapplication,youareeffectivelynotrunningasingleprocessversionofManifoldCFanymore,becauseeach
webapplicationwilleffectivelyhaveitsownsetofstaticclasses.
Ifyouwanttotrythesingleprocessintegration,youshouldlearnwhatyouneedbyreadingtheJavadocfortheManifoldCFJettyRunner
class.
Integrating a multiprocess setup
Inamultiprocesssetup,alloftheManifoldCFprocessesmightaswellexistontheirown.Youcanlearnhowtoprogrammaticallystartthe
agentsprocessbylookingatthecodeintheAgentRuncommandclass,asdescribedabove.Similarly,thecommandclassesthatregister
connectorsareverysmallandshouldbeeasytounderstand.
Integrating ManifoldCF with a search engine
ManifoldCF'sAuthorityServiceisdesignedtoallowmaximumflexibilityinintegratingManifoldCFsecuritywithsearchengines.The
servicereceivesauseridentity(asasetofauthorizationdomain/usernametuples),andproducesasetoftokens.Italsoreturnsa
summaryofthestatusofallauthoritiesthatwereinvolvedintheassemblyofthesetoftokens,asanicety.Asearchengineuserinterface
couldthussignaltheuserwhentheresultstheymightbeseeingareincomplete,andwhy.
TheAuthorityServiceexpectsthefollowingarguments,passedasURLargumentsandproperlyURLencoded:
AuthorityServiceURL

AuthorityServiceURLparameters
Meaning

http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

16/17

2015/7/18

BuildingManifoldCF

parameter
username
domain
username_XX
domain_XX

theusername,ifthereisonlyoneauthorizationdomain
theoptionalauthorizationdomainifthereisonlyoneauthorizationdomain(defaultstoempty
string)
usernamenumberXX,whereXXisanintegerstartingatzero
authorizationdomainXX,whereXXisanintegerstartingatzero

AccesstokensandauthoritystatusesarereturnedintheHTTPresponseseparatedbynewlinecharacters.Eachlinehasaprefixas
follows:
AuthorityServiceresponseprefixes
AuthorityServiceresponseprefix
TOKEN:
AUTHORIZED:
UNREACHABLEAUTHORITY:
UNAUTHORIZED:
USERNOTFOUND:

Meaning
Anaccesstoken
Thenameofanauthoritythatfoundtheusertobeauthorized
Thenameofanauthoritythatwasfoundtobeunreachableorunusable
Thenameofanauthoritythatfoundtheusertobeunauthorized
Thenameofanauthoritythatcouldnotfindtheuser

Itisimportanttorememberthatonlythe"TOKEN:"linesactuallymatterforsecurity.Evenifanyoftheerrorconditionsapply,thesetof
tokensreturnedbytheAuthorityServicewillbecorrectlysuppliedinordertoapplyappropriatesecuritytodocumentsbeingsearched.
IfyouchoosetodeployasearchenginepluginsuppliedbytheApacheManifoldCFproject(forexample,theSolrplugin),youwillnotneed
knowanyoftheabove,sincepartoftheplugin'spurposeistocommunicatewiththeAuthorityServiceandapplytheaccesstokensthat
arereturnedtothesearchqueryautomatically.Someplugins,suchastheElasticSearchplugin,aremoreorlessliketoolkits,butstillhide
mostoftheabovefromtheintegrator.Inamorehighlycustomizedsystem,however,youmayneedtodevelopyourowncodewhich
interactswiththeAuthorityServiceinordertomeetyourgoals.

LastPublished:05/05/201508:23:01
Copyright20092015TheApacheSoftwareFoundation.
ApacheManifoldCF,ManifoldCF,ApacheForrest,Forrest,ApacheSolr,Solr,Apache,theApachefeatherlogo,theApacheForrestlogo,andtheApacheManifoldCFlogoare
trademarksofTheApacheSoftwareFoundation.DocumentumandEMCareatrademarksofEMCCorporation.SharePoint,Windows,andMicrosoftaretrademarksof
Microsoft,Inc.FileNetP8andIBMaretrademarksofIBM,Inc.LiveLinkandOpenTextaretrademarksofOpenText,Inc.QBase,MetaCarta,andGTSaretrademarksofQBase,
Inc.MeridioandAutonomyaretrademarksofHewlettPackard,Inc.AlfrescoisatrademarkofAlfrescoSoftware,Inc.JiraisatrademarkofAtlassian,Inc.

http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF

17/17

You might also like