Professional Documents
Culture Documents
BuildingManifoldCF
Apache>ManifoldCF>ReleaseDocumentation>release>release2.1>en_US
SearchthesitewithSolr
Search
PoweredbyLucidWorks
LastPublished:05/05/201508:23:01
Building ManifoldCF
BuildingManifoldCF
Buildingoverview
BuildingtheframeworkandtheconnectorsusingApacheAnt
BuildingandtestingthelegacyAlfrescoconnector
BuildingandtestingtheAlfrescoWebscriptconnector
BuildingandrunningtheDocumentumconnector
BuildingandrunningtheFileNetconnector
BuildingandrunningtheJDBCconnector,includingOracle,MSSQL,MySQL,SQLServer,andSybaseJDBCdrivers
BuildingandrunningthejCIFS/WindowsSharesconnector
BuildingandrunningtheLiveLinkconnector
BuildingtheMeridioconnector
BuildingandrunningtheSharePointconnector
RunningtheApacheSolroutputconnector
RunningtheElasticSearchoutputconnector
BuildingtheframeworkandtheconnectorsusingApacheMaven
Preparation
Howtobuild
BuildingManifoldCF'sApache2plugin
RunningManifoldCF
Overview
Binaryorganization
Exampledeployments
Quickstartsingleprocessmodel
Singleprocessdeployablewar
Simplifiedmultiprocessmodelusingfilebasedsynchronization
SimplifiedmultiprocessmodelusingZooKeeperbasedsynchronization
Commanddrivenmultiprocessmodel
Theconnectors.xmlconfigurationfile
Runningconnectorspecificprocesses
Databaseselection
ConfiguringaPostgreSQLdatabase
ConfiguringaMySQLdatabase
ConfiguringanHSQLDBdatabase
TheManifoldCFconfigurationfiles
properties.xmlfileproperties
Loggingconfigurationfileproperties
RunningtheManifoldCFApache2plugin
ConfiguringtheManifoldCFApache2plugin
RunningManifoldCFwithApacheMaven
IntegratingManifoldCFintoanotherapplication
IntegratingtheQuickStartexample
Integratingamultiprocesssetup
IntegratingManifoldCFwithasearchengine
Building ManifoldCF
ManifoldCFconsistsofaframework,asetofconnectors,andanoptionalApache2pluginmodule.Thesecanbebuiltasfollows.
Building overview
TherearetwowaystobuildManifoldCF.Theprimarymeansofbuilding(andthemostsupported)isviaApacheAnt.Theantbuildis
usedtocreateManifoldCFreleasesandtoruntests,loadtests,andUItests.Mavenisalsosupportedfordevelopbuildingonly.Maven
ManifoldCFbuildshavemanyrestrictionsandchallengesandareofsecondarypriorityforthedevelopmentteam.
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
1/17
2015/7/18
BuildingManifoldCF
TheManifoldCFframeworkisbuiltwithoutanydependenciesonconnectorcode.Itconsistsofasetofjars,afamilyofwebapplications,
andanumberofjavacommandclasses.Connectorsarethenbuiltthathavewelldefineddependenciesontheframeworkmodules.A
properlybuiltconnectortypicallyconsistsof:
Oneormorejarfilesmeanttobeincludedinthelibraryareameantforconnectorjarsandtheirdependencies.
Possiblysomejavacommands,whicharemeanttosupportorconfiguretheconnectorinsomeway.
Possiblyaconnectorspecificprocessortwo,eachrequiringadistinctclasspath,whichusuallyservestoisolatethecrawlerui
servlet,authorityserviceservlet,agentsprocess,andanycommandsfromproblematicaspectsoftheclientenvironment
Arecommendedsetofjava"define"variables,whichshouldbeusedconsistentlywithallinvolvedprocesses,e.g.theagents
process,theapplicationserverrunningtheauthorityserviceandcrawlerui,andanycommands.(Thisishistorical,andno
connectorsasofthiswritinghaveanyoftheseanylonger).
Anindividualconnectorpackagewilltypicallysupplyanoutputconnector,oratransformationconnector,oramappingconnector,ora
repositoryconnector,orsometimesbotharepositoryconnectorandanauthorityconnector.Themainantbuildscriptautomaticallyforms
eachindividualconnector'scontributiontotheoverallsystemintotheoverallpackage.
Building the framework and the connectors using Apache Ant
TobuildtheManifoldCFframeworkcode,andtheparticularconnectorsyouareinterestedin,youcurrentlyneedtodothefollowing:
1. Checkoutthedesiredreleasefromhttps://svn.apache.org/repos/asf/manifoldcf/tags,orunpackthedesiredsourcedistribution.
2. cdtothetopleveldirectory.
3. EITHER:overlaythelibdirectoryfromthecorrespondinglibdistribution(preferred,wherepossible),ORrun"antmakecore
deps"tobuildthecodedependencies.Thelatteristheonlypossibilityifyouarebuildingfromtrunk,butitisnotguaranteedtowork
forolderreleases.
4. Run"antmakedeps",todownloadLGPLandotheropensourcebutnonApachecompatiblelibraries.
5. Installproprietarybuilddependencies.Seebelowfordetails.
6. Run"antbuild".
7. Installdesireddependentproprietarylibraries.Seebelowfordetails.
Ifyoudonotruntheant"makedeps"target,andyousupplyNOLGPLorproprietarylibraries,notallcapabilitiesofManifoldCFwillbe
available.Theframeworkitselfandthefollowingrepositoryconnectorswillbebuilt:
AlfrescoWebscriptconnector
CMISconnector
EMCDocumentumconnector,builtagainstaDocumentumAPIstub
DropBoxconnector
Emailconnector
FileNetconnector,builtagainstaFileNetAPIstub
WGETcompatiblefilesystemconnector
GenericXMLrepositoryconnector
GoogleDriveconnector
GridFSconnector(mongoDB)
HDFSconnector
JDBCconnector,withthejustthePOSTGRESQLjdbcdriver
AtlassianJiraconnector
OpenTextLiveLinkconnector,builtagainstaLiveLinkAPIstub
Meridioconnector,builtagainstmodifiedMeridioAPIWSDLsandXSDs
RSSconnector
MicrosoftSharePointconnector,builtagainstSharePointAPIWSDLs
Webcrawlerconnector
Wikiconnector
Thefollowingauthorityconnectorswillbebuilt:
ActiveDirectoryauthority
AlfrescoWebscriptauthority
CMISauthority
EMCDocumentumauthority
AtlassianJiraauthority
LDAPauthority
OpenTextLiveLinkauthority
Meridioauthority,builtagainstmodifiedMeridioAPIWSDLsandXSDs
Nullauthority
MicrosoftSharePoint/ADauthority
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
2/17
2015/7/18
BuildingManifoldCF
MicrosoftSharePoint/Nativeauthority,builtagainstSharePointAPIWSDLs
Thefollowingoutputconnectorswillbebuilt:
WGETcompatiblefilesystemoutputconnector
MetaCartaGTSoutputconnector
ApacheSolroutputconnector
OpenSearchServeroutputconnector
ElasticSearchoutputconnector
WGETcompatiblefilesystemoutputconnector
HDFSoutputconnector
Nulloutputconnector
Thefollowingtransformationconnectorswillbebuilt:
Fieldmappingtransformationconnector
Documentfiltertransformationconnector
Nulltransformationconnector
Tikaextractortransformationconnector
Thefollowingmappingconnectorswillbebuilt:
Regularexpressionmappingconnector
ThedependenciesandbuildlimitationsofeachindividualLGPLandproprietaryconnectorisdescribedinseparatesectionsbelow.
Building and testing the legacy Alfresco connector
ThelegacyAlfrescoconnectorrequirestheAlfrescoWebServicesClientprovidedbyAlfrescoinordertobebuilt.Placethisjarintothe
directoryconnectors/alfresco/libproprietarybeforeyoubuild.Thiswilloccurautomaticallyifyouexecutetheanttarget"makedeps"
fromtheManifoldCFrootdirectory.
Torunintegrationtestsfortheconnectoryouhavetocopythealfresco.warincludingH2supportcreatedbytheMavenmoduletest
materials/alfresco4war(using"mvnpackage"insidethefolder)intotheconnectors/alfresco/testmaterialsproprietaryfolder.Then
usethe"anttest"or"mvnintegrationtest"forthestandardbuildtoexecuteintegrationtests.
Building and testing the Alfresco Webscript connector
TheAlfrescoWebscriptconnectorisbuiltagainstanopensourceAlfrescoIndexerclient,whichrequiresacorrespondingAlfrescoIndexer
plugintobeinstalledonyourAlfrescoinstance.ThisAlfrescoIndexerpluginisincludedwithManifoldCFdistributions.Installationofthe
pluginshouldfollowthestandardAlfrescoinstallationsteps,asdescribedhere.Seethispageforconfigurationdetails,andfortheplugin
itself.
Building and running the Documentum connector
TheDocumentumconnectorrequiresEMC'sDFCproductinordertoberun,butisbuiltagainstasetofstubclasses.Thestubsmimicthe
classstructureofDFC6.0.IfyourDFCisnewer,itispossiblethattheclassstructureoftheDFCclassesmighthavechanged,andyou
mayneedtobuildtheconnectoryourself.
IfyouneedtosupplyDFCclassesduringbuildtime,copytheDFCanddependentjarstothesourcedirectory
connectors/documentum/libproprietary,andbuildusing"antbuild".Thejarswillbecopiedintotherightplaceinyourdistdirectory
automatically.
Forabinarydistribution,justcopytheDFCjarstoprocesses/documentumserver/libproprietaryinstead.
Ifyouhavedoneeverythingright,youshouldbeabletostarttheDocumentumconnector'sregistryandserverprocesses,asperthe
instructions.
Building and running the FileNet connector
TheFileNetconnectorrequiresIBM'sFileNetP8APIjarinordertoberun,butisusuallybuiltagainstasetofstubclasses.Thestubs
mimictheclassstructureofFileNetP8API4.5.IfyourFileNetisnewer,itispossiblethattheclassstructureoftheAPImighthave
changed,andyoumayneedtobuildtheconnectoryourself.
IfyouneedtosupplyyourownJace.jaratbuildtime,copyittothesourcedirectoryconnectors/filenet/libproprietary,andbuildusing
"antbuild".TheJace.jarwillbecopiedintotherightplaceinyourdistdirectoryautomatically.
Ifyoudonotwishtobuild,simplycopyyourJace.jarandtheotherdependentjarsfromthatinstallationintothedistributiondirectory
processes/filenetserver/libproprietary.
Ifcorrectlydone,youwillbeabletostarttheFileNetconnector'sregistryandserverprocesses,aspertheinstructions.
Building and running the JDBC connector, including Oracle, MSSQL, MySQL, SQLServer, and Sybase JDBC drivers
TheJDBCconnectoralsoknowshowtoworkwithOracle,SQLServer,andSybaseJDBCdrivers.Inordertosupportthesedatabases,start
byplacingthemysqlconnectorjava.jarandthejtds.jarinthelibproprietarydirectory.Theanttarget"makedeps"willdothisforyou
automatically.ForOracle,downloadtheappropriateOracleJDBCjarfromtheOraclesite,andcopyitintothesamedirectorybeforeyou
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
3/17
2015/7/18
BuildingManifoldCF
buildManifoldCF.
Building and running the jCIFS/Windows Shares connector
Tobuildthisconnector,youneedtodownloadjcifs.jarfromhttp://jcifs.samba.org,andcopyitintotheconnectors/jcifs/libproprietary
directorybeforebuilding.Youcanalsojusttype"antmakedeps"fromtherootManifoldCFdirectoryandthisstepwillbedoneforyou.
Ifyouhavedownloadedabinarydistribution,placethejcifs.jarintotheconnectorlibproprietarydirectory,anduncommentthe
WindowsShareslineintheconnectors.xmlfile.
Building and running the LiveLink connector
ThisconnectorneedsOpenText'sLAPIpackageinordertoberun.Itisusuallybuiltagainstasetofstubs.Thestubs,however,mimicthe
classstructureofLAPI9.7.1.Laterversions(suchas10.x)haveadifferentclassstructure.Therefore,youmayneedtorebuildManifoldCF
againstyourlapi.jar,inorderfortheconnectortoworkproperly.
Ifyouneedtosupplyyourownlapi.jarandllssl.jaratbuildtime,copyittothesourcedirectoryconnectors/livelink/libproprietary,and
buildusing"antbuild".Thelapi.jarwillbecopiedintotherightplaceinyourdistdirectoryautomatically.
Ifyoudonotwishtobuild,simplycopyyourlapi.jarandllssl.jarintothebinarydistribution'sconnectorlibproprietarydirectory,and
uncommenttheLiveLinkrelatedconnectorlinesinconnectors.xmlfile.
Building the Meridio connector
TheMeridioconnectorgeneratesinterfaceclassesusingcheckedinwsdlsandxsdsoriginallyobtainedfromaninstalledMeridioinstance
usingdisco.exe,andsubsequentlymodifiedtoworkaroundlimitationsinApacheAxis.Thedisco.exeutilityisinstalledaspartof
MicrosoftVisualStudio,andistypicallyfoundunder"c:\ProgramFiles\MicrosoftSDKs\Windows\V6.x\bin".Ifdesired,youcanobtain
unmodifiedwsdlsandxsdsbyinterrogatingthefollowingMeridiowebservices:
http[s]://<meridio_server>/DMWS/MeridioDMWS.asmx
http[s]://<meridio_server>/RMWS/MeridioRMWS.asmx
Building and running the SharePoint connector
TheSharePointconnectorgeneratesinterfaceclassesusingcheckedinwsdlsoriginallyobtainedfromaninstalledMicrosoftSharePoint
instanceusingdisco.exe.Thedisco.exeutilityisinstalledaspartofMicrosoftVisualStudio,andistypicallyfoundunder"c:\Program
Files\MicrosoftSDKs\Windows\V6.x\bin".Ifdesired,youcanobtainunmodifiedwsdlsbyinterrogatingthefollowingSharePointweb
services:
http[s]://<server_name>/_vti_bin/Permissions.asmx
http[s]://<server_name>/_vti_bin/Lists.asmx
http[s]://<server_name>/_vti_bin/Dspsts.asmx
http[s]://<server_name>/_vti_bin/usergroup.asmx
http[s]://<server_name>/_vti_bin/versions.asmx
http[s]://<server_name>/_vti_bin/webs.asmx
Important:ForSharePointinstancesversion3.0(2007)orhigher,inordertoruntheconnector,youalsomustdeployacustom
SharePointwebserviceontheSharePointinstanceyouintendtoconnectto.ThisisrequiredbecauseMicrosoftoverlookedsupportfor
webservicebasedaccesstofileandfoldersecurityinformationwhenSharePoint2007wasreleased.ForSharePointversion4.0(2010),
theserviceisevenmorecritical,becausebackwardscompatibilitywasnotmaintainedandwithoutthisservicenocrawlingcanoccur.
SharePointversion5.0(2013)alsorequiresapluginalthoughitsfunctionalityisthesameasforSharePoint4.0,thebinaryyouinstallis
builtagainstSharePoint2013resourcesratherthanSharePoint2010resources,sothereisadifferentdistribution.
Theversionsofthisservicecanbefoundinthedistributiondirectoryplugins/sharepoint.Picktheversionappropriateforyour
SharePointinstallation,andinstallitfollowingtheinstructionsinthefileInstallationReadme.txt,foundinthecorresponding
directory.
Running the Apache Solr output connector
TheApacheSolroutputconnectorrequiresnospecialattentiontobuildorrunwithinManifoldCF.However,inorderforApacheSolrtobe
abletoenforcedocumentsecurity,youmustinstallandproperlyconfigureapluginforSolr.ThispluginisavailableforbothSolr3.xand
forSolr4.x,andcanbeusedeitherasaqueryparserplugin,orasasearchcomponent.Additionalindexfieldsarealsorequiredtocontain
thenecessarysecurityinformation.MuchmoreinformationcanbefoundintheREADME.txtfileinthepluginsthemselves.
Thecorrectversionsofthepluginsareincludedintheplugins/solrdirectoryofthemainManifoldCFdistribution.Youcanalsodownload
updatedversionsofthepluginsfromtheManifoldCFdownloadpage.Thecompatibilitymatrixisasfollows:
0.1.x1.4.x
1.5.x
>=1.6.x
ApacheManifoldCFSolr3.xandSolr4.xplugincompatibility
ManifoldCFversions
Pluginversion
0.x
1.x
2.x
IftheproperversionofthepluginisnotdeployedonSolr,documentswillnotbeproperlysecured.Thus,itisessentialto
verifythattheproperpluginversionhasbeendeployedfortheversionofManifoldCFyouareusing.
Running the ElasticSearch output connector
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
4/17
2015/7/18
BuildingManifoldCF
TheElasticSearchoutputconnectorrequiresnospecialattentiontobuildorrunwithinManifoldCF.However,inorderforElasticSearchto
beabletoenforcedocumentsecurity,youmustinstall,properlyconfigure,andcodeagainstatoolkitpluginforElasticSearch.Additional
indexfieldsarealsorequiredtocontainthenecessarysecurityinformation.MuchmoreinformationcanbefoundintheREADME.txtfile
inthepluginitself.
Thecorrectversionsofthepluginisincludedintheplugins/elasticsearchdirectoryofthemainManifoldCFdistribution.Youcanalso
downloadupdatedversionsofthepluginfromtheManifoldCFdownloadpage.Thecompatibilitymatrixisasfollows:
0.1.x1.4.x
1.5.x
>=1.6.x
ApacheManifoldCFElasticSearchplugincompatibility
ManifoldCFversions
0.x
1.x
2.x
Pluginversion
Iftheproperversionofthepluginisnotdeployedandproperlyintegrated,documentswillnotbeproperlysecured.
Thus,itisessentialtoverifythattheproperpluginversionhasbeendeployedfortheversionofManifoldCFyouareusing.
ToworkwithManifoldCF,yourElasticSearchinstancemustalsoincludetheappropriateindexescreatedaswell.Herearesomesimple
stepsforcreatinganElasticSearchindex,usingtheCURLutility:
%curlXPUT'http://localhost:9200/manifoldcf'
%curlXPUT'http://localhost:9200/manifoldcf/attachment/_mapping'd'
{
"attachment":{
"_source":{
"excludes":["file"]
},
"properties":{
"allow_token_document":{
"type":"string"
},
"allow_token_parent":{
"type":"string"
},
"allow_token_share":{
"type":"string"
},
"attributes":{
"type":"string"
},
"createdOn":{
"type":"string"
},
"deny_token_document":{
"type":"string"
},
"deny_token_parent":{
"type":"string"
},
"deny_token_share":{
"type":"string"
},
"lastModified":{
"type":"string"
},
"shareName":{
"type":"string"
},
"file":{
"type":"attachment",
"path":"full",
"fields":{
"file":{
"store":true,
"term_vector":"with_positions_offsets",
"type":"string"
}
}
}
}
}
}'
Thiscommandcreatesanindexcalledmanifoldcfwithamappingnamedattachmentwhichhassomegenericfieldsforaccess
tokensandafieldfilewhichmakesuseoftheElasticSearchattachmentmapperplugin.Itisconfiguredforhighlighting
("term_vector":"with_positions_offsets").
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
5/17
2015/7/18
BuildingManifoldCF
Thefollowingpartisusefulfornotsavingthesourcejsonontheindexwhichreducestheindexsizesignificantly.Beawarethatyou
shouldn'tdothisifyouwillneedtoreindexdataontheElasticSearchsideoryouneedaccesstothewholedocument:
"_source":{
"excludes":["file"]
},
Nospecialpreparationisrequired,otherthantohaveaccesstotheApacheMavenrepository.
How to build
Buildingisstraightforward.IntheManifoldCFroot,type:
mvncleaninstall
Thisshouldgenerateallthenecessaryartifactstorunwith,andalsoruntheHsqldbbasedtests.
Tobuildandskiponlytheintegrationtests,type:
mvncleaninstallDskipITs
WhenyouhavethedefaultpackageinstalledlocallyinyourMavenrepository,toonlybuildManifoldCFartifacts,type:
mvncleanpackage
NOTE:DuetocurrentlimitationsintheManifoldCFMavenpoms,youMUSTrunacomplete"mvncleaninstall"asthefirststep.You
cannotskipsteps,orthebuildwillfail.
Building ManifoldCF's Apache2 plugin
Tobuildthemodauthzannotateplugin,youneedtostartwithaUnixsystemthathastheapache2developmenttoolsinstalledonit,plus
thecurldevelopmentpackage(fromhttp://curl.haxx.seorelsewhere).Then,cdtomodauthzannotate,andtype"make".Thebuildwill
produceafilecalledmodauthzannotate.so,whichshouldbecopiedtotheappropriateApache2directorysoitcanbeusedasaplugin.
Running ManifoldCF
Overview
ManifoldCFconsistsofseveralcomponents.Theseareenumeratedbelow:
Adatabase,whichiswhereManifoldCFkeepsallofitsconfigurationandstateinformation,usuallyPostgreSQL
Asynchronizationdirectory,whichhowManifoldCFcoordinatesactivityamongitsvariousprocesses
Anagentsprocess,whichistheprocessthatactuallycrawlsdocumentsandingeststhem
Acrawleruiservlet,whichpresentstheUIusersinteractwithtoconfigureandcontrolthecrawler
Anauthorityserviceservlet,whichrespondstorequestsforauthorizationtokens,givenausername
Anapiserviceservlet,whichrespondstoRESTAPIrequests
Theseunderlyingcomponentscanbepackagedinmanyways.Forexample,thethreeservletscanbedeployedinseparatewarfieldsas
separatewebapplications.Onemayalsodeployallthreeservletsinonecombinedwebapplication,andalsoincludetheagentsprocess.
Binary organization
WhetheryoubuildManifoldCFyourself,ordownloadabinarydistribution,youwillneedtoknowwhatiswhatinthebuildresult.Ifyou
buildManifoldCFyourself,thebinarybuildresultcanbefoundinthesubdirectorydist.Inabinarydistribution,thecontentsofthe
distributionarethecontentsofthedistdirectory.Thesecontentsaredescribedbelow.
distfile/directory
connectors.xml
connectorlib
connectorlib
proprietary
Distributiondirectoriesandfiles
Meaning
anxmlfiledescribingtheconnectorsthatshouldberegistered
jarsforalltheconnectors,referredtobyproperties.xml
proprietaryjarsforalltheconnectors,referredtobyproperties.xmlnotincludedinbinaryrelease
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
6/17
2015/7/18
BuildingManifoldCF
obfuscationutility
lib
libproprietary
processes
scriptengine
example
autilitytoobfuscatepasswords,forinclusioninproperties.xmlfields
jarsforalloftheexamples,referencedbytheexamplescripts
proprietaryjarsforalloftheexamples,referencedbytheproprietaryexamplescripts
scripts,classpathjars,andDswitchvaluesneededfortherequiredconnectorspecificprocesses
jarsandscriptsforrunningtheManifoldCFscriptinterpreter
ajettybasedexamplethatrunsinasingleprocess(exceptforanyconnectorspecificprocesses),excludingall
proprietarylibraries
exampleproprietary
ajettybasedexamplethatrunsinasingleprocess(exceptforanyconnectorspecificprocesses),including
proprietarylibrariesnotincludedinbinaryrelease
multiprocessfile
scriptsandjarsforanexamplethatusesthemultipleprocessmodelusingfilebasedsynchronization,excluding
example
allproprietarylibraries
multiprocessfile
scriptsandjarsforanexamplethatusesthemultipleprocessmodelusingfilebasedsynchronization,including
exampleproprietary
proprietarylibrariesnotincludedinbinaryrelease
multiprocesszkexample scriptsandjarsforanexamplethatusesthemultipleprocessmodelusingZooKeeperbasedsynchronization,
excludingallproprietarylibraries
multiprocesszk
scriptsandjarsforanexamplethatusesthemultipleprocessmodelusingZooKeeperbasedsynchronization,
exampleproprietary
includingproprietarylibrariesnotincludedinbinaryrelease
web
appserverdeployablewebapplications(wars),excludingallproprietarylibraries
webproprietary
appserverdeployablewebapplications(wars),includingproprietarylibrariesnotincludedinbinaryrelease
doc
javadocsforframeworkandallincludedconnectors
plugins
prebuiltintegrationcomponentstodeployontargetsystems,e.g.forSolr
Ifyoudownloadedthebinarydistribution,youmaynoticethattheconnectorlibproprietarydirectorycontainsonlyanumberof
<connector>README.txtfiles.ThisisbecauseunderApachelicensingrules,incompatiblylicensedjarsmaynotberedistributed.Each
such<connector>README.txtdescribesthejarsthatyouneedtoaddtotheconnectorlibproprietarydirectoryinordertogetthe
correspondingconnectorworking.Youwillalsothenneedtouncommenttheappropriateentriesintheconnectors.xmlfileaccordinglyto
enabletheconnectorforuse.
NOTE:Theprebuiltbinarydistributioncannot,atthistime,includesupportforMySQL.NorcantheJDBCConnectoraccessMySQL,
MSSQL,SyBase,orOracledatabasesinthatdistribution.InordertousetheseJDBCdrivers,youmustbuildManifoldCFyourself.Start
bydownloadingthedriversandplacingtheminthelibproprietarydirectory.Thecommandantdownloaddependencieswilldomostof
thisforyou,withtheexceptionoftheOracleJDBCdriver.
Thedirectorytitledprocessesincludeseparateprocesseswhichmustbestartedinorderfortheassociatedconnectortofunction.The
numberofproducedprocessessubdirectoriesmayvary,becauseoptionalindividualconnectorsmayormaynotsupplyprocessesthat
mustberuntosupporttheconnector.Foreachoftheprocessessubdirectoriesabove,anyscriptsthatpertaintothatconnectorsupplied
processwillbeplacedintherootlevelofthesubdirectory.Thesuppliedscriptsforaprocessgenerallytakecareofbuildinganappropriate
classpathandsettingnecessaryDswitches.(Note:noneofthecurrentconnectorsrequirespecialDswitchesatthistime.)Ifyouneedto
constructaclasspathbyhand,itisimportanttorememberthat"more"isnotnecessarily"better".Theprocessdeploymentstrategyimplied
bythebuildstructurehasbeencarefullythoughtouttoavoidjarconflicts.Indeed,severalconnectorsarestructuredusingmultiple
processespreciselyforthatreason.
Theproprietarylibrariesrequiredbythesecondaryprocessprocessessubdirectoriesshouldbeinthedirectoryprocesses/xxx/lib
proprietary.Thesejarsarenotincludedinthebinarydistribution,andyouwillneedtosupplytheminordertomaketheprocesswork.A
README.txtfileisplacedineachlibproprietarydirectorydescribingwhatneedstobeprovidedthere.
Thepluginsdirectorycontainscomponentsyoumayneedtodeployonthetargetsystemtomaketheassociatedconnectorfunction
correctly.Forexample,theSolrconnectorincludespluginclassesforenforcingManifoldCFsecurityonSolr3.xand4.x.SeetheREADME
fileineachdirectoryfordetailedinstructionsonhowtodeploythecomponents.
Insidetheexampledirectory,youwillfindeverythingyouneedtofireupManifoldCFinasingleprocessmodelunderJetty.Everythingis
includedsothatallyouneedtodoischangetothatdirectory,andstartitusingthecommand<java>jarstart.jar.Thisisdescribedin
moredetaillater,andistherecommendedwayforbeginnerstotryoutManifoldCF.Thedirectoryexampleproprietarycontainsan
equivalentexamplethatincludesproprietaryconnectorsandjars.ThisisthestandardplacetostartifyoubuildManifoldCFyourself.
Example deployments
TherearemanydifferentwaystorunManifoldCFoutofthebox.Theseareenumeratedbelow:
Quickstartsingleprocessmodel
Singleprocessdeployablewar
Simplifiedmultiprocessmodel
Commanddrivenmultiprocessmodel
Eachwayhasadvantagesanddisadvantages.Forexample,singleprocessmodelslimittheflexibilityofdeployingManifoldCF
components.Multiprocessmodelsrequirethatinterprocesssynchronizationbeproperlyconfigured.Ifyouarejuststartingoutwith
ManifoldCF,wesuggestyoutrythequickstartsingleprocessmodelfirst,sincethatistheeasiest.
Quickstart single process model
YoucanrunmostofManifoldCFinasingleprocess,forevaluationandconvenience.ThissingleprocessversionusesJettytohandleits
webapplications,andHsqldbasanembeddeddatabase.AllyouneedtodotorunthisversionofManifoldCFistofollowtheAntbased
buildinstructionsabove,andthen:
cdexample
start[.bat|.sh]
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
7/17
2015/7/18
BuildingManifoldCF
Inthequickstartmodel,alldatabaseinitializationandconnectorregistrationtakesplaceautomaticallywheneverManifoldCFisstarted
(atthecostofsomestartupdelay).ThecrawlerUIcanbefoundathttp://<host>:8345/mcfcrawlerui.Theauthorityservicecanbe
foundathttp://<host>:8345/mcfauthorityservice/UserACLs.TheprogrammaticAPIisathttp://<host>:8345/mcfapiservice.
YoucanstopthequickstartManifoldCFatanytimeusing^C,orbyusingthescriptstop[.bat|.sh]
BearinmindthatHsqldbisnotasfullfeaturedadatabaseasisPostgreSQL.Thismeansthatanyperformancetestingyoumaydo
againstthequickstartexamplemaynotbeapplicabletoafullinstallation.Furthermore,embeddedHsqldbonlypermitsoneprocessata
timetobeconnectedtoitsdatabases,soyoucannotuseanyoftheManifoldCFcommands(asdescribedbelow)whilethequickstart
ManifoldCFisrunning.
AnothercaveatthatyouwillneedtobeawareofwiththequickstartversionofManifoldCFisthatitinnowayremovestheneedforyouto
runanyseparateprocessesthatindividualconnectorsrequire.Specifically,theDocumentumandFileNetconnectorsrequireprocessesto
beindependentlystartedinordertofunction.Youwillneedtoreadabouttheseconnectorspecificprocessesbelowinordertousethe
correspondingconnectors.Scriptsforrunningtheseprocessescanbefoundinthedirectoriesnamedprocesses/xxx.
Singleprocess deployable war
Underthedistributiondirectoryweb/war,thereisawarfilecalledmcfcombinedservice.war.Thiswebapplicationcontainstheexact
samefunctionalityasthequickstartexample,butbundledupasasinglewarinstead.Anexamplescriptisprovidedtorunthisweb
applicationunderJetty.Youcanexecutethescriptasfollows:
cdexample
startcombined[.sh|.bat]
ThecombinedwebservicepresentsthecrawlerUIattherootpathforthewebapplication,whichishttp://<host>:8345/mcf/.The
authorityservicefunctionalitycanbefoundathttp://<host>:8345/mcf/UserACLs,similartothequickstartexample.However,the
programmaticAPIservicehasapathotherthantheroot:http://<host>:8345/mcf/api/.
Thescriptthatstartsthecombinedservicewebapplicationusesthesamedatabaseinstance(Hsqldbbydefault)asdoesthequickstart,
andthesameproperties.xmlfile.Thesamecaveatsaboutrequiredindividualconnectorprocessesalsoapplyastheydoforthequickstart
example.
Running singleprocess combined war example using Tomcat
InordertoruntheManifoldCFsingleprocesscombinedwarexampleunderTomcat,youwillneedtotakethefollowingsteps:
1. ModifytheTomcatstartupscript,orusetheTomcatserviceadministrationclient,tosetaJava"Dorg.apache.manifoldcf.configfile"
switchtopointtotheexample'sproperties.xmlfile.
2. StartTomcat.
3. Deployandstartthemcfcombinedservicewebapplication,preferablyusingtheTomcatadministrationclient.
Simplified multiprocess model using filebased synchronization
ManifoldCFcanalsobedeployedinasimplifiedmultiprocessmodelwhichusesfilestosynchronizeprocesses.Insidethemultiprocess
fileexampledirectory,youwillfindeverythingyouneedtodothis.(Themultiprocessfileexampleproprietarydirectoryissimilarbut
includesproprietarymaterialandisavailableonlyifyoubuildManifoldCFyourself.)Belowisalistofwhatyouwillfindinthisdirectory.
Filebasedmultiprocessexamplefilesanddirectories
multiprocessfile
Meaning
examplefile/directory
web
Webapplicationsthatshouldbedeployedontomcatortheequivalent,plusrecommendedapplicationserver
Dswitchnamesandvalues
processes
classpathjarsthatshouldbeincludedintheclasspathforallnonconnectorspecificprocesses,alongwith
Dswitches,usingthesameconventionasdescribedfortomcat,above
properties.xml
anexampleManifoldCFconfigurationfile,intherightplaceforthemultiprocessscripttofindit
logging.ini
anexampleManifoldCFloggingconfigurationfile,intherightplacefortheproperties.xmltofindit
syncharea
anexampleManifoldCFsynchronizationdirectory,whichmustbewritableinorderformultiprocess
ManifoldCFtowork
logs
wheretheManifoldCFlogsgetwrittento
startdatabase[.sh|.bat]
scripttostarttheHSQLDBdatabase
initialize[.sh|.bat]
scripttocreatethedatabaseinstance,createalldatabasetables,andregisterconnectors
startwebapps[.sh|.bat]
scripttostartJettywiththeManifoldCFwebapplicationsdeployed
startagents[.sh|.bat]
scripttostartthe(first)agentsprocess
startagents2[.sh|.bat]
scripttostartasecondagentsprocess
stopagents[.sh|.bat]
scripttostopallrunningagentsprocessescleanly
lockclean[.sh|.bat]
scripttocleanupdirtylocks(runonlywhenallwebappsandprocessesarestopped)
Initializing the database and running
Ifyourunthefilebasedmultiprocessmodel,afteryoufirststartthedatabase(usingstartdatabase[.sh|.bat]),youwillneedtoinitialize
thedatabasebeforeyoustarttheagentsprocessorusethecrawlerUI.Todothis,allyouneedtodoisruntheinitialize[.sh|.bat]script.
Then,youwillneedtostartthewebapplications(usingstartwebapps[.sh|.bat])andtheagentsprocess(usingstartagents[.sh|.bat]),
andoptionallythesecondagentsprocess(usingstartagents2[.sh|.bat]).
Running multiprocess filebased example using Tomcat
InordertoruntheManifoldCFmultiprocessfilebasedexampleunderTomcat,youwillneedtotakethefollowingsteps:
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
8/17
2015/7/18
BuildingManifoldCF
1. Startthedatabase(usingstartdatabase[.sh|.bat])
2. Initializethedatabase(usinginitialize[.sh|.bat])
3. Starttheagentsprocess(usingstartagents[.sh|.bat],andoptionallystartagents2[.sh|.bat])
4. ModifytheTomcatstartupscript,orusetheTomcatserviceadministrationclient,tosetaJava"Dorg.apache.manifoldcf.configfile"
switchtopointtotheexample'sproperties.xmlfile.
5. StartTomcat.
6. Deployandstartthemcfcrawlerui,mcfauthorityservice,andmcfapiservicewebapplications,preferablyusingtheTomcat
administrationclient.
Simplified multiprocess model using ZooKeeperbased synchronization
ManifoldCFcanbedeployedinasimplifiedmultiprocessmodelwhichusesApacheZooKeepertosynchronizeprocesses.Insidethe
multiprocesskzexampledirectory,youwillfindeverythingyouneedtodothis.(Themultiprocesszkexampleproprietarydirectoryis
similarbutincludesproprietarymaterialandisavailableonlyifyoubuildManifoldCFyourself.)Belowisalistofwhatyouwillfindinthis
directory.
ZooKeeperbasedmultiprocessexamplefilesanddirectories
multiprocesszk
Meaning
examplefile/directory
web
Webapplicationsthatshouldbedeployedontomcatortheequivalent,plusrecommendedapplicationserver
Dswitchnamesandvalues
processes
classpathjarsthatshouldbeincludedintheclasspathforallnonconnectorspecificprocesses,alongwith
Dswitches,usingthesameconventionasdescribedfortomcat,above
properties.xml
anexampleManifoldCFconfigurationfile,intherightplaceforthemultiprocessscripttofindit
propertiesglobal.xml
anexampleManifoldCFsharedconfigurationfile,intherightplaceforthesetglobalpropertiesscripttofind
it
logging.ini
anexampleManifoldCFloggingconfigurationfile,intherightplacefortheproperties.xmltofindit
zookeeper
theexampleZooKeeperstoragedirectory,whichmustbewritableinorderforZooKeepertowork
logs
wheretheManifoldCFlogsgetwrittento
runzookeeper[.sh|.bat]
scripttorunaZooKeeperserverinstance
setglobalproperties[.sh|.bat]scripttoinitializeZooKeeperwithpropertiesfrompropertiesglobal.xml
startdatabase[.sh|.bat]
scripttostarttheHSQLDBdatabase
initialize[.sh|.bat]
scripttocreatethedatabaseinstance,createalldatabasetables,andregisterconnectors
startwebapps[.sh|.bat]
scripttostartJettywiththeManifoldCFwebapplicationsdeployed
startagents[.sh|.bat]
scripttostart(thefirst)agentsprocess
startagents2[.sh|.bat]
scripttostartasecondagentsprocess
stopagents[.sh|.bat]
scripttostopallrunningagentsprocessescleanly
Initializing the database and running
IfyouruntheZooKeeperbasedmultiprocessexample,thenyoumustfollowthefollowingsteps:
1. StartZooKeeper(usingtherunzookeeper[.sh|.bat]script)
2. InitializetheManifoldCFsharedconfigurationdata(usingsetglobalproperties[.sh|.bat])
3. Startthedatabase(usingstartdatabase[.sh|.bat])
4. Initializethedatabase(usinginitialize[.sh|.bat])
5. Starttheagentsprocess(usingstartagents[.sh|.bat],andoptionallystartagents2[.sh|.bat])
6. Startthewebapplications(usingstartwebapps[.sh|.bat])
Running multiprocess ZooKeeper example using Tomcat
InordertoruntheManifoldCFZooKeeperexampleunderTomcat,youwillneedtotakethefollowingsteps:
1. StartZooKeeper(usingtherunzookeeper[.sh|.bat]script)
2. InitializetheManifoldCFsharedconfigurationdata(usingsetglobalproperties[.sh|.bat])
3. Startthedatabase(usingstartdatabase[.sh|.bat])
4. Initializethedatabase(usinginitialize[.sh|.bat])
5. Starttheagentsprocess(usingstartagents[.sh|.bat],andoptionallystartagents2[.sh|.bat])
6. ModifytheTomcatstartupscript,orusetheTomcatserviceadministrationclient,tosetaJava"Dorg.apache.manifoldcf.configfile"
switchtopointtotheexample'sproperties.xmlfile.
7. StartTomcat.
8. Deployandstartthemcfcrawlerui,mcfauthorityservice,andmcfapiservicewebapplications,preferablyusingtheTomcat
administrationclient.
Commanddriven multiprocess model
ThemostgenericwayofdeployingManifoldCFinvolvescallingManifoldCFoperationsusingscripts.Thereareanumberofjavaclasses
amongtheManifoldCFclassesthatareintendedtobecalleddirectly,toperformspecificactionsintheenvironmentorinthedatabase.
Theseclassesareusuallyinvokedfromthecommandline,withappropriateargumentssupplied,andarethusconsideredtobe
ManifoldCFcommands.Basicfunctionalitysuppliedbythesecommandclassesisasfollows:
Create/DestroytheManifoldCFdatabaseinstance
Start/Stoptheagentsprocess
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
9/17
2015/7/18
BuildingManifoldCF
Register/Unregisteranagentclass(there'scurrentlyonlyoneincluded)
Register/Unregisteranoutputconnector
Register/Unregisteratransformationconnector
Register/Unregisterarepositoryconnector
Register/Unregisteranauthorityconnector
Register/Unregisteramappingconnector
CleanupsynchronizationdirectorygarbageresultingfromanungracefulinterruptionofanManifoldCFprocess
Queryforcertainkindsofjobrelatedinformation
Individualconnectorsmaycontributeadditionalcommandclassesandprocessestothispicture.
Themultiprocesscommandexecutionscriptsaredeliveredintheprocessessubdirectory.Thescriptforexecutingcommandsis
processes/executecommand[.sh|.bat].Thisscriptrequirestwoenvironmentvariablestobesetbeforeexecution:JAVA_HOME,and
MCF_HOME,whichshouldpointtoManifoldCF'shomeexecutiondirectory,wheretheproperties.xmlfileisfound.)
ThebasicstepsrequiredtosetupandrunManifoldCFincommanddrivenfilebasedmultiprocessmodeareasfollows:
InstallPostgreSQLorMySQL.ThePostgreSQLJDBCdriverincludedwithManifoldCFisknowntoworkwithversion9.1,sothat
versionisthecurrentlyrecommendedone.IfyouwanttouseMySQL,theant"downloaddependencies"buildtargetwillfetchthe
appropriateMySQLJDBCdriver.
Configurethedatabaseforyourenvironmentthedefaultconfigurationisacceptablefortestingandexperimentation.
Createthedatabaseinstance(seecommandsbelow)
Initializethedatabaseinstance(seecommandsbelow)
Registerthepullagent(org.apache.manifoldcf.crawler.system.CrawlerAgent,seebelow)
Registeryourconnectorsandauthorities(seebelow)
InstallaJavaapplicationserver,suchasTomcat.
Deploythewarfilesfromweb/war,exceptformcfcombined.war,toyourapplicationserver(seebelow).
SetthestartingenvironmentvariablesforyourappservertoincludeanyDcommandsfoundinweb/define.TheDcommands
shouldbeoftheform,"D<filename>=<filecontents>".Youwillalsoneeda"Dorg.apache.manifoldcf.configfile=<propertiesfile>"
defineoption,ortheequivalent,intheapplicationserver'sJVMstartupinorderforManifoldCFtobeabletolocateitsconfiguration
file.
Usetheprocesses/executecommand[.bat|.sh]commandfromexecutetheappropriatecommandsfromthenextsectionbelow,being
suretofirstsettheJAVA_HOMEandMCF_HOMEenvironmentvariablesproperly.
Startanysupportingprocessesthatresultfromyourbuild.(SomeconnectorssuchasDocumentumandFileNethaveauxiliary
processesyouneedtoruntomaketheseconnectorsfunctional.)
Startyourapplicationserver.
StarttheManifoldCFagentsprocess.
Atthispoint,youshouldbeabletointeractwiththeManifoldCFUI,whichcanbeaccessedviathemcfcrawleruiwebapplication
Thedetailedlistofcommandsispresentedbelow.
Commands
Afteryouhavecreatedthenecessaryconfigurationfiles,youwillneedtoinitializethedatabase,registerthe"pullagent"agent,andthen
registeryourindividualconnectors.ManifoldCFprovidesasetofcommandsforperformingtheseactions,andothersaswell.Theclasses
implementingthesecommandsarespecifiedbelow.
CoreCommandClass
org.apache.manifoldcf.core.DBCreate
org.apache.manifoldcf.core.DBDrop
org.apache.manifoldcf.core.LockClean
org.apache.manifoldcf.core.Obfuscate
Arguments
dbuser[dbpassword]
dbuser[dbpassword]
None
string
AgentsCommandClass
org.apache.manifoldcf.agents.Install
org.apache.manifoldcf.agents.Uninstall
org.apache.manifoldcf.agents.Register
org.apache.manifoldcf.agents.UnRegister
org.apache.manifoldcf.agents.UnRegisterAll
org.apache.manifoldcf.agents.SynchronizeAll
org.apache.manifoldcf.agents.RegisterOutput
org.apache.manifoldcf.agents.UnRegisterOutput
org.apache.manifoldcf.agents.UnRegisterAllOutputs
org.apache.manifoldcf.agents.SynchronizeOutputs
org.apache.manifoldcf.agents.RegisterTransformation
Function
CreateManifoldCFdatabaseinstance
DropManifoldCFdatabaseinstance
Cleanoutsynchronizationdirectory
Obfuscateastring,foruseasanobfuscatedparametervalue
Arguments
None
None
classname
classname
None
None
classname
description
classname
None
None
classname
description
org.apache.manifoldcf.agents.UnRegisterTransformation
classname
org.apache.manifoldcf.agents.UnRegisterAllTransformationsNone
org.apache.manifoldcf.agents.SynchronizeTransformations None
Function
CreateManifoldCFagentstables
RemoveManifoldCFagentstables
Registeranagentclass
Unregisteranagentclass
Unregisterallcurrentagentclasses
Unregisterallregisteredagentclassesthatcan'tbefound
Registeranoutputconnectorclass
Unregisteranoutputconnectorclass
Unregisterallcurrentoutputconnectorclasses
Unregisterallregisteredoutputconnectorclassesthat
can'tbefound
Registeratransformationconnectorclass
Unregisteratransformationconnectorclass
Unregisterallcurrenttransformationconnectorclasses
Unregisterallregisteredtransformationconnectorclasses
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
10/17
2015/7/18
org.apache.manifoldcf.agents.AgentRun
org.apache.manifoldcf.agents.AgentStop
BuildingManifoldCF
thatcan'tbefound
Mainagentsprocessclass
Stopstherunningagentsprocess
None
None
CrawlerCommandClass
org.apache.manifoldcf.crawler.Register
Arguments
classname
description
org.apache.manifoldcf.crawler.UnRegister
classname
org.apache.manifoldcf.crawler.UnRegisterAll
None
org.apache.manifoldcf.crawler.SynchronizeConnectorsNone
Function
Registerarepositoryconnectorclass
org.apache.manifoldcf.crawler.ExportConfiguration
Unregisterarepositoryconnectorclass
Unregisterallrepositoryconnectorclasses
Unregisterallregisteredrepositoryconnectorclassesthatcan't
befound
Exportcrawlerconfigurationtoafile
org.apache.manifoldcf.crawler.ImportConfiguration
filename
[passcode]
filename
[passcode]
Importcrawlerconfigurationfromafile
NOTE:ByaddingapasscodeasasecondargumenttotheExportConfigurationcommandclass,theexportedfilewillbeencryptedby
usingtheAESalgorithm.Thiscanbeusefultopreventrepositorypasswordstobestoredincleartext.Inordertousethisfunctionality,
youmustenterasaltvaluetoyourconfigurationfile.Thesamepasscodealongwiththesaltvalueareusedtodecryptthefilewiththe
ImportConfigurationcommandclass.Seethedocumentationforthecommandsandpropertiesabovetofindthecorrectargumentsand
settings.
AuthorizationDomainCommandClass
org.apache.manifoldcf.authorities.RegisterDomain
org.apache.manifoldcf.authorities.UnRegisterDomain
Arguments
domainnamedescription
domainname
UserMappingCommandClass
org.apache.manifoldcf.authorities.RegisterMapper
Arguments
classname
description
org.apache.manifoldcf.authorities.UnRegisterMapper
classname
org.apache.manifoldcf.authorities.UnRegisterAllMappersNone
org.apache.manifoldcf.authorities.SynchronizeMappers None
AuthorityCommandClass
org.apache.manifoldcf.authorities.RegisterAuthority
Arguments
classname
description
org.apache.manifoldcf.authorities.UnRegisterAuthority
classname
org.apache.manifoldcf.authorities.UnRegisterAllAuthoritiesNone
org.apache.manifoldcf.authorities.SynchronizeAuthorities None
Function
Registeranauthorizationdomain
Unregisteranauthorizationdomain
Function
Registeramappingconnectorclass
Unregisteramappingconnectorclass
Unregisterallmappingconnectorclasses
Unregisterallregisteredmappingconnectorclassesthat
can'tbefound
Function
Registeranauthorityconnectorclass
Unregisteranauthorityconnectorclass
Unregisterallauthorityconnectorclasses
Unregisterallregisteredauthorityconnectorclassesthat
can'tbefound
Rememberthatyouneedtoincludeallthejarsundermultiprocessfileexample/processes/libintheclasspathwheneveryourunoneof
thesecommands!But,luckily,therearescriptswhichdothisforyou.Thesecanbefoundinmultiprocessfile
example/processes/executecommand[.sh,.bat].Thescriptsrequiresomeenvironmentvariablestobeset,suchasMCF_HOMEand
JAVA_HOME,andexpecttheconfigurationfiletobefoundatMCF_HOME/properties.xml.
Deploying the mcfcrawlerui, mcfauthorityservice, and mcfapiservice web applications
IfyoubuiltManifoldCFusingant,thentheantbuildwillhaveconstructedfourwarfilesforyouunderweb/war.Youshouldignorethe
mcfcombinedwarinthisdirectoryforthisdeploymentmodel.IfyouintendtorunManifoldCFinmultiprocessmode,youwillneedto
deploytheotherwebapplicationsonyouapplicationserver.Thereisnorequirementthatthemcfcrawlerui,mcfauthorityservice,
andmcfapiservicewebapplicationsbedeployedonthesameinstanceoftheapplicationserver.Withthecurrentarchitectureof
ManifoldCF,theymustbedeployedonthesamephysicalserver,however.
ForeachoftheapplicationserversinvolvedwithManifoldCF,youmustsetthefollowingdefine,sothattheManifoldCFwebapplications
canlocatetheconfigurationfile:
Dorg.apache.manifoldcf.configfile=<configurationfilepath>
TheagentsprocessistheprocessthatactuallyperformsthecrawlingforManifoldCF.Startthisprocessbyrunningthecommand
"org.apache.manifoldcf.agents.AgentRun".Thisclasswillrununtilstoppedbyinvokingthecommand
"org.apache.manifoldcf.agents.AgentStop".Itishighlyrecommendedthatyoustoptheprocessinthisway.Youmayalsostoptheprocess
usingaSIGTERMsignal,but"kill9"ortheequivalentisNOTrecommended,becausethatmayresultindanglinglocksinthe
ManifoldCFsynchronizationdirectory.(Ifyouhaveto,cleanuptheselocksbyshuttingdownallManifoldCFprocesses,includingthe
applicationserverinstancesthatarerunningthewebapplications,andinvokingthecommand"org.apache.manifoldcf.core.LockClean".)
The connectors.xml configuration file
Thequickstart,combined,andsimplifiedmultiprocesssampledeploymentsofManifoldCFhavetheirownconfigurationfile,called
connectors.xml,whichisusedtoregistertheavailableconnectorsinthedatabase.Thefilehasthisbasicformat:
<?xmlversion="1.0"encoding="UTF8"?>
<connectors>
(clauses)
</connectors>
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
11/17
2015/7/18
BuildingManifoldCF
Thefollowingtagsareavailabletospecifyyourconnectorsandauthorizationdomains:
<repositoryconnectorname="pretty_name"class="connector_class"/>
<authorityconnectorname="pretty_name"class="connector_class"/>
<mappingconnectorname="pretty_name"class="connector_class"/>
<outputconnectorname="pretty_name"class="connector_class"/>
<transformationconnectorname="pretty_name"class="connector_class"/>
<authorizationdomainname="pretty_name"domain="domain_name"/>
Theconnectors.xmlfiletypicallyhassomeconnectorscommentedoutnamelytheonesbuildwithstubswhichrequireyoutosupplya
thirdpartylibraryinorderfortheconnectortorun.IfyoubuildManifoldCFyourself,theexampleproprietaryandmultiprocessfile
exampleproprietaryandmultiprocesszkexampleproprietarydirectoriesinsteaduseconnectorsproprietary.xml.Theconnectorsyou
buildagainsttheproprietarylibrariesyousupplywillnothavetheirconnectorsproprietary.xmltagscommentedout.
Running connectorspecific processes
Connectorspecificprocessesrequiretheclasspathfortheirinvocationtoincludeallthejarsthatareinthecorresponding
processes/<process_name>directory.TheDocumentumandFileNetconnectorsaretheonlytwoconnectorsthatcurrentlyrequire
additionalprocesses.Starttheseprocessesusingthecommandslistedbelow,andstopthemwithSIGTERM(or^C,iftheyarerunningin
ashell).
Connector
Process
Documentumprocesses/documentum
server
Documentumprocesses/documentum
registry
FileNet
processes/filenetserver
FileNet
processes/filenetregistry
Mainclass
Scriptname(relativetodist)
org.apache.manifoldcf.crawler.server.DCTM.DCTM processes/documentum
server/run[.sh|.bat]
org.apache.manifoldcf.crawler.registry.DCTM.DCTMprocesses/documentum
registry/run[.sh|.bat]
org.apache.manifoldcf.crawler.server.filenet.Filenet processes/filenetserver/run[.sh|.bat]
org.apache.manifoldcf.crawler.registry.filenet.Filenet processes/filenetregistry/run[.sh|.bat]
Theregistryprocessinallcasesmustbestartedbeforethecorrespondingserverprocess,ortheserverprocesswillreportanerror.(Itwill,
however,retryaftersomeperiodoftime.)ThescriptsallrequireanMCF_HOMEenvironmentvariablepointingtotheplacewhere
properties.xmlisfound,aswellasaJAVA_HOMEenvironmentvariablepointingtheJDK.Theserverscriptsalsorequireother
environmentvariablesaswell,consistentwiththeneedsoftheDFCortheFileNetAPIrespectively.Forexample,DFCrequiresthe
DOCUMENTUMenvironmentvariabletobeset,whiletheFileNetserverscriptrequirestheWASP_HOMEenvironmentvariable.
Itisimportanttounderstandthatthescriptsworkbybuildingaclasspathoutofalljarsthatgetcopiedintothelibandlibproprietary
directoryunderneatheachprocessduringtheantbuild.Thelibproprietaryjarscannotbedistributedinthebinaryversionof
ManifoldCF,soifyouusethisoptionyouwillstillneedtocopythemthereyourselffortheprocessestorun.IfyoubuildManifoldCF
yourself,thesejarsarecopiedfromthelibproprietarydirectoriesunderneaththedocumentumorfilenetconnectordirectories.Forthe
serverstartupscriptstoworkproperly,thelibproprietarydirectoriesshouldhaveallofthejarsneededtoallowtheapicodetofunction.
Database selection
YouhaveavarietyofopensourcedatabasestochoosefromwhendeployingManifoldCF.Thesupporteddatabaseseachhavetheirown
strengthsandweaknesses,andarelistedbelow:
PostgreSQL(preferred)
MySQL(preferred)
MariaDB(notyetevaluated))
HSQLDB
Youcanselectthedatabaseofyourchoicebysettingtheappropratepropertiesintheapplicableproperties.xmlfile.Thechoiceof
databaseislargelyorthogonaltothechoiceofdeploymentmodel.TheManifoldCFdeploymentexamplesprovidedcanthusbereadily
alteredtousethedatabaseyoudesire.Thedetailsandcaveatsofeachchoiceisdescribedbelow.
Configuring a PostgreSQL database
Despitehavinganinternalarchitecturethatcleanlyabstractsfromspecificdatabasedetails,ManifoldCFiscurrentlyfairlyspecificto
PostgreSQLatthistime.Thereareanumberofreasonsforthis.
ManifoldCFusesthedatabaseforitsdocumentqueue,whichplacesasignificantloadonit.Thebackenddatabaseisthusa
significantfactorinManifoldCF'sperformance.But,inexchange,ManifoldCFbenefitsenormouslyfromtheunderlyingACID
propertiesofthedatabase.
Thestrategyforgettingoptimalqueryplansfromthedatabaseisnotabstracted.Forexample,PostgreSQL8.3+isverysensitiveto
certainstatisticsaboutadatabasetable,andwillnotgenerateaperformantplanifthestatisticsareinaccuratebyevenalittle,in
somecases.So,forPostgreSQL,thedatabasetablemustbeanalyzedveryfrequently,toavoidcatastrophicallybadplans.But
luckily,PostgreSQLisprettygoodatdoinganalysisquickly.Oracle,ontheotherhand,takesaverylongtimetoperformanalysis,
butitsplansaremuchlesssensitive.
PostgreSQLalwaysdoesasequentialscaninordertocountthenumberofrowsinatable,whileotherdatabasesreturnthis
efficiently.ThishasaffectedthedesignoftheManifoldCFUI.
Thechoiceofqueryforminfluencesthequeryplan.Ideally,thisisnottrue,butforbothPostgreSQLandfor(say)Oracle,itis.
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
12/17
2015/7/18
BuildingManifoldCF
PostgreSQLhasahighdegreeofparallelismandlackofinternalsinglethreadedness.
ManifoldCFhasbeentestedagainstversion8.3.7,8.4.5,9.1,9.2,and9.3ofPostgreSQL.Werecommendthefollowingconfiguration
parametersettingstoworkoptimallywithManifoldCF:
AdefaultdatabaseencodingofUTF8
postgresql.confsettingsasdescribedinthetablebelow
pg_hba.confsettingstoallowpasswordaccessforTCP/IPconnectionsfromManifoldCF
Amaintenancestrategyinvolvingcronjobstylevacuuming,ratherthanPostgreSQLautovacuum
Postgresql.confparameters
postgresql.confparameter
standard_conforming_strings
shared_buffers
checkpoint_segments
maintenanceworkmem
tcpip_socket
max_connections
checkpoint_timeout
datestyle
autovacuum
Testedvalue
on
1024MB
300
2MB
true
400
900
ISO,European
off
Notewell:Thestandard_conforming_stringsparametersettingisimportanttopreventanypossibilityofSQLinjectionattacks.While
ManifoldCFusesparameterizedqueriesinalmostallcases,whenitdoesdostringquotingitpresumesthattheSQLstandardforquoting
isadheredto.ItisingeneralgoodpracticetosetthisparameterwhenworkingwithPostgreSQLforthisreason.
A note about PostgreSQL database maintenance
PostgreSQL'sarchitecturecausesittoaccumulatedeadtuplesinitsdatafiles,whichdonotinterferewithitsperformancebutdobloatthe
databaseovertime.TheusagepatternofManifoldCFissuchthatitcancausesignificantbloattooccurtotheunderlyingPostgreSQL
databaseinonlyafewdays,undersufficientload.PostgreSQLhasafeaturetoaddressthisbloat,calledvacuuming.Thiscomesinthree
varieties:autovacuum,manualvacuum,andmanualfullvacuum.
WehavefoundthatPostgreSQL'sautovacuumfeatureisinadequateundersuchconditions,becauseitnotonlyfightsfordatabase
resourcesprettymuchallthetime,butitfallsfurtherandfurtherbehindaswell.PostgreSQL'sinplacemanualvacuumfunctionalityisa
bitbetter,butisstillmuch,muchslowerthanactuallymakinganewcopyofthedatabasefiles,whichiswhathappenswhenamanualfull
vacuumisperformed.
DeadtuplebloatalsooccursinindexesinPostgreSQL,sotablesthathavehadalotofactivitymaybenefitfrombeingreindexedatthe
timeofmaintenance.
Wethereforerecommendperiodic,scheduledmaintenanceoperationsinstead,consistingofthefollowing:
VACUUMFULLVERBOSE
REINDEXDATABASE<the_db_name>
Duringmaintenance,PostgreSQLlockstablesoneatatime.Nevertheless,thecrawleruimaybecomeunresponsiveforsomeoperations,
suchaswhencountingoutstandingdocumentsonthejobstatuspage.ManifoldCFthushastheabilitytocheckfortheexistenceofafile
priortosuchsensitiveoperations,andwilldisplayauseful"maintenanceinprogress"messageifthatfileisfound.Thisallowsauserto
setupamaintenancesystemthatprovidesadequatefeedbackforanManifoldCFuseroftheoverallstatusofthesystem.
Configuring a MySQL database
MySQLisnotquiteasfastasPostgreSQL,butitisarelativelyclosesecondinperformancetests.Nevertheless,theManifoldCFteamdoes
nothavealargeamountofexperiencewiththisdatabaseatthistime.Moredetailswillbeaddedtothissectionasinformationand
experiencebecomesavailable.
Configuring an HSQLDB database
HSQLDB'sperformanceseemscloselytiedtohowmuchofthedatabasecanbeactuallyheldinmemory.Performanceatthistimeisabout
halfthatofPostgreSQL.
HSQLDBcanbeusedwithManifoldCFineitheranembeddedfashion(whichonlyworkswithsingleprocessdeployments),orinexternal
fashion,withadatabaseinstancerunninginaseparateprocess.Seetheproperties.xmlpropertydescriptionsforconfigurationdetails.
The ManifoldCF configuration files
Currently,ManifoldCFrequirestwoconfigurationfiles:themainconfigurationpropertyfile,andtheloggingconfigurationfile.
properties.xml file properties
Theproperties.xmlpropertyfilepathcanbespecifiedbythesystemproperty"org.apache.manifoldcf.configfile".Ifnotspecifiedthrougha
Doperation,itsnameispresumedtobe<user_home>/lcf/properties.xml.TheformofthepropertyfileisXML,ofthefollowingbasic
form:
<?xmlversion="1.0"encoding="UTF8"?>
<configuration>
(clauses)
</configuration>
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
13/17
2015/7/18
BuildingManifoldCF
Theproperties.xmlfileallowspropertiestobespecified.Apropertyclausehastheform:
<propertyname="property_name"value="property_value"/>
Oneoftheoptionalpropertiesisthenameoftheloggingconfigurationfile.Thisproperty'snameis"org.apache.manifoldcf.logconfigfile".If
notpresent,theloggingconfigurationfilewillbeassumedtobe<user_home>/manifoldcf/logging.ini.Theloggingconfigurationfileisa
standardcommonsloggingpropertyfile,andshouldbeformattedaccordingly.
Notethatallpropertiesdescribedbelowcanalsobespecifiedonthecommandline,viaaDswitch.Ifbothmethodsofsettingthe
propertyareused,theDswitchvaluewilloverridethepropertyfilevalue.
Thefollowingtabledescribestheconfigurationpropertyfileproperties,andwhattheydo:
property.xmlproperties
Required?
Function
No
CrawlerUIloginuserID(defaultsto"admin")
No
CrawlerUIloginuserpassword(defaultsto"admin")
No
ObfuscatedcrawlerUIloginuserpassword(defaultsto"admin")
No
APIloginuserID(defaultsto"")
No
APIloginuserpassword(defaultsto"")
No
ObfuscatedAPIloginuserpassword(defaultsto"")
Yes,forJetty LocationofCrawlerUIwar
Yes,forJetty LocationofAuthorityServicewar
Yes,forJetty LocationofAPIServicewar
Yes,forJetty trueforsingleprocessexample,falseformultiprocessexample.
No
Locationofconnectors.xmlfile,forQuickStart,soManifoldCFcan
registerconnectors.
org.apache.manifoldcf.dbsuperusername
No
Databasesuperusername,forQuickStart,soManifoldCFcan
createdatabaseinstance.
org.apache.manifoldcf.dbsuperuserpassword
No
Databasesuperuserpassword,forQuickStart,soManifoldCFcan
createdatabaseinstance.
org.apache.manifoldcf.dbsuperuserpassword.obfuscatedNo
Obfuscateddatabasesuperuserpassword,forQuickStart,so
ManifoldCFcancreatedatabaseinstance.
org.apache.manifoldcf.ui.maxstatuscount
No
ThemaximumnumberofdocumentsManifoldCFwilltrytocount
forthejobstatusdisplay.Defaultsto500000.
org.apache.manifoldcf.databaseimplementationclass
No
Specifiestheclasstousetoimplementdatabaseaccess.Defaultis
abuiltinHsqldbimplementation.Supportedchoicesare:
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL,
org.apache.manifoldcf.core.database.DBInterfaceMySQL,
org.apache.manifoldcf.core.database.DBInterfaceMariaDB,
org.apache.manifoldcf.core.database.DBInterfaceHSQLDB
org.apache.manifoldcf.postgresql.hostname
No
PostgreSQLserverhostname,orlocalhostifnotspecified.
org.apache.manifoldcf.postgresql.port
No
PostgreSQLserverport,orstandardportifnotspecified.
org.apache.manifoldcf.postgresql.ssl
No
Setto"true"forsslcommunicationwithPostgreSQL.
org.apache.manifoldcf.mysql.server
No
TheMySQLorMariaDBservername.Defaultsto'localhost'.
org.apache.manifoldcf.mysql.client
No
TheMySQLorMariaDBclientproperty.Defaultsto'localhost'.
Youmaywanttosetthisto'%'foramultimachinesetup.
org.apache.manifoldcf.hsqldbdatabasepath
No
AbsoluteorrelativepathtoHSQLDBdatabasedefaultis'.'.
org.apache.manifoldcf.hsqldbdatabaseprotocol
Yes,forremote TheHSQLDBJDBCprotocolchoicesare'hsql','http',or'https'.
HSQLDB
Defaultisblank(whichmeansanembeddedinstance)
connection
org.apache.manifoldcf.hsqldbdatabaseserver
Yes,forremote TheHSQLDBremoteservername.
HSQLDB
connection
org.apache.manifoldcf.hsqldbdatabaseport
No
TheHSQLDBremoteserverport.
org.apache.manifoldcf.hsqldbdatabaseinstance
No
TheHSQLDBremotedatabaseinstancename.
org.apache.manifoldcf.lockmanagerclass
No
Specifiestheclasstousetoimplementsynchronization.Defaultis
eitherfilebasedsynchronizationorinmemorysynchronization,
usingtheorg.apache.manifoldcf.core.lockmanager.LockManager
class.Optionsinclude
org.apache.manifoldcf.core.lockmanager.BaseLockManager,
org.apache.manifoldcf.core.FileLockManager,and
org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager.
org.apache.manifoldcf.synchdirectory
Yes,iffile
Specifiesthepathofasynchronizationdirectory.AllManifoldCF
based
processownersmusthaveread/writeprivilegestothisdirectory.
synchronization
classis
specified
org.apache.manifoldcf.zookeeper.connectstring
Yes,if
SpecifiestheZooKeeperconnectionstring,consistingofcomma
ZooKeeper
separatedhostname:portpairs.
based
synchronization
classis
specified
org.apache.manifoldcf.zookeeper.sessiontimeout
No
SpecifiestheZooKeepersessiontimeout,if
ZooKeeperLockManagerisspecified.Defaultsto2000.
org.apache.manifoldcf.database.maxhandles
No
Specifiesthemaximumnumberofdatabaseconnectionhandles
Property
org.apache.manifoldcf.login.name
org.apache.manifoldcf.login.password
org.apache.manifoldcf.login.password.obfuscated
org.apache.manifoldcf.login.apiname
org.apache.manifoldcf.login.apipassword
org.apache.manifoldcf.login.apipassword.obfuscated
org.apache.manifoldcf.crawleruiwarpath
org.apache.manifoldcf.authorityservicewarpath
org.apache.manifoldcf.apiservicewarpath
org.apache.manifoldcf.usejettyparentclassloader
org.apache.manifoldcf.connectorsconfigurationfile
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
14/17
2015/7/18
BuildingManifoldCF
thatwillbypooled.Recommendedvalueis200.
org.apache.manifoldcf.database.handletimeout
No
org.apache.manifoldcf.database.connectiontracking
No
org.apache.manifoldcf.logconfigfile
org.apache.manifoldcf.database.name
No
No
org.apache.manifoldcf.database.username
No
org.apache.manifoldcf.database.password
No
org.apache.manifoldcf.database.password.obfuscated
No
org.apache.manifoldcf.crawler.threads
org.apache.manifoldcf.crawler.expirethreads
No
No
org.apache.manifoldcf.crawler.cleanupthreads
No
org.apache.manifoldcf.crawler.deletethreads
org.apache.manifoldcf.crawler.historycleanupinterval
No
No
org.apache.manifoldcf.misc
No
org.apache.manifoldcf.db
No
org.apache.manifoldcf.lock
No
org.apache.manifoldcf.cache
No
org.apache.manifoldcf.agents
No
org.apache.manifoldcf.perf
No
org.apache.manifoldcf.crawlerthreads
No
org.apache.manifoldcf.hopcount
No
org.apache.manifoldcf.jobs
org.apache.manifoldcf.connectors
org.apache.manifoldcf.scheduling
No
No
No
org.apache.manifoldcf.authorityconnectors
No
org.apache.manifoldcf.authorityservice
No
org.apache.manifoldcf.salt
Yes,iffile
encryptionis
used
Specifiesthemaximumtimeahandleistolivebeforeitis
presumeddead.Recommendavalueof604800,whichisthe
maximumallowable.
Trueorfalse.When"true",willtrackallallocateddatabase
connectionhandles,andwilldumpanallocationstacktracewhen
thepoolisexhausted.Usefulfordiagnosingconnectionleaks.
Specifieslocationofloggingconfigurationfile.
DescribesdatabasenameforManifoldCFdefaultsto"dbname"if
notspecified.
DescribesdatabaseusernameforManifoldCFdefaultsto
"manifoldcf"ifnotspecified.
Describesdatabaseuser'spasswordforManifoldCFdefaultsto
"local_pg_password"ifnotspecified.
Obfuscateddatabaseuser'spasswordforManifoldCFdefaultsto
"local_pg_password"ifnotspecified.
Numberofcrawlerworkerthreadscreated.Suggestavalueof30.
Numberofcrawlerexpirationthreadscreated.Suggestavalueof
10.
Numberofcrawlercleanupthreadscreated.Suggestavalueof
10.
Numberofcrawlerdeletethreadscreated.Suggestavalueof10.
Millisecondstoretainhistoryrecords.Defaultis0.Zeromeans
"forever".
Miscellaneousdebuggingoutput.LegalvaluesINFO,WARN,or
DEBUG.
Databasedebuggingoutput.LegalvaluesINFO,WARN,or
DEBUG.
Lockmanagementdebuggingoutput.LegalvaluesINFO,WARN,
orDEBUG.
Cachemanagementdebuggingoutput.LegalvaluesINFO,
WARN,orDEBUG.
Agentmanagementdebuggingoutput.LegalvaluesINFO,
WARN,orDEBUG.
Performanceloggingdebuggingoutput.LegalvaluesINFO,
WARN,orDEBUG.
Logcrawlerthreadactivity.LegalvaluesINFO,WARN,or
DEBUG.
Loghopcounttrackingactivity.LegalvaluesINFO,WARN,or
DEBUG.
Logjobactivity.LegalvaluesINFO,WARN,orDEBUG.
Logconnectoractivity.LegalvaluesINFO,WARN,orDEBUG.
Logdocumentschedulingactivity.LegalvaluesINFO,WARN,or
DEBUG.
Logauthorityconnectoractivity.LegalvaluesINFO,WARN,or
DEBUG.
Logauthorityserviceactivity.LegalvaluesareINFO,WARN,or
DEBUG.
Specifythesaltvaluetobeusedforencryptingthefiletowhich
thecrawlerconfigurationisexported.
Thefollowingtabledescribes'advanced'configurationpropertyfileproperties.Theyshouldn'tneedtobechangedbutprovideagreater
levelofcustomization:
Advancedproperty.xmlproperties
Property
Required?Default
Function
org.apache.manifoldcf.crawler.repository.store_historyNo
true
Ifyoudonotrequirereportsfromwithinthiswilldisable
loggingtotherepositoryhistory(althoughthereportswillstill
runtheywillnotcontainanycontent).Thiscanincrease
throughputandreducetherateofgrowthofthedatabase.
org.apache.manifoldcf.db.postgres.analyze.
No
2000
Forpostgresql,specifyhowmanychangesshouldbecarried
<tablename>
outbeforecarryingoutan'ANALYZE'onthespecifiedtable.
org.apache.manifoldcf.db.postgres.reindex.
No
250000 Forpostgresql,specifyhowmanychangesshouldbecarried
<tablename>
outbeforecarryingoutan'REINDEX'onthespecifiedtable.
org.apache.manifoldcf.db.mysql.analyze.<tablename> No
org.apache.manifoldcf.ui.maxstatuscount
No
2000
ForMySqlorMariaDB,specifyhowmanychangesshouldbe
carriedoutbeforecarryingoutan'ANALYZE'onthespecified
table.
500000 Settheupperlimitfortheprecisedocumentcounttobe
returnedonthe'StatusandJobManagement'page.
Theconfigurationfilecanalsospecifyasetofdirectorieswhichwillbesearchedforconnectorjars.Thedirectivethataddstotheclass
pathis:
<libdirpath="path"/>
Notethatthepathcanberelative.Forthepurposesofpathresolution,"."meansthedirectoryinwhichtheproperties.xmlfileisitself
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
15/17
2015/7/18
BuildingManifoldCF
located.
Logging configuration file properties
Thelogging.inifilecontainsApachecommonsloggingpropertiesinastandardJava<name>=<value>format.ThewaytheManifoldCF
loggingoutputisformattediscontrolledthroughthisfile,asareanyloggersthatManifoldCFdoesn'texplicitlydefine(e.g.loggersfor
Apachecommonshttpclient).Otherresourcesarethereforebestsuitedtodescribetheparametersthatcanbeusedandtowhateffect.
Running the ManifoldCF Apache2 plug in
TheManifoldCFApache2plugin,modauthzannotate,isdesignedtoconvertanauthenticatedprinciple(e.g.frommodauthkerb),and
queryasetofauthorityservicesforaccesstokensusinganHTTPrequest.Theseaccesstokensarethenpassedtoa(notincluded)search
engineUI,whichcanusethemtohelpcomposeasearchthatproperlyexcludescontentthattheuserisnotsupposedtosee.
ThelistofauthorityservicessoqueriedisconfiguredinApache'shttpd.conffile.Thisprojectincludesonlyonesuchservice:thejava
authorityservice,whichusesauthorityconnectionsdefinedinthecrawlerUItoobtainappropriateaccesstokens.
Inorderformodauthzannotatetobeused,itmustbeplacedintoApache2'sextensionsdirectory,andconfiguredappropriatelyinthe
httpd.conffile.
Note:TheManifoldCFprojectnowcontainssupportforconvertingaKerberosprincipaltoalistofActiveDirectorySIDs.This
functionalityiscontainedintheActiveDirectoryAuthority.Thefollowingconnectorsareexpectedtomakeuseofthisauthority:
FileNet
CIFS
SharePoint
Configuring the ManifoldCF Apache2 plug in
modauthzannotateunderstandsthefollowinghttpd.confcommands:
Command
AuthzAnnotateEnable
AuthzAnnotateAuthority
AuthzAnnotateACLAuthority
AuthzAnnotateIDAuthority
AuthzAnnotateIDACLAuthority
Meaning
Turnon/offtheplugin
PointtoanauthorityservicethatsupportsACLqueries,butnotIDqueries
PointtoanauthorityservicethatsupportsACLqueries,butnotIDqueries
PointtoanauthorityservicethatsupportsIDqueries,butnotACLqueries
PointtoanauthorityservicethatsupportsbothACLqueriesandIDqueries
Values
"On","Off"
TheauthorityURL
TheauthorityURL
TheauthorityURL
TheauthorityURL
AuthorityServiceURLparameters
Meaning
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
16/17
2015/7/18
BuildingManifoldCF
parameter
username
domain
username_XX
domain_XX
theusername,ifthereisonlyoneauthorizationdomain
theoptionalauthorizationdomainifthereisonlyoneauthorizationdomain(defaultstoempty
string)
usernamenumberXX,whereXXisanintegerstartingatzero
authorizationdomainXX,whereXXisanintegerstartingatzero
AccesstokensandauthoritystatusesarereturnedintheHTTPresponseseparatedbynewlinecharacters.Eachlinehasaprefixas
follows:
AuthorityServiceresponseprefixes
AuthorityServiceresponseprefix
TOKEN:
AUTHORIZED:
UNREACHABLEAUTHORITY:
UNAUTHORIZED:
USERNOTFOUND:
Meaning
Anaccesstoken
Thenameofanauthoritythatfoundtheusertobeauthorized
Thenameofanauthoritythatwasfoundtobeunreachableorunusable
Thenameofanauthoritythatfoundtheusertobeunauthorized
Thenameofanauthoritythatcouldnotfindtheuser
Itisimportanttorememberthatonlythe"TOKEN:"linesactuallymatterforsecurity.Evenifanyoftheerrorconditionsapply,thesetof
tokensreturnedbytheAuthorityServicewillbecorrectlysuppliedinordertoapplyappropriatesecuritytodocumentsbeingsearched.
IfyouchoosetodeployasearchenginepluginsuppliedbytheApacheManifoldCFproject(forexample,theSolrplugin),youwillnotneed
knowanyoftheabove,sincepartoftheplugin'spurposeistocommunicatewiththeAuthorityServiceandapplytheaccesstokensthat
arereturnedtothesearchqueryautomatically.Someplugins,suchastheElasticSearchplugin,aremoreorlessliketoolkits,butstillhide
mostoftheabovefromtheintegrator.Inamorehighlycustomizedsystem,however,youmayneedtodevelopyourowncodewhich
interactswiththeAuthorityServiceinordertomeetyourgoals.
LastPublished:05/05/201508:23:01
Copyright20092015TheApacheSoftwareFoundation.
ApacheManifoldCF,ManifoldCF,ApacheForrest,Forrest,ApacheSolr,Solr,Apache,theApachefeatherlogo,theApacheForrestlogo,andtheApacheManifoldCFlogoare
trademarksofTheApacheSoftwareFoundation.DocumentumandEMCareatrademarksofEMCCorporation.SharePoint,Windows,andMicrosoftaretrademarksof
Microsoft,Inc.FileNetP8andIBMaretrademarksofIBM,Inc.LiveLinkandOpenTextaretrademarksofOpenText,Inc.QBase,MetaCarta,andGTSaretrademarksofQBase,
Inc.MeridioandAutonomyaretrademarksofHewlettPackard,Inc.AlfrescoisatrademarkofAlfrescoSoftware,Inc.JiraisatrademarkofAtlassian,Inc.
http://manifoldcf.apache.org/release/release2.1/en_US/howtobuildanddeploy.html#Running+ManifoldCF
17/17