Professional Documents
Culture Documents
DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine
Welcome,LuisFabian|MyAccount|LogOut
WhitePapers|WebSeminars|Newsletters|eBooks
BigData&Analytics
DataManagement
MDM&DataGovernance
Infrastructure
InfoStrategy&Leadership
BI&DataDiscovery
Mobility
webseminars&
whitepapers
resource
center
DataQualityIsn'tJustaDataManagement
Problem
byWalterHoward
OCT1,20071:00amET
Print
Email
Reprints
Comment
Twitter
LinkedIn
Facebook
Google+
Afewmonthsago,Iwasreviewingthelatestdataprofilingresults(thankyoudataprofilingvendors!)
forthreenewdatasourcesIneededtointegrateintomyenterprisecustomerdataintegration(CDI)hub.
WhenIreachedtheubiquitousstatecodefield,IinstinctivelycringedwhenIglancedatthecolumn
metadatareports.Intwoofthefiles,statecodewasdefinedasatwobytefield,justwhatIwouldexpect.
Butareviewofthefrequencydistributionreportshowedthefirstfilehad64distinctvalues,whilethe
secondfilehad67distinctvalues.Thelastfilewasinworseshape.Thecolumnlengthwasdefinedata
whopping18byteswithmorethan260distinctvalues.Now,Americanshavebeenportrayedinthe
mediaaslessthanstellarwhenitcomestogeography,but260statecodesis,asmywifelikestosay,just
ridiculous.AquickcheckatWikipedia(afterall,Ifitthemedia'srepresentationofgeographically
challenged)confirmedmysuspicions.Atmost,thereshouldbe54values,50statesandfour
http://www.informationmanagement.com/issues/20071001/10936061.html
1/6
29/7/2015
DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine
commonwealths.Thisnumberassumesyou'reoverloadingthemeaningof"state"inyourdatamodel,
which,arguably,isnotgooddatabasedesign.Abetterdatabasedesignwouldmodelstatesand
commonwealthsasseparatecolumns,butIdigress.
WhileIamhappytocontinuedevelopingdataintegrationlogictofixincorrectorinconsistentdatatoa
conformedcodeset,Ifindmyselfaskingwhyisthisadatamanagementproblem?Wheredidthedataor
processrulesbreakdownintheoriginalcaptureandverificationoftheaddressdatatocausesuchan
overwhelminginconsistencyintheoperationaldata?Myexperiencehasshownthatmakingchangesto
datavaluesinthewarehousetogetdataconsistencyisarecipefordisaster.Inthecaseofnumeric
changes,youcanloseyourabilitytoreconcilemetricstothesourcesystem.Forcharacterdata,the
businesshastolearnnewcodevaluesandmeanings.Andtoexacerbatethings,theproblemisnever
fixed.Sowhoseproblemisdataqualityanyway?
Myfirstthoughtgoestothedatamodelerwhodesignedtheonlinetransactionalprocessing(OLTP)
database.Clearly,adomaintableofstateandcommonwealthcodesshouldhavebeendefinedtoenforce
acommoncodeset.Creatingan18bytestatecodecolumnisclearlyanegregiousoversight.Today's
savvyusersarequicktoidentifyholesintheeditchecksperformedbyapplicationsanddatabaselogic.
OnejusthastolooktothequalityandcontentoftheveritableSocialSecuritynumberoraddressline3to
seetheresultofthelackofeditsonscreenfields.Atextfieldonascreenwithnoeditchecksisopen
seasonforabuse.Ifthefieldislongenough,youwillstarttoseeXMLishtypetexttodistinguishthe
multipleconcatenatedattributesthebusinessaddedwhilewaitingfortheITdepartmenttoreleasethe
nextversionoftheapplication.
Mysecondthoughtgoestotheapplicationdevelopmentteam.Thereisoftenabigpushbythesoftware
developmentteamstoremovetheenforcementofanybusinessrulesbythedatabasemanagement
system.Whilethispotentiallyabsolvesthedatamodelerofguilt,thefactremainsthatbaddatamadeits
wayintothedatabase.Theapplicationeditchecksfailedtorecognizethe50validstatecodesorprovide
anytextstandardizationconversions.Forexample,IcanfindmultiplevaluesforTexas,includingTX,
TxandTexas.Andasweallknowbynow,aslongasdatacanbecreated,updatedanddeletedoutsideof
theapplicationlogic,businessrulesbuiltintotheapplicationwillbebypassed.
Mythirdthoughtgoestothequalityassurance(QA)team.Dependingonthesizeofthecompany,QA
teamscanbehitormiss.BythatImeanlargercompaniestendtohavededicatedQAteamstoensurethe
endproductmeetstheoriginalrequirements.That'snottosaydataqualityisbetteratlargecompanies.
Onejustneedstolooktothepriceofdataqualitysoftwaretoseewhomthesoftwarevendorsare
targeting.Mostlikely,lostinhundredsoffunctionalandnonfunctionalrequirementsforanapplication,
thefactthatstatecodeshouldbeonlytwobytesinlengthandshouldconformtotheUSPSstandardwas
overlooked.Withoutaspecificrequirementtotest,abadstatecodewouldpassQAwithflyingcolors.
Morethanlikely,someoneassumedeveryoneknewthe50statecodesandthatwritingvalidationcode
wasawasteoftime.Afterall,everyoneknowsthestateabbreviationsforMichigan,Minnesotaand
Missouri.(Don'tfeelbadifyouhavetocheckIdid.)
Myfinalthoughtgoestothebusinessusers.Atarecentclientengagement,oneoftheseniordata
architectssaiddataqualityattheircompanyisanafterthought.Thebusinesshaddecidedearlyonthat
dataintegrityissuessuchasbadaddressdataandinvalidpersonalidentificationattributesshouldnotbe
constrainingbusinessrules.Thesedataitems,whilecriticalpiecesofinformation,werenotimportant
enoughtovalidateandenforcevalidityorconformityatthepointoforigination.Badinformationwas
capturedandpassedontothenextapplicationthatassumedthefirstapplicationhaddoneitsjob.Voil
baddataisnowpersistentlystoredinmultipledatastores.
Whichbringsmebacktothebeginning.HereIsitenforcingthebusinessrulesthattheOLTPmodeler
http://www.informationmanagement.com/issues/20071001/10936061.html
2/6
29/7/2015
DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine
omitted,theapplicationteamdidn'timplement,theQAteamoverlookedandthebusinessdecidedwasn't
importantenough.Dataqualityisn'tjustadatamanagementproblem,it'sacompanyproblem.
JOINTHEDISCUSSION
Comment
SEEMOREIN
DataManagement
RELATEDTAGS
DataManagement,
DataQuality
Comments(0)
Bethefirsttocommentonthispostusingthesectionbelow.
AddYourComments:
Addyourcomments
here.
Notifymewhenotherreaderscommentonthisarticle.
Clickheretoreceivenotificationswithoutcommenting
MostRead
MostEmailed
EnablingaDataCultureThroughContinuousImprovement
Gartners10AgileSoftwareDevelopmentTips
8ObjectivesforYourMDMStrategy
5WaysBigDataDisruptsYourExistingDataWarehouse(InAGoodWay)
BusinessAnalytics:OppositionorProposition?
Analytics
http://www.informationmanagement.com/issues/20071001/10936061.html
3/6
29/7/2015
DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine
IsADataLakeTHEAnswer?ThinkAgain.HereComesElasticAnalytics
DigitalAdTechDataDemandsNewMarketingResponsibilities
BusinessAnalytics:OppositionorProposition?
EnablingaDataCultureThroughContinuousImprovement
BusinessIntelligence
EarlierGenerationBINeedsATuneUp
RiseoftheDataVisualizationCompetencyCenter
BusinessIntelligenceProsSidetrackedWithDataCleanup
InsuranceTechCompaniesFaceInternetofThingsArmsRace
CustomerExperience
HealthcareDataProMakesMedicareAnalytics,ToolsPush
CompaniesNeedDisruptiveCustomerExperienceTechnologies
AirlinePursuesAmazon'sDatadrivenCustomerApproach
BigDatavsNetPromoterScore:ADifferentView
OpenSource
TheRiseofNoSQL
Hortonworks:InsidetheOpenEnterpriseHadoopPush
Top10PrioritiesforBigDataManagement
WhereRWeGoingNext?TheRRevolution
PredictiveAnalytics
PredictiveAnalyticsEnterstheBusinessMainstream
WhereRWeGoingNext?TheRRevolution
RealtimeDataDemandSurgesinOilandGasIndustry
InsurerOffersConsumerDiscountsforSmartHomeData
DataGovernance
9MasterDataManagement&DataGovernanceTrendstoTrack
8ObjectivesforYourMDMStrategy
7ReasonsEdgeComputingIsCriticaltoIoT
TimetoDigDeeperIntoYourDataArchives
DataIntegration
TransformingTextandDataIntoaTrueKnowledgeBase
BlendedAnalytics:That'sWhat'sNextforITMgmt
InformaticaCEO:4TrendsReshapingDataIntegration,Management
DeptofDefenseDataManagementStrategyUnderFire
http://www.informationmanagement.com/issues/20071001/10936061.html
4/6
29/7/2015
DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine
DataManagement
TheRiseofNoSQL
3rdPartyDataSpeedsInsuranceUnderwriting
AnthemandCigna:ProposedMergersImpactonIT
ThinkFast!HowDataStreamsAlterInformationManagement
HOME
AboutUs
ContactUs
ContentLicensing
AdvertisewithUs
CustomerService
Feedback
MyAccount
SiteMap
PrivacyPolicy
EditorialSubmissions
sourcemedia
corporatesite
banking
AmericanBanker
BankTechnologyNews
AmericanBankerMagazine
CreditUnionJournal
MORTGAGES
NationalMortgageNews
PAYMENTS
PaymentsSource
Collections&CreditRisk
ISO&Agent
capitalmarkets
Mergers&Acquisitions
AssetSecuritizationReport
LeveragedFinanceNews
TradersMagazine
http://www.informationmanagement.com/issues/20071001/10936061.html
5/6
29/7/2015
DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine
MUNICIPALFINANCE
TheBondBuyer
accounting
AccountingToday
TaxProToday
HEALTHCARE&BENEFITS
EmployeeBenefitNews
EmployeeBenefitAdviser
HealthDataManagement
InsuranceNetworkingNews
InformationManagement
INVESTMENTADVISORY
FinancialPlanning
OnWallStreet
BankInvestmentConsultant
MoneyManagementExecutive
2015SourceMedia.Allrightsreserved.
MobileVersion
http://www.informationmanagement.com/issues/20071001/10936061.html
6/6