You are on page 1of 6

29/7/2015

DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine

Welcome,LuisFabian|MyAccount|LogOut


WhitePapers|WebSeminars|Newsletters|eBooks
BigData&Analytics
DataManagement
MDM&DataGovernance
Infrastructure
InfoStrategy&Leadership
BI&DataDiscovery
Mobility
webseminars&
whitepapers
resource
center

DataQualityIsn'tJustaDataManagement
Problem
byWalterHoward
OCT1,20071:00amET
Print
Email
Reprints
Comment
Twitter
LinkedIn
Facebook
Google+
Afewmonthsago,Iwasreviewingthelatestdataprofilingresults(thankyoudataprofilingvendors!)
forthreenewdatasourcesIneededtointegrateintomyenterprisecustomerdataintegration(CDI)hub.
WhenIreachedtheubiquitousstatecodefield,IinstinctivelycringedwhenIglancedatthecolumn
metadatareports.Intwoofthefiles,statecodewasdefinedasatwobytefield,justwhatIwouldexpect.
Butareviewofthefrequencydistributionreportshowedthefirstfilehad64distinctvalues,whilethe
secondfilehad67distinctvalues.Thelastfilewasinworseshape.Thecolumnlengthwasdefinedata
whopping18byteswithmorethan260distinctvalues.Now,Americanshavebeenportrayedinthe
mediaaslessthanstellarwhenitcomestogeography,but260statecodesis,asmywifelikestosay,just
ridiculous.AquickcheckatWikipedia(afterall,Ifitthemedia'srepresentationofgeographically
challenged)confirmedmysuspicions.Atmost,thereshouldbe54values,50statesandfour
http://www.informationmanagement.com/issues/20071001/10936061.html

1/6

29/7/2015

DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine

commonwealths.Thisnumberassumesyou'reoverloadingthemeaningof"state"inyourdatamodel,
which,arguably,isnotgooddatabasedesign.Abetterdatabasedesignwouldmodelstatesand
commonwealthsasseparatecolumns,butIdigress.
WhileIamhappytocontinuedevelopingdataintegrationlogictofixincorrectorinconsistentdatatoa
conformedcodeset,Ifindmyselfaskingwhyisthisadatamanagementproblem?Wheredidthedataor
processrulesbreakdownintheoriginalcaptureandverificationoftheaddressdatatocausesuchan
overwhelminginconsistencyintheoperationaldata?Myexperiencehasshownthatmakingchangesto
datavaluesinthewarehousetogetdataconsistencyisarecipefordisaster.Inthecaseofnumeric
changes,youcanloseyourabilitytoreconcilemetricstothesourcesystem.Forcharacterdata,the
businesshastolearnnewcodevaluesandmeanings.Andtoexacerbatethings,theproblemisnever
fixed.Sowhoseproblemisdataqualityanyway?
Myfirstthoughtgoestothedatamodelerwhodesignedtheonlinetransactionalprocessing(OLTP)
database.Clearly,adomaintableofstateandcommonwealthcodesshouldhavebeendefinedtoenforce
acommoncodeset.Creatingan18bytestatecodecolumnisclearlyanegregiousoversight.Today's
savvyusersarequicktoidentifyholesintheeditchecksperformedbyapplicationsanddatabaselogic.
OnejusthastolooktothequalityandcontentoftheveritableSocialSecuritynumberoraddressline3to
seetheresultofthelackofeditsonscreenfields.Atextfieldonascreenwithnoeditchecksisopen
seasonforabuse.Ifthefieldislongenough,youwillstarttoseeXMLishtypetexttodistinguishthe
multipleconcatenatedattributesthebusinessaddedwhilewaitingfortheITdepartmenttoreleasethe
nextversionoftheapplication.
Mysecondthoughtgoestotheapplicationdevelopmentteam.Thereisoftenabigpushbythesoftware
developmentteamstoremovetheenforcementofanybusinessrulesbythedatabasemanagement
system.Whilethispotentiallyabsolvesthedatamodelerofguilt,thefactremainsthatbaddatamadeits
wayintothedatabase.Theapplicationeditchecksfailedtorecognizethe50validstatecodesorprovide
anytextstandardizationconversions.Forexample,IcanfindmultiplevaluesforTexas,includingTX,
TxandTexas.Andasweallknowbynow,aslongasdatacanbecreated,updatedanddeletedoutsideof
theapplicationlogic,businessrulesbuiltintotheapplicationwillbebypassed.
Mythirdthoughtgoestothequalityassurance(QA)team.Dependingonthesizeofthecompany,QA
teamscanbehitormiss.BythatImeanlargercompaniestendtohavededicatedQAteamstoensurethe
endproductmeetstheoriginalrequirements.That'snottosaydataqualityisbetteratlargecompanies.
Onejustneedstolooktothepriceofdataqualitysoftwaretoseewhomthesoftwarevendorsare
targeting.Mostlikely,lostinhundredsoffunctionalandnonfunctionalrequirementsforanapplication,
thefactthatstatecodeshouldbeonlytwobytesinlengthandshouldconformtotheUSPSstandardwas
overlooked.Withoutaspecificrequirementtotest,abadstatecodewouldpassQAwithflyingcolors.
Morethanlikely,someoneassumedeveryoneknewthe50statecodesandthatwritingvalidationcode
wasawasteoftime.Afterall,everyoneknowsthestateabbreviationsforMichigan,Minnesotaand
Missouri.(Don'tfeelbadifyouhavetocheckIdid.)
Myfinalthoughtgoestothebusinessusers.Atarecentclientengagement,oneoftheseniordata
architectssaiddataqualityattheircompanyisanafterthought.Thebusinesshaddecidedearlyonthat
dataintegrityissuessuchasbadaddressdataandinvalidpersonalidentificationattributesshouldnotbe
constrainingbusinessrules.Thesedataitems,whilecriticalpiecesofinformation,werenotimportant
enoughtovalidateandenforcevalidityorconformityatthepointoforigination.Badinformationwas
capturedandpassedontothenextapplicationthatassumedthefirstapplicationhaddoneitsjob.Voil
baddataisnowpersistentlystoredinmultipledatastores.
Whichbringsmebacktothebeginning.HereIsitenforcingthebusinessrulesthattheOLTPmodeler
http://www.informationmanagement.com/issues/20071001/10936061.html

2/6

29/7/2015

DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine

omitted,theapplicationteamdidn'timplement,theQAteamoverlookedandthebusinessdecidedwasn't
importantenough.Dataqualityisn'tjustadatamanagementproblem,it'sacompanyproblem.

JOINTHEDISCUSSION
Comment

SEEMOREIN
DataManagement

RELATEDTAGS
DataManagement,
DataQuality
Comments(0)
Bethefirsttocommentonthispostusingthesectionbelow.
AddYourComments:
Addyourcomments
here.

Notifymewhenotherreaderscommentonthisarticle.
Clickheretoreceivenotificationswithoutcommenting

MostRead
MostEmailed
EnablingaDataCultureThroughContinuousImprovement
Gartners10AgileSoftwareDevelopmentTips
8ObjectivesforYourMDMStrategy
5WaysBigDataDisruptsYourExistingDataWarehouse(InAGoodWay)
BusinessAnalytics:OppositionorProposition?

Analytics
http://www.informationmanagement.com/issues/20071001/10936061.html

3/6

29/7/2015

DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine

IsADataLakeTHEAnswer?ThinkAgain.HereComesElasticAnalytics
DigitalAdTechDataDemandsNewMarketingResponsibilities
BusinessAnalytics:OppositionorProposition?
EnablingaDataCultureThroughContinuousImprovement

BusinessIntelligence
EarlierGenerationBINeedsATuneUp
RiseoftheDataVisualizationCompetencyCenter
BusinessIntelligenceProsSidetrackedWithDataCleanup
InsuranceTechCompaniesFaceInternetofThingsArmsRace

CustomerExperience
HealthcareDataProMakesMedicareAnalytics,ToolsPush
CompaniesNeedDisruptiveCustomerExperienceTechnologies
AirlinePursuesAmazon'sDatadrivenCustomerApproach
BigDatavsNetPromoterScore:ADifferentView

OpenSource
TheRiseofNoSQL
Hortonworks:InsidetheOpenEnterpriseHadoopPush
Top10PrioritiesforBigDataManagement
WhereRWeGoingNext?TheRRevolution

PredictiveAnalytics
PredictiveAnalyticsEnterstheBusinessMainstream
WhereRWeGoingNext?TheRRevolution
RealtimeDataDemandSurgesinOilandGasIndustry
InsurerOffersConsumerDiscountsforSmartHomeData

DataGovernance
9MasterDataManagement&DataGovernanceTrendstoTrack
8ObjectivesforYourMDMStrategy
7ReasonsEdgeComputingIsCriticaltoIoT
TimetoDigDeeperIntoYourDataArchives

DataIntegration
TransformingTextandDataIntoaTrueKnowledgeBase
BlendedAnalytics:That'sWhat'sNextforITMgmt
InformaticaCEO:4TrendsReshapingDataIntegration,Management
DeptofDefenseDataManagementStrategyUnderFire
http://www.informationmanagement.com/issues/20071001/10936061.html

4/6

29/7/2015

DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine

DataManagement
TheRiseofNoSQL
3rdPartyDataSpeedsInsuranceUnderwriting
AnthemandCigna:ProposedMergersImpactonIT
ThinkFast!HowDataStreamsAlterInformationManagement
HOME
AboutUs
ContactUs
ContentLicensing
AdvertisewithUs
CustomerService
Feedback
MyAccount
SiteMap
PrivacyPolicy
EditorialSubmissions

sourcemedia
corporatesite
banking
AmericanBanker
BankTechnologyNews
AmericanBankerMagazine
CreditUnionJournal

MORTGAGES
NationalMortgageNews

PAYMENTS
PaymentsSource
Collections&CreditRisk
ISO&Agent

capitalmarkets
Mergers&Acquisitions
AssetSecuritizationReport
LeveragedFinanceNews
TradersMagazine
http://www.informationmanagement.com/issues/20071001/10936061.html

5/6

29/7/2015

DataQualityIsn'tJustaDataManagementProblem|InformationManagementMagazine

MUNICIPALFINANCE
TheBondBuyer

accounting
AccountingToday
TaxProToday

HEALTHCARE&BENEFITS
EmployeeBenefitNews
EmployeeBenefitAdviser
HealthDataManagement
InsuranceNetworkingNews
InformationManagement

INVESTMENTADVISORY
FinancialPlanning
OnWallStreet
BankInvestmentConsultant
MoneyManagementExecutive

2015SourceMedia.Allrightsreserved.
MobileVersion

http://www.informationmanagement.com/issues/20071001/10936061.html

6/6

You might also like