Professional Documents
Culture Documents
1)
InthisDocument
Purpose
Details
SectionADisplayingFaultEventInformation
SectionA.1UsingtheFaultManagementShell
SectionA.2UsingtheStandardILOMCommandLineInterface
SectionBSubmittingaServiceRequest
AutoServiceRequest(ASR)ActivatedfortheProduct
SubmittingaServiceRequestViatheSupportCenter
SectionCPostRepairProcedures
SectionC.1UsingFaultManagementShelltoCleartheFault
SectionC.2UsingtheILOMCommandLineInterfacetoCleartheFault
References
APPLIESTO:
SunMicrosystems>Servers
Informationinthisdocumentappliestoanyplatform.
PURPOSE
ThisarticleprovidesstandardproceduresforviewingdetailsofahardwarefaultdiagnosedbytheILOMbasedfault
managers.Informationcontainedinthisarticleincludesthepreparationrequiredwhenopeningaservicerequestand
actionsrequiredtomodifythefaultstatusaftercompletionoftherepairaction.
DETAILS
Note:Theexamplecontainedinthisdocumentisrepresentativeofthewhatwillappearonyoursystem.However
therewillbeslightvariationsforyourspecificfault.
SectionADisplayingFaultEventInformation
Thissectiondescribesspecificproceduresforviewingthedetailsofdiagnosedfault,suchas,theimpactedresources
andthereplaceablepartsthathavebeenidentifiedasbeingfaulty.Executionoftheseproceduresshouldbeperformed
priortomanuallysubmittingaservicerequest.
TheFaultManagementShellisthepreferredmethodfordisplayingthedetailsofadiagnosedfault.However,support
forthiscommandshellvariesdependingILOMreleaselevelandserverproductmodel.
DetermineiftheFaultManagementShellissupportedonyourproductbyloggingintototheILOMcommandinterface
asrootandexecutingthethecommandindicatedintheproceduresbelow.
Note:ThehostnamemaybesubstitutedinplaceoftheIPaddressoftheServiceProcessorwhenloggingintothe
ILOMCLI.
%sshlroot<IPaddressofServiceProcessor>
>show/SP/faultmgmt/shell
/SP/faultmgmt/shell
Targets:
Properties:
Commands:
show
start
TheaboveindicatestheFaultManagementShellissupported.ProceedtosectionA.1Using
theFaultManagementShell
>show/SP/faultmgmt/shell
show:Nosuchtarget/SP/faultmgmt/shell
TheaboveindicatestheFaultManagementShellisnotsupportedonyourproduct.Proceedto
sectionA.2UsingtheILOMCommandLineInterface
SectionA.1UsingtheFaultManagementShell
ThefollowingprocedureassumesyouareloggedintotheILOMcommandlineinterfaceasrootpertheinstructions
above.
Enterthefaultmanagementshelltoobtainpertinentinformationaboutthefault.
>start/SP/faultmgmt/shell
Areyousureyouwanttostart/SP/faultmgmt/shell(y/n)?y
faultmgmtsp>
Usethe'fmadmfaulty'commandtoidentifythefaultycomponent/FRU.
Example1
TheExampleoutputshownbelowidentifiesthesuspectFRUas"/SYS/FANBD/FM0",whichrepresents
thefullphysicalpathtotheFRU.Thehierarchicalpath"/SYS"representsthechassis,"FANBD"
representsthefanboard,and"FM0"representstheFanModule.
FortheexamplebelowthefandoesnotcontainaFRUIDsothepartnumberandserialnumberare
displayedas'unknown'.Whenthisinformationisavailable,thesefieldswillcontainvalidinformation.
SeeExample2below.
faultmgmtsp>fmadmfaulty
TimeUUIDmsgidSeverity
20100817/20:19:09c10607718f6eeb1fa65cbb47d261a1d4SPT80003RMajor
Faultclass:fault.chassis.device.fan.fail
FRU:/SYS/FANBD/FM0
(PartNumber:unknown)
(SerialNumber:unknown)
Description:Fantachometerspeedisbelowitsnormaloperatingrange.
Response:TheservicerequiredLEDmaybeilluminatedontheaffectedFRUand
chassis.SystemwillbepowereddownwhentheHigh
Temperaturethresholdisreached.
Impact:Systemmaybepowereddownifredundantfanmodulesarenot
operational.
Action:TheadministratorshouldreviewtheILOMeventlogfor
additionalinformationpertainingtothisdiagnosis.PleaserefertotheDetails
sectionoftheKnowledgeArticlefor
additionalinformation.
Example2
TheExample2outputshownbelowidentifiesthesuspectFRUas'/SYS/MB'.Thehierarchicalpath"/SYS"
representsthechassis,'/MB'representstheMotherBoard.
faultmgmtsp>fmadmfaulty
TimeUUIDmsgidSeverity
20100830/14:44:362a4e3a37b243e0718b26f65cb5d015f1SPT8000DHCritical
Faultclass:fault.chassis.voltage.fail
FRU:/SYS/MB
(PartNumber:541385707)
(SerialNumber:1005LCB1018B2009T)
Description:Achassisvoltagesupplyisoperatingoutsideofthe
allowablerange.
Response:Thesystemwillbepoweredoff.Thechassiswideservice
requiredLEDwillbeilluminated.
Impact:Thesystemisnotusableuntilrepaired.ILOMwillnotallow
thesystemtobepoweredonuntilrepaired.
Action:TheadministratorshouldreviewtheILOMeventlogfor
additionalinformationpertainingtothisdiagnosis.Please
refertotheDetailssectionoftheKnowledgeArticlefor
additionalinformation.
Example3(ILOM3.2+)
ThefollowingexampledepictsthechangeinfmadmfaultyoutputdeliveredasofILOM3.2.Inthisexamplea
memoryfaultwasdiagnosedonaSPARCT52system.Thesystem,systemcomponentandthesinglesuspect
FRUidentitypropertiesareexpliciitlypresentedinindividualfields.
TheSystempropertiesidentifythetoplevelproductwhiletheSystemComponenttheidentityofaconstituent
systemlevelcomponent(i.e.server)ofthatproductcontainingthediagnosedproblem.
Otherelementsofadditionaleventinformationpresentedinclude:
DiagEngineIdentityofthediagnosissoftwarethatgeneratedthisevent.
ProblemStatusTheoverallstatusofthisdiagnosedproblem.
Status(FRU)IndicatesthestatusofthisFRU("faulty"inthiscase).
[fmadmfaulty
faultmgmtsp>fmadmfaulty
TimeUUIDmsgidSeverity
20000409/23:59:42ebd48d6a3c0ccf8cb29d93c7ffc401a3SPSUN4V8000CQMAJOR
ProblemStatus:solved
DiagEngine:fdd1.0
System
Manufacturer:OracleCorporation
Name:T5engineered
Part_Number:1234
Serial_Number:4321
SystemComponent
Manufacturer:OracleCorporation
Name:SPARCT52
Part_Number:12345678+1+1
Serial_Number:1239BDC0FA
Suspect1of1
Faultclass:fault.memory.dimm
Certainty:100%
Affects:/SYS/MB/CM0/CMP/MR1/BOB0/CH0/D0
Status:faultedbutstillinservice
FRU
Status:faulty
Location:/SYS/MB/CM0/CMP/MR1/BOB0/CH0/D0
Manufacturer:Samsung
Name:8192MBDDR3SDRAMDIMM
Part_Number:07042208,M393B1K70DH0YK0
Revision:04
Serial_Number:00CE02121585C74755
Chassis
Manufacturer:OracleCorporation
Name:T5chassis
Part_Number:abcd
Serial_Number:dbca
Description:Thenumberofcorrectableerrorsassociatedwiththismemory
modulehasexceededacceptablelevels.
Response:Anattemptwillbemadetoremovetheaffectedmemoryfrom
service.
Impact:Thedimmmaybedeconfigureadatsystemrestartwhichwould
reducetotalsystemmemorycapacity.
Action:Use'fmadmfaulty'toprovideamoredetailedviewofthis
event.Pleaserefertotheassociatedreferencedocumentat
http://support.oracle.com/msg/SPSUN4V8000CQforthelatest
serviceproceduresandpoliciesregardingthisdiagnosis.
SectionA.2UsingtheStandardILOMCommandLineInterface
ThefollowingprocedureassumesyouareloggedintotheILOMcommandlineinterfaceasrootpertheinstructions
above.
Usethefollowingcommandsdescribedbelowtoidentifythefaultycomponent/FRU.
ThesampleoutputshownbelowinstepsA1A3identifythesuspectFRUas"/SYS/MB/P0",whichrepresentsthefull
physicalpathtoFRU,whereby"SYS"representsthechassis,"MB"representsthemotherboard,and"P0"represents
theprocessor.
Refertoeithertheservicelabelontopcoverorsilkscreenlabelingonthemotherboardtolocateprocessor"P0".
Step1Listallknownfaultsinthesystem
Example:
>show/SP/faultmgmt
/SP/faultmgmt
Targets:
0(/SYS/MB/P0)
Properties:
Commands:
cd
show
Step2.Listthestateofafaultedprocessor
Example:
>show/SYS/MB/P0
/SYS/MB/P0
Targets:
D0
D1
D2
D3
D4
D5
D6
D7
D8
PRSNT
SERVICE
Properties:
type=HostProcessor
fru_name=GenuineIntel(R)CPU000@2.67GHz
fru_manufacturer=Intel
fru_version=04
fru_part_number=060A
fault_state=Faulted
clear_fault_action=(none)
Commands:
cd
show
Step3.ListthecontentsoftheILOMeventlog
Example:
>show/SP/logs/event/list
6313SunDec2809:54:572008FaultFaultcritical
Faultdetectedattime=SunDec2809:54:572008.
Thesuspectcomponent:/SYS/MB/P0hasfault.cpu.intel.l1itlbwithprobability=100.
Refertohttp://www.sun.com/msg/SPX868000TXfordetails.
SectionBSubmittingaServiceRequest
ThissectionprovidesguidanceonsubmittingaservicerequesttoOracleServicesinresponsetothediagnosedfault
reported.
AutoServiceRequest(ASR)ActivatedfortheProduct
IfASRhasbeenactivatedfortheproductonwhichthisproblemwasdiagnosed,youhave,orwillreceiveanotification
viaemailconfirmingaservicerequesthasbeenautomaticallyopenedalongwithinstructionsforviewingtheservice
request.
AllofthefaulteventtelemetryrequiredtoopenaservicerequesthasalreadybeentransmittedtoOracle.Unless
contactedandinstructedotherwisebyanOracleservicerepresentative,nofurtheractionsisrequiredtoreportthis
problemandopenaservicerequest.
IfyouarereadingthisarticleinresponsetoafaultmessageorSNMPtrapgeneratedontheproduct,ratherthanin
responsetotheASRnotificationemailmentionedabove,thenyoucancheckonthestatusoftheassociatedservice
requestbyloggingintoMyOracleSupport.
RefertoGetProactivewithSupportAutomationformoreinformationonAutoServiceRequest(ASR)andthecurrently
supportedproducts.
NOTE:ASRimplementsasetofrulesfordeterminingwhicheventsshouldresultinaservicerequestbeing
automaticallysubmitted.MessageIDsthatdonotresultinaservicerequestbeingautomaticallyopenedbyASR
willbesonotedintheassociateddocumentforthatspecificMessageID.
SubmittingaServiceRequestViatheSupportCenter
IncaseswhereASRhasnotbeenactivated,openaservicerequestbyloggingintoMyOracleSupportandfollowthe
indicatedprocedures,whichwillincludepresentingelementsoftheeventcontentdisplayedusingtheprocedures
providedinSectionA.
SectionCPostRepairProcedures
Thissectiondescribesspecificproceduresthatmayberequiredtomodifythestatusoffaultsthathavebeenrepaired
andreturnimpactedresourcestonormaloperation.
OnsomeproductstheILOMfaultmanagementfunctioncandetermineiftheassociatedFRUshavebeenreplacedand
automaticallycleartheassociatedfaultstatus.Insomecasesitcannotandthefaultwillhavetobechangedmanually.
TodetermineifthefaultisstillpresentrunthesamecommandsappliedinsectionA.1orA.2(FaultManagementShell
orILOMCommandLineInterface)asappropriate.Ifthefaultisnolongerpresentthennofurtheractionisrequired.If
itisstillpresentthenfollowtheproceduresdescribedinSectionC.1orC.2tomanuallyclearthefault.
InsomecasesevidenceofthissamefaultmayalsobestoredbytheSolarisfaultmanager.IfSolariswasinfactthe
operatingsystemrunning,thenfollowtheproceduresinSectionCofthefollowingdocumenttodetermineifadditional
postrepairactionisrequired:
PSHProceduralArticleforSolarisFMABasedDiagnosis(DocID1173733.1.
SectionC.1UsingFaultManagementShelltoCleartheFault
Enterthefaultmanagementshell.
>start/SP/faultmgmt/shell
Areyousureyouwanttostart/SP/faultmgmt/shell(y/n)?y
faultmgmtsp>
Use'fmadmrepair'toclearthefault.
RatherthantheUUID,theFRUpath(/SYS/FANBD/FM0)couldalsobeused.
Example3
Example3showsthe'fmadmrepaired'commandrequiredafterthesuspectFRUhasbeenreplaced.
UsingtheUUIDfromthe'fmadmfaultyfromExample1above,thecommandwouldbe:
faultmgmtsp>fmadmrepair9df39f93f3566d26e081e4f3a9872c2f
Example4
Example4showsthe'fmadmrepaired'commandrequiredaftertheFRUhasbeenreplaced..This
exampleshowstheFRUPathfromExample2abovebeingused.Thecommandwouldbe:
fmadmrepair/SYS/MB
SectionC.2UsingtheILOMCommandLineInterfacetoCleartheFault
LogintotheILOMcommandlineinterfaceas'root'andusethefollowingcommandstoclearthefault.
Example:
>set/SYS/MB/P0clear_fault_action=true
Areyousureyouwanttoclear/SYS/MB/P0(y/n)?y
Set'clear_fault_action'to'true'
REFERENCES