Tuesday, January 29, 2013

Resetting SC from the machine - v490


Sun-Fire-V490's rsc can be reset from the machine


(MySolaris:/root)# cd /usr/platform/`uname  -i`/
(MySolaris:/usr/platform/SUNW,Sun-Fire-V490)#


(MySolaris:/usr/platform/SUNW,Sun-Fire-V490)# cd rsc/
(MySolaris:/usr/platform/SUNW,Sun-Fire-V490/rsc)#


(MySolaris:/usr/platform/SUNW,Sun-Fire-V490/rsc)# ls
rsc-config      rsc-initscript  rscadm
(MySolaris:/usr/platform/SUNW,Sun-Fire-V490/rsc)#


(MySolaris:/usr/platform/SUNW,Sun-Fire-V490/rsc)# ./rscadm resetrsc
Are you sure you want to reboot RSC (y/n)?  y
(MySolaris:/usr/platform/SUNW,Sun-Fire-V490/rsc)#


(MySolaris:/usr/platform/SUNW,Sun-Fire-V490/rsc)# uname -a
SunOS MySolaris 5.10 Generic_147440-11 sun4u sparc SUNW,Sun-Fire-V490
(MySolaris:/usr/platform/SUNW,Sun-Fire-V490/rsc)#

Sunday, January 27, 2013

Login delays



su - < nis user> is taking a long time ( more than 20 sec )


Some introduction to the problem:

When is the slowness observed su only or even with telnet and ssh?

Answer: Even telnet and ssh is slow. Basically “su - ” is taking long time for all the user.

When slowness is observed,is it just before you get logged in or even after that with commands?

Answer: Only the login is very slow. Once login everything looks normal.


(MySolaris:/)# time su - schweitz -c "hostname"
Sun Microsystems Inc. SunOS 5.10 Generic January 2005

#############################################################
# This server is using NIS 
#############################################################
MySolaris

real 0m20.180s
user 0m0.040s
sys 0m0.070s
(MySolaris:/)#


First test:

Difference in time between "su schweitz" and "su - schweitz".

With "su - username" the HOME directory has to be mounted and the shell profiles (system and user specific) get executed. (sh,ksh: /etc/profile, $HOME/.profile; for C-shell ist equivalent). May be those profiles contain command wich run slow (like 'quota').

(MySolaris:/)# time su schweitz -c "hostname"
(MySolaris:/)# time su - schweitz -c "hostname"

su is working fine. su - is taking long time

Second Test:

All users have csh as its shell

rename /etc/.login (used by csh") :
# mv /etc/.login /etc/.login.not

Then test "su - xxx" :
# time su - schweitz -c "hostname"

after moving the /etc/.login, it is indeed fast

(MySolaris:/)# time su - schweitz -c "hostname"
MySolaris

real 0m0.055s
user 0m0.016s
sys 0m0.027s
(MySolaris:/)# 

The real problem:

In the resultant truss output, it was noted that quota command takes more than 20 sec. 
So issue seems to be with quota command. The "quota" command also checks NFS mounted file systems. One of the NFS servers seems not to respond. So removed an NFS mount and then the su was fast. 

The 20 sec delay is caused by checking quotas on NFS mounted file systems; it occurs when some NFS server does not respond.


Thursday, January 17, 2013

Clearing a minor fmadm faulty alert



(MySolaris:/)# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Dec 16 22:36:07 96c775c2-6764-6eae-ea5b-ea57f62cc2c0  FMD-8000-0W    Minor

Host        : MySolaris
Platform    : SUNW,Sun-SPARC Enterprise T5240        Chassis_id  :
Product_sn  :

Fault class : defect.sunos.fmd.nosub
FRU         : None
                  faulty

Description : The Solaris Fault Manager received an event from a component to
              which no automated diagnosis software is currently subscribed.
              Refer to http://sun.com/msg/FMD-8000-0W for more information.

Response    : Error reports from the component will be logged for examination
              by Sun.

Impact      : Automated diagnosis and response for these events will not occur.

Action      : Run pkgchk -n SUNWfmd to ensure that fault management software is
              installed properly.  Contact Sun for support.

(MySolaris:/)#



The FMADM fault currently logged on this system is caused by a logical inconsistency in the checkpointed data, causing the system do disable cpumem-diagnosis. As described in the attached document, this in turn, causes the FMD-8000-0W defect.sunos.fmd.nosub on the next transient memory error which should have been handled by the cpumem-diagnosis module. We can clear the FMD-8000-0W, but any little thing which cpumem-diagnosis would normally handle will trigger another FMD-8000-0W defect.sunos.fmd.nosub. Please see belowtThe resolution for this issue:

First roll the logs and restart the FMA daemon to keep the history.

logadm -p now -s 1b /var/fm/fmd/errlog
logadm -p now -s 1b /var/fm/fmd/fltlog

svcadm restart fmd


...wait two minutes...

Now scrub the checkpoint files


svcadm disable -st fmd
find /var/fm/fmd/ckpt -type f | xargs rm

svcadm enable fmd


...wait 2 minutes...

Now see if everything is clear

fmadm config - check that cpumem-diagnosis is active

fmadm faulty -a - shouldn't return anything

also check to see if we logged any new errors on fmd startup; if we did, we'll need to check further...

fmdump -e

should return nothing