Sunday, December 12, 2010

Luxadm

luxadm utility is used to manage the Sun Enterprise Network Array (SENA) specifically the Sun StorEdge A5x00 disk array, the SPARCstorage Array (SSA), and the Sun Fire 880 internal disk arrays. The command line must contain a subcommand and options if applicable.
luxadm is used for internal SUN fibre disks as well as external disk arrays. 




It has many sub-commands to do various operations on the disk or enclosure.
  • display, probe, start, stop, power_on, power_off, offline, online, forceclip, etc...
To display the connectivity status of the HBA ports
# luxadm -e port
Found path to 3 HBA ports
/devices/pci@8,700000/SUNW,qlc@2/fp@0,0:devctl        CONNECTED
/devices/pci@8,700000/SUNW,qlc@2,1/fp@0,0:devctl     CONNECTED
/devices/pci@8,600000/SUNW,qlc@4/fp@0,0:devctl        CONNECTED


To reinitiate the connection
# luxadm -e forcelip /devices/pci@8,700000/SUNW,qlc@2/fp@0,0:devctl







zonecfg - Adding new lofs filesystem

zonecfg is used to configure the zone configuration. It is used to add the resources in the zone1.xml file which is under /etc/zones/ directory.



(server1:/)# zonecfg -z zone1
zonecfg:zone1> add fs
zonecfg:zone1:fs> set dir=/application/ARC
zonecfg:zone1:fs> set special=/zones/zone1/applicationARC
zonecfg:zone1:fs> set type=lofs
zonecfg:zone1:fs> add options [rw,nodevices]
zonecfg:zone1:fs> end
zonecfg:zone1> commit
zonecfg:zone1> exit
(server1:/)#

Sub command 'add' is used to add a particular resource to the zone configuration.

'end' to end the resource specification.

'commit' confirms the changes and writes the content permanently to disk.

Saturday, November 13, 2010

Live Upgrade - Basic

Its a method of upgrading a Solaris box while the system is operational. It s done by creating a parallel environment that resembles the current boot environment and making the upgrade on the newly created environment. All this is done while still the old environment is completely functional.

Once the upgrade is done on the newly created environment, the system can be started on the newly created environment with just a reboot thus reducing downtime for an upgrade within the time of a reboot.

It is also possible to do a flash intallation on the alternate environment which is similar to new installation even while the system is active.

One more advantage of this is if there is an issue with the booting of new environment, we can easily fall back to the old environment where the machine was functioning well before.

Live Upgrade process:
1. Create a boot environment
2. Upgrade an inactive boot environment
3. Activate the inactive boot environment with a reboot
4. Reboot the machine to boot from the newly created and activated BE
5. (Optional) Fallback to the original boot environment if issues with new BE.

Command involved in performing Live Upgrade:-
  • luactivate - Activate an inactive boot environment.
  • lucancel - Cancel a scheduled copy or create job.
  • lucompare - Compare an active boot environment with an inactive boot environment.
  • lumake - Recopy file systems to update an inactive boot environment.
  • lucreate - Create a boot environment.
  • lucurr - Name the active boot environment.
  • ludelete - Delete a boot environment.
  • ludesc - Add a description to a boot environment name.
  • lufslist - List critical file systems for each boot environment.
  • lumount - Enable a mount of all of the file systems in a boot environment. This command enables you to modify the files in a boot environment while that boot environment is inactive.
  • lurename - Rename a boot environment.
  • lustatus - List status of all boot environments.
  • luumount - Enable an unmount of all the file systems in a boot environment. This command enables you to modify the files in a boot environment while that boot environment is inactive.
  • luupgrade - Upgrade an OS or install a flash archive on an inactive boot environment.
Before using Live Upgrade 3 packages are required. SUNWlucfg, SUNWlur, SUNWluu - These should be installed in the order specified.

# lustatus
ERROR: No boot environments are configured on this system
ERROR: cannot determine list of all boot environment names

If the following error is displayed when you run the lustatus command, it is an indication that a new installation was performed and that Solaris Live Upgrade was not used. Before any BEs can be acknowledged in the lustatus output, a new BE must be first created on the system.
 
# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
sol10-u6                   yes      no     no        yes    -
sol10-u8                   yes      yes    yes       no     -
#

This shows there are 2 BE's configured one is active.

Normally when a Live upgrade is performed, the OS critical filesystems(/,/var,/opt,/usr) are copied on to the new BE. While creating new environments, the filesytems can be either split or can be merged.

For example if in the current Environment filesystems /var,/opt are not seperate filesystems, while creating new environment, we could split these filesystems seperately or vise-versa.
 
Setting up New Environment:

For setting up an alternate BE, we need sufficient space. The alt-BE should have space to hold the copy of the existsing BE and the updates. Reformatting of disk might be necessary.

Prepare the disk by creating the slices necessary or creating mirrors or creating zpools necessary.
Create the BE

# lucreate -c sol10-u6 -n sol10-u8 -p rpool

# lucreate -c first_disk -m /:/dev/dsk/c0t4d0s0:ufs -n second_disk


It is also possible to detach an exsistsing mirror and using the unconfigured mirror as the alt-BE

Applying the upgrades:

Once the new BE is created, upgrades are applied onto that.

# luupgrade -n c0t15d0s0 -u -s /net/ins-svr/export/Solaris_10 \
combined.solaris_wos


All upgrades/patches are done to this alternate BE.

Activating the alt-BE:

Once the upgrades are done, we can prepare this BE to become the BE on next reboot. To achieve that, we need to activate this alt-BE.

Before luactivate:
# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
s10s_u9wos_14a             yes      yes    yes       no     -
testBE                     yes      no     no        yes    -
# luactivate testBE
A Live Upgrade Sync operation will be performed on startup of boot environment .


**********************************************************************

The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.

**********************************************************************

In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:

1. Enter the PROM monitor (ok prompt).

2. Boot the machine to Single User mode using a different boot device
(like the Solaris Install CD or Network). Examples:

     At the PROM monitor (ok prompt):
     For boot to Solaris CD:  boot cdrom -s
     For boot to network:     boot net -s

3. Mount the Current boot environment root slice to some directory (like
/mnt). You can use the following commands in sequence to mount the BE:

     zpool import rpool
     zfs inherit -r mountpoint rpool/ROOT/s10s_u9wos_14a
     zfs set mountpoint= rpool/ROOT/s10s_u9wos_14a
     zfs mount rpool/ROOT/s10s_u9wos_14a

4. Run  utility with out any arguments from the Parent boot
environment root slice, as shown below:

     /sbin/luactivate

5. luactivate, activates the previous working boot environment and
indicates the result.

6. Exit Single User mode and reboot the machine.

**********************************************************************

Modifying boot archive service
Activation of boot environment  successful.
#
After activation observe the difference
# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
s10s_u9wos_14a             yes      yes    no        no     -
testBE                     yes      no     yes       no     -

Now perform the reboot for switching the BE's. Thus an ugraded system is achieved with the downtime of just a reboot.

This is the core of how Live upgrade happens. But a lot of other important details are to be taken care depending on the type of filesystems used(like SVM, VXfs, ZFS...) etc. This is just an introduction.

Saturday, November 6, 2010

To turn off password aging

To turn off password aging

(Server:/)# for i in server1 server2 server3 server4 server5
> do
> ssh $i "passwd -x -1 schweitzer"
> done

passwd: password information changed for schweitzer
passwd: password information changed for schweitzer
passwd: password information changed for schweitzer
passwd: password information changed for schweitzer
passwd: password information changed for schweitzer

Extracted from man page of passwd:

     -x max              Sets maximum field  for  name.  The  max
                         field  contains  the number of days that
                         the password  is  valid  for  name.  The
                         aging for name is turned off immediately
                         if max is set to -1.

Reset password while system uses both local & ldap accounts

In a machine where the user authentication is depending on both local /etc/passwd file and ldap, reset of local password should be done as below.

(Server1:/)# passwd petuser
New Password:
Re-enter new Password:
Permission denied

(Server1:/)# id
uid=0(root) gid=0(root)
(Server1:/)#

This happens because the user account authentication involves both ldap and files.

(Server1:/)# ps -ef | grep ldap
    root  2925  2430   0   Oct 27 ?           0:47 /usr/lib/ldap/ldap_cachemgr
    root 12024 22230   0 13:57:15 pts/1       0:00 grep ldap

(Server1:/)# passwd -help
usage:
        passwd [-r files | -r nis | -r nisplus | -r ldap] [name]
        passwd [-r files] [-egh] [name]
        passwd [-r files] -sa
        passwd [-r files] -s [name]
        passwd [-r files] [-d|-l|-N|-u] [-f] [-n min] [-w warn] [-x max] name
        passwd -r nis [-eg] [name]
        passwd -r nisplus [-egh] [-D domainname] [name]
        passwd -r nisplus -sa
        passwd -r nisplus [-D domainname] -s [name]
        passwd -r nisplus [-D domainname] [-l|-N|-u] [-f] [-n min] [-w warn]
                [-x max] name
        passwd -r ldap [-egh] [name]
        passwd -r ldap -sa
        passwd -r ldap -s [name]
        passwd -r ldap [-l|-N|-u] [-f] [-n min] [-w warn] [-x max] name
Invalid combination of options

So use -r option with passwd command to reset the local password.

(Server1:/)# passwd -r files petuser
New Password:
Re-enter new Password:
passwd: password successfully changed for petuser

Thursday, October 28, 2010

/etc/passwd and /etc/shadow

/etc/passwd file maintains the user account in a unix machine. It has 7 fields.
1:2:3:4:5:6:7
  1. Username: The user login name. Length is between 1 and 32 characters.
  2. Password: An x character indicates that encrypted password is stored in /etc/shadow file.
  3. User ID (UID): Each user must be assigned a user ID (UID). UID 0 (zero) is reserved for root and UIDs 1-99 are reserved for other predefined accounts. Further UID 100-999 are reserved by system for administrative and system accounts/groups.
  4. Group ID (GID): The primary group ID (stored in /etc/group file). Group ID must exists before you can use them.
  5. User ID Info: The comment field to specify more information. 
  6. Home directory: The absolute path to the directory the user will be in when they log in. If this directory does not exists then users directory becomes /
  7. Command/shell: The absolute path of a command or shell (/bin/bash). Typically, this is a shell.
/etc/shadow file maintains the user password information. The encrypted passwd is stored in this file and is accessible only for the root account. It has 8 fields.
1:2:3:4:5:6:7:8
  1. User name : The user login name.
  2. Password: The encrypted password. The password should be minimum 6-8 characters long including special characters/digits. The length can be altered by changing configuration files.
  3. Last password change (lastchanged): Days since Jan 1, 1970 that password was last changed
  4. Minimum: The minimum number of days required between password changes i.e. the number of days left before the user is allowed to change his/her password
  5. Maximum: The maximum number of days the password is valid (after that user is forced to change his/her password)
  6. Warn : The number of days before password is to expire that user is warned that his/her password must be changed
  7. Inactive : The number of days after password expires that account is disabled
  8. Expire : days since Jan 1, 1970 that account is disabled i.e. an absolute date specifying when the login may no longer be used 
Editing these files manually is not advised. Adding user should be done by useradd/usermod commands.
pwconv is used to synchronize /etc/passwd and /etc/shadow file.

Friday, October 22, 2010

Upgrading zones to match global machines patch level

The Global Zone and Non-Global Zones are on different patch levels. So to bring them both on the same patch level, the following steps can be followed.

If the zone is configured in cluster(VCS), stop all resources running in the SG through VCS including zone. Do not stop the zone root fs and DG. ( Halt the zone manually if necessary if some problem through VCS. eg) zoneadm -z zone04 halt)

Server1:/# zoneadm list -icv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   1 zone04           running    /zones/zone04                  native   shared
After the zone is down, put the zone in configured state (Normally VCS will put the zone in configured state automatically if it brings the zone down or by editing the /etc/zones/index file if the zone was halted manually)

Server1:/# zoneadm list -icv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   1 zone04     down       /zones/zone04                 native   shared


Server1:/# zoneadm list -icv
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              native   shared
   - zone04     configured /zones/zone04                 native   shared

Server1:/# cat /etc/zones/index
# Copyright 2004 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
# ident "@(#)zones-index        1.2     04/04/01 SMI"
#
# DO NOT EDIT: this file is automatically generated by zoneadm(1M)
# and zonecfg(1M).  Any manual changes will be lost.
#
global:configured:/:

Don't offline the whole DG. Only stop the zone. Zoneroot should be mounted for attaching

Attach the zone

      zoneadm -z zone04 attach -u

      - While attaching if any package inconsistency error is thrown, remove the packages using pkgrm

Server1:/# zoneadm -z zone04 attach -u
/zones/zone04 must not be group readable.
/zones/zone04 must not be group executable.
/zones/zone04 must not be world readable.
/zones/zone04 must not be world executable.

Check if the zoneroot is mounted properly

Server1:/# ls -ld /zones/zone04
drwxr-xr-x   3 root     root         512 Mar 18  2010 /zones/zone04

Server1:/# df -k /zones/zone04
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d10      12396483 10032378 2240141    82%    /


Not mounted - So mount zoneroot properly (in this case the whole SG was brought down so both zone root fs and DG were stopped)

After mounting


Server1:/# ls -ld /zones/zone04
drwx------   5 root     root        1024 Mar 29  2010 /zones/zone04

Try attaching now.

Server1:/# zoneadm -z zone04 attach -u
zoneadm: zone 'zone04': ERROR: attempt to downgrade package SUNWlur, the source had patch 121430-43 but this system only has 121430-42

zoneadm: zone 'zone04': ERROR: attempt to downgrade package SUNWluu, the source had patch 121430-43 but this system only has 121430-42

So now we have to remove the two packages SUNWlur and SUNWluu.
After removing the package again attach

Server1:/# zoneadm -z zone04 attach -u
Getting the list of files to remove
Removing 1208 files
Remove 24 of 24 packages
Installing 23631 files
Add 415 of 415 packages
Installation of these packages generated warnings: SUNWgssc SUNWinstall-patch-utils-root SUNWkrbr SUNWmconr SUNWnisu SUNWntpr SUNWpkgcmdsr SUNWsacom SUNWwbcor VRTSjre15
Updating editable files
The file within the zone contains a log of the zone update.

Now boot the zone and bring up the resources

Server1:/#zoneadm -z zone04 boot

Verify the patch levels of both global and non-global zones

zone04:/root# uname -a
SunOS zone04 5.10 Generic_142900-02 sun4u sparc SUNW,Sun-Fire-15000
zone04:/root#
zone04:/root# cat /etc/release
                      Solaris 10 10/09 s10s_u8wos_08a SPARC
           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 16 September 2009

Server1:/# uname -a
SunOS Server1 5.10 Generic_142900-02 sun4u sparc SUNW,Sun-Fire-15000
Server1:/# cat /etc/release
                      Solaris 10 10/09 s10s_u8wos_08a SPARC
           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 16 September 2009

Friday, October 8, 2010

proc Utilities

Lets see some process related commands.

The pgrep command displays a list of the process IDs of active processes on the system that match the pattern specified in the command line.

[Displays pid of process matching cro* pattern]
server1:/root# pgrep cro*
5399
20171
20146
20178


[Display pid of process matching cron]
server1:/root# pgrep cron
5399

[Display pid and name of process]
server1:/root# pgrep -l cron
5399 cron


[Display process associated with a user]
server1:/root# pgrep -u pagent
15267
14798
19430
19429
1731
25549

 
[Display pid and process name associated with a username]
server1:/root# pgrep -l -u pagent
15267 PatrolAgent
14798 ds_listener
19430 bgscollect
19429 bgsagent
1731 ksh
25549 ksh



pflags - Print the /proc tracing flags, the pending and held signals, and other /proc status information for each lwp in each process.

server1:/root# pflags 28453
28453: /usr/lib/ssh/sshd
data model = _ILP32 flags = ORPHAN|MSACCT|MSFORK
/1: flags = ASLEEP pollsys(0xffbff3f0,0x1,0x0,0x0)



pcred - Print the credentials (effective, real, saved UIDs and GIDs) of each process.

server1:/root# pcred 15267
15267: euid=1320 ruid=1320 suid=0 e/r/sgid=1300
groups: 1300 7929 32506 7211 32502 13500 32505 7156 32504 32503


pldd - List the dynamic libraries linked into each process, including shared objects explicitly attached using dlopen(3C).

server1:/root# pldd 1731
1731: /bin/ksh
/lib/libc.so.1
/platform/sun4u-us3/lib/libc_psr.so.1



psig - List the signal actions and handlers of each process.

server1:/root# psig 1731
1731: /bin/ksh
HUP ignored
INT caught sh_fault RESTART
QUIT ignored
ILL caught sh_done RESTART
TRAP caught sh_done RESTART
ABRT caught sh_done RESTART
EMT caught sh_done RESTART
FPE ignored
KILL default
BUS caught sh_done RESTART
SEGV default
SYS caught sh_done RESTART
PIPE ignored
ALRM caught sh_fault RESTART
TERM caught sh_done RESTART
USR1 caught sh_done RESTART
USR2 caught sh_done RESTART
CLD caught sh_fault NOCLDSTOP
PWR default
WINCH default
URG default
POLL default
STOP default
TSTP ignored
CONT default
TTIN ignored
TTOU ignored
VTALRM default
PROF default
XCPU caught sh_done RESTART
XFSZ ignored
WAITING default
LWP default
FREEZE default
THAW default
CANCEL default
LOST default
XRES default
JVM1 default
JVM2 default
RTMIN default
RTMIN+1 default
RTMIN+2 default
RTMIN+3 default
RTMAX-3 default
RTMAX-2 default
RTMAX-1 default
RTMAX default



pstack - Print a hex+symbolic stack trace for each lwp in each process.

server1:/root# pstack 1731
1731: /bin/ksh
ff2cc400 read (0, ff339c44, 1)
000233fc io_readbuff (0, ff339c44, 1, 24400, 527e0, 400) + 314
000248c4 ???????? (0, ff339c44, 53444, 1, 527e0, 5)
00024adc io_readc (2, ffbff908, 53d78, 0, ffbff90b, 53000) + 2c
00029f5c ???????? (300000, 0, 0, 53000, 53000, 0)
000299cc main (20000000, 2bc00, ffbffc24, 53000, 53000, ffff8000) + a30
00016b20 _start (0, 0, 0, 0, 0, 0) + 108



pfiles - Report information for all open files in each process. In addition, a path to the file is reported if the information is available from /proc/pid/path. This is not necessarily the same name used to open the file.

server1:/root# pfiles 1731
1731: /bin/ksh
Current rlimit: 4096 file descriptors
0: S_IFIFO mode:0000 dev:368,0 ino:751569929 uid:1320 gid:1300 size:0
O_RDWR
1: S_IFIFO mode:0000 dev:368,0 ino:751569928 uid:1320 gid:1300 size:0
O_RDWR
2: S_IFIFO mode:0000 dev:368,0 ino:751569928 uid:1320 gid:1300 size:0
O_RDWR



pwdx - Print the current working directory of each process.

server1:/proc# pwdx 1731
1731: /opt/patrol


pstop - Stop each process (PR_REQUESTED stop).

prun - Set each process running (inverse of pstop).

pwait - Wait for all of the specified processes to terminate.

ptime - Time the command, like time(1), but using microstate accounting for reproducible precision. Unlike time(1), children of the command are not timed.

server1:/# ptime cat /var/tmp/1

real 0.005
user 0.001
sys 0.003



ptree - Print the process trees containing the specified pids or users, with child processes indented from their respective parent processes.

server1:/proc# ptree 23662
28453 /usr/lib/ssh/sshd
23650 /usr/lib/ssh/sshd
23652 /usr/lib/ssh/sshd
23662 -ksh
14517 isql -syb_dba -SDRSBDT5 -w0000000000000000000000000000000000000000000000000000

Thursday, October 7, 2010

su

su command is used to change to another user. It is most commonly employed to change the ownership from an ordinary user to the root.

su [options] [commands] [-] [username]

#su root
If the correct password is provided, ownership of the session is changed to root.

whoami command displays the current user.

The default behavior of su is to maintain the current directory and the environmental variables of the original user, which means the variables like PATH and others will still be the original user's value. For ordinary users PATH is usually something like /usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/username/bin:/sbin:/usr/sbin:/bin:/usr/bin For root it generally resembles

To overcome this use,
su -

The hyphen has two effects: (1) it switches the current directory to the home directory of the new user (e.g., to /root in the case of the root user) and (2) it changes the environmental variables to those of the new user. 

The common option used with su is the -c option. Which tells su to execute the command that directly follows it on the same line and exit and return back to the original user. 

eg) su -c "ls -l /home" - aaron
This will attempt to switch to user 'aaron' and execute the command and return back exiting user aaron's session.

Monitoring usage of su:

Normally su attempts are logged in /var/adm/sulog file. This has to be set up when system is commissioned. The setup is done by editing the fie /etc/default/login.

#SULOG=/var/adm/sulog  => This line should be un-commented.
 
eg)# tail /var/adm/sulog
SU 10/07 10:35 + pts/3 winsel-root
SU 10/07 15:05 - console root-daemon
SU 10/07 15:54 + console root-daemon
SU 10/07 16:28 - pts/3 winsel-root
SU 10/08 08:23 + console root-daemon

MeasureWare Agent (MWA)

MeasureWare Agent uses data source integration (DSI) technology to receive, alarm on, and log data from external data sources such as applications, databases, networks, and other operating systems.

MeasureWare Agent installs in the /opt/perf/ directory and creates its log and status files in the /var/opt/perf/ directory.

root@server1:/hroot# which mwa
/opt/perf/bin/mwa


Starting the agent:-

The mwa script starts MeasureWare Agent and all its processes, including the scopeux data collector, the midaemon (measurement interface daemon), the perflbd, the rep.server, the ttd and the alarm generator.

root@server1:/hroot# mwa start

The Perf Agent scope collector is being started.
         The ARM registration daemon ttd is already running.
         It will be signaled to reprocess its configuration file.

         The Performance collection daemon
         /opt/perf/bin/scopeux has been started.

         The coda daemon /opt/OV/lbin/perf/coda has been started.
         It will be fully operational in a few minutes.


The Perf Agent server daemons are being started.
         The Perf Agent Location Broker daemon
         /opt/perf/bin/perflbd has been started.


Stopping the agent:-

root@server1:/hroot# mwa stop

Shutting down Perf Agent collection software
NOTE:   The ARM registration daemon ttd will be left running.

Shutting down coda daemon
         Shutting down coda, pid(s) 7953


Shutting down the Perf Agent server daemons
         Shutting down the alarmgen process.  This may take a while
         depending upon how many monitoring systems have to be
         notified that Perf Agent Server is shutting down.


         The alarmgen process has terminated

         Shutting down the perflbd process

         The perflbd process has terminated

         The agdbserver process terminated

         The rep_server processes have terminated

         The Perf Agent Server has been shut down successfully


To start individual components:-

root@server1:/hroot# mwa restart scope

Shutting down Perf Agent collection software
NOTE:   The ARM registration daemon ttd will be left running.

The Perf Agent scope collector is being started.
         The ARM registration daemon ttd is already running.
         It will be signaled to reprocess its configuration file.

         The Performance collection daemon
         /opt/perf/bin/scopeux has been started.

root@server1:/hroot# ps -ef | grep /opt/perf/bin/scopeux
    root 21769  5322  1 10:22:16 pts/2     0:00 grep /opt/perf/bin/scopeux
root@server1:/var/opt/perf#


But the process has not started, so have to check why it has not started.
The status of the mwa agents are recorded in the file /var/opt/perf/status.* files. Each component has its own files

root@server1:/var/opt/perf# ls
.gp                aldxc09.log        perfd              reptfile           status.perfalarm   ttd.pid
adviser.syntax     aldxd09            perfd.ini          repthead           status.perfd-5227  vppa.env
alarmdef           app-defaults       perflbd.rc         repthist           status.perflbd
alarmdef.old       datafiles          pkey               rxitemid           status.rep_server
alarmdef.org       gkey               reptSASstd         rxshorts           status.scope
aldlog09           mwakey             reptTBL            status.alarmgen    status.ttd
aldxc09            parm               reptall            status.mi          ttd.conf




Checking the status.scope file will give us the details of why the process has not started

eg)
root@server1:/var/opt/perf# tail -f status.scope

A FILE, GROUP or USER parameter is limited to 15 characters.
A parameter was truncated.


**** /opt/perf/bin/scopeux : 09/09/10 15:11:51 ****
ERROR: Unable to read from logfile '/var/opt/perf/datafiles/logproc' - corrupted data. (PE221-24)

**** /opt/perf/bin/scopeux : 09/09/10 15:11:51 ****
COLLECTOR END. program terminated abnormally.



Action Taken:-
Move the corrupt file and restart. First stop the process

>>mwa stop
>>mv /var/opt/perf/datafiles/logproc /var/opt/perf/datafiles/logproc.bkp
>>mwa start

Wednesday, October 6, 2010

EFI Disk Label

A Disk label is a place where the disk geometry id stored. There are 2 types of labels VTOC and EFI.

The EFI label provides support for physical disks and virtual disk volumes. It is used to support disks which are more than 2TB size.

The UFS file system is compatible with the EFI disk label, and you can create a UFS file system greater than 2 TB.

You can use the format-e command to label a disk less than 1TB with an EFI label.

EFI labeled disk cannot be used for booting.



For more information
http://docs.sun.com/app/docs/doc/817-5093/disksconcepts-14?a=view

How to find out the current run level

$ who -r
 .    run-level 3  Sep 13 10:18  3  0 S
$
 
run-level 3
Identifies the current run level
Sep 13 10:18
Identifies the date of last run level change
3
Also identifies the current run level
0
Identifies the number of times the system has been at this run level since the last reboot
S
Identifies the previous run level
  
In Solaris 10, check the SMF milestones and make sure the 
multi-user server service is enabled and running. 

Restricting an ftp user within his home directory

1. Go to the host and check the /etc/ftpd/ftpaccess file.

2. Add the below entry

    restricted-uid [login-id]


 eg)restricted-uid ftpuseraaron

3. This will restrict the ftp user to his home directory and deny navigation through filesystems.

Tuesday, October 5, 2010

Why syslog stops working ?

1. Could be because of space issue in /var
2. Could be because of spaces in /etc/syslog.conf

Check dmesg command and see when was the last log entry made. Also check if there is any space issue reported. If there was any space issues logged, clear the file systems and restart the daemon again.

The use of a space instead of a tab between facility.level and destination in /etc/syslog.conf will stop sylogd loging anything. Restore a fresh version of syslog.conf file and restart the daemon.

1. Check the syslog daemon and restart it if necessary.

(server1:/)# ps -ef | grep syslog
    root   226  7244   0 10:22:41 ?           0:01 /usr/sbin/syslogd
    root  3068  1691   0 10:07:58 pts/2       0:00 grep syslog

(server1:/)# svcs svc:/system/system-log:default
STATE          STIME    FMRI
online         Sep_28   svc:/system/system-log:default
(server1:/)#
(server1:/)# svcadm restart svc:/system/system-log:default
(server1:/)#
(server1:/)# svcs svc:/system/system-log:default
STATE          STIME    FMRI
online         12:39:27 svc:/system/system-log:default
(server1:/)#



Check the messages file and confirm if the logging is working fine after the corrections.

dmesg (or)
(server1:/)# ls -l /var/adm/messages
-rw-r--r--   1 root     root       46666 Oct  4 12:40 /var/adm/messages
(server1:/)#

Monday, October 4, 2010

Breaking an unresponsive system

System is unresponsive and is unreachable. Checking the status of the system through console shows the system is running but still no response from it. Looks like the system is hung. So to come out, we need to send break from console and reset the machine.

M9000 console. The unresponsive machine is domain 0

XSCF> sendbreak -d 0
Send break signal to DomainID 0?[y|n] :y
XSCF>

Open another console and login to domain and give sync in the ok prompt to initiate core dumping

XSCF> console -f -d 0
Connect to DomainID 0?[y|n] :y

### System reaches OK prompt. Give sync to force coredump

{a7} ok sync
panic[cpu167]/thread=2a174821ca0: sync initiated
sched: software trap 0x7f
pid=0, pc=0xf005d18c, sp=0x2a174820cb1, tstate=0x4400001407, context=0x0
g1-g7: 10511c4, 18de000, 60, 0, 0, 0, 2a174821ca0
00000000fdb79cd0 unix ync_handler+144 (182e400, f7, 3, 1, 1, 109f400)
%l0-3: 0000000001893e80 00000000018dddd0 00000000018ddc00 000000000000017f
%l4-7: 00000000018c1000 0000000000000000 00000000018bac00 0000000000000037
00000000fdb79da0 unix:vx_handler+80 (fdb02078, 183e038, 7fffffffffffffff, 1, 183e140, f006d515)
%l0-3: 000000000183e140 0000000000000000 0000000000000001 0000000000000001
%l4-7: 000000000182ec00 00000000f0000000 0000000001000000 0000000001019734
00000000fdb79e50 unix:callback_handler+20 (fdb02078, fdfea400, 0, 0, 0, 0)
%l0-3: 0000000000000016 00000000fdb79701 0000000000000000 0000000000000000
%l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
syncing file systems... 570 568 568 568 568 568 568 568 568 568 568 568 568 568 568 568 568 568 568 568 568 568 done (not all i/o completed)
dumping to /dev/md/dsk/d11, offset 21476081664, content: kernel
100% done: 2916208 pages dumped, compression ratio 2.90, dump succeeded

### System reboots to init level 3
rebooting...
Resetting...
.POST Sequence 01 CPU Check
LSB#02 (XSB#01-0): POST 2.11.0 (2009/06/18 09:30)
LSB#06 (XSB#03-1): POST 2.11.0 (2009/06/18 09:30)
LSB#07 (XSB#03-2): POST 2.11.0 (2009/06/18 09:30)
LSB#03 (XSB#02-0): POST 2.11.0 (2009/06/18 09:30)
LSB#01 (XSB#00-1): POST 2.11.0 (2009/06/18 09:30)
LSB#04 (XSB#02-1): POST 2.11.0 (2009/06/18 09:30)
POST Sequence 02 Banner

Machine dumps core and reboots.

Sunday, October 3, 2010

Changing UMASK for an ftp user account

UMASK is used to set the default permissions of a newly created file.

This value is defined as a system wide property in the /etc/profiles file. By default this value is set as 022.
For each user, this value can be set in their ~/.profiles file there by setting their own customized values.

To calculate permissions which will result from specific umask values, subtract the umask from 777. 
For files, the subtraction is done from 666 and for directories, 777 is used. If umask is 022, this will cause files to be created with permissions of 644 (rw-r--r--) and directories to be created with permissions of 755 (rwxr-xr-x).

So to set this umask value for a ftp user to 006 - GLOBALLY

1. Edit the file /etc/inetd.conf and change the umask value as below

$vi /etc/inetd.conf

EDIT==> ftp          stream tcp6 nowait root /usr/lbin/ftpd     ftpd -l -u 006

2. Save and reinitialize the daemon.

****Do not restart the inetd daemon*** Instead use the below command to re-initialize.

$inetd -c


Saturday, October 2, 2010

Replacing a faulted disk in a SVM - hotswap

Lets see how to replace a defective disk which is in 'Maintenance' state in SVM.

This is a hot swap in which the old failed disk is pulled out of the live system and a new disk is attached back into it. Before removing a disk, it must be un-configured from SVM.

The disk is part of a concatenated mirror. Six disks are organized as mirror with 3 disks forming a concatenation at each ends.

Mirror - d6
Sub-Mirror 1 - d15
Sub-Mirror 2 - d16

           d 6
           | |
           | |
       d15 | | d16


One of the failed disk is in the sub-mirror d16. To replace, we need to detach the sub-mirror, clear the sub-mirror, replace the disk, recreate the sub-mirror.

Step-1

$metadetach d6 d16
[Once detached, the logging device is no longer part of the trans, thus the trans is no longer logging and all benefits of logging are lost. Use the -f option if the device is busy. Use it only if the disk is in maintenance state]
$metadetach -f d6 d16
Step-2
$metaclear d16
[The metaclear command deletes the specified metadevice or deletes all hotspares/soft partitions. Once cleared, we need to create again using metainit to be able to use again]
Step-3
Here the disk is pulled out and the new disk is inserted. Necessary un-configuration is done using cfgadm and after replacement, then action is taken to recognize the disk at oS level.

$cfgadm -al

Step-4

The new attached disk is now formatted similar to the opposite sub-mirrors configuartion.

$prtvtoc /dev/dsk/c2t1d0s2|fmthard -s - /dev/rdsk/c0t1d0s2 
[Copy the VTOC from old disk to new using fmthard command]

Step-5

$metainit d16 3 1 c0t0d0s6 1 c0t1d0s2 1 c0t2d0s2
[Recreate the sub-mirror using metainit]

Step-6

$metattach d6 d16
[This reattaches the other sub-mirror and immediately starts the synchronization of data from the other sub-mirror]

Step-7

$metastat -C
[This checks the metadevice states]

Thursday, September 30, 2010

vxconfigd - Volume Manager configuration daemon

The Volume Manager configuration daemon, vxconfigd, is responsible for maintaining configurations of disks and disk groups in the VERITAS Volume Manager. vxconfigd takes requests from other utilities for configuration changes, and communicates those changes to the kernel and modifies configuration information stored on disk. vxconfigd is also responsible for initializing the Volume Manager when the system is booted. 

vxconfigd -k is used if the daemon is hung. This will kill the existing daemon and restarts. Killing the old vxconfigd and starting a new one should not cause any problems for volume or plex devices that are being used by applications or that contain mounted file systems.

Wednesday, September 29, 2010

Introducing the newly allocated storage disk to your system.

New LUNs has been assigned to your host and you want to use it. There are a series of steps involved to first find the allocated disk, make it available to veritas and finally put it into good use.

The disks from SAN will be attached to a particular controller, to find the disk, run the cfgadm command

List the state and condition of attachment points 
# cfgadm -al

Now configure the fc-fabric controller
# cfgadm -c configure c1 

Recreate the device tree which also clears the unwanted devices
# devfsadm -C

Now the devices should be available under /dev/rdsk

Tuesday, September 28, 2010

Beware of making changes when someone is monitoring you!!!

Changes to a system is a daily activity in a sysadmin's life. It cannot be avoided. And there are a lot of tools to help him do a change easily and without making mistakes. but still there is a need to be careful and be aware of these tootl which monitors and helps you. Because these tools might be there to help you, but still if you ignore them, it might turn against you.

Always check if there is any monitoring process running before making any changes to a system. Particularly if there is an application like VCS which monitors and also takes corrective actions when something abnormal is noticed, be extra careful.

Consequences because of overlooking a tool like VCS.

VCS normally defines dependencies between applications.
For example, an application will be dependent on a file system in-turn, the file system will be dependent on a disk group. Also an ip can be dependent on a file system in-turn a whole application may be dependent on that ip.

Making any changes to any of these components without the knowledge of VCS may lead to some bad things !!!

Let me explain a little about VCS first so that the below scenario can be more clear.

Veritas Cluster Server is a high availability solution from symantec. It monitors the resources(file-systems, dg, applications, ip, horc, etc) and has the ability to perform failovers in-case of failure on one system thus enabling the availability of the application. It is one of the best ways to minimize application downtime.

Practically, resources are configured in VCS and monitoring is enabled. Also dependencies are specified so that we could ensure that one particular resource cannot exists without another necessary resources. VCS has agents for each resources which monitors the status of the associated resources and can take appropriate actions like online/offline as per the rules specified.

The Scenario:
There is a filesystem /opt/MyApps/Billlogs on a machine name Server1. The filesystem is configured in VCS with some dependencies. ip-MyApps2 which is an ip resources of MyApps2 is dependent on this filesystem


fsSubApp1   SubApp2
   |        |
   |        |
   ip-MyApps2                         (parent)
         |
         |
         |
 /opt/MyApps/Billlogs                 (child)
         |
         |
    DG-MyApps


In the above configuration, the file system /opt/MyApps/Billlogs is necessary for all the resources dependent on it like ip-MyApps2, fsSubApp1 and SubApp2. If the filesystem /opt/MyApps/Billlogs is unmounted, it causes a cascading effect by pulling down the resources that depends on it.
VCS continually monitors each and every types of resources. If a particular resource is taken offline, the dependent resources are also appropriately dealt with. So if VCS is enabled and running, and the resources are monitored by VCS, off-lining or on-lining a resource without the knowledge of VCS might lead to un-foreseen impacts. For example, if the file system /opt/MyApps/Billlogs is unmounted through the system, the VCS thinks something has gone wrong so will try to take the dependent resources offline.
But in-case if the dependency is wrongly specified while configuring, ie, if there is in fact no dependency of ip-MyApps2 on /opt/MyApps/Billlogs and still if the dependency has been specified, this could lead to downtime of ip-MyApps2, which is not expected, but had happened because of the dependency. This is a mistake that happens because of improper configuration of dependency and also the off-lining activity done without  consultation with VCS.

So how to do it?
The safe way to remove a resource without affecting the already running other dependent applications is to first unlink them so that no dependency is established between the resources and then safely turn off the resources that is no longer required.

hagrp -unlink parent_group child_group

Friday, September 24, 2010

VxVM - Plex State Change Cycle

Changing plex states are part of normal operations, and do not necessarily indicate abnormalities that must be corrected.

All CLEAN(DISABLED) plexes are made ACTIVE(ENABLED) when system starts or when volume started.

EMPTY=>CLEAN=>ACTIVE=>OFFLINE/IOFAIL

OFFLINE=>STALE=>ACTIVE
IOFAIL=>ACTIVE



Veritas Volume Manager - Plexes

In Veritas Volume Management, disk spaces is allocated as sub-disks, plexes and eventually volumes. Contiguous disk blocks are grouped as sub-disks, which is a portion on the Vx Disk. 

Plex is a group of sub-disks. Plex can be organized as stripes, mirror and RAIDs. Plexes are used to form volumes.

'vxassist' command automatically creates plexes while creating volumes. Plex can also be created seperately with 'vxmake' command and attached to a volume later.

'vxprint' command is used to display plex information. (vxprint -g -l plex)

Plex States:

VxVM maintains the state of plex automatically. There are many state associated with plex which helps to identify the consistency of data. These states are very important for the recovery of volume after a system failure.

ACTIVE State: 
This state shows the plex is in use and I/O operation is happening on the plex.

CLEAN State:
If a plex has consistent data, this state is set.

EMPTY State:
This is set when a new volume is created and plex is not initialized. 

OFFLINE State:
Plex is not associated with a volume.

Plex Condition Flags:

NODEVICE:
The physical disk associated with sub-disk of plex is not available. Recovery has to be done to be able to use the plex again.

RECOVER:
The physical disk associated with plex is reattached but but is not in sync with volume and recovery is needed.

REMOVED:
The sub-disk associated with a plex is lost. Complete recover of the sub-disk is needed.

Plex Kernal State: This indicates if plex is accessible for volume driver. maintained internally, state change is reflected automatically.

DETACHED:
Plex is in maintenance state. No write access is allowed on plex.

DISABLED:
Plex is not accessible.

ENABLED:
Plex is online and read/write access is accepted.


Wednesday, September 22, 2010

Large File Support

Many operating system were designed with restricted file size support when they were initially developed. As the disk capacity and the processing capacity increased, file sizes started growing resulting in files over the size of 2GB and 4GB's. So the Operating Systems which had initially not taken this growth into consideration had to separately provide facility for processing large files.

Large file support can be enabled for a file system while creating them or after creating them. While mounting a file system, there is an option to specify large file option which checks if the underlying file system has that support enabled or not.

There are options to switch between largefiles and nolargefiles. But if a file system has largefile support and has a large file, converting it to nolargefile will result in mount failure.

fsadm allows to specify laregfile support. An example for specifying laregfile option using fsadm in hp-ux,

root@Server1:/hroot# fsadm -F vxfs -o largefiles /base/files
root@Server1:/hroot# umount /base/files
root@Server1:/hroot# mount /base/files
root@Server1:/hroot# mount | grep /base/files
/base/files on /dev/vg_base/files ioerror=mwdisable,largefiles,delaylog,dev=402f0009 on Thu Jul 15 16:32:19 2010


Saturday, September 18, 2010

sed - an introduction

A situation araises in which the admin is supposed to make changes to a particular file in a large number of servers, say about 500 servers. opening and editing the file manually takes ages and it doesn't make any sense particularly when there are tools available such things.

One such powerful tool is 'sed'. sed stands for Stream Editor and it ships with almost all unix flavours. It requires very minimal reaources to run. But it is rarely used as a common editor because it has a very difficult interface.

sed reads its input from standard input one line at a time.
sed uses its editing command on the input stream
sed sends the o/p to the standard output and can be redirected

Lets see with an example

Server1:/home/Aaron# cat file_1
line 1 This is a test file
line 2 We will use this file to test sed
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10

Server1:/home/Aaron# sed -e 's/line/LINE/g' file_1
LINE 1 This is a test file
LINE 2 We will use this file to test sed
LINE 3
LINE 4
LINE 5
LINE 6
LINE 7
LINE 8
LINE 9
LINE 10
Server1:/home/Aaron#

This is what has happened:
sed reads the standard input into the pattern space, performs a sequence of editing commands(here substitution of line with LINE) on the pattern space, then writes the pattern space to STDOUT.
Note: The original file is unharmed.

There are a lot of such commands associated with sed. It is one of the powerful utility. Lets have a look at the most frequently used commands.

Lets see more about the commands later.

Friday, September 17, 2010

SAN Storage

SAN (storage area network) storage is a type of computer data storage system designed specifically for use with large networks. It is very expensive but is very reliable, scalable and flexible. It is networ based in which the storage box is connected to the server through switches. SAN supports RAID technologies which provide variopus ways to optimize data. It provides ways for high data availability, faster access, protection from disk failures and faster recovery. These advantages come with high cost.

SANs are most commonly implemented using a technology called Fibre channel. RAID supports high-performance data communications technology that supports very fast data rates (over 2Gbps).

A SAN presents shared pools of storage devices to multiple servers. Each server can access the storage as if it were directly attached to that server. SANs make it possible to move data between various storage devices, share data between multiple servers, and backup and restore data rapidly and efficiently.

A simple illustration of How SAN and server interacts(www.vmware.com):


 


How they communicate?
Host sends embedded access request to SAN. HBA and Switches acts as the medium through which the request is send. 
The request reaches the Storage Processors which is the front interface of the SAN and communicates with the disk arrays eventually communicating with the LUNs.

Storage Devices(Disk arrays) uses RAID groups the disks and provides various functionality. The smallest unit of storage is LUN. 

When provisioning storage, the administrator uses management software to create LUNs. They can create, for example, more than one LUN from one physical drive, which would then appear as two or more discrete drives to the user. Or they may create a number of LUNs that span several separate disks that form a RAID array; but, again, these will appear as discrete drives to users. 

A given host might be able to access a LUN on a storage array through more than one path. Having more than one path from a host to a LUN is called multipathing.

LUNs can be shared between several servers. While implementing failovers, LUNs can be moved from one host to another. 

Zoning:

This is a way of providing access control within a SAN. In a physical SAN, LUNs may be shared across many hosts. By zoning, it is possible to logically group hosts and storage in a SAN. It provides a way of authorization. Only the authorized hosts can see the associated devices. Zoning lets you isolate a single server to a group of storage devices or a single storage device, or associate a grouping of multiple servers with one or more storage devices, as might be needed in a server cluster deployment.

LUN Masking:

This is used to make a LUN visible to some hosts and invisible to some other hosts. This is used to protect some LUNs from other servers which might harm them. 

Storage management itself is a very big area in an IT Infrastructure Management. But for a System Admin, Storage is an indispensable area and needs a really good understanding of the way storage works with servers. Hope the above descriptions of SAN would have helped a little.