Tuesday, September 7, 2010

VxVM - Recovering Volumes after a Disk Failure !!!


Situation:- Host has lost connection with the SAN box which resulted in disk not available to the dg which inturns affects the mounts.

To Recover and start a Veritas Volume Manager logical volume where the volume is DISABLED ACTIVE and state of Plex is DISABLED NODEVICE. When a system encounters a problem with a volume or a plex, or if Veritas Volume Manager (VxVM) has any reason to believe that the data is not synchronized, VxVM changes the kernel state, KSTATE and state, STATE, of the volume and its plexes accordingly. 

The plex state can be stale, empty, nodevice, etc. A particular plex state does not necessarily mean that the data is good or bad. 
The plex state is representative of VxVM's perception of the data in a plex. 

vxprint displays information from records in VxVM disk group configurations, including the KSTATE and STATE of a volume and plex. When viewing the configuration records of a VxVM disk group using the vxprint utility and the KSTATE and STATE fields display DISABLED ACTIVE for the volume and DISABLED RECOVER for the plex, recovery steps need to be followed to bring the volume back to an ENABLED ACTIVE state so it can be mounted and make the file system accessible again.

Below are the steps to follow:-
1. Check the dg; If the status is disabled, deport and import the dg

(server1:/)# vxdg list
NAME         STATE           ID
MyDG-app    enabled         1232625005.170.server1
2. Check the dg details using vxprint utility check the volumes
(server1:/)# vxprint -g MyDG-app -v

TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
v  usrapp       fsgen        DISABLED 2097152  -        ACTIVE   -       -
v  usrappPS4    fsgen        DISABLED 25165824 -        ACTIVE   -       -
v  usrappSMD    fsgen        DISABLED 3145728  -        ACTIVE   -       -
v  usrappput    fsgen        DISABLED 10485760 -        ACTIVE   -       -
v  MyDG-swap   fsgen        ENABLED  62914560 -        ACTIVE   -       -
3. Try starting the volumes
(server1:/)# vxvol -g MyDG-app startall
VxVM vxvol ERROR V-5-1-1201 Volume usrapp has no associated data plexes
VxVM vxvol ERROR V-5-1-1201 Volume usrappPS4 has no associated data plexes
VxVM vxvol ERROR V-5-1-1201 Volume usrappSMD has no associated data plexes
VxVM vxvol ERROR V-5-1-1201 Volume usrappput has no associated data plexes
4. Check the vg details using vxprint -htg utility
(server1:/)#vxprint -htg MyDG-app

DG NAME         NCONFIG      NLOG     MINORS   GROUP-ID
ST NAME         STATE        DM_CNT   SPARE_CNT         APPVOL_CNT
DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE
RV NAME         RLINK_CNT    KSTATE   STATE    PRIMARY  DATAVOLS  SRL
RL NAME         RVG          KSTATE   STATE    REM_HOST REM_DG    REM_RLNK
CO NAME         CACHEVOL     KSTATE   STATE
VT NAME         NVOLUME      KSTATE   STATE
V  NAME         RVG/VSET/CO  KSTATE   STATE    LENGTH   READPOL   PREFPLEX UTYPE
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
SV NAME         PLEX         VOLNAME  NVOLLAYR LENGTH   [COL/]OFF AM/NM    MODE
SC NAME         PLEX         CACHE    DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
DC NAME         PARENTVOL    LOGVOL
SP NAME         SNAPVOL      DCO

dg MyDG-app    default      default  76000    1226671526.111.serverold

dm EMC0_0       -            -        -        -        NODEVICE
dm EMC0_1       -            -        -        -        NODEVICE
dm EMC0_2       -            -        -        -        NODEVICE
dm EMC0_4       -            -        -        -        NODEVICE
dm EMC0_5       -            -        -        -        NODEVICE
dm EMC0_16      -            -        -        -        NODEVICE
dm EMC0_17      -            -        -        -        NODEVICE
dm EMC0_18      -            -        -        -        NODEVICE
dm EMC0_19      -            -        -        -        NODEVICE
dm EMC1_0       EMC1_0       auto     3583     70707840 -

v  appPS4       -            DISABLED ACTIVE   12582912 SELECT    -        fsgen
pl appPS4-01    appPS4       DISABLED NODEVICE 12582912 CONCAT    -        RW
sd EMC0_4-01    appPS4-01    EMC0_4   0        12582912 0         -        NDEV

v  appPS4archreorg -         DISABLED ACTIVE   10485760 SELECT    -        fsgen
pl appPS4archreorg-01 appPS4archreorg DISABLED NODEVICE 10485760 CONCAT -  RW
sd EMC0_16-01   appPS4archreorg-01 EMC0_16 0   10485760 0         -        NDEV

v  appPS4mlogA  -            DISABLED ACTIVE   1048576  SELECT    -        fsgen
pl appPS4mlogA-01 appPS4mlogA DISABLED NODEVICE 1048576 CONCAT    -        RW
sd EMC0_18-01   appPS4mlogA-01 EMC0_18 0       1048576  0         -        NDEV

v  appPS4mlogB  -            DISABLED ACTIVE   1048576  SELECT    -        fsgen
pl appPS4mlogB-01 appPS4mlogB DISABLED NODEVICE 1048576 CONCAT    -        RW
sd EMC0_19-01   appPS4mlogB-01 EMC0_19 0       1048576  0         -        NDEV

v  appPS4ologA  -            DISABLED ACTIVE   1048576  SELECT    -        fsgen
pl appPS4ologA-01 appPS4ologA DISABLED NODEVICE 1048576 CONCAT    -        RW
sd EMC0_1-02    appPS4ologA-01 EMC0_1 1048576  1048576  0         -        NDEV

v  appPS4ologB  -            DISABLED ACTIVE   1048576  SELECT    -        fsgen
pl appPS4ologB-01 appPS4ologB DISABLED NODEVICE 1048576 CONCAT    -        RW
sd EMC0_18-02   appPS4ologB-01 EMC0_18 1048576 1048576  0         -        NDEV

v  appPS4appdata1 -          DISABLED ACTIVE   241172480 SELECT   -        fsgen
pl appPS4appdata1-01 appPS4appdata1 DISABLED NODEVICE 241172480 CONCAT -   RW
sd EMC0_0-02    appPS4appdata1-01 EMC0_0 4194304 66513536 0       -        NDEV
sd EMC0_1-03    appPS4appdata1-01 EMC0_1 2097152 68610688 66513536 -       NDEV
sd EMC0_16-02   appPS4appdata1-01 EMC0_16 10485760 37437568 135124224 -    NDEV
sd EMC0_18-03   appPS4appdata1-01 EMC0_18 2097152 68610688 172561792 -     NDEV

v  appPS410264  -            DISABLED ACTIVE   16777216 SELECT    -        fsgen
pl appPS410264-01 appPS410264 DISABLED NODEVICE 16777216 CONCAT   -        RW
sd EMC0_5-01    appPS410264-01 EMC0_5 0        16777216 0         -        NDEV

v  appcle       -            DISABLED ACTIVE   4194304  SELECT    -        fsgen
pl appcle-01    appcle       DISABLED NODEVICE 4194304  CONCAT    -        RW
sd EMC0_0-01    appcle-01    EMC0_0   0        4194304  0         -        NDEV

v  appclient    -            DISABLED ACTIVE   1048576  SELECT    -        fsgen
pl appclient-01 appclient    DISABLED NODEVICE 1048576  CONCAT    -        RW
sd EMC0_1-01    appclient-01 EMC0_1   0        1048576  0         -        NDEV

v  appstage102  -            DISABLED ACTIVE   20971520 SELECT    -        fsgen
pl appstage102-01 appstage102 DISABLED NODEVICE 20971520 CONCAT   -        RW
sd EMC0_2-01    appstage102-01 EMC0_2 0        20971520 0         -        NDEV

v  apptemp      -            DISABLED ACTIVE   52428800 SELECT    -        fsgen
pl apptemp-01   apptemp      DISABLED NODEVICE 52428800 CONCAT    -        RW
sd EMC0_17-01   apptemp-01   EMC0_17  0        52428800 0         -        NDEV

v  appmntPS4    -            DISABLED ACTIVE   10485760 SELECT    -        fsgen
pl appmntPS4-01 appmntPS4    DISABLED NODEVICE 10485760 CONCAT    -        RW
sd EMC0_5-02    appmntPS4-01 EMC0_5   16777216 10485760 0         -        NDEV

v  apptemp      -            DISABLED ACTIVE   83886080 SELECT    -        fsgen
pl apptemp-01   apptemp      DISABLED NODEVICE 83886080 CONCAT    -        RW
sd EMC0_4-02    apptemp-01   EMC0_4   12582912 58124928 0         -        NDEV
sd EMC0_19-02   apptemp-01   EMC0_19  1048576  25761152 58124928  -        NDEV

v  usrapp       -            DISABLED ACTIVE   2097152  SELECT    -        fsgen
pl usrapp-01    usrapp       DISABLED NODEVICE 2097152  CONCAT    -        RW
sd EMC0_2-02    usrapp-01    EMC0_2   20971520 2097152  0         -        NDEV

v  usrappPS4    -            DISABLED ACTIVE   25165824 SELECT    -        fsgen
pl usrappPS4-01 usrappPS4    DISABLED NODEVICE 25165824 CONCAT    -        RW
sd EMC0_2-03    usrappPS4-01 EMC0_2   23068672 25165824 0         -        NDEV

v  usrappSMD    -            DISABLED ACTIVE   3145728  SELECT    -        fsgen
pl usrappSMD-01 usrappSMD    DISABLED NODEVICE 3145728  CONCAT    -        RW
sd EMC0_19-03   usrappSMD-01 EMC0_19  26809728 3145728  0         -        NDEV

v  usrappput    -            DISABLED ACTIVE   10485760 SELECT    -        fsgen
pl usrappput-01 usrappput    DISABLED NODEVICE 10485760 CONCAT    -        RW
sd EMC0_5-04    usrappput-01 EMC0_5   28311552 10485760 0         -        NDEV

v  MyDG-swap   -            ENABLED  ACTIVE   62914560 SELECT    -        fsgen
pl MyDG-swap-01 MyDG-swap  ENABLED  ACTIVE   62914560 CONCAT    -        RW
sd EMC1_0-01    MyDG-swap-01 EMC1_0  0        62914560 0         EMC1_0   ENA

5. The above command shows some plexes in NODEVICE state, so some disks might 
have failed. Check the disk status using vxdisk command

(server1:/)# vxdisk list | grep MyDG-app
EMC1_0       auto:sliced     EMC1_0       MyDG-app    online
-            -         EMC0_0       MyDG-app    failed was:EMC0_0
-            -         EMC0_1       MyDG-app    failed was:EMC0_1
-            -         EMC0_2       MyDG-app    failed was:EMC0_2
-            -         EMC0_4       MyDG-app    failed was:EMC0_4
-            -         EMC0_5       MyDG-app    failed was:EMC0_5
-            -         EMC0_16      MyDG-app    failed was:EMC0_16
-            -         EMC0_17      MyDG-app    failed was:EMC0_17
-            -         EMC0_18      MyDG-app    failed was:EMC0_18
-            -         EMC0_19      MyDG-app    failed was:EMC0_19
Also vxprint command shows the status
(server1:/)# vxprint -htg MyDG-app -d
DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

dm EMC0_0       -            -        -        -        NODEVICE
dm EMC0_1       -            -        -        -        NODEVICE
dm EMC0_2       -            -        -        -        NODEVICE
dm EMC0_4       -            -        -        -        NODEVICE
dm EMC0_5       -            -        -        -        NODEVICE
dm EMC0_16      -            -        -        -        NODEVICE
dm EMC0_17      -            -        -        -        NODEVICE
dm EMC0_18      -            -        -        -        NODEVICE
dm EMC0_19      -            -        -        -        NODEVICE
dm EMC1_0       EMC1_0       auto     3583     70707840 -
6. The above command shows disks in failed state. This can be reattached using 
vxreattach command

(server1:/)# /etc/vx/bin/vxreattach -c EMC0_0 
MyDG-app EMC0_0
//-c option shows the status of which dg this disk is associated with
(server1:/)# /etc/vx/bin/vxreattach EMC0_1        //attach all disks similarly
7. Now check the status
(server1:/)# vxdisk list | grep MyDG-app
EMC0_0       auto:sliced     EMC0_0       MyDG-app    online
EMC0_1       auto:sliced     EMC0_1       MyDG-app    online
EMC0_2       auto:sliced     EMC0_2       MyDG-app    online
EMC0_4       auto:sliced     EMC0_4       MyDG-app    online
EMC0_5       auto:sliced     EMC0_5       MyDG-app    online
EMC0_16      auto:sliced     EMC0_16      MyDG-app    online
EMC0_17      auto:sliced     EMC0_17      MyDG-app    online
EMC0_18      auto:sliced     EMC0_18      MyDG-app    online
EMC0_19      auto:sliced     EMC0_19      MyDG-app    online
EMC1_0       auto:sliced     EMC1_0       MyDG-app    online

//vxdiskconfig - This command rescan all vxdisk and attach
8. Check the status again which shows plexes in RECOVER state
(server1:/)# vxprint -htg MyDG-app -v | grep pl
pl appPS4-01    appPS4       DISABLED RECOVER  12582912 CONCAT    -        RW
pl appPS4archreorg-01 appPS4archreorg DISABLED RECOVER 10485760 CONCAT -   RW
pl appPS4mlogA-01 appPS4mlogA DISABLED RECOVER 1048576  CONCAT    -        RW
pl appPS4mlogB-01 appPS4mlogB DISABLED RECOVER 1048576  CONCAT    -        RW
pl appPS4ologA-01 appPS4ologA DISABLED RECOVER 1048576  CONCAT    -        RW
pl appPS4ologB-01 appPS4ologB DISABLED RECOVER 1048576  CONCAT    -        RW
pl appPS4appdata1-01 appPS4appdata1 DISABLED RECOVER 241172480 CONCAT -    RW
pl appPS410264-01 appPS410264 DISABLED RECOVER 16777216 CONCAT    -        RW
pl appcle-01    appcle       DISABLED RECOVER  4194304  CONCAT    -        RW
pl appclient-01 appclient    DISABLED RECOVER  1048576  CONCAT    -        RW
pl appstage102-01 appstage102 DISABLED RECOVER 20971520 CONCAT    -        RW
pl apptemp-01   apptemp      DISABLED RECOVER  52428800 CONCAT    -        RW
pl appmntPS4-01 appmntPS4    DISABLED RECOVER  10485760 CONCAT    -        RW
pl apptemp-01   apptemp      DISABLED RECOVER  83886080 CONCAT    -        RW
pl usrapp-01    usrapp       DISABLED RECOVER  2097152  CONCAT    -        RW
pl usrappPS4-01 usrappPS4    DISABLED RECOVER  25165824 CONCAT    -        RW
pl usrappSMD-01 usrappSMD    DISABLED RECOVER  3145728  CONCAT    -        RW
pl usrappput-01 usrappput    DISABLED CLEAN    10485760 CONCAT    -        RW
pl MyDG-swap-01 MyDG-swap  ENABLED  ACTIVE   62914560 CONCAT    -        RW

9. Now recover the volume by fixing the plex first to STALE and next to CLEAN

->Get the second column and get it in a file
(server1:/)# vxprint -htg MyDG-app -v | grep pl | awk '{print $2}' > /var/tmp/vxmendlist

->Loop each plex and fix the plex state
(server1:/)# for i in `cat /var/tmp/vxmendlist`
> do
> vxmend -g MyDG-app fix stale $i
> vxmend -g MyDG-app fix clean $i
> done

10. Now the state would be DISABLE CLEAN. Use vxvol to activate the volumes in dg

(server1:/)# vxvol -g MyDG-app startall


==>> We have successfully recovered the volumes :) 




Some doc's on VxVM:
http://sfdoccentral.symantec.com/sf/5.0/solaris/pdf/vxvm_admin.pdf

1 comment:

  1. Thanks for sharing this important information. You may also refer http://www.s4techno.com/blog/2016/06/17/extend-vxvm-filesystem/

    ReplyDelete