2010‎ > ‎

Veritas Cluster Filesystem Quick Setup Guide

Saturday, August 28, 2010

Once every 6 months or so I’m asked to install the Veritas Cluster Filesystem on a group of machines, and every time it’s a complete nightmare. This is due in part to the fact that I don’t do it often so I forget all the little intricacies, but it’s also due to the arcane and unintelligible commands and error messages spit out by the various commands. Maybe I’m just an idiot, but this software seems to be needlessly complex and obfuscated. Worse, now that Veritas is part of Symantec it’s tough to get through to the right support resources, although once you do get to a real Veritas person after the 5 layers of Symantec BS the Veritas people are quite good. (Symantec: Making the world un-usable one acquisition at a time.)


I just got through doing a vxfs/cfs install on a cluster of 4 machines and finally took the time to write down each step so hopefully it won’t be so painful for me next time. Maybe you’ll find this useful also.These instructions assume you have 3 or more machines all connected to a SAN. Before you start pick one machine to run the installer on, create an ssh keypair with no passphrase, and install the public key in the authorized_keys files on the other machines in the cluster. You should be able to do “ssh machineB ls /” and get the directory listing without any password prompts. Test this for each machine before you start.
The Veritas commands are all installed in /opt/VRTS/bin. Add that to your path or just cd to that directory and prefix all the commands below with ‘./’ to get them.

VXFS/CFS Quick Setup Steps:

  1. On your SAN, create 3 luns for coordinator disks. They should be small, 120MB works, not sure what the actual lower limit is but I know that 15MB does not work.
  2. Create any data luns you want, set any ACLs or whatever else so that the hosts can see these luns. It seems like the luns have to be 1TB or less each, I tried with 4TB and 2TB and neither worked right even with gpt partition tables. It looks like veritas uses a sun partition table format and it breaks beyond 1TB. For my 4TB system I ended up making 4 1TB luns then putting them all into the same disk group, which then let me make the 4TB vxfs filesystem.
  3. Reboot systems (with -r on solaris to rebuild devs),  and confirm all the disks are visible on all machines after reboot (use ‘fdisk -l’ on linux or ‘format’ on solaris to list the disks).
  4. Note: On Linux it appears that you have to run fdisk on each disk first to write out a blank partition table. If you don’t, then ‘vxdisk list’ shows all the disks in “error” state. After writing partition tables to the disks you can ‘vxdisk scandisks’ to tell veritas to re-read all the disk partition tables.
  5. Download the right installer tarball from the Symantec web site, put into /var/tmp or somewhere appropriate, unpack the tarball.
  6. Make sure you have license keys for the features you need to insatll
  7. ‘cd’ into the unpacked installation tree and install using the “installer” script. Pick the stuff you want to install, maybe just “storage foundation cluster filesystem”, and walk through the steps, you can probably take the defaults in most cases. Do not reboot yet.
  8. Disable fencing:
    1. On all hosts in the cluster do:
      echo “vxfencoorddg” > /etc/vxfendg
      cp /etc/vxfen.d/vxfenmode_disabled /etc/vxfenmode
  9. Reboot all hosts simultaneously, wait for them all to come back up and get the cluster started, this takes a couple minutes:
    /opt/VRTS/bin/hastatus -sum   # should show all systems online, none faulted
  10. Set up the fencing coordinator disk group
    1. vxdisk list  # pick out the names of the disks you want to use for coordinator disks
    2. vxdiskadm
      1. pick option (1) to init disks
      2. give the list of disks (leave off the slice# from the disk spec)
      3. use ‘vxfencoorddg’ as the disk group name when it asks
      4. all other options you can take the default response
      5. it should create the disk group and add the disks to it. If you get errors, good luck on google….
      6. This might work instead of the vxdiskadm stuff to do the same thing:
        vxdisk init c3t5d1  #for example, leave off the slice# part
        vxdisk init c3t5d2
        vxdisk init c3t5d3
        vxdg init vxfencoorddg c3t5d1 c3t5d2 c3t5d3
    3. vxdg list  # should show your vxfencoorddg disk group
    4. vxdisk list   # should show all your fence disks online
    5. Set the fence disk group to not import at boot:
      vxdg deport vxfencoorddg
      vxdg -t import vxfencoorddg
      vxdg deport vxfencoorddg
    6. Switch the fencing driver to use scsi3 fencing:
      1. On all nodes:
        cp /etc/vxfen.d/vxfenmode_scsi3_raw /etc/vxfenmode
  11. Reboot all hosts simultaneously, wait for them all to come back up and get the cluster started:
    /opt/VRTS/bin/hastatus -sum   # should show all systems online, none faulted
  12. Determine which is the master node and ssh to that node:
    vxdctl -c mode
  13. Set up data volumes/dgs/filesystems
    1. Figure out which disks to use
      vxdisk list
    2. Initialize the disks and create the default disk group (use the same name you told the installer to use)
      vxdiskadm  # pick (1), list disks and take defaults except maybe disk group name
    3. Create a volume on that new disk group:
      vxassist maxsize   # get the total avail size
      vxassist make vxvol1 1234567M    # value from maxsize or smaller
    4. Create a vxfs filesystem on the new volume:
      mkfs -t vxfs /dev/vx/dsk/vxdg1/vxvol1  #  Solaris:  mkfs -F vxfs /dev/vx/rdsk/vxdg1/vxvol1
    5. Deport the disk group then import it as shared and add shared-write flag:
      vxdg deport vxdg1
      vxdg -s import vxdg1  # makes it shared
      cfsdgadm add vxdg1 all=sw   # grant shared-write to all hosts
      vxvol start vxvol1   # starts the volume
    6. On each machine create the directory you want the vxfs filesystem to mount on:
      mkdir /vxpsace
      ssh machineB mkdir /vxspace
      etc…
    7. Add the cluster mountpoint to the cluster config:
      cfsmntadm add vxdg1 vxvol1 /vxspace all=cluster
    8. Tell cfs to mount the filesystem on all the cluster machines
      cfsmount /vxspace
  14. Confirm that the cluster filesystem mounted everywhere, set perms as desired, etc.
That’s it, simple right?  Trust me, it never goes smoothly, good luck.

Troubleshooting

  1. Find out the status of your vxfs mount:
    hares -state cfsmount1
  2. Check out the state of all the drive paths, especially useful in a multipath setup:
    /sbin/vxdmpadm list dmpnode all
  3. Check filesystem superblock for the force-fsck flag if a filesystem refuses to mount:
    echo “8192B.p S” | ./fsdb -F vxfs /dev/vx/rdsk/sundg1/sunvol1
    #   if “flags”  shows “1″ then needs full fsck, so do
    ./cfsumount /vxspace
    fsck -y -F vxfs -o full /dev/vx/rdsk/sundg1/sunvol1
    ./cfsmount /vxspace
    # BEWARE: THIS DOES TAKE THE ENTIRE CLUSTER OFFLINE!
Comments