Migrating an existing CentOS7 installation from ext4 to ZFS [ROOT on ZFS]

As a long term ZFS user I have been using it on most of my production and home servers. But how about client machines?. Bringing all its nice features  (snapshots,clones,checksums etc) on my laptop, desktop machine sounds like a great idea.Since ZFS is not included in mainline Linux kernel some Linux distributions have decided to integrate ZFS (ZoL) into their repositories. Such examples is Ubuntu, Arch and Gentoo Linux. CentOS also supports ZFS via its ELREPO repository. This guide is about  the last one, (CentOS) since that’s what I mainly use at my work. I decided to share my experience by writing  this guide hoping that it would be interesting for you as well.

What you will need:

– Ubuntu 16.04 LTS Desktop Live CD.
– An existing CentOS7 installation (ext4,xfs or whatever).
– A spare hard drive to be used for the ZFS installation.

Initial preparation:

  1. Connect the spare hard drive(where ZFS is to be configured) into the machine. This can be done either internally by using SATA cable or externally via a USB case for example.The spare hard drive capacity must be equal or larger than the source drive.
  2. Use Ubuntu LiveCD to boot the machine and select “Try Ubuntu” option.
  3. Once in the Ubuntu live environment, open the Terminal.
  4. First thing we need to do is to download ZFS packages for Ubuntu:

    root@ubuntu:/# apt-add-repository universe
    ‘universe’ distribution component enabled for all sources.

    root@ubuntu:/# apt update && apt -y install zfs-initramfs

    Ign:1 cdrom://Ubuntu 16.04.3 LTS _Xenial Xerus_ – Release amd64 (20170801) xenial InReleaseHit:2 cdrom://Ubuntu 16.04.3 LTS _Xenial Xerus_ – Release amd64 (20170801) xenial ReleaseGet:3 http://security.ubuntu.com/ubuntu xenial-security InRelease [102 kB] Hit:5 http://archive.ubuntu.com/ubuntu xenial InRelease Get:6 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [396 kB]……The following additional packages will be installed: libnvpair1linux libuutil1linux libzfs2linux libzpool2linux zfs-doc zfs-zed zfsutils-linuxSuggested packages: default-mta | mail-transport-agent nfs-kernel-serverThe following NEW packages will be installed: libnvpair1linux libuutil1linux libzfs2linux libzpool2linux zfs-doc zfs-initramfs zfs-zed zfsutils-linux0 upgraded, 8 newly installed, 0 to remove and 284 not upgraded.Need to get 901 kB of archives……Setting up zfs-initramfs (0.6.5.6-0ubuntu18) …Processing triggers for libc-bin (2.23-0ubuntu9) …Processing triggers for systemd (229-4ubuntu19) …Processing triggers for ureadahead (0.100.0-19) …Processing triggers for initramfs-tools (0.122ubuntu8.8) …update-initramfs is disabled since running on read-only media

  5. Now you should have all you need to create a ZFS pool, so let’s proceed further and that would be  disk partitioning. Normally ZFS uses whole drives to save data, but in this case we need to reserve a small partition [~1MB] for GRUB to install the boot loader.So, we’ll have to create 2 partitions, one for GRUB and a second for ZFS. Assuming that the destination hard disk is sdb we use the following commands to create the partitions into it:5a. First clear any previous partitions on the destination disk.

    root@ubuntu:/#sgdisk –zap-all /dev/sdb

    5b. Then create a boot partition to be used by GRUB. Use this for legacy (BIOS) booting.

    root@ubuntu:/#sgdisk -a1 -n2:34:2047 -t2:EF02 /dev/sdb

    5c. Now create the 2nd partition to be used for ZFS data.

    root@ubuntu:/#sgdisk -n1:0:0 -t1:BF01 /dev/sdb5d. List partitionsroot@ubuntu:/#gdisk -l /dev/sdb

    Number Start (sector) End (sector) Size Code Name
    1           2048               488397134  232.9 GiB BF01  –> ZFS Data
    2           34                  2047            1007.0 KiB EF02 –> GRUB boot

  6. Now that we got our hard drive partitioned, we need to create a ZFS pool onto it. One very important thing to note here is that ZFS does not play well with “/dev/sdb, /dev/sdc etc” namings for hard disks. What you will have to do is to use the “/dev/disk/by-id/xxxxxx” naming scheme.Be very careful at this point to select the correct hard drive and partition (you can use gdisk -l /dev/disk/by-id/<drive_name>” to verify the partitions you created previously).Since I’m doing these tests in a virtual machine, you will notice that the hard disk is showing up as “ata-VBOX…“. Make sure that you replace that with your hard drive.** Create ZFS pool on the disk.Be sure to add the “-part1” at the end, otherwise zpool command will overwrite boot parition. **

    root@ubuntu:/#zpool create -O atime=off -O canmount=off -O compression=lz4 -O mountpoint=/ -R /mnt rpool /dev/disk/by-id/ata-VBOX_HARDDISK_VB7d5f9023-2bfafc91-part1 -f

    ** List pool **
    root@ubuntu:/#zpool listNAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
    rpool   79.5G 274K   79.5G        –             0%     0%     1.00x   ONLINE      /mntSome important things to note on the above command and its results are the following:

    – The pool will be created with access time property disabled.
    – It should not mount itself.
    – The compression algorithm will be LZ4.
    – Default mountpoint will be (/). This is going to be used to properly mount CentOS ROOT fs later.
    – The alternate mount point will be (/mnt). This is the temporary mount point to be used during the live session (Ubuntu LiveCD) to mount the pool. It’s perfectly fine to select something else there, for example (/rpool). If you decide to do this, remember to replace (/mnt) with (/rpool) on the commands that will follow later in this guide.

  7. [Optional] Create rest of datasets as needed.This step is used mainly to separate ROOT (/) system from other system directories, like /var/log and /home for example. This is to ensure that those folders contents will be preserved during rollbacks of the ROOT system (i.e after a failed system upgrade).** Create ROOT datasets **

    root@ubuntu:/#zfs create -o canmount=off -o mountpoint=none rpool/ROOT
    root@ubuntu:/#zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/centos** Mount ROOT filesystem **

    root@ubuntu:/#zfs mount rpool/ROOT/centos
    root@ubuntu:/#df -h /mnt

    Filesystem                 Size    Used   Avail  Use%  Mounted on
    rpool/ROOT /centos  223G   9.2G   213G   5%       /mnt
  8. ** Create rest of datasets **

    It’s perfectly fine to not create some (or all) of these datasets but they will make your life much easier when in the future you will have to rollback the ROOT system and  you need to preserve the content of the (/home) directory or (/var/log) directory for example. Another thing is that you will have to create most of them by using the “legacy” property. This means that these datasets will not auto mount themselves during boot but rather expect from (/etc/fstab) file to mount them.

    root@ubuntu:/#zfs create -o mountpoint=legacy -o setuid=off rpool/home
    root@ubuntu:/#zfs create -o mountpoint=legacy -o setuid=off rpool/centos-test2
    root@ubuntu:/#zfs create -o mountpoint=legacy rpool/home/root
    root@ubuntu:/#zfs create -o canmount=off -o setuid=off -o exec=off rpool/var
    root@ubuntu:/#zfs create -o mountpoint=legacy -o com.sun:auto-snapshot=false rpool/var/cache
    root@ubuntu:/#zfs create -o mountpoint=legacy rpool/var/log
    root@ubuntu:/#zfs create -o mountpoint=legacy rpool/var/spool
    root@ubuntu:/#zfs create -o mountpoint=legacy -o com.sun:auto-snapshot=false -o exec=on rpool/var/tmp** Create a ZVOL to be used for SWAP **

    This ZFS volume is going to be used as a SWAP partition. Make sure you adjust its size on your system needs.

    root@ubuntu:/#zfs create -V 4G -o compression=zle -o logbias=throughput -o sync=always -o primarycache=metadata -o secondarycache=none -o com.sun:auto-snapshot=false rpool/swap

  9. Copy content from original CentOS7 installation [source drive] to the ZFS pool. In the example below, my CentOS7 installation is located in a LV (root) in a VG named “centos_test”. The “sda” is a temporary mount point I created to mount LV.** Create a mountpoint for mounting original CentOS installation disk, in this example that is “/sda” **

    root@ubuntu:/#mkdir /sda
    root@ubuntu:/#mount /dev/centos_test/root /sda** Mount boot parition (sda1). This is where CentOS7 kernel,initramfs and GRUB files are located in. This is a separate ~500MB ext4 partition **
    root@ubuntu:/#mount /dev/sda1 /sda/boot** rsync all content from source disk to destination pool. In this step, we basically copy the whole CentOS7 installation from drive1[LVM/EXT4] to drive2[ZFS] **

    root@ubuntu:/#rsync -avPX /sda/ /mnt/

    ** Unmount original (source) disk. From this point we don’t need our CentOS7 installation drive anymore, so it’s safe to unmount it **

    root@ubuntu:/#umount -R /sda

  10. Now we need to mount all previously created datasets into (/mnt) and prepare the chroot environment (CentOS). From this point we leave Ubuntu live environment and we chroot to the CentOS7 ZFS environment since there are still things to do like for example to modify the contents of (/etc/fstab) file and configure GRUB.

    root@ubuntu:/# mount -t zfs zpool/var/log /mnt/var/log
    root@ubuntu:/# mount -t zfs zpool/var/tmp /mnt/var/tmp
    root@ubuntu:/# mount -t zfs zpool/var/cache /mnt/var/cache
    root@ubuntu:/# mount -t zfs zpool/var/spool /mnt/var/spool
    root@ubuntu:/# mount -t zfs zpool/home /mnt/home
    root@ubuntu:/# mount -t zfs zpool/home/root /mnt/root

    root@ubuntu:/#mount -o bind /dev /mnt/dev
    root@ubuntu:/#mount -o bind /proc /mnt/proc
    root@ubuntu:/#mount -o bind /sys /mnt/sys
    root@ubuntu:/#chroot /mnt /bin/bash –login

  11. First thing to check once in CentOS7 (chrooted) is to check its version.root@centos-test:/# lsb_release -a
    Distributor ID: CentOS  Distributor ID: CentOS  Description: CentOS Linux release 7.3.1611 (Core) Release: 7.3.1611      Codename: Core
  12. Install the ZFS packages for CentOS as described here: https://github.com/zfsonlinux/zfs/wiki/RHEL-and-CentOS
    ** Take a note of the CenOS7 version.You will need that info for downloading
    ** proper ZFS package as described in the url above.In this case, installed CentOS version is 7.3, so I’m downloading the ZFS package for that version.distro.

    root@centos-test:/# lsb_release -a
    root@centos-test:/# yum install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm
    root@centos-test:/# gpg –quiet –with-fingerprint /etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux** Modify /etc/yum.repos.d/zfs.repo if needed.By default dkms type packages will be used.Please modify if you prefer kABI type packages.**** Install CentOS ZFS packages **

    root@centos-test:/# yum install zfs** Install the ZFS package of dracut to create a ZFS aware initramfs later **root@centos-test:/# yum install zfs-dracut

    ** Modify the (/etc/fstab) file as follows (Remove any existing entries since they do not apply anymore to ZFS CentOS7 setup). **

    rpool/var/cache /var/cache zfs defaults 0 0
    rpool/var/log /var/log zfs defaults 0 0
    rpool/var/spool /var/spool zfs defaults 0 0
    rpool/var/tmp /var/tmp zfs defaults 0 0
    rpool/home /home zfs defaults 0 0
    rpool/home/root /root zfs defaults 0 0
    /dev/zvol/rpool/swap none swap defaults 0 0

  13.  Configuring GRUB and initramfs (dracut)** Modify dracut configuration as follows **

    root@centos-test:/# vi /etc/dracut.conf** comment out and modify this line as follows: add_dracutmodules+=”zfs”

    ** Finally generate a new ZFS aware initramfs. Make sure you use your own kernel version in dracut parameters! **

    root@centos-test:/# dracut -f -M -kver 3.10.0-514.26.2.el7

    Executing: /sbin/dracut -f -M –kver /boot/initramfs-3.10.0-514.26.2.el7.x86_64.img 3.10.0-514.26.2.el7.x86_64
    dracut module ‘busybox’ will not be installed, because command ‘busybox’ could not be found!
    dracut module ‘nbd’ will not be installed, because command ‘nbd-client’ could not be found!
    dracut module ‘biosdevname’ will not be installed, because command ‘biosdevname’ could not be found!
    *** Including module: bash ***
    *** Including module: fips ***
    *** Including module: modsign ***
    ….
    *** Hardlinking files ***
    *** Hardlinking files done ***
    *** Generating early-microcode cpio image contents ***
    *** Constructing AuthenticAMD.bin ****
    *** Constructing GenuineIntel.bin ****
    *** Store current command line parameters ***
    *** Creating image file ***
    *** Creating microcode section ***
    *** Created microcode section ***
    *** Creating image file done ***
    *** Creating initramfs image file ‘/boot/initramfs-3.10.0-514.26.2.el7.x86_64.img’
    done ***

    ***Important! If you get a message saying “unknown filesystem when running grub2-install command, then you will have to compile the latest version of grub from github and run that version of grub-install to install the boot loader ***

     

    root@centos-test:/# vi /etc/default/grub
    ** Modify GRUB_CMDLINE_LINUX as follows (leave only defaults):

    GRUB_CMDLINE_LINUX=”rhgb quiet”

    ** The following line is required to avoid an error when running grub2-mkconfig command later below **

    root@centos-test:/# export ZPOOL_VDEV_NAME_PATH=YES

    ** Generate a new GRUB config **

    root@centos-test:/# grub2-mkconfig -o /boot/grub2/grub.cfg

    Generating grub configuration file …
    Found linux image: /boot/vmlinuz-3.10.0-514.26.2.el7.x86_64
    Found initrd image: /boot/initramfs-3.10.0-514.26.2.el7.x86_64.img
    Found linux image: /boot/vmlinuz-0-rescue-71ce3f23d5324e69aba211b4405fbf4c
    Found initrd image: /boot/initramfs-0-rescue-71ce3f23d5324e69aba211b4405fbf4c.img
    Found linux image: /boot/vmlinuz-0-rescue-27a3968f98aa4670a8ce5e4c952d8f77
    Found initrd image: /boot/initramfs-0-rescue-27a3968f98aa4670a8ce5e4c952d8f77.img
    done

    ** Install bootloader on ZFS drive **

    root@centos-test:/# grub2-install /dev/sdb
    Installing for i386-pc platform.
    Installation finished. No error reported.


  14. That’s all, if everything went well you can now exit chroot environment, unmount ZFS datasets, export the pool and finally shutdown the machine and use ZFS disk as the boot disk (old hard drive can be unplugged).

    root@centos-test#: exit
    root@ubuntu#: umount -R /mnt
    root@ubuntu#: zpool export rpool
    root@ubuntu#: poweroff

  15. Boot from ZFS disk and see if it works, good luck!

Reinstalling GRUB on a non bootable UEFI Ubuntu 16.04 ZFS installation

You can use the steps below to reinstall grub on a Ubuntu 16.04 ROOT on ZFS  installation.

Step 1: Prepare The Install Environment

1.1 Boot the Ubuntu Live CD, select Try Ubuntu Without Installing, and open a terminal (press Ctrl-Alt-T).

1.2 Optional: Install the OpenSSH server in the Live CD environment: If you have a second system, using SSH to access the target system can be convenient.

$ sudo apt-get --yes install openssh-server

Set a password on the “ubuntu” (Live CD user) account:

$ passwd

Hint: You can find your IP address with ip addr show scope global. Then, from your main machine, connect with ssh ubuntu@IP.

1.3 Become root:

# sudo -i

1.4 Install ZFS in the Live CD environment:

# apt-add-repository universe
# apt update

(ignore errors about moving an old database out of the way)

# apt install --yes debootstrap gdisk zfs-initramfs grub-efi-amd64

Step 2: Discover available ZFS pools

2.1 check if ZFS pools are already imported

# zpool list
# zfs list 

2.2 if so, we need to export the zfs pool so we can mount it in a different directory so we can chroot to it

# zpool export rpool

Step 3: Chroot into ZFS pool

3.1 import pool to non-default location. The -N flag (don’t automatically mount) is necessary because otherwise the rpool root, and the rpool/root/UBUNTU pool, will both try to mount on /mnt

# zpool import -a -N -R /mnt

3.2 mount the root system

# zfs mount rpool/ROOT/ubuntu

3.3 mount the remaining file systems

# zfs mount -a

3.4 Bind the virtual filesystems from the LiveCD environment to the new system and chroot into it:

# mount --rbind /dev  /mnt/dev
# mount --rbind /proc /mnt/proc
# mount --rbind /sys  /mnt/sys
# chroot /mnt /bin/bash --login

Note: This is using –rbind, not —bind.

Step 4: Re-initialise EFI partitions on all root pool components

4.1 Check the wildcard gets the correct root pool partitions:

# for i in /dev/disk/by-id/*ata*part3; do echo $i; done

4.2 Add an entry for /boot/efi for each disk to /etc/fstab for failover purposes in future:

# for i in /dev/disk/by-id/*ata*part3; \
      do mkdosfs -F 32 -n EFI ${i}; \
      echo PARTUUID=$(blkid -s PARTUUID -o value \
      ${i}) /boot/efi vfat defaults 0 1 >> /etc/fstab; done

4.3 mount the first disk

# mount /dev/disk/by-id/scsi-SATA_disk1-part3 /boot/efi

4.4 install grub

# grub-install --target=x86_64-efi --efi-directory=/boot/efi \
      --bootloader-id=ubuntu --recheck —no-floppy

4.5 unmount the first partition

# umount /boot/efi

4.6 mount the second disk

# mount /dev/disk/by-id/scsi-SATA_disk2-part3 /boot/efi

4.7 install grub

# grub-install --target=x86_64-efi --efi-directory=/boot/efi \
      --bootloader-id=ubuntu --recheck —no-floppy

4.8 repeat steps 4.5 to 4.7 for each additional disk 4.9 For added insurance, do an MBR installation to each disk too

# grub-install /dev/disk/by-id/scsi-SATA_disk1
# grub-install /dev/disk/by-id/scsi-SATA_disk2

Step 5: Reboot

5.1 Quit from the chroot

# exit

5.2 Reboot

# reboot

Using DRBD block level replication on Windows

WDRBD or Windows DRBD

DRBD is a well known distributed replicated storage system for Linux. Recently a company has ported DRBD kernel driver and userspace utilities on Windows, so it’s now possible to setup DRBD resources on a Windows machine. DRBD is block level storage replication system  (similar to RAID-1) used on highly available storage setups. You can use both Desktop and Server Windows O/S, but it’s recommended  to use a Server version if this is intended for production use.

What you will need:
– 2 x Windows Server machines (Win2012 R2 in my case)
– DRBD binaries from here
– A dedicated volume (disk) to be replicated by DRBD. You can also use a NTFS volume, with existing data. You can use this method to replicate for example an existing Windows file server on a second Windows server. However, in this case you will need to resize (shrink) server’s partition in order to create a second, small partition needed for DRBD meta-data.
– Optionally a dedicated network for DRBD replication.

Configuration:

You must follow these steps on both nodes.

– Setup both Windows machines with static IP addresses. In my case I will use 10.10.10.1 for node1 and 10.10.10.2 for node2. Also provide a meaningful hostname on each server since you will need this during DRBD configuration. In my case node1: wdrbd1 and node2: wdrbd2 .
– Install DRBD binaries by double clicking on setup file and following the wizard. Finally reboot both servers.
– Navigate to “Program Files\drbd\etc” and “Program Files\drbd\etc\drbd.d”  folder and rename (or create a copy) the following files:

drbd.conf.sample –> drbd.conf
   global_common.conf.sample –> global_common.conf

(Note: For this test we do not need to modify the content of the above files. However it may be needed to do so in different scenarios.)

– Create a resource config file in “Program Files\drbd\etc\drbd.d”

r0.res (you can copy the contents from the existing sample config file)

A simple resource config file should look like this:

resource r0 {
      on wdrbd1 {
            device          e   minor 2;
            disk            e;
            meta-disk       f;
            address      10.10.10.1:7789;
      }

      on wdrbd2 {
              device        e   minor 2;
              disk          e;
              meta-disk     f;
              address       10.10.10.2:7789;
    }
}

“minor 2” means volume index number. (c: volume is minor 0, d: volume is minor 1, and e: is minor 2).

– Partition the hard drive for DRBD use. In my case I have a dedicated 40GB disk to be used for DRBD replication. I will use Disk Management to partition/format the hard drive.
I will need 2 partitions, 1st partition will be the data partition(device e above) and 2nd partition will be the meta-data partition(device f above). So let’s create the partition 1 and format it in NTFS.The size of this partition (e) in my case will be 39.68GB. The rest of free space will be dedicated for the meta-data partition (f), 200MB. To calculate the size of the meta-data properly please use the following link from Linbit DRBD documentation site.
The disk layout should look like this:
Please note that the data partition (E:) has a filesystem, NTFS,  but the meta-data partition (F:) does not, so it must be a RAW partition.

– Once finished with the above on both nodes, open a command prompt (as an Administrator)  and use the following commands to prepare DRBD:

  •  drbdadm create-md r0    (on each nodes)
Initial Sync
  • drbdadm up r0   (on node1)
  • drbdadm up r0   (on node2)
  • drbdadm status r0  (on node1)

You should see something like the following:

C:\Users\wdrbd>drbdadm status r0
  r0 role:Secondary
    disk:Inconsistent
    wdrbd2 role:Secondary
        peer-disk:Inconsistent

Initiate a full sync on node1:

  • drbdadm primary –force r0

After the sync is completed you should get the following:

C:\Users\wdrbd>drbdadm status r0
  r0 role:Primary
    disk:UpToDate
    wdrbd2 role:Secondary
          peer-disk:UpToDate

The disk state on both nodes should be in UpToDate state. As you can see the primary node in this case it’s node1. This means that node1 is the only node which can access the E: drive to read/write data into it. Remember that NTFS is not a clustered file system, meaning that it cannot be opened for read/write access concurrently on both nodes. Our DRBD configuration in our scenario prevents dual Primary mode in order to avoid corruption of the file system.

Switching the roles:

If you want to make node2 the Primary and node1 the Secondary, you can do so by doing the following (make sure there are no any active read/write sessions on node1 since DRBD will have to force close them):

  • On node1: drbdadm secondary r0
  • On node2: drbdadm primary r0

After issuing the above commands, you should be able to access the E: drive on node2 this time. Feel free to experiment and don’t forget to report any bugs to the project’s github web page!

DRBD9 | How to setup a basic three node cluster

DRBD9 | How to setup a basic three node cluster

This guide assumes that you already have DRBD9 kernel module, drbd-utils and drbdmanage installed on the servers.
In this setup, we will need 3 identical (as much as possible) servers with a dedicated network and storage backend for the DRBD replication.

HostA: drbd-host1
ip addr: 10.1.1.3
netmask: 255.255.255.0

HostB: drbd-host2
ip addr: 10.1.1.4
netmask: 255.255.255.0

HostC: drbd-host3
ip addr: 10.1.1.5
netmask: 255.255.255.0

* make sure you setup /etc/hosts on each node to reflect the ip addresses of the other nodes. This is a requirement for DRBD.

* all hosts must be able to connect to each other via ssh without password. To do so execute the following commands on each node:

ssh-keygen # Follow the wizard, make sure you don’t set a passphrase!
ssh-copy-id <node name> # where <node name> is the hostname of the opposite node, e.g if you are on drbd-host1 then the opposite hosts should be drbd-host2 and drbd-host3. Do the same on all 3 nodes.

* make sure you can connect to each node password less:

drbd-host1:~# ssh drbd-host2 && ssh drbd-host3

Ok, now that you have access to each node without needing to enter password let’s configure drbd.

First, we must select the underlying storage that we will use for drbd. In this example each host has /dev/sdb volume which we dedicate for drbd, where /dev/sdb corresponds to an RAID10 disk array on each host.

Now let’s connect to the first host, drbd-host1 and create needed LVM VG. We should name this VG ‘drbdpool’ which is how drbd will recognise it and use it to allocate the space:

drbd-host1~# pvcreate /dev/sdb
drbd-host1~# vgcreate drbdpool /dev/sdb
drbd-host1~# lvcreate -L 4T –thinpool drbdthinpool drbdpool # We create a Thin provisioned LV inside drbdpool VG. It is necessary to call it drbdthinpool otherwise operations later will fail!

* repeat steps above to the rest of nodes (drbd-host2, drbd-host3).

Now will have to use drbdmanage utility to initiate the drbd cluster. One drbd-host1 execute the following:

drbdmanage init 10.1.1.3 # where 10.1.1.3 is the ip address on drbd-host1 which is dedicated for drbd replication (see at the top of this guide).

If successful then proceed adding the rest 2 nodes in the cluster (again from drbd-host1!)

drbdmanage add-node drbd-host2 10.1.1.4 # note that here you need to specify node’s hostname as well!. You should be able to auto complete each parameter (even the ip) by pressing the TAB button.

drbdmanage add-node drbd-host3 10.1.1.5

Now verify that everything is good:

drbdmanage list-nodes # all nodes should appear with OK status

Next, let’s create the first resource with a test volume within it:

drbdmanage add-resource res01

drbdmanage add-volume vol01 40G –deploy 3 # additionally here we specify on how many nodes the volume should reside. In this case we set 3 nodes.

Verify that the above are created successfully:

ls /dev/drbd* #Here you should see /dev/drbd0, /dev/drbd1 which belong to the control volumes that drbdmanage automatically creates during “drbdmanage init“. Additionally there should be /dev/drbd100 which corresponds to the vol01 volume we created above. You can handle this as a usual block device, e.g fdisk it then create a fs with mkfs and finally mount and write data to it. All writes will be automatically replicated to the rest of nodes.

drbdmanage list-volumes

drbdmanage list-resources

drbd-overview # Shows the current status of the cluster. Please take a note of the node which is elected to be in Primary state. That is the only node which can mount the newly created drbd volume!

drbdadm status

Proxmox VE users: If you setup the DRBD9 cluster on PVE nodes make sure you add the following entries in /etc/pve/storage.cfg , Note that you don’t need to create volumes manually like we did previously. Proxmox storage plugin will create those automatically per each VM you create :

drbd: drbd9-stor1  # where drbd9-stor1 can be any arbitrary label to identify the storage
content images
redundancy 3   # This is the volume redundancy level, in this case it’s 3.

Ok, so if everything went right you should now have a working 3 node DRBD9 cluster. Of course you will need to spend a lot of time to familiarise yourself with drbd utils command line and drbd in general. Have fun!

Proxmox cluster | Reverse proxy with noVNC and SPICE support

I have a 3 node proxmox cluster in production and I was trying to find a way to centralize the webgui management.

Currently the only way to access proxmox cluster web interface is by connecting to each cluster node individually, e.g https://pve1:8006 , https://pve2:8006 etc from your web browser.

The disadvantage of this is that you have either to bookmark every single node on your web browser, or type the url manually each time.

Obviously this can become pretty annoying, especially as you are adding more nodes into the cluster.

Below I will show how I managed to access any of my PVE cluster nodes web interface by using a single dns/host name (e.g https://pve in my case).

Note that you don’t even need to type the default proxmox port (8006) after the hostname since Nginx will listen to default https port (443) and forward the request to the backend proxmox cluster nodes on port 8006.

My first target was the web management console and secondly it was making noVNC and SPICE work too. The last seemed to be more tricky.

We will use Nginx to handle Proxmox web and noVNC console traffic (port 8006) and HAProxy to handle SPICE traffic (port 3128).

Note The configuration below has been tested with the following software versions:

  • Debian GNU/Linux 8.6 (jessie)
  • nginx version: nginx/1.6.2
  • HA-Proxy version 1.5.8
  • proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)

What you will need

1. A basic Linux vm. My preference for this tutorial was Debian Jessie.

2. Nginx + HAProxy for doing the magic.

3. OpenSSL packages to generate the self signed certificates.

4. Obviously a working proxmox cluster.

5. Since this will be a critical vm, It would be a good idea to configure it as a HA virtual machine into your proxmox cluster.

The steps

– Download Debian Jessie net-install.

– Assign a static IP address and create the appropriate DNS record on your DNS server (if available, otherwise use just hostnames).
In my case, I created an A record named ‘pve‘ which is pointing to 10.1.1.10 . That means that when you manage to complete this guide your will be able to access all proxmox nodes by using https://pve (or https://pve.domain.local) on your browser! You will not even need to type the default port which is 8006.

– Update package repositories by entering ‘apt-get update’

– Install Nginx and HAProxy:

apt-get install nginx && apt-get install haproxy

Nginx and OpenSSL setup

– Assuming that you are logged in as root, create backup copy of the default config file.

cp /etc/nginx/sites-enabled/default /root

– Remove /etc/nginx/sites-enabled/default:

rm /etc/nginx/sites-enabled/default

– Download OpenSSL packages:

apt-get install openssl

– Generate a private key (select a temp password when prompted):

openssl genrsa -des3 -out server.key 1024

– Generate a csr file (select the same temp password if prompted):

openssl req -new server.key -out server.csr

– Remove the password from the key:

openssl rsa -in server.key -out server_new.key

– Remove old private key and rename the new one:

rm server.key && mv server_new.key server.key

– Make sure only root has access to private key:

chown root server.key && chmod 600 server.key

– Generate a certificate:

openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

– Create a directory called ssl in /etc/nginx folder and copy server.key and server.crt files:

mkdir /etc/nginx/ssl && cp server.key /etc/nginx/ssl && cp server.crt /etc/nginx/ssl

– Create an empty file:

vi /etc/nginx/sites-enabled/proxmox-gui

– Paste the code below and save the file. Make sure that you change the ip addresses to match your proxmox nodes ip addresses:

Edit (11-11-2017)

upstream proxmox {
ip_hash;    #added ip hash algorithm for session persistency
server 10.1.1.2:8006;
server 10.1.1.3:8006;
server 10.1.1.4:8006;
}
server {
listen 80 default_server;
rewrite ^(.*) https://$host$1 permanent;
}
server {
listen 443;
server_name _;
ssl on;
ssl_certificate /etc/nginx/ssl/server.crt;
ssl_certificate_key /etc/nginx/ssl/server.key;
proxy_redirect off;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $http_host;
location / {
proxy_pass https://proxmox;
}
}

– Create a symlink for /etc/nginx/sites-enabled/proxmox-gui in /etc/nginx/sites-available:

ln -s /etc/nginx/sites-enabled/proxmox-gui /etc/nginx/sites-available

– Verify that the symlink has been created and it’s working:

ls -ltr /etc/nginx/sites-available && cat /etc/nginx/sites-available/proxmox-gui (You should see the above contents after this)

– That’s it! You can now start Nginx service:

systemctl start nginx.service && systemctl status nginx.service (Verify that it is active (running).

HAProxy Setup

– Create a backup copy of the default config file.

cp /etc/haproxy/haproxy.cfg /root

– Create an empty /etc/haproxy/haproxy.cfg file (or remove it’s contents):

vi /etc/haproxy/haproxy.cfg

– Paste the following code and save the file. Again make sure that you change the ip addresses to match your proxmox hosts. Also note that the hostnames must also match your pve hostnames, e.g pve1, pve2, pve3

global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
user haproxy
group haproxy
daemon
stats socket /var/run/haproxy/haproxy.sock mode 0644 uid 0 gid 107
defaults
log global
mode tcp
option tcplog
option dontlognull
retries 3
option redispatch
maxconn 2000
timeout connect 5000
timeout client 50000
timeout server 50000
listen proxmox_spice *:3128
mode tcp
option tcpka
balance roundrobin
server pve1 10.1.1.2:3128 weight 1
server pve2 10.1.1.3:3128 weight 1
server pve3 10.1.1.4:3128 weight 1

– Note that the above configuration has been tested on HA-Proxy version 1.5.8.
If the Nginx service fails to start please troubleshoot by running:

haproxy -f /etc/haproxy/haproxy.cfg ...and check for errors.

– Start HAProxy service:

systemctl start haproxy.service && systemctl status haproxy.service (Must show active and running)

Testing our setup…

Open a web browser and enter https://pve . You should be able to access PVE webgui. (remember in my case I have assigned ‘pve’ as hostname to the Debian VM and I have also created a similar entry on my DNS server. That means that your client machine must be able to resolve the above address properly otherwise it will fail to load proxmox webgui).

You can now also test noVNC console and SPICE. Please note that you may need to refresh noVNC window in order to see the vm screen.

UPDATE: You can seamesly add SSH to the proxied ports if you wish to ssh in any of pve host.

Just add the lines below to your /etc/haproxy/haproxy.cfg file. Note that I’m using port 222 instead of 22 in order to prevent conflicting ports with the actual Debian vm which already listens on port tcp 22.


listen proxmox_ssh *:222
mode tcp
option tcpka
balance roundrobin
server pve1 10.1.1.2:22 weight 1
server pve2 10.1.1.3:22 weight 1
server pve3 10.1.1.4:22 weight 1

Now if you try to connect from your machine as root@pve at port 222 (ssh root@pve -p 222), the first time you will be asked to save the ECDSA key of the host to your .ssh/known_hosts file and then you will login to the first proxmox node e.g pve1.
If you attempt to connect for a second time your request will be rejected since HAProxy will forward your request to the second proxmox node e.g pve2 which happens to have a different fingerprint from the first. This is good of course for security reasons but in this case we will need to disable the check for the proxied host, otherwise we will not be able to connect to it.

– On your client machine, modify /etc/ssh/ssh_config file (not sshd_config !).

– Remove the following entry:

Host *

– Add the following at the end of the file:

Host pve
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
ServerAliveInterval 5

This will prevent the security ECDSA key checks ONLY for host pve and enable them from ALL other hostnames. So in short it’s quite restrictive setting.ServerAliveInterval is used in order to keep the ssh session alive during periods of inactivity.I’ve noticed that without setting that parameter to ssh client, it will drop the session quite often.