How to build a two node Proxmox cluster with native ZFS (ZFS on Linux) and DRBD support.

In this post I’m going to show you how to build a reliable, two node cluster with Proxmox, ZFS and DRBD.Please note that all these technologies will be hosted on just two computer nodes.No further requirements needed.

If you don’t understand the above terms, here is a quick explanation:
-Proxmox is a high quality type I hypervisor comparable to Vmware Esx, Xen, Hyper-V etc. It is based on Debian Linux. (

-ZFS is a combined file system and logical volume manager designed by Sun Microsystems. As we use Linux we will need ZFS on Linux which is the port of ZFS for the Linux platform.(,(

DRBD  is a distributed replicated storage system for the Linux platform. Imagine it  like Linux raid over network instead of hard drives.

Which are the benefits of above?
Proxmox is free, so there are no limitations on it’s capabilities. Optionally, if you need support, you can buy one their subscription plans to support the project.
-With ZFS we can leverage from its advanced capabilities: RAID like disk management, volume management, snapshots, compression, protection against data corruption, optionally deduplication and so on.ZFS must have direct access to hard drives to function properly. So, do not put your drives on raid configuration before exposing them to ZFS. If you already have a raid controller, then select JBOD mode to expose the disks.
-We are going to use DRBD to replicate ZFS ZVOLs (will see this later) between two Proxmox hosts. ZVOLs will be the backend storage for our virtual machines, so if we need options like live migration we need to replicate them somehow in real time.ZFS natively does not have such option, so we will use DRBD for that.

Ok, now that we have an idea of the above excellent technologies we can proceed further to our plan. What we will need?

Note: You can also simulate the scenario below with two virtual machines instead of physical machines.Get Vmware Workstation and create two vms.You will install Proxmox on both of them.Just remember to enable “Virtualize Intel VT-x/EPT or AMD-V/RVI” on vm properties -> processors to be able to power on virtual machines inside proxmox.This is called Nested Virtualization.
Two computers (does not have to be high end computers or servers) with a CPU with Virtualization Extensions capabilities enabled, both on CPU and BIOS.Of course if you plan to have many virtual machines hosted on both nodes consider buying a Xeon or I7 class CPU.
Minimum 8GB of RAM or more. Preferably 16GB+, depending also on the number of Virtual Machines that you are planning to create. Optionally  you must have ECC RAM since we are going to use ZFS, which by design is very much RAM sensitive. With ECC RAM you can eliminate the possibility of data corruption. If you plan to use this setup for production use then definitely should use ECC RAM. Make sure that the motherboard you will choose can support ECC RAM.

Minimum two SATA hard disk drives for ZFS pool.This ZFS pool as I mentioned before will be the storage backend for the virtual machines, so the number of hard disks depends on your virtual machine space needs.Except of the size of the datastore you will also need speed for your virtual machines.So the faster are the disks are the better VMs will perform.You can improve ZFS performance by combining SSD disks in your ZFS configuration (L2ARC,ZIL).Check ZFS documentation for more about performance tuning.

-One small hard disk, preferably SSD for Proxmox installation.This will be the host operating system.

Two Gigabit network cards.One card will be used for communication of virtual machines with Local Area Network (LAN), for management network (proxmox management) and finally for Proxmox cluster communication.The other nic will be dedicated for DRBD storage replication between nodes.For the second nic we won’t need any network switch, just a straight cat5e or cat6 cabled between the nics.
Of course these are the absolute minimums.If you need better performance or failover you should consider 4 network cards bonded on two pairs either by using Linux bonding or LACP if your switch supports it.

Ok, now that we have everything connected and ready we can proceed to Proxmox installation as the first step.
-Download a copy of the latest Proxmox ISO (currently version 3.3)  from their official website and burn it on a cd-rom:

-Put the cd-rom on one of two proxmox nodes,make the appropriate boot order modifications on BIOS and start the installation.Installation is very easy so we won’t cover it here.Refer to Proxmox website if you have problems.Just note that installer will wipe everything on the hard disks.Also some people note that you must have just your operating system hard drive connected when you do the installation (not the data disks) otherwise installation fails.An important thing to note on installation is, the name and IP address  that you will give to your Proxmox node.This will be very important later when we will configure our cluster so put something that makes sense for example proxmox1 or pve1 or something similar.The ip address should be in the same range as you LAN for example in my case. Note also that you will need some kind of DNS server/forwarder to create a FQDN name for your proxmox nodes e.g domain.local . In my case this is achieved by a pfsense router/firewall DNS forwarder service.

-Now that you have installed successfully Proxmox on node A let’s do the same on Node B. Just note to select a different hostname/ip address this time, like proxmox2 or pve2.IP address , in my case.

Note: If for some reason you cannot install Proxmox on your machine, there is another option.You can install Debian wheezy minimal instead, and after that you can put the Proxmox repos to install Proxmox.

-Now we have two working Proxmox nodes which can work independly. That’s not what we need though. We need to cluster them somehow. Let’s proceed to cluster configuration.

-From a client computer ssh to proxmox nodeA. In my case it is and it’s hostname is pve1.

Edit the /etc/apt/sources.list.d/pve-enterprise.list file and change it as follows.This is required only if you do not have a subscription plan.

 nano /etc/apt/sources.list.d/pve-enterprise.list
 #deb wheezy pve-enterprise

-Note the “#” before deb https…. Save the file and do the same on nodeB.

-edit /etc/apt/sources.list and add proxmox repository.

 nano /etc/apt/sources.list
 deb wheezy pve-no-subscription

-Edit /etc/hosts file and add this line.This should be done so each node can resolve other nodes via hostname.On nodeA:

nano /etc/hosts pve2.domain.local pve2

-On nodeB:

nano /etc/hosts pve1.domain.local pve1

-Upgrade nodeA and nodeB.

 apt-get update && apt-get upgrade && apt-get dist-upgrade

Reboot both nodes.

-SSH on nodeA
-Create the cluster:

 pvecm create pvecluster

where “pvecluster” is the name of the cluster.
-Let’s see the cluster status:

 root@pve1:~# pvecm status
Version: 6.2.0
Config Version: 1
Cluster Name: pvecluster
Cluster Id: 48308
Cluster Member: Yes
Cluster Generation: 4
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1
Active subsystems: 5
Ports Bound: 0
Node name: pve1
Node ID: 1
Multicast addresses:
Node addresses:

-As we see there is Node: 1 which means only 1 node is currently member of the cluster.We need to join the other node also to the cluster.

-Login the the Proxmox nodeB this time  via ssh to add it in the cluster:

 pvecm add

where is the ip of the first node.
Answer “yes” and put the root password of the first node.

-Again let’s see our cluster status.Now we should have Nodes:2

 root@pve2:~# pvecm status
Version: 6.2.0
Config Version: 2
Cluster Name: pvecluster
Cluster Id: 48308
Cluster Member: Yes
Cluster Generation: 8
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Total votes: 2
Node votes: 1
Quorum: 2
Active subsystems: 5
Ports Bound: 0
Node name: pve2
Node ID: 2
Multicast addresses:
Node addresses:
 root@pve2:~# pvecm nodes
Node  Sts   Inc   Joined               Name

1   M      8   2014-11-23 15:36:54  pve1
2   M      8   2014-11-23 15:36:54  pve2

-Ok, so far so good. Now let’s make a backup of current pve cluster configuration file before modifying it:

 root@pve2:~# cp /etc/pve/cluster.conf /etc/pve/cluster.conf.bak

Note: The contents of /etc/pve directory are located in a clustered filesystem which means that is mounted on both pve nodes at the same time.So you can make modifications on files located in this folder on any node and the changes will be propagated to the other nodes of the cluster automatically.

-Now again copy the /etc/pve/cluster.conf to /etc/pve/ to make the required changes for two node cluster:

 cp /etc/pve/cluster.conf /etc/pve/
 nano /etc/pve/

-Modify the contents so it will look as follows.Remember to increment the number of config_version=”2” each time that you make modifications to the cluster file.

<?xml version="1.0"?>
<cluster config_version="3" name="pvecluster">
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
<clusternode name="pve1" nodeid="1" votes="1">
<clusternode name="pve2" nodeid="2" votes="1">

-The important parameter here is two_node=”1″ . This tells the cluster that we have just two nodes in the cluster.Remember that the recommendation is to have at least 3 nodes in a cluster, otherwise you will have quorum issues in case that one of two nodes fails.
-Save the contents of the file and exit nano.

Note: Please note that in this guide I will not talk about high availability of virtual machines.What that means is that  if for example one of the two nodes of the cluster fails, the virtual machines that run on that node will not be automatically started on the surviving node.You have to do this manually.HA with just two cluster nodes is a dangerous thing and should be avoided.But, if you want to experiment you can follow this guide:
You will need a third node though to do that, not necessarily a good machine, just a simple pc will to the job.This is called a qdisk (quorum disk) as referred on the above howto.You will also need a proper fencing device.

-Now that you saved the contents of /etc/pve/ you have to activate them.
To do that you must login to the web interface of your PVE.To do that open your favorite browser and point to or .Proxmox can be managed from any node, it doesn’t matter which one you will select, as now both are members of the cluster.Use username:root and the password that you selected for that during proxmox installation.

-You should see now the management console of PVE.It will look like this:

-Go to Datacenter (upper left) -> HA (upper right) and click on “Activate” to apply the changes you made on file.If you don’t receive any error message then everything went ok.If not then please check again the syntax of /etc/pve/ file , save it , and try to activate it again from web management console as said above.

-Now that our cluster is ready we can proceed to the installation of ZFS (ZoL) on both nodes. (
Note that currently ZoL installation directly on Proxmox node is not an officially supported configuration by Proxmox team.Use it at your own risk.

Login to nodeA via ssh and do the following:
#download pve kernel headers

 aptitude install pve-headers-$(uname -r)

#Get ZoL for Debian and install it.Press “Yes” to any prompts.Note that it will take some time to build the ZFS modules.

 dpkg -i zfsonlinux_1~wheezy_all.deb
 apt-get update && apt-get install debian-zfs

#Check if the following files exist

 ls -l /lib/modules/$(uname -r)/updates/dkms/
 -rw-r--r-- 1 root root  343584 Nov 23 17:32 splat.ko
-rw-r--r-- 1 root root  312512 Nov 23 17:32 spl.ko
-rw-r--r-- 1 root root   13456 Nov 23 17:35 zavl.ko
-rw-r--r-- 1 root root   73264 Nov 23 17:35 zcommon.ko
-rw-r--r-- 1 root root 2024832 Nov 23 17:35 zfs.ko
-rw-r--r-- 1 root root  132208 Nov 23 17:35 znvpair.ko
-rw-r--r-- 1 root root   35088 Nov 23 17:35 zpios.ko
-rw-r--r-- 1 root root  330432 Nov 23 17:35 zunicode.ko

#Edit /etc/default/zfs and make sure the following option exists.

nano /etc/default/zfs

-Repeat the above steps on nodeB.

Note: Please note that after each kernel upgrade you should run the commands:

aptitude install pve-headers-$(uname -r)
aptitude reinstall spl-dkms zfs-dkms

in order to recompile zfs modules for the new kernel.This very important!If you miss this step, then on next reboot your ZFS pool will not be mounted on the system so DRBD will also fail.Remember to run these command before restarting your server, after a kernel upgrade.

-For ZFS fine tuning refer to this link. (

DRBD configuration and installation.

#edit /etc/network/interfaces to configure the second network interface which will be dedicated for DRBD replication.
On nodeA do the following:

 nano /etc/network/interfaces

# add the following lines at the bottom

auto eth1
iface eth1 inet static

-Where is the ip address of the interface on nodeA for DRBD replication.
-Repeat the above on nodeB and change the ip address to . Save the file on both nodes.
-On both nodes do:

 service networking restart
 ifconfig | more

-Notice that eth1 should have now as ip address on nodeA and on nodeB.
-On both nodes again do the following to install drbd admin tools:

 apt-get install drbd8-utils

-Now we are finished with installation of ZFS and DRBD on both proxmox nodes.We can now proceed to configuration of them.First we must prepare our vm datastore by creating the ZFS pool.I have two hard drives on each node for this purpose.Let’s prepare them.

-ZFS Pool Preparation

On nodeA do:

 fdisk -l|grep /dev/sd

#Find there your datastore disks.In my case they are /dev/sdb and /dev/sdc

Disk /dev/sdb: 128.8 GB, 128849018880 bytes
Disk /dev/sdc: 128.8 GB, 128849018880 bytes

Note: At this point you should note that it is recommended to use /dev/disk/by-id/<disk drive unique name> as disk member of ZFS pool instead of /dev/sdX naming.This is because Linux sometimes changes the names of hard disks(for example if you add more disks to your server).If that happens ZFS will not be able to assemble the pool or it would need your assistance to locate them.By using /dev/disk/by-id/ naming you will be confident that ZFS always will locate the drives to assemble the pool.In my example I cannot do this since I am using virtual machines to simulate the scenario so I will use /dev/sdX naming.
#Create the ZFS pool

 zpool create -o ashift=12 zfspool mirror /dev/sdb /dev/sdc –f

-Note that you should use option ashift=12 if your hard drives are Advanced Formatted Driver aka 4K sector size instead of 512byte.Most of modern disks are 4k formatted disks.In my case I created a mirror pool.This is similar to raid-1.If you have more than 2 disks for ZFS pool consider selecting raid-z1 or raid-z2 or mirrored stripe setup.Refer to ZFS documentation for more about these.

#check the status of the pool

zpool status
pool: zfspool
state: ONLINE
scan: none requested
zfspool     ONLINE       0     0     0
mirror-0  ONLINE       0     0     0
sdb     ONLINE       0     0     0
sdc     ONLINE       0     0     0
errors: No known data errors

-Our pool looks OK so repeat the above steps on nodeB and create a pool there too with the same options.

-Ok, now we have one ZFS pool on each proxmox node ready.What we should do now is to create a ZVOL on each node’s pool.This ZVOL will be the backend storage for our first virtual machine.

-On nodeA do:

 zfs create -V 40G zfspool/zvol0-drbd0-win7

-What I did with this command is: I create a ZVOL(block device) with a size of 40Gbytes and named that zvol0-drbd0-win7.You can give whatever name you want, but for simplicity and management purposes it is better to give a name which would be meaningful.You will thank yourself later when you have to create more virtual machines.

#Now let’s enable also compression on this ZVOL

 zfs set compression=lz4 zfspool/zvol0-drbd0-win7

#Check ZVOL status

zfs list
zfspool                   41.3G  75.9G   136K  /zfspool
zfspool/zvol0-drbd0-win7  41.3G   117G    72K  -

-Now repeat all above steps (create pool,create zvol) on nodeB

-At this point we have created our ZFS pool and our ZVOL, which will host our virtual machine.Now we need to configure DRBD to replicate the ZVOL between two proxmox nodes.This is essential for live migration to work.

-On nodeA do the following.

#Backup existing DRBD global configuration file.

mv /etc/drbd.d/global_common.conf /etc/drbd.d/global_common.conf.bak

#Create a new global configuration file for DRBD.

nano /etc/drbd.d/global_common.conf

#Put the following code,save and exit:

global { usage-count no; }

common {

syncer { rate 40M; verify-alg md5; }

handlers { out-of-sync "/usr/lib/drbd/ root"; }


#Create a new DRBD resource file.This is will be the config file for our first resource file which will be responsible for the vm replication.

nano /etc/drbd.d/r0.res

#put the following code

resource r0 {

protocol C;

startup {

wfc-timeout  25;     
degr-wfc-timeout 60;
become-primary-on both;


handlers {
split-brain "/usr/lib/drbd/ root";


net {

cram-hmac-alg sha1;
shared-secret "my-secret";
after-sb-0pri discard-least-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;


on pve1 {

device /dev/drbd0;
disk /dev/zfspool/zvol0-drbd0-win7;
meta-disk internal;

on pve2 {

device /dev/drbd0;
disk /dev/zfspool/zvol0-drbd0-win7;
meta-disk internal;


disk {

# no-disk-barrier and no-disk-flushes should be applied only to sys#tems with non-volatile (battery backed) controller caches.

# Follow links for more information:





-Save the file and exit.This would be our first DRBD resource file which will be responsible for replication of zvol0-drbd0-win7 on both nodes.Parameter “disk /dev/zfspool/zvol0-drbd0-win7” refers to backend device for DRBD device /dev/drbd0.
Parameter <shared-secret “my-secret”> is needed to secure the connection between the nodes with a password.Put on “my-secret” the password you wish.
For each virtual machine that you will create you must create a separate ZVOL and a separate resource file on DRBD.Why not just one ZVOL and one DRBD resource file for all of them? Because, if we do that we could not rollback a vm individually in case of vm corruption,deletion or whatever.This was my choice, you can configure your setup as you wish.For further explanation on DRBD configuration files please refer to DRBD official documentation.Please note that currently proxmox includes DRBD version 8.3.13.
#Copy /etc/drbd.d/global_common.conf on NodeB

 scp /etc/drbd.d/global_common.conf pve2:/etc/drbd.d/

#Copy /etc/drbd.d/r0.res on nodeB

scp /etc/drbd.d/r0.res pve2:/etc/drbd.d/

#Start DRBD on both nodes and wait to complete

/etc/init.d/drbd start

#On both nodes initialize DRBD.You should see a success message.

drbdadm create-md r0

#On both nodes bring up DRBD resource.

drbdadm up r0

#Check DRBD resource status

 cat /proc/drbd

#It should be in Secondary/Secondary state.
#From nodeA do the following to promote the node as Primary:

 drbdadm -- --overwrite-data-of-peer primary r0

#Then watch the status as DRBD replicates the ZVOL to the other node.Wait until it’s finished.It will take some time.Replication speed depends on syncer { rate 40M } parameter in /etc/drbd.d/global_common.conf . Be careful to not increase it too much since it also depends on network speed between nodes.VM performance may be affected if you select a wrong value.See DRBD documentation for tuning DRBD.

 watch cat /proc/drbd

#Now that the replication is finished ssh on nodeB and give:

drbdadm primary all

#This should promote nodeB to Primary mode.Remember both nodes should be in Primary/Primary, Uptodate/Uptodate mode, otherwise the live migration will fail.


#Now we are almost done! Next step is to create LVM on top of DBRD.
#First we must edit
/etc/lvm/lvm.conf and edit the filter parameter.We will instruct #LVM to not scan for lvm volumes on zfspool.

nano /etc/lvm/lvm.conf

#Find the following line:

# By default we accept every block device:
       filter = [ "a/.*/" ]

and change it to:

 filter =  [ "r|/dev/zfspool/|", "a/.*/" ]

#Save and exit.Copy /etc/lvm/lvm.conf file to the second node.

scp /etc/lvm/lvm.conf pve2:/etc/lvm/lvm.conf

#Optionally restart LVM service on both nodes.

service lvm2 restart

#On NodeA – Create a PV on /dev/drbd0

pvcreate /dev/drbd0

#On NodeA – Create a VG on /dev/drbd0 .Again give a name that is meaningful.

 vgcreate drbd0-zvol0-win7vg /dev/drbd0

#That’s it! We are finished with command line configuration.Now it’s time to use Proxmox webgui to attach the previously create VG to it’s datastore and create our first VM.

-Create your first VM and test Live Migration.

-Login to https://<pve1_ip>:8006 with username:root and your password.

-Go to Datastore -> Storage  -> Add -> LVM

#Give as ID again something meaningful. Volume Group is what we created before. Enable this storage and don’t forget to mark it as <Shared>.

-ID: win7-drbd0-zvol0  Volume Group: drbd0-zvol0-win7vg   Enable:yes  Share:yes
-Ok, now is time to create our first vm on this VG.
-Go to webgui -> select the first node (pve1) -> on the upper right corner select “Create VM” -> Give it a name like “win7-drbd0-zvol0” -> Choose the Operating System -> Select your ISO image for O/S installation (you must have uploaded it in you ISO storage before) -> Choose Hard Disk Options (IDE,Storage= win7-drbd0-zvol0 ,Format=RAW,Disk Size=40G,Cache=DirectSync) -> CPU cores 1 , kvm64 -> Fixed memory=1024, LAN=bridged mode, vmbr0,Intel1000.
-Install your O/S for example Win7 in my case. When it’s finished you can install Virtio Drivers to achieve better disk and network  i/o performance. You can also install Spice display driver. But all these are optional for the moment.
-Now we have a working VM and we need to Live Migrate it from nodeA to NodeB.
-Click to select the running VM,Start the vm if it’s not already started,Right Click on VM,Select Migrate,Target node=pve2(nodeB in my case),Click the box=online,wait for Live Migration to complete.If it completes successfully you should see “Task OK” at the bottom and your VM should be moved without downtime from nodeA to nodeB.

Destroy VM and bring it back again!
-Ok now everything  seem to work as expected.Now let’s try to simulate a disaster.Let’s destroy our VM and bring it back.ZFS snapshots will help us to do this magic.
Stop your virtual machine.
ssh on proxmox node where virtual machine resides (pve1 in my case).
-First create a snapshot of ZVOL where vm resides.

 zfs snapshot zfspool/zvol0-drbd0-win7@snap1

-Verify that the snapshot is created.

zfs list –r –t snapshot

-It must show something like this.

 NAME                             USED  AVAIL  REFER  MOUNTPOINT

zfspool/zvol0-drbd0-win7@snap1     8K      -  14.4G  -

-Ok, our snapshot exists so we can proceed to vm destruction.

 dd if=/dev/zero of=/dev/zfspool/zvol0-drbd0-win7 bs=1M count=100

-We wrote some zeros to our ZVOL so now our VM is gone.

ACTIVE            '/dev/pve/swap' [2.50 GiB] inherit
ACTIVE            '/dev/pve/root' [5.00 GiB] inherit
ACTIVE            '/dev/pve/data' [9.50 GiB] inherit

-Only proxmox LVs exists.Our VM LV is gone.It’s time to roll it back.

drbdadm down r0

-> Bring down DRBD resource first.

 zfs rollback zfspool/zvol0-drbd0-win7@snap1

->rollback previously created snapshot.

drbdadm up r0

-> Bring up DRBD resource again.

drbdadm attach r0

-> Attach ZVOL to DRBD.


-> Show current DRBD status.

 0:r0  StandAlone Secondary/Unknown UpToDate/DUnknown r-----

->It must show this.

drbdadm primary r0

-> Make DRBD resource Primary.

 0:r0  StandAlone Primary/Unknown UpToDate/DUnknown r----- lvm-pv: drbd0-zvol0-win7vg 40.00g 30.00g

-> DRBD resource is Primary now.


-> Scan for LVM. VM LV should appear again.

inactive  ‘/dev/drbd0-zvol0-win7vg/vm-100-disk-1’ [30.00 GiB] inherit

-> There it is, but inactive.
Go to proxmox webgui and try to start the vm again.It should start now.
Now we must instruct DRBD to replicate again with the other node.But first we must wipe DRBD data on nodeB.
SSH on nodeB this time and give the following commands.

drbdadm down r0
 dd if=/dev/zero of=/dev/zfspool/zvol0-drbd0-win7 bs=1M count=100
drbdadm create-md r0

->Create meta-data and type Yes.

drbdadm up r0

-> Bring up resource.

service drbd restart

-> Optionally restart DRBD service.
Now it is  time to initiate DRBD replication from nodeA.
-SSH on nodeA.

drbdadm connect r0
watch cat /proc/drbd
 version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted, 2012-10-09 12:47:51
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:952548 nr:0 dw:60 dr:991032 al:2 bm:56 lo:160 pe:0 ua:160 ap:0 ep:1 wo:b oos:40989348
[>....................] sync'ed:  2.3% (40028/40956)M
finish: 1:42:34 speed: 6,656 (7,052) K/sec

-> Replication starts.Wait until it’s finished.When it’s finished you can live migrate vm again.

->Promote it to Primary.

 drbdadm primary r0


We saw how we can combine the benefits of ZFS and DRBD to make a stable cluster based on Proxmox.Rember though that ZFS snapshots are not backup.They can help us rollback our VMs in a previous state.You should make sure that you configure your Proxmox server to backup your VMs occasionally on a NFS target or wherever else you want.
Hope this article was useful to you.Cheers!