Synchronise a GlusterFS volume to a remote site using geo replication

Synchronise a GlusterFS volume to a remote site using geo replication

Get Social!

gluster-orange-antGlusterFS can be used to synchronise a directory to a remote server on a local network for data redundancy or load balancing to provide a highly scalable and available file system.

The problem is when the storage you would like to replicate to is on a remote network, possibly in a different location, GlusterFS does not work very well. This is because GlusterFS is not designed to work when there is a high latency between replication nodes.

GlusterFS provides a feature called geo replication to perform batch based replication of a local volume to a remote machine over SSH.

The below example will use three servers:

  • gfs1.jamescoyle.net is one of the two running GlusterFS volume servers.
  • gfs2.jamescoyle.net is the second of the two running GlusterFS volume servers. gfs1 and gfs2 both server a single GlusterFS replicated volume called datastore.
  • remote.jamescoyle.net is the remote file server which the GlusterFS volume will be replicated to.

GlusterFS uses an SSH connection to the remote host using SSH keys instead of passwords. We’ll need to create an SSH key using ssh-keygen to use for our connection. Run the below command and press return when asked to enter the passphrase to create a key without a passphrase. 

ssh-keygen -f /var/lib/glusterd/geo-replication/secret.pem

The output will look like the below:

Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/glusterd/geo-replication/secret.pem.
Your public key has been saved in/var/lib/glusterd/geo-replication/secret.pem.pub.
The key fingerprint is:
46:ba:02:fd:2f:9c:b9:39:ec:6c:90:50:d8:ec:7b:00 root@gfs1
The key's randomart image is:
+--[ RSA 2048]----+
|   +             |
|  E +            |
|   +    .        |
|  ..o  o         |
|  ...+. S        |
|   .+..o         |
|    .=oo         |
|     oOo         |
|     o=+.        |
+-----------------+

Now you need to copy the public certificate to your remote server in the authorized_keys file. The remote user must be a super user (currently a limitation of GlusterFS) which is root in the below example. If you have multiple GlusterFS volumes in a cluster then you will need to copy the key to all GlusterFS servers.

cat /var/lib/glusterd/geo-replication/secret.pem.pub | ssh [email protected] "cat >> ~/.ssh/authorized_keys"

Make sure the remote server has glusterfs-server installed. Run the below command to install glusterfs-server on remote.jamescoyle.net. You may need to use yum instead of apt-get for Red Hat versions of Linux.

apt-get install glusterfs-server

Create a folder on remote.jamescoyle.net which will be used for the remote replication. All data which transferrs to this machine will be stored in this folder.

mkdir /gluster
mkdir /gluster/geo-replication

Create the geo-replication volume with Gluster and replace the below values with your own:

  • [SOURCE_DATASTORE] – is the local Gluster data volume which will be replicated to the remote server.
  • [REMOTE_SERVER] – is the remote server to receive all the replication data.
  • [REMOATE_PATH] – is the path on the remote server to store the files.
gluster volume geo-replication [SOURCE_DATASTORE] [REMOTE_SERVER]:[REMOTE_PATH] start

Example:

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ start

Starting geo-replication session between datastore & remote.jamescoyle.net:/gluster/geo-replication/ has been successful

Sometimes on the remote machine, gsyncd (part of the GlusterFS package) may be installed in a different location to the local GlusterFS nodes.

Your log file may show a message similar to below:

Popen: ssh> bash: /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd: No such file or directory

In this scenario you can specify the config command the remote gsyncd location.

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication config remote-gsyncd /usr/lib/glusterfs/glusterfs/gsyncd

You will then need to run the start command to start the volume synchronisation.

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ start

You can view the status of your replication task by running the status command.

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ status

You can stop your volume replication at any time by running the stop command.

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ stop

Visit our advertisers

Quick Poll

How often do you change the password for the computer(s) you use?

Visit our advertisers