Synchronise a GlusterFS volume to a remote site using geo replication

Synchronise a GlusterFS volume to a remote site using geo replication

Get Social!

gluster-orange-antGlusterFS can be used to synchronise a directory to a remote server on a local network for data redundancy or load balancing to provide a highly scalable and available file system.

The problem is when the storage you would like to replicate to is on a remote network, possibly in a different location, GlusterFS does not work very well. This is because GlusterFS is not designed to work when there is a high latency between replication nodes.

GlusterFS provides a feature called geo replication to perform batch based replication of a local volume to a remote machine over SSH.

The below example will use three servers:

  • gfs1.jamescoyle.net is one of the two running GlusterFS volume servers.
  • gfs2.jamescoyle.net is the second of the two running GlusterFS volume servers. gfs1 and gfs2 both server a single GlusterFS replicated volume called datastore.
  • remote.jamescoyle.net is the remote file server which the GlusterFS volume will be replicated to.

GlusterFS uses an SSH connection to the remote host using SSH keys instead of passwords. We’ll need to create an SSH key using ssh-keygen to use for our connection. Run the below command and press return when asked to enter the passphrase to create a key without a passphrase. 

ssh-keygen -f /var/lib/glusterd/geo-replication/secret.pem

The output will look like the below:

Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/glusterd/geo-replication/secret.pem.
Your public key has been saved in/var/lib/glusterd/geo-replication/secret.pem.pub.
The key fingerprint is:
46:ba:02:fd:2f:9c:b9:39:ec:6c:90:50:d8:ec:7b:00 root@gfs1
The key's randomart image is:
+--[ RSA 2048]----+
|   +             |
|  E +            |
|   +    .        |
|  ..o  o         |
|  ...+. S        |
|   .+..o         |
|    .=oo         |
|     oOo         |
|     o=+.        |
+-----------------+

Now you need to copy the public certificate to your remote server in the authorized_keys file. The remote user must be a super user (currently a limitation of GlusterFS) which is root in the below example. If you have multiple GlusterFS volumes in a cluster then you will need to copy the key to all GlusterFS servers.

cat /var/lib/glusterd/geo-replication/secret.pem.pub | ssh [email protected] "cat >> ~/.ssh/authorized_keys"

Make sure the remote server has glusterfs-server installed. Run the below command to install glusterfs-server on remote.jamescoyle.net. You may need to use yum instead of apt-get for Red Hat versions of Linux.

apt-get install glusterfs-server

Create a folder on remote.jamescoyle.net which will be used for the remote replication. All data which transferrs to this machine will be stored in this folder.

mkdir /gluster
mkdir /gluster/geo-replication

Create the geo-replication volume with Gluster and replace the below values with your own:

  • [SOURCE_DATASTORE] – is the local Gluster data volume which will be replicated to the remote server.
  • [REMOTE_SERVER] – is the remote server to receive all the replication data.
  • [REMOATE_PATH] – is the path on the remote server to store the files.
gluster volume geo-replication [SOURCE_DATASTORE] [REMOTE_SERVER]:[REMOTE_PATH] start

Example:

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ start

Starting geo-replication session between datastore & remote.jamescoyle.net:/gluster/geo-replication/ has been successful

Sometimes on the remote machine, gsyncd (part of the GlusterFS package) may be installed in a different location to the local GlusterFS nodes.

Your log file may show a message similar to below:

Popen: ssh> bash: /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd: No such file or directory

In this scenario you can specify the config command the remote gsyncd location.

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication config remote-gsyncd /usr/lib/glusterfs/glusterfs/gsyncd

You will then need to run the start command to start the volume synchronisation.

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ start

You can view the status of your replication task by running the status command.

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ status

You can stop your volume replication at any time by running the stop command.

gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ stop


3 Comments

Rameez

10-May-2014 at 10:25 am

Hello James.
This is really useful. I am trying to implement GFS spread across two datacenters. We will be having two or more servers in each datacenter. In the explanation you have treated only one server in the remote location. How can we avoid a single point of failure with two servers in each location with geo replication.

Thanks in advance.

Rameez

    DaveQB

    21-Oct-2015 at 1:40 am

    I’d say create a standard replication pair in the remote location and then set one of them up as a geo-replication server from the original location, like so:

    Site A
    Server1 \
    \________ replicated
    /
    / Server2 /
    Geo rep/
    \ Site B
    \Server3 \
    \________ replicated
    /
    Server4 /

Krishna Verma

23-Aug-2018 at 8:15 pm

Hi,

I have setup geo-replication as per your instructions but after all done status is showing faulty.

[root@gluster-poc-noida ~]# gluster volume geo-replication status

MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED
———————————————————————————————————————————————————–
gluster-poc-noida glusterep /data/gluster/gv0 root ssh://gluster-poc-sj::glusterep N/A Faulty N/A N/A
noi-poc-gluster glusterep /data/gluster/gv0 root ssh://gluster-poc-sj::glusterep N/A Faulty N/A N/A
[root@gluster-poc-noida ~]#

i logs it says

[2018-08-23 18:43:14.989361] W [gsyncd(config-get):293:main] : Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glusterep_gluster-poc-sj_glusterep/gsyncd.conf

[2018-08-23 18:51:31.473921] E [syncdutils(worker /data/gluster/gv0):303:log_raise_exception] : connection to peer is broken
[2018-08-23 18:51:31.475351] E [syncdutils(worker /data/gluster/gv0):749:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-p_jGDC/439fec5c92816a77fe8f02cb94c17fec.sock gluster-poc-sj /nonexistent/gsyncd slave glusterep gluster-poc-sj::glusterep –master-node gluster-poc-noida –master-node-id 098c16c6-8dff-490a-a2e8-c8cb328fcbb3 –master-brick /data/gluster/gv0 –local-node gluster-poc-sj –local-node-id 45e1ba30-19fd-4379-860b-601eef2ef249 –slave-timeout 120 –slave-log-level INFO –slave-gluster-log-level INFO –slave-gluster-command-dir /usr/local/sbin/ error=1
[2018-08-23 18:51:31.475692] E [syncdutils(worker /data/gluster/gv0):753:logerr] Popen: ssh> failure: execution of “/usr/local/sbin/glusterfs” failed with ENOENT (No such file or directory)

where i am wrong?

Leave a Reply

Visit our advertisers

Quick Poll

Do you use ZFS on Linux?

Visit our advertisers