Small Scale Ceph Replicated Storage

Small Scale Ceph Replicated Storage

Category : How-to

Get Social!

I’ve written a few posts about Ceph, how it works and how it’s set up and it mostly revolves around large scale storage for storing things like virtual machines. This post will focus on using Ceph  provide fault tolerant storage for a small amount of data in a low resource environment. Because of this, the main focus has been moved away from performance and switched to:

  • availability – the storage should always be available and recoverable in the event of disaster
  • portability – the storage isn’t tied to a machine and can be moved with relative ease.
  • scalability – more machines can use the storage as required.

This tutorial will focus on a small scale Ceph setup, fit for something like a Raspberry Pi or low resource VPS. We’ll use 3 machines but you could easily add more machines if your scenario requires it.

If you are looking for a larger setup, then see this blog post on installing Ceph.

ceph-local

The above diagram shows the topology of the layout. Each machine will have a file /ceph-file that will be mounted as a block device on /dev/loop0 and that’s the space that will be assigned to Ceph. Ceph will replicate any data stored to the file and ensure the data is available to all Ceph clients. The Ceph storage will be accessed from a mountpoint at /mnt/ha-pool.

Ceph block device

The first step in creating a Ceph storage pool is to set aside some storage that can be used by Ceph. Ceph stores everything twice, by default, so whatever storage you provision will be halved. For this example we’re going to use a file created with dd as the Ceph storage device, however you could use a drive mounted in /dev/ if you have one. A whole drive is by far the preferred solution, however as I’ve stated, the main goal of this post isn’t just performance.

If you’re going to use a file for storage, follow my post on creating a block device from a file and mount it on loop0. Otherwise you can continue to the next step.

OpenVZ: if you’re using Ceph inside of an OpenVZ container, make sure you pass the loop device through to the container.

Installing Ceph

At this point it’s worth noting that Ceph, in addition to the application requirements, will use approximately 1MB of RAM for each GB of storage provisioned. This means that 1TB of provisioned storage (which in today’s world is rather small) would take 1GB of RAM plus the requirements of running the Ceph daemons. For our low memory footprint, only provision the storage that you’ll need.

Before starting the install, you’ll need a couple of things in place:

  • SSH Keys are set up between all nodes in your cluster – see this post for information on how to set up SSH Keys. For security it’s good practice to set up a new user on all machines you’re going to install Ceph onto and use it to run Ceph. The key should also be copied to all machines using the ssh-copy-id command.
  • NTP is set up on all nodes in your cluster to keep the time in sync. You can install it with: apt-get install ntp

The following commands are for installing Ceph on Debian (wheezy) and should be executed on all machines that need to run Ceph. In our example, these commands will be executed on Server 1Server 2 and Server 3.

First let’s add the release key and repositories to the apt package manager. Run the following as root:

wget --no-check-certificate -q -O- 'https://git.ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add -
echo deb http://download.ceph.com/debian-firefly/ $(lsb_release -sc) main | tee /etc/apt/sources.list.d/ceph.list

Next let’s update our apt cache and install Ceph and a few other bits.

apt-get update && apt-get install ceph-deploy ceph ceph-common

Setup and configuring for minimal resource requirements

The next step should be done on just one of your Ceph machines. This will create the monitor service and make each machine aware of the other machines running Ceph.

The command references each machine you’re going to be running Ceph on by hostname or DNS entry. Before running the command, make sure that all of your machines resolve via DNS or hosts file. Because I’m only running this in a lab, I’ve used the hosts file route and added an entry to each machine in the hosts file of all Ceph machines.

vi /etc/hosts

Add your Ceph machine IP and hostnames.

10.10.10.1 ceph1
10.10.10.2 ceph2
10.10.10.3 ceph3

You can test that each machine can see the others by using the ping command. If it works then you should be in business!

ping ceph2
ping ceph3

Once you’re happy that all machines can reference the other machines then run the ceph-deploy command:

ceph-deploy new ceph1 ceph2 ceph3

If you haven’t used your ssh keys since setting them up you may be presented with the following warning. Just type yes to continue.

The authenticity of host 'ceph1 (10.10.10.1)' can't be established.
ECDSA key fingerprint is 66:44:a8:90:e2:8e:12:0e:05:4a:c4:93:a1:43:d1:fd.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ceph1' (ECDSA) to the list of known hosts.

We now need to configure Ceph with our low resource settings. These settings are not performance driven, but instead set to minimise system resources.

See ceph.conf for the script and add the content to the ceph.conf file

vi ~/ceph.conf

Create the initial mds daemons, monitor daemons and set the proper permissions on the keyring file.

ceph-deploy mon create-initial
ceph-deploy admin ceph1 ceph2 ceph3
ceph-deploy mds create ceph1 ceph2 ceph3

ssh ceph1 "chmod 644 /etc/ceph/ceph.client.admin.keyring"
ssh ceph2 "chmod 644 /etc/ceph/ceph.client.admin.keyring"
ssh ceph3 "chmod 644 /etc/ceph/ceph.client.admin.keyring"

Test Ceph is deployed and monitors are running

At this point it’s good to take a step back and check everything is up and running. We’ve still not assigned any storage to our Ceph cluster so we can’t run it yet, but we should have the monitor daemons running and the cluster configuration be deployed on all servers.

Run the below command and take a look at the output.

ceph -s

The output should show

cluster 51e1ddff-ff28-4f58-af7e-e94448e5324b
   health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
   monmap e1: 3 mons at {ceph1=10.10.10.1:6789/0,ceph2=10.10.10.2:6789/0,ceph3=10.10.10.3:6789/0}, election epoch 6, quorum 0,1,2 ceph1,ceph2,ceph3
   osdmap e1: 0 osds: 0 up, 0 in
    pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
   mdsmap e8: 1/1/1 up {0=web1=up:active}, 2 up:standby

As you can see, three Ceph servers are referenced on port 6789 which is the monitor daemon port number.

Add storage to the Ceph cluster

We’ve got our Ceph cluster, and we’ve got our storage device that we created as the first step, it’s time to put the two together. Run the below commands on the same machine that you ran the above steps on. You’ll need to replace /dev/sda with the block device on each ceph machines that you’d  like to use. Note that the block device (sda) does not need to be the same on all machines.

ceph-deploy osd create --fs-type ext4 ceph1:/dev/sda
ceph-deploy osd create --fs-type ext4 ceph2:/dev/sda
ceph-deploy osd create --fs-type ext4 ceph3:/dev/sda

Or…

You can use a directory as storage for Ceph, rather than a block device.

If you’re following this tutorial and creating a loop device to use with Ceph then you’ll need to ensure there is a filesystem on the loop0 device and that it’s mounted. You can skip these next step if you are just using an existing directory.

Run the below commands (if you’re using a loop device) on each of the machines that has a loop device you’d like to use. We’re assuming that you’re loop device is loop0. For this example we’ll run it on each of the three machines; ceph1, ceph2 and ceph3.

mkfs.ext4 /dev/loop0
mkdir /mnt/ceph-backing0
echo "/dev/loop0 /mnt/ceph-backing0 ext4 defaults 1 1" >> /etc/fstab
mount /mnt/ceph-backing0

You can use a directory path on the Ceph machine as the OSD device. This may be an option if you’re in an OpenVZ or Docker container that doesn’t allow you to pass through block devices.

ceph-deploy osd prepare ceph1:/mnt/ceph-backing0
ceph-deploy osd prepare ceph2:/mnt/ceph-backing0
ceph-deploy osd prepare ceph3:/mnt/ceph-backing0

And then activate the storage:

ceph-deploy osd activate ceph1:/mnt/ceph-backing0
ceph-deploy osd activate ceph2:/mnt/ceph-backing0
ceph-deploy osd activate ceph3:/mnt/ceph-backing0

Mount a Ceph device as a folder

That’s the server side done! The last step to using our Ceph storage cluster is to mount the cluster to a mountpoint on the local filesystem. Here we’re going to use /mnt/ha-pool as the mount point but you can change that to whatever you’d like. Run these commands on any machines that you’d like to mount the Ceph volume on.

First create the mount point where the Ceph storage will be accessible from.

mkdir /mnt/ha-pool

Then we need to export the key so that the ceph-client can authenticate with the Ceph daemon. You could turn authentication off, or even create a non-admin user secret but for this tutorial we’ll just use the admin user.

ceph-authtool --name client.admin /etc/ceph/ceph.client.admin.keyring --print-key >> /etc/ceph/admin.secret

Then run the below command to add an entry to your fstab file so that the Ceph volume will be automatically mounted on machine start. This will mount the Ceph volume at /mnt/ha-pool.

echo "ceph1,ceph2,ceph3:/ /mnt/ha-pool/ ceph name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2" >> /etc/fstab

Finally run the mount command

mount /mnt/ha-pool

One last check to make sure you’re up and running:

df -h | grep ha-pool
10.10.10.1,10.10.10.2,10.10.10.3:/                    6G   3G   3G  54% /mnt/ha-pool

And that’s it! You have a working Ceph cluster up and running!


Ceph Storage on Proxmox

Get Social!

ceph-logoCeph is an open source storage platform which is designed for modern storage needs. Ceph is scalable to the exabyte level and designed to have no single points of failure making it ideal for applications which require highly available flexible storage.

Since Proxmox 3.2, Ceph is now supported as both a client and server, the client is for back end storage for VMs and the server for configuring storage devices. This means that a Ceph storage cluster can now be administered through the Proxmox web GUI and therefore can be centrally managed from a single location. In addition, as Proxmox now manages the Ceph server the config can be stored in Proxmox’ shared file system meaning that the configuration is immediately replicated throughout the entire cluster.

The below diagram shows the layout of an example Proxmox cluster with Ceph storage.

  • 2 nodes are used dedicated to running VMs and use the Ceph storage hosted by the other nodes.
  • Two networks are used, one for management and application traffic and one for Ceph traffic only. This helps to maintain sufficient bandwidth for storage requirements without affecting the applications which are hosted by the VMs.

ceph-infrastructure-proxmox

Before getting started with setting up the Ceph cluster, we need to do some preparation work. Make sure the following prerequisites are met before continuing the tutorial.

  • You have Proxmox cluster with the latest packages from the pvetest repository. You must have at least three nodes in your cluster. See How to set up a cluster.
  • SSH Keys are set up between all nodes in your cluster – Proxmox does this automatically as part of clustering but if you are using a different user, you may need to set them up manually.
  • NTP is set up on all nodes in your cluster to keep the time in sync. You can install it with: apt-get install ntp

promox-ceph-3-nodesThe rest of this tutorial will assume that you have three nodes which are all clustered into a single Proxmox cluster. I will refer to three host names which are all resolvable via my LAN DNS server; prox1, prox2 and prox3 which are all on the jamescoyle.net domain. The image to the left is what is displayed in the Proxmox web GUI and details all three nodes in a single Proxmox cluster. Each of these nodes has two disks configured; one which Proxmox is installed onto and provides a small ‘local’ storage device which is displayed in the image to the left and one which is going to be used for the Ceph storage. The below output shows the storage available, which is exactly the same on each host. /dev/vda is the root partition containing the Proxmox install and /dev/vdb is an untouched partition which will be used for Ceph.

root@prox1:~# fdisk -l | grep /dev/vd
Disk /dev/vdb doesn't contain a valid partition table
Disk /dev/mapper/pve-root doesn't contain a valid partition table
Disk /dev/mapper/pve-swap doesn't contain a valid partition table
Disk /dev/mapper/pve-data doesn't contain a valid partition table
Disk /dev/vda: 21.5 GB, 21474836480 bytes
/dev/vda1   *        2048     1048575      523264   83  Linux
/dev/vda2         1048576    41943039    20447232   8e  Linux LVM
Disk /dev/vdb: 107.4 GB, 107374182400 bytes

Now that I’ve set the scene, lets start to put together our Ceph cluster! Before using the new Proxmox web GUI you must run a few SSH commands to set up the initial Ceph instance.

Run the below command on all of the nodes which you will use as a Ceph server. This will download and set up the latest Ceph packages.

pveceph install

Create the Ceph config file by initialising pveceph. The Ceph config file will be created in your /etc/pve/ directory called ceph.conf. You should only run this on one node.

pveceph init --network 192.168.50.0/24

The next step is to set up the Ceph monitors for your cluster. So that you don’t have a single point of failure, you will need at least 3 monitors. You must also have an uneven number of monitors – 3, 5, 7, etc.

pveceph createmon

The rest of the configuration can be completed with the Proxmox web GUI. Log in to your Proxmox web GUI and click on one of your Proxmox nodes on the left hand side, then click the Ceph tab.

proxmox-ceph-status-tab

Next, you will add a disk to the Ceph cluster. Each disk creates to as an OSD in Ceph which is a storage object used later by the Ceph storage pool. Click on the Disks tab at the bottom of the screen and choose the disk you would like to add to the Ceph cluster. Click the Create: OSD button and click Create to create an OSD. Repeat these steps for each Proxmox server which will be used to host storage for the Ceph cluster.

ceph-create-osd

If the Create: OSD button is greyed out, it’s because the disk is not in a state where Ceph can use it. It’s likely because you have partitions on your disk. Run the fdisk command on the disk and press d to delete the partitions and w to save the changes. For example:

fdisk /dev/vdb

The last step in creating the Ceph storage cluster is to create a storage pool. Click the Pools tab and click Create. Enter the below values into the new Create Pool dialogue box:

  • Name: name to use for the storage pool.
  • Size: the number of replicas to use for a working cluster. A replica is the number of times the data is stored across nodes.
  • Min. Size: the minimum of replicas which can be used.
  • Crush RuleSet:
  • pg_num: this is the placement group count which you have to calculate based on the number os OSDs you have. To calculate your placement group count, multiply the amount of OSDs you have by 100 and divide it by the number of number of times each part of data is stored. The default is to store each part of data twice which means that if a disk fails, you won’t loose the data because it’s stored twice.For our example,3 OSDs * 100 = 300
    Divided by replicas, 300 / 2 = 150

ceph-create-pool

The Ceph storage pool is now set up and available to use for your KVM images. You can check the status of the Ceph storage pool by clicking on the Status tab.

ceph-status-screen

See my blog post on mounting Ceph storage on Proxmox.


Create a 3 Node Ceph Storage Cluster

Category : How-to

Get Social!

ceph-logoCeph is an open source storage platform which is designed for modern storage needs. Ceph is scalable to the exabyte level and designed to have no single points of failure making it ideal for applications which require highly available flexible storage.

The below diagram shows the layout of an example 3 node cluster with Ceph storage. Two network interfaces can be used to increase bandwidth and redundancy. This can help to maintain sufficient bandwidth for storage requirements without affecting client applications.

ceph-infrastructure

This example will create a 3 node Ceph cluster with no single point of failure to provide highly redundant storage. I will refer to three host names which are all resolvable via my LAN DNS server; ceph1ceph2 and ceph3 which are all on the jamescoyle.net domain. Each of these nodes has two disks configured; one which runs the Linux OS and one which is going to be used for the Ceph storage. The below output shows the storage available, which is exactly the same on each host. /dev/vda is the root partition containing the OS install and /dev/vdb is an untouched partition which will be used for Ceph.

root@ceph1:~# fdisk -l | grep /dev/vd
Disk /dev/vdb doesn't contain a valid partition table
Disk /dev/mapper/pve-root doesn't contain a valid partition table
Disk /dev/mapper/pve-swap doesn't contain a valid partition table
Disk /dev/mapper/pve-data doesn't contain a valid partition table
Disk /dev/vda: 21.5 GB, 21474836480 bytes
/dev/vda1   *        2048     1048575      523264   83  Linux
/dev/vda2         1048576    41943039    20447232   8e  Linux LVM
Disk /dev/vdb: 107.4 GB, 107374182400 bytes

Before getting started with setting up the Ceph cluster, you need to do some preparation work. Make sure the following prerequisites are met before continuing the tutorial.

You will need to perform the following steps on all nodes in your Ceph cluster. First you will add the Ceph repositories and download the key to make the latest Ceph packages available. Add the following line to a new /etc/apt/sources.list.d/ file.

vi /etc/apt/sources.list.d/ceph.list

Add the below entry, save and close the file.

deb http://ceph.com/debian-dumpling/ wheezy main

Download the key from Ceph’s git page and install it.

wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add -

Update all local repository cache.

sudo apt-get update

Note: if you see the below code when running apt-get update then the above wget command has failed – it could be because the Ceph git URL has changed.

W: GPG error: http://ceph.com wheezy Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 7EBFDD5D17ED316D

Run the following commands on just one of your Ceph nodes. I’ll use ceph1 for this example. Update your local package cache and install the ceph-deploy command.

apt-get install ceph-deploy -y

Create the first Ceph storage cluster and specify the first server in your cluster by either hostname or IP address for [SERVER].

ceph-deploy new [SERVER]

For example

ceph-deploy new ceph1.jamescoyle.net

Now deploy Ceph onto all nodes which will be used for the Ceph storage.  Replace the below [SERVER] tags with the host name or IP address of your Ceph cluster including the host you are running the command on. See this post if you get a key error here.

ceph-deploy install [SERVER1] [SERVER2] [SERVER3] [SERVER...]

For example

ceph-deploy install ceph1.jamescoyle.net ceph2.jamescoyle.net ceph3.jamescoyle.net

Install the Ceph monitor and accept the key warning as keys are generated. So that you don’t have a single point of failure, you will need at least 3 monitors. You must also have an uneven number of monitors – 3, 5, 7, etc. Again, you will need to replace the [SERVER] tags with your server names or IP addresses.

ceph-deploy mon create [SERVER1] [SERVER2] [SERVER3] [SERVER...]

Example

ceph-deploy mon create ceph1.jamescoyle.net ceph2.jamescoyle.net ceph3.jamescoyle.net

Now gather the keys of the deployed installation, just on your primary server.

ceph-deploy gatherkeys [SERVER1]

Example

ceph-deploy gatherkeys ceph1.jamescoyle.net

It’s now time to start adding storage to the Ceph cluster. The fdisk output at the top of this page shows that the disk I’m going to use for Ceph is /dev/vdb, which is the same for all the nodes in my cluster. Using Ceph terminology, we will create an OSD based on each disk in the cluster. We could have used a file system location instead of a whole disk but, for this example, we will use a whole disk. Use the below command, changing [SERVER] to the name of the Ceph server which houses the disk and [DISK] to the disk representation in /dev/.

ceph-deploy osd --zap-disk create [SERVER]:[DISK]

For example

ceph-deploy osd --zap-disk create ceph1.jamescoyle.net:vdb

If the command fails, it’s likely because you have partitions on your disk. Run the fdisk command on the disk and press d to delete the partitions and w to save the changes. For example:

fdisk /dev/vdb

Run the osd command for all nodes in your Ceph cluster

ceph-deploy osd --zap-disk create ceph2.jamescoyle.net:vdb
ceph-deploy osd --zap-disk create ceph3.jamescoyle.net:vdb

We now have to calculate the number of placement groups (PG) for our storage pool. A storage pool is a collection of OSDs, 3 in our case, which should each contain around 100 placement groups. Each placement group will hold your client data and map it to an OSD whilst providing redundancy to protect against disk failure.

To calculate your placement group count, multiply the amount of OSDs you have by 100 and divide it by the number of number of times each part of data is stored. The default is to store each part of data twice which means that if a disk fails, you won’t loose the data because it’s stored twice.

For our example,

3 OSDs * 100 = 300
Divided by replicas, 300 / 2 = 150

Now lets create the storage pool! Use the below command and substitute [NAME] with the name to give this storage pool and [PG] with the number we just calculated.

ceph osd pool create [NAME] [PG]

For example

ceph osd pool create datastore 150
pool 'datastore' created

You have now completed the set up for the Ceph storage pool. See my blog post on mounting Ceph storage on Proxmox.


Visit our advertisers

Quick Poll

How many Proxmox servers do you work with?

Visit our advertisers