Ceph Storage on Proxmox

Ceph Storage on Proxmox

Get Social!

ceph-logoCeph is an open source storage platform which is designed for modern storage needs. Ceph is scalable to the exabyte level and designed to have no single points of failure making it ideal for applications which require highly available flexible storage.

Since Proxmox 3.2, Ceph is now supported as both a client and server, the client is for back end storage for VMs and the server for configuring storage devices. This means that a Ceph storage cluster can now be administered through the Proxmox web GUI and therefore can be centrally managed from a single location. In addition, as Proxmox now manages the Ceph server the config can be stored in Proxmox’ shared file system meaning that the configuration is immediately replicated throughout the entire cluster.

The below diagram shows the layout of an example Proxmox cluster with Ceph storage.

  • 2 nodes are used dedicated to running VMs and use the Ceph storage hosted by the other nodes.
  • Two networks are used, one for management and application traffic and one for Ceph traffic only. This helps to maintain sufficient bandwidth for storage requirements without affecting the applications which are hosted by the VMs.

ceph-infrastructure-proxmox

Before getting started with setting up the Ceph cluster, we need to do some preparation work. Make sure the following prerequisites are met before continuing the tutorial.

  • You have Proxmox cluster with the latest packages from the pvetest repository. You must have at least three nodes in your cluster. See How to set up a cluster.
  • SSH Keys are set up between all nodes in your cluster – Proxmox does this automatically as part of clustering but if you are using a different user, you may need to set them up manually.
  • NTP is set up on all nodes in your cluster to keep the time in sync. You can install it with: apt-get install ntp

promox-ceph-3-nodesThe rest of this tutorial will assume that you have three nodes which are all clustered into a single Proxmox cluster. I will refer to three host names which are all resolvable via my LAN DNS server; prox1, prox2 and prox3 which are all on the jamescoyle.net domain. The image to the left is what is displayed in the Proxmox web GUI and details all three nodes in a single Proxmox cluster. Each of these nodes has two disks configured; one which Proxmox is installed onto and provides a small ‘local’ storage device which is displayed in the image to the left and one which is going to be used for the Ceph storage. The below output shows the storage available, which is exactly the same on each host. /dev/vda is the root partition containing the Proxmox install and /dev/vdb is an untouched partition which will be used for Ceph.

root@prox1:~# fdisk -l | grep /dev/vd
Disk /dev/vdb doesn't contain a valid partition table
Disk /dev/mapper/pve-root doesn't contain a valid partition table
Disk /dev/mapper/pve-swap doesn't contain a valid partition table
Disk /dev/mapper/pve-data doesn't contain a valid partition table
Disk /dev/vda: 21.5 GB, 21474836480 bytes
/dev/vda1   *        2048     1048575      523264   83  Linux
/dev/vda2         1048576    41943039    20447232   8e  Linux LVM
Disk /dev/vdb: 107.4 GB, 107374182400 bytes

Now that I’ve set the scene, lets start to put together our Ceph cluster! Before using the new Proxmox web GUI you must run a few SSH commands to set up the initial Ceph instance.

Run the below command on all of the nodes which you will use as a Ceph server. This will download and set up the latest Ceph packages.

pveceph install

Create the Ceph config file by initialising pveceph. The Ceph config file will be created in your /etc/pve/ directory called ceph.conf. You should only run this on one node.

pveceph init --network 192.168.50.0/24

The next step is to set up the Ceph monitors for your cluster. So that you don’t have a single point of failure, you will need at least 3 monitors. You must also have an uneven number of monitors – 3, 5, 7, etc.

pveceph createmon

The rest of the configuration can be completed with the Proxmox web GUI. Log in to your Proxmox web GUI and click on one of your Proxmox nodes on the left hand side, then click the Ceph tab.

proxmox-ceph-status-tab

Next, you will add a disk to the Ceph cluster. Each disk creates to as an OSD in Ceph which is a storage object used later by the Ceph storage pool. Click on the Disks tab at the bottom of the screen and choose the disk you would like to add to the Ceph cluster. Click the Create: OSD button and click Create to create an OSD. Repeat these steps for each Proxmox server which will be used to host storage for the Ceph cluster.

ceph-create-osd

If the Create: OSD button is greyed out, it’s because the disk is not in a state where Ceph can use it. It’s likely because you have partitions on your disk. Run the fdisk command on the disk and press d to delete the partitions and w to save the changes. For example:

fdisk /dev/vdb

The last step in creating the Ceph storage cluster is to create a storage pool. Click the Pools tab and click Create. Enter the below values into the new Create Pool dialogue box:

  • Name: name to use for the storage pool.
  • Size: the number of replicas to use for a working cluster. A replica is the number of times the data is stored across nodes.
  • Min. Size: the minimum of replicas which can be used.
  • Crush RuleSet:
  • pg_num: this is the placement group count which you have to calculate based on the number os OSDs you have. To calculate your placement group count, multiply the amount of OSDs you have by 100 and divide it by the number of number of times each part of data is stored. The default is to store each part of data twice which means that if a disk fails, you won’t loose the data because it’s stored twice.For our example,3 OSDs * 100 = 300
    Divided by replicas, 300 / 2 = 150

ceph-create-pool

The Ceph storage pool is now set up and available to use for your KVM images. You can check the status of the Ceph storage pool by clicking on the Status tab.

ceph-status-screen

See my blog post on mounting Ceph storage on Proxmox.


14 Comments

mike

4-May-2014 at 9:03 pm

Hi

I followed your article upto the disk creation stage. i have run fdisk and deleted the partitions, however i am still getting:

create OSD on /dev/sdb (xfs)
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
The operation has completed successfully.
INFO:ceph-disk:Will colocate journal with data on /dev/sdb
Information: Moved requested sector from 34 to 2048 in
order to align on 2048-sector boundaries.
Could not create partition 2 from 34 to 10485760
Unable to set partition 2’s name to ‘ceph journal’!
Could not change partition 2’s type code to 45b0969e-9b03-4f30-b4c6-b4b80ceff106!
Error encountered; not saving changes.
ceph-disk: Error: Command ‘[‘sgdisk’, ‘–new=2:0:5120M’, ‘–change-name=2:ceph journal’, ‘–partition-guid=2:9cc3668b-95f9-4614-90ee-bbcef97c72d1’, ‘–typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106’, ‘–‘, ‘/dev/sdb’]’ returned non-zero exit status 4
TASK ERROR: command ‘ceph-disk prepare –zap-disk –fs-type xfs –cluster ceph –cluster-uuid 71f3a3d5-3c25-4fbe-a9ea-af52f92ac492 /dev/sdb’ failed: exit code 1

Roel Peeters

6-Aug-2014 at 3:16 pm

Hello James,

thanks for your article about installing ceph , but in the graph you let us see how the configuration needs to finally done , so i have some questions :

do you need to build first a normal pve cluster over 5 machines and then make a ceph cluster over 3 machines ?
So you can use the other 2 machines for running vm’s and the other 3 for just storage ?

of the configuration needs to be done by first adding 2 machines in a pve cluster and then the other 3 machines in a pve cluster and a ceph cluster and after that add this to the other 2 machines ?

I hope you understand my questions and sorry for the bad english

Thanks for your help.

Kind Regards Roel

    james.coyle

    6-Aug-2014 at 4:38 pm

    Hi Roel,

    You start out by adding all 5 nodes into a single Proxmox cluster – you can then set up Ceph on the nodes you require and share the storage between all nodes on the cluster.

    Hope that helps.

      Roel Peeters

      11-Aug-2014 at 7:56 am

      Hello James,

      Thanks for your quick reply did is all what i need to know , i’m gonna test proxmox on my HP servers thanks for your help.

      Have a nice day.

Andy

29-Oct-2014 at 3:48 pm

Hi James,

Great article. The diagram show Ceph on a separate network. How do you get Ceph to communicate on a separate network from the management network? Thank you for your help.

Regards,

Andrew

    james.coyle

    29-Oct-2014 at 9:04 pm

    Hi Andy,

    There is no silver bullet here, it really depends on your network and setup.

    The general principle is to have more than one physical nic on the box, and to have your management traffic go through one nic, and your storage/ heavy traffic through another. In simple terms, you’d communicate with Ceph on one IP, and use other services on another IP.

    Again, it’s tough to give any detail without knowledge of your setup or what you’re trying to do.

Tony

25-Dec-2014 at 6:44 pm

Hi,
Thanks for this article!

I’ve got a few questions if possible –

From the diagram it appears that you have two data centers setup, one for ceph one for vm- is this correct or are you using one giant datacenter with all 5 but only 3 for ceph?

The three nodes that are ceph – are they multi functional in terms of being ods, admin and management?

Continuing on Andys question – let’s say each prox node has 3 nics for simplicity – 192.168.1.0, 192.168.2.0 and 192.168.3.0

1.0 will be the storage net, 2.0 for management and 3.0 for the vms. How would one force ceph configured through prox to use 1.0 for the storage net?

Last question – is it possible to use this type of a config in an ha setup (but with 1 more vm prox node)?

Thanks again!

Ladis

25-Aug-2016 at 1:19 am

Hi
It’s really nice article. Is it possible create similar for Proxmox 4.2 ?
New version create local and local-lvm storage and I don’t know how to create ceph
Can you help ??

Daniele Corsini

13-Jan-2017 at 7:47 am

For Ceph backup see https://github.com/EnterpriseVE/eve4pve-barc

Kevin Gish

27-Feb-2017 at 5:07 am

Are you aware of anyone using rbd-mirror with two or more proxmox/ceph clusters?

    james.coyle

    28-Feb-2017 at 6:34 pm

    no, I’ve not seen that.

Sameer Mohammed

12-Jun-2017 at 9:48 pm

Hi James,

I am trying to install Ceph. I followed the article properly on proxmox wiki and I am all good until I see the error message when I click on Ceph\content on webui. The error is “rbd error: rbd: couldn’t connect to the cluster! (500)”. Please help!

roger

19-Jun-2019 at 7:01 am

it could be configured in a Lan to Lan ? of 100 Mb? regards..

Tom Zajac

16-Sep-2019 at 11:36 pm

Hi,

Interesting tutorial, however I have trouble to understand concept of calculating pg_num for my cluster.
I will have 7 nodes and each node will have 8 ssd disks.
Do I calculate pg_num for one node and repeat that process on every node, or I should calculate this pg_num for all nodes (as they say here: https://ceph.com/pgcalc ) and tpe the pg_num on ll of them.?
For example, in the first scenario pg_num will be (8*100)\4= 200 (where 4 is the size of replicas, and 8 is the number of SSD drives in one node) so my pg_num will = 200, however in the second scenario, the count will be like this: (56*100)/4=1400… so the difference is significant..
could you please help to clarify that?
Thank you!

Leave a Reply

Visit our advertisers

Quick Poll

Do you use ZFS on Linux?

Visit our advertisers