Ceph

    PLEASE NOTE: This document applies to v1.7 version and not to the latest stable release v1.8

    Documentation for other releases can be found by using the version selector in the top right of any doc page.

    CSI Common Issues

    Issues when provisioning volumes with the Ceph CSI driver can happen for many reasons such as:

    • Network connectivity between CSI pods and ceph
    • Cluster health issues
    • Slow operations
    • Kubernetes issues
    • Ceph-CSI configuration or bugs

    The following troubleshooting steps can help identify a number of issues.

    Block (RBD)

    If you are mounting block volumes (usually RWO), these are referred to as RBD volumes in Ceph. See the sections below for RBD if you are having block volume issues.

    Shared Filesystem (CephFS)

    If you are mounting shared filesystem volumes (usually RWX), these are referred to as CephFS volumes in Ceph. See the sections below for CephFS if you are having filesystem volume issues.

    Network Connectivity

    The Ceph monitors are the most critical component of the cluster to check first. Retrieve the mon endpoints from the services:

    kubectl -n rook-ceph get svc -l app=rook-ceph-mon
    
    NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
    rook-ceph-mon-a   ClusterIP   10.104.165.31   <none>        6789/TCP,3300/TCP   18h
    rook-ceph-mon-b   ClusterIP   10.97.244.93    <none>        6789/TCP,3300/TCP   21s
    rook-ceph-mon-c   ClusterIP   10.99.248.163   <none>        6789/TCP,3300/TCP   8s
    

    If host networking is enabled in the CephCluster CR, you will instead need to find the node IPs for the hosts where the mons are running.

    The clusterIP is the mon IP and 3300 is the port that will be used by Ceph-CSI to connect to the ceph cluster. These endpoints must be accessible by all clients in the cluster, including the CSI driver.

    If you are seeing issues provisioning the PVC then you need to check the network connectivity from the provisioner pods.

    • For CephFS PVCs, check network connectivity from the csi-cephfsplugin container of the csi-cephfsplugin-provisioner pods
    • For Block PVCs, check network connectivity from the csi-rbdplugin container of the csi-rbdplugin-provisioner pods

    For redundancy, there are two provisioner pods for each type. Make sure to test connectivity from all provisioner pods.

    Connect to the provisioner pods and verify the connection to the mon endpoints such as the following:

    # Connect to the csi-cephfsplugin container in the provisioner pod
    kubectl -n rook-ceph exec -ti deploy/csi-cephfsplugin-provisioner -c csi-cephfsplugin -- bash
    
    # Test the network connection to the mon endpoint
    curl 10.104.165.31:3300 2>/dev/null
    ceph v2
    

    If you see the response “ceph v2”, the connection succeeded. If there is no response then there is a network issue connecting to the ceph cluster.

    Check network connectivity for all monitor IP’s and ports which are passed to ceph-csi.

    Ceph Health

    Sometimes an unhealthy Ceph cluster can contribute to the issues in creating or mounting the PVC. Check that your Ceph cluster is healthy by connecting to the Toolbox and running the ceph commands:

    ceph health detail
    
    HEALTH_OK
    

    Slow Operations

    Even slow ops in the ceph cluster can contribute to the issues. In the toolbox, make sure that no slow ops are present and the ceph cluster is healthy

    ceph -s
    
    cluster:
     id:     ba41ac93-3b55-4f32-9e06-d3d8c6ff7334
     health: HEALTH_WARN
             30 slow ops, oldest one blocked for 10624 sec, mon.a has slow ops
    

    If Ceph is not healthy, check the following health for more clues:

    • The Ceph monitor logs for errors
    • The OSD logs for errors
    • Disk Health
    • Network Health

    Ceph Troubleshooting

    Check if the RBD Pool exists

    Make sure the pool you have specified in the storageclass.yaml exists in the ceph cluster.

    Suppose the pool name mentioned in the storageclass.yaml is replicapool. It can be verified to exist in the toolbox:

    ceph osd lspools
    
    1 device_health_metrics
    2 replicapool
    

    If the pool is not in the list, create the CephBlockPool CR for the pool if you have not already. If you have already created the pool, check the Rook operator log for errors creating the pool.

    Check if the Filesystem exists

    For the shared filesystem (CephFS), check that the filesystem and pools you have specified in the storageclass.yaml exist in the Ceph cluster.

    Suppose the fsName name mentioned in the storageclass.yaml is myfs. It can be verified in the toolbox:

    ceph fs ls
    
    name: myfs, metadata pool: myfs-metadata, data pools: [myfs-data0 ]
    

    Now verify the pool mentioned in the storageclass.yaml exists, such as the example myfs-data0.

    ceph osd lspools
    
    1 device_health_metrics
    2 replicapool
    3 myfs-metadata0
    4 myfs-data0
    

    The pool for the filesystem will have the suffix -data0 compared the filesystem name that is created by the CephFilesystem CR.

    subvolumegroups

    If the subvolumegroup is not specified in the ceph-csi configmap (where you have passed the ceph monitor information), Ceph-CSI creates the default subvolumegroup with the name csi. Verify that the subvolumegroup exists:

    ceph fs subvolumegroup ls myfs
    
    [
       {
           "name": "csi"
       }
    ]
    

    If you don’t see any issues with your Ceph cluster, the following sections will start debugging the issue from the CSI side.

    Provisioning Volumes

    At times the issue can also exist in the Ceph-CSI or the sidecar containers used in Ceph-CSI.

    Ceph-CSI has included number of sidecar containers in the provisioner pods such as: csi-attacher, csi-resizer, csi-provisioner, csi-cephfsplugin, csi-snapshotter, and liveness-prometheus.

    The CephFS provisioner core CSI driver container name is csi-cephfsplugin as one of the container names. For the RBD (Block) provisioner you will see csi-rbdplugin as the container name.

    Here is a summary of the sidecar containers:

    csi-provisioner

    The external-provisioner is a sidecar container that dynamically provisions volumes by calling ControllerCreateVolume() and ControllerDeleteVolume() functions of CSI drivers. More details about external-provisioner can be found here.

    If there is an issue with PVC Create or Delete, check the logs of the csi-provisioner sidecar container.

    kubectl -n rook-ceph logs deploy/csi-rbdplugin-provisioner -c csi-provisioner
    

    csi-resizer

    The CSI external-resizer is a sidecar container that watches the Kubernetes API server for PersistentVolumeClaim updates and triggers ControllerExpandVolume operations against a CSI endpoint if the user requested more storage on the PersistentVolumeClaim object. More details about external-provisioner can be found here.

    If any issue exists in PVC expansion you can check the logs of the csi-resizer sidecar container.

    kubectl -n rook-ceph logs deploy/csi-rbdplugin-provisioner -c csi-resizer
    

    csi-snapshotter

    The CSI external-snapshotter sidecar only watches for VolumeSnapshotContent create/update/delete events. It will talk to ceph-csi containers to create or delete snapshots. More details about external-snapshotter can be found here.

    In Kubernetes 1.17 the volume snapshot feature was promoted to beta. In Kubernetes 1.20, the feature gate is enabled by default on standard Kubernetes deployments and cannot be turned off.

    Make sure you have installed the correct snapshotter CRD version. If you have not installed the snapshotter controller, see the Snapshots guide.

    kubectl get crd | grep snapshot
    
    volumesnapshotclasses.snapshot.storage.k8s.io    2021-01-25T11:19:38Z
    volumesnapshotcontents.snapshot.storage.k8s.io   2021-01-25T11:19:39Z
    volumesnapshots.snapshot.storage.k8s.io          2021-01-25T11:19:40Z
    

    The above CRDs must have the matching version in your snapshotclass.yaml or snapshot.yaml. Otherwise, the VolumeSnapshot and VolumesnapshotContent will not be created.

    The snapshot controller is responsible for creating both VolumeSnapshot and VolumesnapshotContent object. If the objects are not getting created, you may need to check the logs of the snapshot-controller container.

    Rook only installs the snapshotter sidecar container, not the controller. It is recommended that Kubernetes distributors bundle and deploy the controller and CRDs as part of their Kubernetes cluster management process (independent of any CSI Driver).

    If your Kubernetes distribution does not bundle the snapshot controller, you may manually install these components.

    If any issue exists in the snapshot Create/Delete operation you can check the logs of the csi-snapshotter sidecar container.

    kubectl -n rook-ceph logs deploy/csi-rbdplugin-provisioner -c csi-snapshotter
    

    If you see an error such as:

    GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID
    0001-0009-rook-ceph-0000000000000001-8d0ba728-0e17-11eb-a680-ce6eecc894de already >exists.
    

    The issue typically is in the Ceph cluster or network connectivity. If the issue is in Provisioning the PVC Restarting the Provisioner pods help(for CephFS issue restart csi-cephfsplugin-provisioner-xxxxxx CephFS Provisioner. For RBD, restart the csi-rbdplugin-provisioner-xxxxxx pod. If the issue is in mounting the PVC, restart the csi-rbdplugin-xxxxx pod (for RBD) and the csi-cephfsplugin-xxxxx pod for CephFS issue.

    Mounting the volume to application pods

    When a user requests to create the application pod with PVC, there is a three-step process

    • CSI driver registration
    • Create volume attachment object
    • Stage and publish the volume

    csi-driver registration

    csi-cephfsplugin-xxxx or csi-rbdplugin-xxxx is a daemonset pod running on all the nodes where your application gets scheduled. If the plugin pods are not running on the node where your application is scheduled might cause the issue, make sure plugin pods are always running.

    Each plugin pod has two important containers: one is driver-registrar and csi-rbdplugin or csi-cephfsplugin. Sometimes there is also a liveness-prometheus container.

    driver-registrar

    The node-driver-registrar is a sidecar container that registers the CSI driver with Kubelet. More details can be found here.

    If any issue exists in attaching the PVC to the application pod check logs from driver-registrar sidecar container in plugin pod where your application pod is scheduled.

    kubectl -n rook-ceph logs deploy/csi-rbdplugin -c driver-registrar
    
    I0120 12:28:34.231761  124018 main.go:112] Version: v2.0.1
    I0120 12:28:34.233910  124018 connection.go:151] Connecting to unix:///csi/csi.sock
    I0120 12:28:35.242469  124018 node_register.go:55] Starting Registration Server at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
    I0120 12:28:35.243364  124018 node_register.go:64] Registration Server started at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
    I0120 12:28:35.243673  124018 node_register.go:86] Skipping healthz server because port set to: 0
    I0120 12:28:36.318482  124018 main.go:79] Received GetInfo call: &InfoRequest{}
    I0120 12:28:37.455211  124018 main.go:89] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
    E0121 05:19:28.658390  124018 connection.go:129] Lost connection to unix:///csi/csi.sock.
    E0125 07:11:42.926133  124018 connection.go:129] Lost connection to unix:///csi/csi.sock.
    

    You should see the response RegistrationStatus{PluginRegistered:true,Error:,} in the logs to confirm that plugin is registered with kubelet.

    If you see a driver not found an error in the application pod describe output. Restarting the csi-xxxxplugin-xxx pod on the node may help.

    Volume Attachment

    Each provisioner pod also has a sidecar container called csi-attacher.

    csi-attacher

    The external-attacher is a sidecar container that attaches volumes to nodes by calling ControllerPublish and ControllerUnpublish functions of CSI drivers. It is necessary because the internal Attach/Detach controller running in Kubernetes controller-manager does not have any direct interfaces to CSI drivers. More details can be found here.

    If any issue exists in attaching the PVC to the application pod first check the volumettachment object created and also log from csi-attacher sidecar container in provisioner pod.

    kubectl get volumeattachment
    
    NAME                                                                   ATTACHER                        PV                                         NODE       ATTACHED   AGE
    csi-75903d8a902744853900d188f12137ea1cafb6c6f922ebc1c116fd58e950fc92   rook-ceph.cephfs.csi.ceph.com   pvc-5c547d2a-fdb8-4cb2-b7fe-e0f30b88d454   minikube   true       4m26s
    
    kubectl logs po/csi-rbdplugin-provisioner-d857bfb5f-ddctl -c csi-attacher
    

    CephFS Stale operations

    Check for any stale mount commands on the csi-cephfsplugin-xxxx pod on the node where your application pod is scheduled.

    You need to exec in the csi-cephfsplugin-xxxx pod and grep for stale mount operators.

    Identify the csi-cephfsplugin-xxxx pod running on the node where your application is scheduled with kubectl get po -o wide and match the node names.

    kubectl exec -it csi-cephfsplugin-tfk2g -c csi-cephfsplugin -- sh
    ps -ef |grep mount
    
    root          67      60  0 11:55 pts/0    00:00:00 grep mount
    
    ps -ef |grep ceph
    
    root           1       0  0 Jan20 ?        00:00:26 /usr/local/bin/cephcsi --nodeid=minikube --type=cephfs --endpoint=unix:///csi/csi.sock --v=0 --nodeserver=true --drivername=rook-ceph.cephfs.csi.ceph.com --pidlimit=-1 --metricsport=9091 --forcecephkernelclient=true --metricspath=/metrics --enablegrpcmetrics=true
    root          69      60  0 11:55 pts/0    00:00:00 grep ceph
    

    If any commands are stuck check the dmesg logs from the node. Restarting the csi-cephfsplugin pod may also help sometimes.

    If you don’t see any stuck messages, confirm the network connectivity, Ceph health, and slow ops.

    RBD Stale operations

    Check for any stale map/mkfs/mount commands on the csi-rbdplugin-xxxx pod on the node where your application pod is scheduled.

    You need to exec in the csi-rbdplugin-xxxx pod and grep for stale operators like (rbd map, rbd unmap, mkfs, mount and umount).

    Identify the csi-rbdplugin-xxxx pod running on the node where your application is scheduled with kubectl get po -o wide and match the node names.

    kubectl exec -it csi-rbdplugin-vh8d5 -c csi-rbdplugin -- sh
    
    ps -ef |grep map
    
    root     1297024 1296907  0 12:00 pts/0    00:00:00 grep map
    
    ps -ef |grep mount
    
    root        1824       1  0 Jan19 ?        00:00:00 /usr/sbin/rpc.mountd
    ceph     1041020 1040955  1 07:11 ?        00:03:43 ceph-mgr --fsid=ba41ac93-3b55-4f32-9e06-d3d8c6ff7334 --keyring=/etc/ceph/keyring-store/keyring --log-to-stderr=true --err-to-stderr=true --mon-cluster-log-to-stderr=true --log-stderr-prefix=debug  --default-log-to-file=false --default-mon-cluster-log-to-file=false --mon-host=[v2:10.111.136.166:3300,v1:10.111.136.166:6789] --mon-initial-members=a --id=a --setuser=ceph --setgroup=ceph --client-mount-uid=0 --client-mount-gid=0 --foreground --public-addr=172.17.0.6
    root     1297115 1296907  0 12:00 pts/0    00:00:00 grep mount
    
    ps -ef |grep mkfs
    
    root     1297291 1296907  0 12:00 pts/0    00:00:00 grep mkfs
    
    ps -ef |grep umount
    
    root     1298500 1296907  0 12:01 pts/0    00:00:00 grep umount
    
    ps -ef |grep unmap
    
    root     1298578 1296907  0 12:01 pts/0    00:00:00 grep unmap
    

    If any commands are stuck check the dmesg logs from the node. Restarting the csi-rbdplugin pod also may help sometimes.

    If you don’t see any stuck messages, confirm the network connectivity, Ceph health, and slow ops.

    dmesg logs

    Check the dmesg logs on the node where pvc mounting is failing or the csi-rbdplugin container of the csi-rbdplugin-xxxx pod on that node.

    dmesg
    

    RBD Commands

    If nothing else helps, get the last executed command from the ceph-csi pod logs and run it manually inside the provisioner or plugin pod to see if there are errors returned even if they couldn’t be seen in the logs.

    $ rbd ls --id=csi-rbd-node -m=10.111.136.166:6789 --key=AQDpIQhg+v83EhAAgLboWIbl+FL/nThJzoI3Fg==
    

    Where -m is one of the mon endpoints and the --key is the key used by the CSI driver for accessing the Ceph cluster.

    Node Loss

    When a node is lost, you will see application pods on the node stuck in the Terminating state while another pod is rescheduled and is in the ContainerCreating state.

    To allow the application pod to start on another node, force delete the pod.

    Force deleting the pod

    To force delete the pod stuck in the Terminating state:

    $ kubectl -n rook-ceph delete pod my-app-69cd495f9b-nl6hf --grace-period 0 --force
    

    After the force delete, wait for a timeout of about 8-10 minutes. If the pod still not in the running state, continue with the next section to blocklist the node.

    Blocklisting a node

    To shorten the timeout, you can mark the node as “blocklisted” from the Rook toolbox so Rook can safely failover the pod sooner.

    If the Ceph version is at least Pacific(v16.2.x), run the following command:

    $ ceph osd blocklist add <NODE_IP> # get the node IP you want to blocklist
    blocklisting <NODE_IP>
    

    If the Ceph version is Octopus(v15.2.x) or older, run the following command:

    $ ceph osd blacklist add <NODE_IP> # get the node IP you want to blacklist
    blacklisting <NODE_IP>
    

    After running the above command within a few minutes the pod will be running.

    Removing a node blocklist

    After you are absolutely sure the node is permanently offline and that the node no longer needs to be blocklisted, remove the node from the blocklist.

    If the Ceph version is at least Pacific(v16.2.x), run:

    $ ceph osd blocklist rm <NODE_IP>
    un-blocklisting <NODE_IP>
    

    If the Ceph version is Octopus(v15.2.x) or older, run:

    $ ceph osd blacklist rm <NODE_IP> # get the node IP you want to blacklist
    un-blacklisting <NODE_IP>