由于对ceph-mon不甚了解,做了下面一件事情:
- openstack存储完全基于ceph存储
- ceph存储节点单独部署
- 现有ceph-mon节点2个(mon-1,mon-2),想重装mon-1;但是,没有了ceph-1之后,ceph-2似乎认为脑裂了,不提供服务;于是想找个临时mon顶上
- 在openstack集群上很轻松地申请了一台机器,很快变安装了一个mon-3,由于安全组(没有放开6789)的原因,虽然mon-3能够找到mon-2,ceph -s也能看到,但是mon-1却因为不能连接到mon-3使得mon-3没有完全加入,调整安全组后,似乎一切变的正常,当mon-1去掉之后,ceph -s 卡住了,mon-3中正在执行的yum也卡住了
- 观察发现,mon-2 出于probing状态,mon-3出于reop状态,没有leader了
- 因为mon-3 依赖ceph,但是ceph没有了leader,mon-3就不能落地数据
- mon-3不能落地数据的话,mon-2、mon-3之间就没法选举出来leader
- 如此,事情进入僵局,死锁了
- 分析:
- 每个mon上都存在一个db,里面放着monmap信息,启动的时候,就根据monmap中的信息加入集群,如果monmap中只有自己,直接启动就可以了,如果有多个mon节点,并且当前不存在leader就得选举
- 如果能把monmap中的mon节点修改成只有自己,就能正常启动
- 解决办法:(参考: http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap )
- 停掉mon-2,不停掉是操作不了monmap的db的
- 导出monmap:
ceph-mon -i ID-FOO –extract-monmap /tmp/monmap - 查看monmap:
monmaptool –print -f /tmp/monmap - 删除mon-3
monmaptool –rm mon-3 -f /tmp/monmap - 注入monmap
ceph-mon -i ID –inject-monmap /tmp/monmap - 启动mon-2
Inject a monmap into the monitor
Usually the safest path. You should grab the monmap from the remaining monitors and inject it into the monitor with the corrupted/lost monmap.
These are the basic steps:
Is there a formed quorum? If so, grab the monmap from the quorum:
1 $ ceph mon getmap -o /tmp/monmapNo quorum? Grab the monmap directly from another monitor (this assumes the monitor you’re grabbing the monmap from has id ID-FOO and has been stopped):
1 $ ceph-mon -i ID-FOO --extract-monmap /tmp/monmapStop the monitor you’re going to inject the monmap into.
Inject the monmap:
1 $ ceph-mon -i ID --inject-monmap /tmp/monmapStart the monitor
Please keep in mind that the ability to inject monmaps is a powerful feature that can cause havoc with your monitors if misused as it will overwrite the latest, existing monmap kept by the monitor.
What if the state is probing
?
This means the monitor is still looking for the other monitors. Every time you start a monitor, the monitor will stay in this state for some time while trying to find the rest of the monitors specified in the
monmap
. The time a monitor will spend in this state can vary. For instance, when on a single-monitor cluster, the monitor will pass through the probing state almost instantaneously, since there are no other monitors around. On a multi-monitor cluster, the monitors will stay in this state until they find enough monitors to form a quorum – this means that if you have 2 out of 3 monitors down, the one remaining monitor will stay in this state indefinitively until you bring one of the other monitors up.If you have a quorum, however, the monitor should be able to find the remaining monitors pretty fast, as long as they can be reached. If your monitor is stuck probing and you have gone through with all the communication troubleshooting, then there is a fair chance that the monitor is trying to reach the other monitors on a wrong address.
mon_status
outputs themonmap
known to the monitor: check if the other monitor’s locations match reality. If they don’t, jump to Recovering a Monitor’s Broken monmap; if they do, then it may be related to severe clock skews amongst the monitor nodes and you should refer to Clock Skews first, but if that doesn’t solve your problem then it is the time to prepare some logs and reach out to the community (please refer to Preparing your logs on how to best prepare your logs).