Ceph:关于 Ceph 练习题笔记整理

对每个人而言,真正的职责只有一个:找到自我。然后在心中坚守其一生,全心全意,永不停息。所有其它的路都是不完整的,是人的逃避方式,是对大众理想的懦弱回归,是随波逐流,是对内心的恐惧 ——赫尔曼·黑塞《德米安》

写在前面


+

  • 理解不足小伙伴帮忙指正

对每个人而言,真正的职责只有一个:找到自我。然后在心中坚守其一生,全心全意,永不停息。所有其它的路都是不完整的,是人的逃避方式,是对大众理想的懦弱回归,是随波逐流,是对内心的恐惧 ——赫尔曼·黑塞《德米安》


1. 布署 Ceph

容器 image 存储在:registry.lab.example.com
账号:registry/redhat

  • servercserverdservereclienta 节点上部署 Ceph 集群
  • serverc.lab.example.com clienta.lab.example.com 为 Ceph 管理节点
  • 3 个存储节点使用 /dev/vdb/dev/vdc/dev/vdd作为 OSD 硬盘
  • Dashboard 的管理员密码是 redhat
  • 安装并配置其它题目所要求的服务

cephadm 软件包已经提前安装到了 serverc 节点

[root@serverc ~]#

1
2
3
4
5
6
7
8
9
10
11
# DEPLOY
# cephadm bootstrap -h
# --allow-fqdn-hostname allow hostname that is fully-qualified (contains ".")
cephadm bootstrap \
--mon-ip 172.25.250.12 \
--initial-dashboard-password redhat \
--dashboard-password-noupdate
--allow-fqdn-hostname \
--registry-url registry.lab.example.com \
--registry-username registry \
--registry-password redhat
1
2
3
4
5
# Enabling password-less SSH
# -f: force
ssh-copy-id -f -i /etc/ceph/ceph.pub root@serverd
ssh-copy-id -f -i /etc/ceph/ceph.pub root@servere
ssh-copy-id -f -i /etc/ceph/ceph.pub root@clienta
1
2
3
4
5
6
7
# install software
yum provides ceph

yum -y install ceph-common

# <Tab>: source OR logout then login
source /etc/bash_completion.d/ceph
1
2
3
4
5
6
# Add a host
ceph orch host add serverd.lab.example.com 172.25.250.13
ceph orch host add servere.lab.example.com 172.25.250.14
ceph orch host add clienta.lab.example.com 172.25.250.10

ceph orch host ls
1
2
3
4
5
# Add a host label
ceph orch host label add clienta.lab.example.com _admin
ceph orch host label add serverc.lab.example.com _admin

ceph orch host ls
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# ceph orch apply -h
ceph orch apply mon \
serverc.lab.example.com,serverd.lab.example.com,servere.lab.example.com

ceph orch ls mon

# ceph orch apply -h
ceph orch apply mgr \
serverc.lab.example.com,serverd.lab.example.com,servere.lab.example.com

ceph orch ls mgr

# ceph orch daemon add osd -h
for i in server{c..e}; do
ceph orch daemon add osd $i.lab.example.com:/dev/vdb
ceph orch daemon add osd $i.lab.example.com:/dev/vdc
ceph orch daemon add osd $i.lab.example.com:/dev/vdd
done

ceph device ls

[root@clienta ~]#

1
2
3
4
5
6
7
# for ceph
yum -y install ceph-common
source /etc/bash_completion.d/ceph

scp root@serverc:/etc/ceph/*.{keyring,conf} /etc/ceph

ceph health

2. Ceph 的健康状态

  • Ceph 的健康状态应该为:HEALTH_OK

[root@serverc ~]#

1
ceph health

3. 配置 Ceph

Ceph 中的 pool 允许被删除

[root@serverc ~]#

1
2
3
4
5
6
ceph config ls | grep allow.*del

# ceph config set -h
ceph config set mon mon_allow_pool_delete true

ceph config get mon

4. 配置 Ceph dashboard

Hint - 提示
第 1 题已完成,此处只做验证

遇到的问题处理

1
2
3
[root@clienta ~]# ceph health
HEALTH_WARN 6 stray daemon(s) not managed by cephadm; 7 osds down; Reduced data availability: 64 pgs inactive, 1 pg peering, 1 pg stale; Degraded data redundancy: 15/201 objects degraded (7.463%), 5 pgs degraded, 25 pgs undersized; 77 slow ops, oldest one blocked for 36281 sec, daemons [osd.0,osd.1,osd.14,osd.15,osd.16,osd.17,osd.2,osd.4,osd.7] have slow ops.
[root@clienta ~]#

HEALTH_WARN 6个未被cephadm管理的流浪守护进程;7个osds停机;数据可用性降低:64个pg不活动,1个pg对等,1个pg过期;数据冗余度降低:15/201个对象降低(7.463%),5个pg降低,25个pg大小不足;77个慢速运行,最老的一个阻塞了36281秒,守护进程[osd.0、 osd.1,osd.14,osd.15,osd.16,osd.17,osd.2,osd.4,osd.7]有慢速操作。

详细信息

  • HEALTH_WARN:表示集群不完全健康,存在一些问题需要解决。
  • 6 stray daemon(s) not managed by cephadm:6个没有被cephadm管理的孤立守护程序,需要手动管理。
  • 5 osds down:5个OSD宕机或无法访问。
  • Reduced data availability: 64 pgs inactive, 1 pg peering, 1 pg stale:64个PG处于非活动状态,1个PG正在交换状态,另外1个PG处于陈旧状态。这会影响数据的可用性。
  • Degraded data redundancy: 15/201 objects degraded (7.463%), 5 pgs degraded, 25 pgs undersized:15个对象的冗余性已经降级(因为副本丢失等原因),5个PG已经降级,还有25个PG的大小过小,会影响数据的冗余性和可靠性。
  • 174 slow ops, oldest one blocked for 37975 sec, daemons [osd.0,osd.1,osd.14,osd.15,osd.16,osd.17,osd.2,osd.4,osd.7] have slow ops:174个操作很慢,最老的一个已被阻塞了37975秒。这些操作正在由名为osd.0、osd.1、osd.14、osd.15、osd.16、osd.17、osd.2、osd.4和osd.7的守护程序处理,需要进行诊断和优化。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
[root@clienta ~]# ceph health detail
HEALTH_WARN 6 stray daemon(s) not managed by cephadm; 7 osds down; Reduced data availability: 64 pgs inactive, 1 pg peering, 1 pg stale; Degraded data redundancy: 15/201 objects degraded (7.463%), 5 pgs degraded, 25 pgs undersized; 62 slow ops, oldest one blocked for 36156 sec, daemons [osd.0,osd.1,osd.14,osd.15,osd.16,osd.17,osd.2,osd.4,osd.7] have slow ops.
[WRN] CEPHADM_STRAY_DAEMON: 6 stray daemon(s) not managed by cephadm
stray daemon osd.5 on host serverc.lab.example.com not managed by cephadm
stray daemon osd.9 on host serverc.lab.example.com not managed by cephadm
stray daemon osd.10 on host serverd.lab.example.com not managed by cephadm
stray daemon osd.8 on host serverd.lab.example.com not managed by cephadm
stray daemon osd.11 on host servere.lab.example.com not managed by cephadm
stray daemon osd.6 on host servere.lab.example.com not managed by cephadm
[WRN] OSD_DOWN: 7 osds down
osd.8 () is down
osd.9 () is down
osd.12 (root=default,host=clienta) is down
osd.13 (root=default,host=serverd) is down
osd.14 (root=default,host=servere) is down
osd.18 (root=default,host=clienta) is down
osd.19 (root=default,host=clienta) is down
[WRN] PG_AVAILABILITY: Reduced data availability: 64 pgs inactive, 1 pg peering, 1 pg stale
pg 2.0 is stuck inactive for 12h, current state unknown, last acting []
pg 2.1 is stuck inactive for 12h, current state unknown, last acting []
pg 2.2 is stuck inactive for 12h, current state unknown, last acting []
pg 2.3 is stuck inactive for 12h, current state unknown, last acting []
pg 2.6 is stuck inactive for 12h, current state unknown, last acting []
pg 2.7 is stuck inactive for 4m, current state undersized+peered, last acting [7]
pg 2.9 is stuck inactive for 12h, current state unknown, last acting []
pg 2.a is stuck inactive for 12h, current state unknown, last acting []
pg 2.b is stuck inactive for 4m, current state undersized+peered, last acting [4]
pg 2.d is stuck inactive for 12h, current state unknown, last acting []
pg 2.e is stuck inactive for 12h, current state unknown, last acting []
pg 2.13 is stuck inactive for 12h, current state unknown, last acting []
pg 2.17 is stuck inactive for 12h, current state unknown, last acting []
pg 3.0 is stuck inactive for 12h, current state unknown, last acting []
pg 3.1 is stuck inactive for 12h, current state unknown, last acting []
pg 3.3 is stuck inactive for 12h, current state unknown, last acting []
pg 3.5 is stuck inactive for 12h, current state unknown, last acting []
pg 3.7 is stuck inactive for 12h, current state unknown, last acting []
pg 3.9 is stuck inactive for 12h, current state unknown, last acting []
pg 3.a is stuck inactive for 12h, current state unknown, last acting []
pg 3.b is stuck inactive for 12h, current state unknown, last acting []
pg 3.d is stuck inactive for 12h, current state unknown, last acting []
pg 3.e is stuck inactive for 12h, current state unknown, last acting []
pg 3.f is stuck inactive for 12h, current state unknown, last acting []
pg 3.10 is stuck inactive for 12h, current state unknown, last acting []
pg 3.12 is stuck inactive for 12h, current state unknown, last acting []
pg 3.13 is stuck inactive for 12h, current state unknown, last acting []
pg 3.15 is stuck stale for 4m, current state stale+peering, last acting [14,13]
pg 3.16 is stuck inactive for 12h, current state unknown, last acting []
pg 3.17 is stuck inactive for 12h, current state unknown, last acting []
pg 3.19 is stuck inactive for 12h, current state unknown, last acting []
pg 4.0 is stuck inactive for 12h, current state unknown, last acting []
pg 4.1 is stuck inactive for 4m, current state undersized+peered, last acting [2]
pg 4.2 is stuck inactive for 12h, current state unknown, last acting []
pg 4.3 is stuck inactive for 12h, current state unknown, last acting []
pg 4.4 is stuck inactive for 12h, current state unknown, last acting []
pg 4.5 is stuck inactive for 4m, current state undersized+peered, last acting [0]
pg 4.7 is stuck inactive for 4m, current state undersized+peered, last acting [1]
pg 4.8 is stuck inactive for 12h, current state unknown, last acting []
pg 4.9 is stuck inactive for 12h, current state unknown, last acting []
pg 4.c is stuck inactive for 4m, current state undersized+peered, last acting [4]
pg 4.d is stuck inactive for 4m, current state undersized+degraded+peered, last acting [4]
pg 4.10 is stuck inactive for 12h, current state unknown, last acting []
pg 4.13 is stuck inactive for 12h, current state unknown, last acting []
pg 4.15 is stuck inactive for 12h, current state unknown, last acting []
pg 4.16 is stuck inactive for 12h, current state unknown, last acting []
pg 4.1e is stuck inactive for 4m, current state undersized+peered, last acting [1]
pg 5.0 is stuck inactive for 12h, current state unknown, last acting []
pg 5.3 is stuck inactive for 12h, current state unknown, last acting []
pg 5.5 is stuck inactive for 12h, current state unknown, last acting []
pg 5.7 is stuck inactive for 4m, current state undersized+peered, last acting [1]
[WRN] PG_DEGRADED: Degraded data redundancy: 15/201 objects degraded (7.463%), 5 pgs degraded, 25 pgs undersized
pg 1.0 is stuck undersized for 4m, current state active+undersized, last acting [3,2]
pg 2.4 is stuck undersized for 4m, current state active+undersized, last acting [1,2]
pg 2.5 is stuck undersized for 4m, current state active+undersized, last acting [2,7]
pg 2.7 is stuck undersized for 4m, current state undersized+peered, last acting [7]
pg 2.8 is stuck undersized for 4m, current state active+undersized+remapped, last acting [1,2]
pg 2.b is stuck undersized for 4m, current state undersized+peered, last acting [4]
pg 2.15 is stuck undersized for 4m, current state active+undersized, last acting [1,4]
pg 2.18 is stuck undersized for 4m, current state active+undersized, last acting [17,2]
pg 2.19 is stuck undersized for 4m, current state active+undersized, last acting [1,2]
pg 2.1b is stuck undersized for 4m, current state undersized+degraded+peered, last acting [0]
pg 3.14 is stuck undersized for 4m, current state active+undersized+degraded, last acting [0,7]
pg 3.18 is stuck undersized for 4m, current state active+undersized+degraded, last acting [4,7]
pg 4.1 is stuck undersized for 4m, current state undersized+peered, last acting [2]
pg 4.5 is stuck undersized for 4m, current state undersized+peered, last acting [0]
pg 4.6 is stuck undersized for 4m, current state active+undersized, last acting [1,17]
pg 4.7 is stuck undersized for 4m, current state undersized+peered, last acting [1]
pg 4.a is stuck undersized for 4m, current state active+undersized, last acting [15,0]
pg 4.c is stuck undersized for 4m, current state undersized+peered, last acting [4]
pg 4.d is stuck undersized for 4m, current state undersized+degraded+peered, last acting [4]
pg 4.11 is stuck undersized for 4m, current state active+undersized, last acting [7,0]
pg 4.1e is stuck undersized for 4m, current state undersized+peered, last acting [1]
pg 4.1f is stuck undersized for 4m, current state undersized+degraded+peered, last acting [15]
pg 5.2 is stuck undersized for 4m, current state active+undersized, last acting [0,1]
pg 5.4 is stuck undersized for 4m, current state active+undersized, last acting [4,15]
pg 5.7 is stuck undersized for 4m, current state undersized+peered, last acting [1]
[WRN] SLOW_OPS: 62 slow ops, oldest one blocked for 36156 sec, daemons [osd.0,osd.1,osd.14,osd.15,osd.16,osd.17,osd.2,osd.4,osd.7] have slow ops.
[root@clienta ~]#

博文部分内容参考

© 文中涉及参考链接内容版权归原作者所有,如有侵权请告知,这是一个开源项目,如果你认可它,不要吝啬星星哦 :)



© 2018-至今 liruilonger@gmail.com, All rights reserved. 保持署名-非商用-相同方式共享(CC BY-NC-SA 4.0)

发布于

2023-04-11

更新于

2024-11-22

许可协议

评论
Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×