PHPor 的Blog – 第28页

linux 虚拟网络设备之vlan

我们通过一个网桥两个设备对，来连接两个网络名字空间，每个名字空间中创建两个vlan

借助vconfig来配置vlan：

#创建网桥
brctl addbr br-test-vlan

#创建veth对儿
ip link add veth01 type veth peer name veth10
ip link add veth02 type veth peer name veth20

#将veth对儿的一段添加到网桥
brctl addif br-test-vlan veth01
brctl addif br-test-vlan veth02

#启动设备
ip link set dev br-test-vlan up
ip link set dev veth01 up
ip link set dev veth02 up
ip link set dev veth10 up
ip link set dev veth20 up

#创建网络名字空间
ip netns add test-vlan-vm01
ip netns add test-vlan-vm02

#将设备对儿的另一端添加到另个名字空间（其实在一个名字空间也能玩，只是两个名字空间更加形象）
ip link set veth10 netns test-vlan-vm01
ip link set veth20 netns test-vlan-vm02

#分别进入两个名字空间创建vlan和配置ip
#配置名字空间test-vlan-vm01
ip netns exec test-vlan-vm01 bash
#配置vlan 3001 和 vlan 3002
vconfig add veth10 3001
vconfig add veth10 3002
#启动两个vlan的设备
ip link set veth10.3001 up
ip link set veth10.3002 up

#分别在两个vlan上配置ip （这里简单起见，使用了同一个网段了IP，缺点是，需要了解一点儿路由的知识）
ip a add 172.16.30.1/24 dev veth10.3001
ip a add 172.16.30.2/24 dev veth10.3002

#添加路由
route add 172.16.30.21 dev veth10.3001
route add 172.16.30.22 dev veth10.3002

#配置名字空间test-vlan-vm02
ip netns exec test-vlan-vm02 bash
#配置vlan 3001 和 vlan 3002
vconfig add veth20 3001
vconfig add veth20 3002
#启动两个vlan的设备
ip link set veth20.3001 up
ip link set veth20.3002 up

#分别在两个vlan上配置ip （这里简单起见，使用了同一个网段了IP，缺点是，需要了解一点儿路由的知识）
ip a add 172.16.30.21/24 dev veth20.3001
ip a add 172.16.30.22/24 dev veth20.3002

#添加路由
route add 172.16.30.1 dev veth20.3001
route add 172.16.30.2 dev veth20.3002

#创建网桥

brctl addbr br-test-vlan

#创建veth对儿

ip link add veth01 type veth peer name veth10

ip link add veth02 type veth peer name veth20

#将veth对儿的一段添加到网桥

brctl addif br-test-vlan veth01

brctl addif br-test-vlan veth02

#启动设备

ip link set dev br-test-vlan up

ip link set dev veth01 up

ip link set dev veth02 up

ip link set dev veth10 up

ip link set dev veth20 up

#创建网络名字空间

ip netns add test-vlan-vm01

ip netns add test-vlan-vm02

#将设备对儿的另一端添加到另个名字空间（其实在一个名字空间也能玩，只是两个名字空间更加形象）

ip link set veth10 netns test-vlan-vm01

ip link set veth20 netns test-vlan-vm02

#分别进入两个名字空间创建vlan和配置ip

#配置名字空间test-vlan-vm01

ip netns exec test-vlan-vm01 bash

#配置vlan 3001 和 vlan 3002

vconfig add veth10 3001

vconfig add veth10 3002

#启动两个vlan的设备

ip link set veth10.3001 up

ip link set veth10.3002 up

#分别在两个vlan上配置ip （这里简单起见，使用了同一个网段了IP，缺点是，需要了解一点儿路由的知识）

ip a add 172.16.30.1/24 dev veth10.3001

ip a add 172.16.30.2/24 dev veth10.3002

#添加路由

route add 172.16.30.21 dev veth10.3001

route add 172.16.30.22 dev veth10.3002

#配置名字空间test-vlan-vm02

ip netns exec test-vlan-vm02 bash

#配置vlan 3001 和 vlan 3002

vconfig add veth20 3001

vconfig add veth20 3002

#启动两个vlan的设备

ip link set veth20.3001 up

ip link set veth20.3002 up

#分别在两个vlan上配置ip （这里简单起见，使用了同一个网段了IP，缺点是，需要了解一点儿路由的知识）

ip a add 172.16.30.21/24 dev veth20.3001

ip a add 172.16.30.22/24 dev veth20.3002

#添加路由

route add 172.16.30.1 dev veth20.3001

route add 172.16.30.2 dev veth20.3002

查看一下vlan配置：

# cat /proc/net/vlan/config 
VLAN Dev name	 | VLAN ID
Name-Type: VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD
veth10.3001    | 3001  | veth10
veth10.3002    | 3002  | veth10

# cat /proc/net/vlan/config

VLAN Dev name | VLAN ID

Name-Type: VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD

veth10.3001 | 3001 | veth10

veth10.3002 | 3002 | veth10

现在，我们可以分别在两个名字空间来ping另外一个名字空间的两个IP，虽然两个IP都能ping通，但是使用的源IP是不同的，走的vlan也是不同的，我们可以在veth01/veth10/veth02/veth20/br-test-vlan 任意一个上抓包，会看到vlan信息：

# tcpdump  -i veth10 -nn -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth10, link-type EN10MB (Ethernet), capture size 262144 bytes
15:38:18.381010 82:f7:0e:2d:3f:62 > 9e:58:72:fa:11:15, ethertype 802.1Q (0x8100), length 102: vlan <span style="color: #ff0000;">3001</span>, p 0, ethertype IPv4, <strong><span style="color: #ff0000;">172.16.30.1 > 172.16.30.21</span></strong>: ICMP echo request, id 19466, seq 1, length 64
15:38:18.381183 9e:58:72:fa:11:15 > 82:f7:0e:2d:3f:62, ethertype 802.1Q (0x8100), length 102: vlan <span style="color: #ff0000;"><strong>3001</strong></span>, p 0, ethertype IPv4, 172.16.30.21 > 172.16.30.1: ICMP echo reply, id 19466, seq 1, length 64
15:38:19.396796 82:f7:0e:2d:3f:62 > 9e:58:72:fa:11:15, ethertype 802.1Q (0x8100), length 102: vlan 3001, p 0, ethertype IPv4, 172.16.30.1 > 172.16.30.21: ICMP echo request, id 19466, seq 2, length 64
15:38:19.396859 9e:58:72:fa:11:15 > 82:f7:0e:2d:3f:62, ethertype 802.1Q (0x8100), length 102: vlan 3001, p 0, ethertype IPv4, 172.16.30.21 > 172.16.30.1: ICMP echo reply, id 19466, seq 2, length 64
15:38:23.162052 82:f7:0e:2d:3f:62 > 9e:58:72:fa:11:15, ethertype 802.1Q (0x8100), length 102: vlan 3002, p 0, ethertype IPv4, 172.16.30.2 > <strong><span style="color: #ff0000;">172.16.30.22</span></strong>: ICMP echo request, id 19473, seq 1, length 64
15:38:23.162107 9e:58:72:fa:11:15 > 82:f7:0e:2d:3f:62, ethertype 802.1Q (0x8100), length 102: vlan 3002, p 0, ethertype IPv4, <strong><span style="color: #ff0000;">172.16.30.22 > 172.16.30.2</span></strong>: ICMP echo reply, id 19473, seq 1, length 64

# tcpdump -i veth10 -nn -e

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on veth10, link-type EN10MB (Ethernet), capture size 262144 bytes

15:38:18.381010 82:f7:0e:2d:3f:62 > 9e:58:72:fa:11:15, ethertype 802.1Q (0x8100), length 102: vlan 3001, p 0, ethertype IPv4, 172.16.30.1 > 172.16.30.21: ICMP echo request, id 19466, seq 1, length 64

15:38:18.381183 9e:58:72:fa:11:15 > 82:f7:0e:2d:3f:62, ethertype 802.1Q (0x8100), length 102: vlan 3001, p 0, ethertype IPv4, 172.16.30.21 > 172.16.30.1: ICMP echo reply, id 19466, seq 1, length 64

15:38:19.396796 82:f7:0e:2d:3f:62 > 9e:58:72:fa:11:15, ethertype 802.1Q (0x8100), length 102: vlan 3001, p 0, ethertype IPv4, 172.16.30.1 > 172.16.30.21: ICMP echo request, id 19466, seq 2, length 64

15:38:19.396859 9e:58:72:fa:11:15 > 82:f7:0e:2d:3f:62, ethertype 802.1Q (0x8100), length 102: vlan 3001, p 0, ethertype IPv4, 172.16.30.21 > 172.16.30.1: ICMP echo reply, id 19466, seq 2, length 64

15:38:23.162052 82:f7:0e:2d:3f:62 > 9e:58:72:fa:11:15, ethertype 802.1Q (0x8100), length 102: vlan 3002, p 0, ethertype IPv4, 172.16.30.2 > 172.16.30.22: ICMP echo request, id 19473, seq 1, length 64

15:38:23.162107 9e:58:72:fa:11:15 > 82:f7:0e:2d:3f:62, ethertype 802.1Q (0x8100), length 102: vlan 3002, p 0, ethertype IPv4, 172.16.30.22 > 172.16.30.2: ICMP echo reply, id 19473, seq 1, length 64

如果试图从veth10.3001 去ping 172.16.30.22 是不能通的，因为是不同的vlan呀：

# ping -I veth10.3001 172.16.30.22
PING 172.16.30.22 (172.16.30.22) from 172.16.30.1 veth10.3001: 56(84) bytes of data.
^C
--- 172.16.30.22 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8231ms

# ping -I veth10.3001 172.16.30.22

PING 172.16.30.22 (172.16.30.22) from 172.16.30.1 veth10.3001: 56(84) bytes of data.

--- 172.16.30.22 ping statistics ---

9 packets transmitted, 0 received, 100% packet loss, time 8231ms

不适用vconfig的解法：

ip link add link veth10 name veth10.3001 type vlan id 3001

1	ip link add link veth10 name veth10.3001 type vlan id 3001

另： vlan 一般以 设备名.vlanid 来命名，不过并非强制，如下命名为 vlan3003也是没问题的

# ip link add link veth10 name vlan3003 type vlan id 3003

1	# ip link add link veth10 name vlan3003 type vlan id 3003

注意：一个主设备上相同vlan好的子设备最多只能有一个

# ip link add link veth10 name vlan3001 type vlan id 3001
 RTNETLINK answers: File exists

1 2	# ip link add link veth10 name vlan3001 type vlan id 3001 RTNETLINK answers: File exists

所以，正常来讲，一般是这样的：

参考： http://network.51cto.com/art/201504/473419.htm

http://www.mamicode.com/info-detail-2357921.html

ceph 之 ceph-bluestore-tool

ceph-bluestore-tool 可以对bluestore 文件系统进行检查：

#ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0 --command fsck
...
2017-12-11 12:06:22.464823 7f9bce47bd00 4 rocksdb: Compression algorithms supported:
2017-12-11 12:06:22.465071 7f9bce47bd00 4 rocksdb: Snappy supported: 0
2017-12-11 12:06:22.465078 7f9bce47bd00 4 rocksdb: Zlib supported: 0
2017-12-11 12:06:22.465080 7f9bce47bd00 4 rocksdb: Bzip supported: 0
2017-12-11 12:06:22.465083 7f9bce47bd00 4 rocksdb: LZ4 supported: 0
2017-12-11 12:06:22.465085 7f9bce47bd00 4 rocksdb: ZSTD supported: 0
2017-12-11 12:06:22.465087 7f9bce47bd00 4 rocksdb: Fast CRC32 supported: 0
...
2017-12-11 12:06:26.629854 7f9bce47bd00 1 bluestore(/var/lib/ceph/osd/ceph-0) fsck finish with 0 errors in 7.053573 seconds

#ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0 --command fsck

...

2017-12-11 12:06:22.464823 7f9bce47bd00 4 rocksdb: Compression algorithms supported:

2017-12-11 12:06:22.465071 7f9bce47bd00 4 rocksdb: Snappy supported: 0

2017-12-11 12:06:22.465078 7f9bce47bd00 4 rocksdb: Zlib supported: 0

2017-12-11 12:06:22.465080 7f9bce47bd00 4 rocksdb: Bzip supported: 0

2017-12-11 12:06:22.465083 7f9bce47bd00 4 rocksdb: LZ4 supported: 0

2017-12-11 12:06:22.465085 7f9bce47bd00 4 rocksdb: ZSTD supported: 0

2017-12-11 12:06:22.465087 7f9bce47bd00 4 rocksdb: Fast CRC32 supported: 0

...

2017-12-11 12:06:26.629854 7f9bce47bd00 1 bluestore(/var/lib/ceph/osd/ceph-0) fsck finish with 0 errors in 7.053573 seconds

注意：必须把对应的osd停掉才行

ceph 之 mon quorum操作

取消某个mon的法人资格：

# ceph tell mon.ceph-test-3 quorum exit
stopped responding to quorum, initiated new election

1 2	# ceph tell mon.ceph-test-3 quorum exit stopped responding to quorum, initiated new election

恢复一个mon的法人资格

# ceph tell mon.ceph-test-3 quorum enter

1	# ceph tell mon.ceph-test-3 quorum enter

不幸的是，该命令卡死不动；甚至，如果期望提示enter补全都会卡死不动; 哪怕是4个mon，exit一个再tell回来也是卡死的，所以，tell命令似乎只能取消法人资格，不能恢复法人资格

方法2：直接在需要恢复法人资格的daemon上执行如下命令

# ceph daemon mon.ceph-test-3 quorum enter
started responding to quorum, initiated new election

1 2	# ceph daemon mon.ceph-test-3 quorum enter started responding to quorum, initiated new election

当while read 遇上ssh

为什么第二条命令输出的不是a b c三行？

改进：

这里把ssh的标准输入关闭了，结果就正常了，可见，原来不能输出三行是有ssh的标准输入导致的

再次验证如下：

这里比较清晰地说明了b、c被ssh给读走了

如果不想费这心思，完全可以别走管道：

有时候就是这样，能用简单明了的写法，最好别摆酷

非图片版：

[root@ceph-test-1 my-cluster]# echo -e "a\nb\nc" | while read c; do ssh localhost echo $c; done
a
[root@ceph-test-1 my-cluster]# echo -e "a\nb\nc" | while read c; do ssh localhost echo $c <&-; done
a
b
c
[root@ceph-test-1 my-cluster]# echo -e "a\nb\nc" | while read c; do ssh localhost 'while read c; do echo ">>$c";done'; done
>>b
>>c
[root@ceph-test-1 my-cluster]# echo -e "a\nb\nc" | while read c; do ssh localhost 'while read d; do echo ">>$d";done'; echo ">$c";done
>>b
>>c
>a
[root@ceph-test-1 my-cluster]# for c in  $(echo -e "a\nb\nc"); do ssh localhost echo $c; done
a
b
c

[root@ceph-test-1 my-cluster]# echo -e "a\nb\nc" | while read c; do ssh localhost echo $c; done

[root@ceph-test-1 my-cluster]# echo -e "a\nb\nc" | while read c; do ssh localhost echo $c <&-; done

[root@ceph-test-1 my-cluster]# echo -e "a\nb\nc" | while read c; do ssh localhost 'while read c; do echo ">>$c";done'; done

>>b

>>c

[root@ceph-test-1 my-cluster]# echo -e "a\nb\nc" | while read c; do ssh localhost 'while read d; do echo ">>$d";done'; echo ">$c";done

>>b

>>c

[root@ceph-test-1 my-cluster]# for c in $(echo -e "a\nb\nc"); do ssh localhost echo $c; done

vxlan 问题排查

现象：

openstack中创建的vpc网络中，虚拟机不能dhcp到IP地址，抓包分析如下：

这是openstack控制节点的网络相关信息，问题是： vxlan-60中一个虚机想要通过dhcp获取IP地址，现在dhcp的数据包可以通过eth0到达brqce4d2a54-44，抓包可以验证：

tcpdump -i brqce4d2a54-44 udp and port 8472  #可以看到数据包

1	tcpdump -i brqce4d2a54-44 udp and port 8472 #可以看到数据包

但是vxlan-60上却抓不到dhcp数据包：

 tcpdump -i vxlan-60 -nn   #看不到数据包

1	tcpdump -i vxlan-60 -nn #看不到数据包

我觉得，到达了 brqce4d2a54-44 的vxlan数据包，如果vxlan id是60就直接转给vxlan-60 就是了，还可能被哪里的规则给拦截？问题都定位了这地步了，应该问题就出现在openstack-controller身上吧？不应该和其它机器有关系吧？（我不断祭出bridge fdb、brctl、ip等工具，浪费了有半天的时间）

后来，在我都试图想重启openstack-controller的时候，我决定分别在两个compute上创建两个机器，如果两个机器之间能够通信，则说明openstack-controller 的问题，否则，就不是openstack-controller的问题；测试发现，两个机器不能通信，我才把重点放在了compute节点；

compute节点网络架构：

话说，这个网络和官方指导架构或者书本上指导的架构都不一样；正常来讲，把bond0 添加到 /etc/neutron/plugins/ml2/linuxbridge_agent.ini 中就可以了，也不会出现今天的问题，因为最初是这样的，我测试vpc功能都是OK的；

加入我把bond0给了neutron，则neutron会将bond0的IP转移给brqce4d2a54-44 ，由于机器有限，我需要在这些compute节点上起ceph node，ceph node也通过虚拟机提供存储服务（如上图的vnet0），如果让vnet0桥接到brqce4d2a54-44 上，总感觉不大好，于是，便有了上述的网络架构；

既然 /etc/neutron/plugins/ml2/linuxbridge_agent.ini 里面配置成了veth_neutron ,那么 /etc/neutron/plugins/ml2/linuxbridge_agent.ini 里面的 local_ip 也应该修改为 veth_neutron 的IP，不是吗？（当然不是，错就错在这儿了），于是我就这么改了

因为vxlan-60 的 dev 是brqce4d2a54-44 ，数据只有先到达brqce4d2a54-44 才可能进入vxlan-60 ；

我发现，进入的vxlan数据包首先是要走到bond0的，然后走到br0，我期望能通过veth_br0 到 veth_neutron ，再到 brqce4d2a54-44 ，然后再到vxlan-60，然而，实际情况是，vxlan数据包根本不进入veth_br0；我猜测，很可能内核已经在分析vxlan数据包了，该给谁给谁，没人要就丢掉呗；然而，vxlan-60是base在brqce4d2a54-44 上的，没有base在br0上；如果让vxlan-60 base在br0上是不是就可以了呢？

关于vxlan-60 base在哪个设备是通过/etc/neutron/plugins/ml2/linuxbridge_agent.ini 里面的local_ip 来决定的，和 physical_interface_mappings 是不必相同的。

修改后，重启 neutron-linuxbridge-agent.service , 重新创建vpc，问题解决。

思考：

为什么vxlan的目的IP是brqce4d2a54-44，而实际却不能真正落到brqce4d2a54-44 上？

vxlan实践

vxlan 也能在单台机器上演示：

网络拓扑：

执行命令：

创建两个网络名字空间：（相当于做了两个虚拟机）

ip netns add 192.168.1.1 ip netns add 192.168.1.2

1
2

ip netns add 192.168.1.1
ip netns add 192.168.1.2
创建一个设备对儿：

ip link add veth10 type veth peer name veth01 # 启动 ip link set dev veth10 up ip link set dev veth01 up

1
2
3
4

ip link add veth10 type veth peer name veth01
# 启动
ip link set dev veth10 up
ip link set dev veth01 up
将设备对分别添加到两个名字空间：

ip link set dev veth01 netns 192.168.1.1 ip link set dev veth10 netns 192.168.1.2

1
2

ip link set dev veth01 netns 192.168.1.1
ip link set dev veth10 netns 192.168.1.2

进入名字空间进行配置：

ip netns exec 192.168.1.1 bash
#配置IP
ip a add dev veth01 192.168.1.1/24

#添加网桥
brctl addbr br01
ip link set br01 up

#添加一个设备对，一端配置IP，代表一个虚拟机，一端添加到网桥
ip link add veth-vm type veth peer name vm01
ip link set dev veth-vm up
ip link set dev vm01 up
brctl addif br01 veth-vm

ip a add dev vm01 172.16.10.1/24

#添加一个vxlan
ip link add vxlan01 type vxlan id 10 dev veth01
ip link set dev vxlan01 up

#将vxlan添加到交换机br01上
brctl addif br01 vxlan01

ip netns exec 192.168.1.1 bash

#配置IP

ip a add dev veth01 192.168.1.1/24

#添加网桥

brctl addbr br01

ip link set br01 up

#添加一个设备对，一端配置IP，代表一个虚拟机，一端添加到网桥

ip link add veth-vm type veth peer name vm01

ip link set dev veth-vm up

ip link set dev vm01 up

brctl addif br01 veth-vm

ip a add dev vm01 172.16.10.1/24

#添加一个vxlan

ip link add vxlan01 type vxlan id 10 dev veth01

ip link set dev vxlan01 up

#将vxlan添加到交换机br01上

brctl addif br01 vxlan01

进入另外一个名字空间进行类似配置：

ip netns exec 192.168.1.2 bash
#配置IP
ip a add dev veth10 192.168.1.2/24

#添加网桥
brctl addbr br01
ip link set br01 up

#添加一个设备对，一端配置IP，代表一个虚拟机，一端添加到网桥
ip link add veth-vm type veth peer name vm02
ip link set dev veth-vm up
ip link set dev vm02 up
brctl addif br01 veth-vm

ip a add dev vm02 172.16.10.2/24

#添加一个vxlan
ip link add vxlan01 type vxlan id 10 dev veth01
ip link set dev vxlan01 up

#将vxlan添加到交换机br01上
brctl addif br01 vxlan01

ip netns exec 192.168.1.2 bash

#配置IP

ip a add dev veth10 192.168.1.2/24

#添加网桥

brctl addbr br01

ip link set br01 up

#添加一个设备对，一端配置IP，代表一个虚拟机，一端添加到网桥

ip link add veth-vm type veth peer name vm02

ip link set dev veth-vm up

ip link set dev vm02 up

brctl addif br01 veth-vm

ip a add dev vm02 172.16.10.2/24

#添加一个vxlan

ip link add vxlan01 type vxlan id 10 dev veth01

ip link set dev vxlan01 up

#将vxlan添加到交换机br01上

brctl addif br01 vxlan01

通过ip link show veth01 和ip link show veth10，查到：
veth01 Mac： fe:ea:f2:aa:1f:fc
veth0 Mac： ee:b4:c4:3f:b3:2b

添加转发表：

ip netns exec 192.168.1.1 bash 
bridge fdb add ee:b4:c4:3f:b3:2b dev vxlan01 dst 192.168.1.2 
bridge fdb add 00:00:00:00:00:00 dev vxlan01 dst 192.168.1.2

ip netns exec 192.168.1.2 bash
bridge fdb add fe:ea:f2:aa:1f:fc dev vxlan01 dst 192.168.1.1
bridge fdb add 00:00:00:00:00:00 dev vxlan01 dst 192.168.1.1
#如果需要广播到多个主机，则需要使用bridge fdb append ,如：
#bridge fdb append 00:00:00:00:00:00 dev vxlan01 dst 192.168.1.3

ip netns exec 192.168.1.1 bash

bridge fdb add ee:b4:c4:3f:b3:2b dev vxlan01 dst 192.168.1.2

bridge fdb add 00:00:00:00:00:00 dev vxlan01 dst 192.168.1.2

ip netns exec 192.168.1.2 bash

bridge fdb add fe:ea:f2:aa:1f:fc dev vxlan01 dst 192.168.1.1

bridge fdb add 00:00:00:00:00:00 dev vxlan01 dst 192.168.1.1

#如果需要广播到多个主机，则需要使用bridge fdb append ,如：

#bridge fdb append 00:00:00:00:00:00 dev vxlan01 dst 192.168.1.3

验证：

ip netns exec 192.168.1.1 bash
# ping 172.16.10.2
PING 172.16.10.2 (172.16.10.2) 56(84) bytes of data.
64 bytes from 172.16.10.2: icmp_seq=1 ttl=64 time=0.236 ms
...

CTRL+D

ip netns exec 192.168.1.2 bash
# ping 172.16.10.1
PING 172.16.10.2 (172.16.10.1) 56(84) bytes of data.
64 bytes from 172.16.10.1: icmp_seq=1 ttl=64 time=0.236 ms
...

ip netns exec 192.168.1.1 bash

# ping 172.16.10.2

PING 172.16.10.2 (172.16.10.2) 56(84) bytes of data.

64 bytes from 172.16.10.2: icmp_seq=1 ttl=64 time=0.236 ms

...

CTRL+D

ip netns exec 192.168.1.2 bash

# ping 172.16.10.1

PING 172.16.10.2 (172.16.10.1) 56(84) bytes of data.

64 bytes from 172.16.10.1: icmp_seq=1 ttl=64 time=0.236 ms

...

注意：我们在各自的名字空间内ping不到自己的IP，因为我们没有启动lo，启动lo就可以ping到自己了

参考： http://tech.mytrix.me/2017/04/vxlan-overlay-in-linux-bridge/

iproute 之 ifstat

iftop 只能看单个网卡的流量情况，ifstat就可以同时看多个网卡的流量情况：

因为，ifstat不会自己交互式刷新，所以，可以借助watch来模拟

ifstat 总是把每次执行的结果存放到历史文件中，下次执行会参考历史文件取差值

watch + ssh 远程top

通过ssh远程top一次ceph用户的所有进程：

ssh ceph-14 top -u ceph -b -n 1

1	ssh ceph-14 top -u ceph -b -n 1

如果想查看多次（3次）：

ssh ceph-14 top -u ceph -b -n 3

1	ssh ceph-14 top -u ceph -b -n 3

如果想以本机的top效果查看，则可以：

watch ssh ceph-14 top -u ceph -b -n 1

1	watch ssh ceph-14 top -u ceph -b -n 1

其实，根本不用watch ：

ssh -t ceph-14 top -u ceph

1	ssh -t ceph-14 top -u ceph

但是如果想在一个机器上查看2个机器的top呢？

watch "ssh ceph-14 top -u ceph -b -n 1 ; ssh ceph-4 top -u ceph -b -n 1"

1	watch "ssh ceph-14 top -u ceph -b -n 1 ; ssh ceph-4 top -u ceph -b -n 1"

如果如果想在一个机器上查看4个机器的top呢？写4遍ssh好繁琐：

watch 'for h in ceph-4 ceph-14; do echo -e "$h\n"; ssh $h top -u ceph -b -n 1; printf "=%.0s" {1..80}; echo;done '

1	watch 'for h in ceph-4 ceph-14; do echo -e "$h\n"; ssh $h top -u ceph -b -n 1; printf "=%.0s" {1..80}; echo;done '

分割线的打印，参考： https://stackoverflow.com/questions/5349718/how-can-i-repeat-a-character-in-bash

cloud-init 源码阅读

网络初始化：

/usr/lib/python2.7/site-packages/cloudinit/stages.py：

    def apply_network_config(self, bring_up):
        netcfg, src = self._find_networking_config()
        if netcfg is None:
            LOG.info("network config is disabled by %s", src)
            return

        try:
            LOG.debug("applying net config names for %s" % netcfg)
            self.distro.apply_network_config_names(netcfg)
        except Exception as e:
            LOG.warn("Failed to rename devices: %s", e)

        if (self.datasource is not NULL_DATA_SOURCE and
                not self.is_new_instance()):
            LOG.debug("not a new instance. network config is not applied.")
            return

        LOG.info("Applying network configuration from %s bringup=%s: %s",
                 src, bring_up, netcfg)
        try:
            return self.distro.apply_network_config(netcfg, bring_up=bring_up)
        except NotImplementedError:
            LOG.warn("distro '%s' does not implement apply_network_config. "
                     "networking may not be configured properly." %
                     self.distro)
            return

def apply_network_config(self, bring_up):

netcfg, src = self._find_networking_config()

if netcfg is None:

LOG.info("network config is disabled by %s", src)

return

try:

LOG.debug("applying net config names for %s" % netcfg)

self.distro.apply_network_config_names(netcfg)

except Exception as e:

LOG.warn("Failed to rename devices: %s", e)

if (self.datasource is not NULL_DATA_SOURCE and

not self.is_new_instance()):

LOG.debug("not a new instance. network config is not applied.")

return

LOG.info("Applying network configuration from %s bringup=%s: %s",

src, bring_up, netcfg)

try:

return self.distro.apply_network_config(netcfg, bring_up=bring_up)

except NotImplementedError:

LOG.warn("distro '%s' does not implement apply_network_config. "

"networking may not be configured properly." %

self.distro)

return

分析：

每次都查找网络配置
如果找到配置，则 self.distro.apply_network_config_names(netcfg)
1. 对于新机器，则 self.distro.apply_network_config(netcfg, bring_up=bring_up) ，包括
  1. 写 ifcfg-eth0 配置文件
  2. if bring_up ,则，启动该网络设备

/usr/lib/python2.7/site-packages/cloudinit/stages.py：

    def _find_networking_config(self):
        disable_file = os.path.join(
            self.paths.get_cpath('data'), 'upgraded-network')
        if os.path.exists(disable_file):
            return (None, disable_file)

        cmdline_cfg = ('cmdline', cmdline.read_kernel_cmdline_config())
        dscfg = ('ds', None)
        if self.datasource and hasattr(self.datasource, 'network_config'):
            dscfg = ('ds', self.datasource.network_config)
        sys_cfg = ('system_cfg', self.cfg.get('network'))

        for loc, ncfg in (cmdline_cfg, sys_cfg, dscfg):
            if net.is_disabled_cfg(ncfg):
                LOG.debug("network config disabled by %s", loc)
                return (None, loc)
            if ncfg:
                return (ncfg, loc)
        return (net.generate_fallback_config(), "fallback")

def _find_networking_config(self):

disable_file = os.path.join(

self.paths.get_cpath('data'), 'upgraded-network')

if os.path.exists(disable_file):

return (None, disable_file)

cmdline_cfg = ('cmdline', cmdline.read_kernel_cmdline_config())

dscfg = ('ds', None)

if self.datasource and hasattr(self.datasource, 'network_config'):

dscfg = ('ds', self.datasource.network_config)

sys_cfg = ('system_cfg', self.cfg.get('network'))

for loc, ncfg in (cmdline_cfg, sys_cfg, dscfg):

if net.is_disabled_cfg(ncfg):

LOG.debug("network config disabled by %s", loc)

return (None, loc)

if ncfg:

return (ncfg, loc)

return (net.generate_fallback_config(), "fallback")

分析：

先尝试从三个不同的地方获取网络配置（注意，这里是有优先级的）：cmdline、system_cfg、datasource中的network_config；只要其中一个地方明确禁用网络或存在配置则返回；

如果没有找到任何配置，则进入预定义的配置逻辑net/__init__.py: net.generate_fallback_config() ：

def generate_fallback_config():
    """Determine which attached net dev is most likely to have a connection and
       generate network state to run dhcp on that interface"""
    # get list of interfaces that could have connections
    invalid_interfaces = set(['lo'])
    potential_interfaces = set(get_devicelist())
    potential_interfaces = potential_interfaces.difference(invalid_interfaces)
    # sort into interfaces with carrier, interfaces which could have carrier,
    # and ignore interfaces that are definitely disconnected
    connected = []
    possibly_connected = []
    for interface in potential_interfaces:
        if interface.startswith("veth"):
            continue
        if os.path.exists(sys_dev_path(interface, "bridge")):
            # skip any bridges
            continue
        carrier = read_sys_net_int(interface, 'carrier')
        if carrier:
            connected.append(interface)
            continue
        # check if nic is dormant or down, as this may make a nick appear to
        # not have a carrier even though it could acquire one when brought
        # online by dhclient
        dormant = read_sys_net_int(interface, 'dormant')
        if dormant:
            possibly_connected.append(interface)
            continue
        operstate = read_sys_net_safe(interface, 'operstate')
        if operstate in ['dormant', 'down', 'lowerlayerdown', 'unknown']:
            possibly_connected.append(interface)
            continue

    # don't bother with interfaces that might not be connected if there are
    # some that definitely are
    if connected:
        potential_interfaces = connected
    else:
        potential_interfaces = possibly_connected

    # if eth0 exists use it above anything else, otherwise get the interface
    # that we can read 'first' (using the sorted defintion of first).
    names = list(sorted(potential_interfaces))
    if DEFAULT_PRIMARY_INTERFACE in names:
        names.remove(DEFAULT_PRIMARY_INTERFACE)
        names.insert(0, DEFAULT_PRIMARY_INTERFACE)
    target_name = None
    target_mac = None
    for name in names:
        mac = read_sys_net_safe(name, 'address')
        if mac:
            target_name = name
            target_mac = mac
            break
    if target_mac and target_name:
        nconf = {'config': [], 'version': 1}
        nconf['config'].append(
            {'type': 'physical', 'name': target_name,
             'mac_address': target_mac, 'subnets': [{'type': 'dhcp'}]})
        return nconf
    else:
        # can't read any interfaces addresses (or there are none); give up
        return None

def generate_fallback_config():

"""Determine which attached net dev is most likely to have a connection and

generate network state to run dhcp on that interface"""

# get list of interfaces that could have connections

invalid_interfaces = set(['lo'])

potential_interfaces = set(get_devicelist())

potential_interfaces = potential_interfaces.difference(invalid_interfaces)

# sort into interfaces with carrier, interfaces which could have carrier,

# and ignore interfaces that are definitely disconnected

connected = []

possibly_connected = []

for interface in potential_interfaces:

if interface.startswith("veth"):

continue

if os.path.exists(sys_dev_path(interface, "bridge")):

# skip any bridges

continue

carrier = read_sys_net_int(interface, 'carrier')

if carrier:

connected.append(interface)

continue

# check if nic is dormant or down, as this may make a nick appear to

# not have a carrier even though it could acquire one when brought

# online by dhclient

dormant = read_sys_net_int(interface, 'dormant')

if dormant:

possibly_connected.append(interface)

continue

operstate = read_sys_net_safe(interface, 'operstate')

if operstate in ['dormant', 'down', 'lowerlayerdown', 'unknown']:

possibly_connected.append(interface)

continue

# don't bother with interfaces that might not be connected if there are

# some that definitely are

if connected:

potential_interfaces = connected

else:

potential_interfaces = possibly_connected

# if eth0 exists use it above anything else, otherwise get the interface

# that we can read 'first' (using the sorted defintion of first).

names = list(sorted(potential_interfaces))

if DEFAULT_PRIMARY_INTERFACE in names:

names.remove(DEFAULT_PRIMARY_INTERFACE)

names.insert(0, DEFAULT_PRIMARY_INTERFACE)

target_name = None

target_mac = None

for name in names:

mac = read_sys_net_safe(name, 'address')

if mac:

target_name = name

target_mac = mac

break

if target_mac and target_name:

nconf = {'config': [], 'version': 1}

nconf['config'].append(

{'type': 'physical', 'name': target_name,

'mac_address': target_mac, 'subnets': [{'type': 'dhcp'}]})

return nconf

else:

# can't read any interfaces addresses (or there are none); give up

return None

分析：

lo 排除在外
list目录 SYS_CLASS_NET = “/sys/class/net/” ，可以找到多个网络设备，如：

分析现有的网络设备，看看哪个最适合连网
“””Determine which attached net dev is most likely to have a connection and
generate network state to run dhcp on that interface”””

如果是veth开头的，认为是设备对，不考虑
如果存在 /sys/class/net/{$name}/bridge 则认为是（肯定是）网桥，也不考虑

如果是插着线(carrier)的，则加入可以考虑的接口列表，如何判断是否插着线呢？
参考：

def read_sys_net_int(iface, field):
    val = read_sys_net_safe(iface, field)
    if val is False:
        return None
    try:
        return int(val)
    except TypeError:
        return None

def read_sys_net_int(iface, field):

val = read_sys_net_safe(iface, field)

if val is False:

return None

try:

return int(val)

except TypeError:

return None

简单说就是： cat /sys/class/net/{$name}/carrier 如果结果是整数，则是插着线呢，否则就没插线，如下：

注意：
插着线的意思是，线的两端都是加了电的网络设备，即：数据链路层是UP的;
有些网络设备的该文件是不能cat的，如：

如果cat /sys/class/net/{$name}/dormant 是一个大于0的整型值，也可以考虑
参考文件 /sys/class/net/{$name}/operstate ，这里记录了设备的状态，如果状态是： ‘dormant’, ‘down’, ‘lowerlayerdown’, ‘unknown’，也可以考虑
最后，对可以考虑的列表进行排序；排在前面的优先考虑；不过还有个例外，程序里面定义了一个默认的网络设备 DEFAULT_PRIMARY_INTERFACE，就是 eth0，排序后被特意添加到了列表的最前面；不出意外的话，后续胜出的基本就是eth0了

根据上面得到的列表，查找设备的mac地址，只要有一个设备有mac地址，该设备就胜出了，后面的就没戏了

最后，返回网络配置

    if target_mac and target_name:
        nconf = {'config': [], 'version': 1}
        nconf['config'].append(
            {'type': 'physical', 'name': target_name,
             'mac_address': target_mac, 'subnets': [{'type': 'dhcp'}]})
        return nconf

if target_mac and target_name:

nconf = {'config': [], 'version': 1}

nconf['config'].append(

{'type': 'physical', 'name': target_name,

'mac_address': target_mac, 'subnets': [{'type': 'dhcp'}]})

return nconf

总结：

分析可知，如果不对网络进行特殊配置的话，cloud-init只能帮我们配置一个网卡；一般来讲，大部分需求已经满足了。

我们如果看 cloud-init （ /var/log/cloud-init.log ）的日志的话，会发现，在多个网卡的时候，虽然其他网卡的信息也被read了，但是最终却没有得到和eth0相同的待遇，现在也就真相大白了

堆栈：

关于cmdline的获取方法：（util.py)

容器的话，cat /proc/1/cmdline
非容器的话， cat /proc/cmdline

结果：

BOOT_IMAGE=/boot/vmlinuz-4.10.10-1.el7.elrepo.x86_64 root=UUID=0356e691-d6fb-4f8b-a905-4230dbe62a32 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8

1	BOOT_IMAGE=/boot/vmlinuz-4.10.10-1.el7.elrepo.x86_64 root=UUID=0356e691-d6fb-4f8b-a905-4230dbe62a32 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8

变化的mac地址

一个bridge诞生时，会有一个mac地址；当向bridge上addif veth2时，bridge的mac地址就跟随了veth2的mac地址，难道br0就不能固定一个mac地址吗？

可以的：

默认情况下，bridge总是跟随port中mac地址最小的那个port的mac地址，如果不想让mac地址总是变化，则可以设置bridge的首选mac地址，方法就是显式地给bridge设置mac地址：

ip link set dev br0 address fa:11:11:11:11:11

1	ip link set dev br0 address fa:11:11:11:11:11

参考：

http://blog.csdn.net/fanwenbo/article/details/2131193