环境:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# docker info Containers: 56 Running: 33 Paused: 0 Stopped: 23 Images: 31 Server Version: 1.12.5 Storage Driver: devicemapper Pool Name: data-docker_thinpool Pool Blocksize: 524.3 kB Base Device Size: 32.21 GB Backing Filesystem: xfs Data file: Metadata file: Data Space Used: 516.2 GB Data Space Total: 644.2 GB Data Space Available: 128.1 GB Metadata Space Used: 45.47 MB Metadata Space Total: 16.98 GB Metadata Space Available: 16.93 GB Thin Pool Minimum Free Space: 64.42 GB Udev Sync Supported: true Deferred Removal Enabled: true Deferred Deletion Enabled: false Deferred Deleted Device Count: 0 Library Version: 1.02.135-RHEL7 (2016-11-16) Logging Driver: journald Cgroup Driver: cgroupfs Plugins: Volume: local Network: host bridge null overlay Swarm: inactive Runtimes: runc docker-runc Default Runtime: runc Security Options: seccomp Kernel Version: 3.10.0-514.6.1.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 Number of Docker Hooks: 2 CPUs: 40 Total Memory: 94.14 GiB Name: VM-2-10-12 ID: NYBE:NZML:4KQQ:PF2J:RXCB:IPPI:Y3BI:CY7E:RVAC:WVWV:VDM2:3EEK Docker Root Dir: /data1/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Insecure Registries: docker-registry.i.bbtfax.com:5000 127.0.0.0/8 Registries: docker.io (secure) |
现象:
1 2 |
# docker exec -it c6176f37c4b6 bash rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:75: starting setns process caused \\\"fork/exec /proc/self/exe: no such file or directory\\\"\"\n" |
乍一看, /proc/self/exe 文件找不见?
一般来讲,文件找不见并不奇怪,怪就怪在是 /proc/self/exe 找不见就不太应该了;
因为docker exec 最终是由libcontainerd进程来出来的,strace跟进发现,是chdir到 /root/data1/docker/devicemapper/mnt/4723e8178992b32b7284aa48c1c62f4011a6b785aca0c54e18d7ce5cc23b22dc/rootfs 时,找不到目标目录导致的,于是我就迅速地看了一下,该目录确实不存在,但是对于正常的能够exec的容器来讲,相应的rootfs目录也是不存在的
思考中。。。
docker玩的就是名字空间和cgroup,所以不能不想到这些;libcontainerd也有自己的(mnt)名字空间,我们进入libcontainerd进程的文件系统就可以查看到上面目录的存在了,而且,正常的容器存在相应的目录,异常的容器不存在相应的目录;
通过mount命令可以发现mount的规律,从容器的config.json (/var/run/docker/libcontainerd/c6176f37c4b67b03d4187edef6d1131cd44ab80bd0f0c20b24a7a20056967652/config.json) 中查看到对应的mount的位置,通过nsenter进入libcontainerd的mnt名字空间手动mount上去就好了,如下:
1 2 |
# nsenter -m -t 3639 bash # mount /dev/mapper/docker-253\:3-3221225568-4723e8178992b32b7284aa48c1c62f4011a6b785aca0c54e18d7ce5cc23b22dc -o rw,relatime,nouuid,attr2,inode64,sunit=512,swidth=1024,noquota -t xfs /data1/docker/devicemapper/mnt/4723e8178992b32b7284aa48c1c62f4011a6b785aca0c54e18d7ce5cc23b22dc |
写个脚本自动修复之:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
#!/bin/bash # author: phpor # LIBCONTAINERD_DIR=/var/run/docker/libcontainerd function main() { local pidOfCotainerd=$(pidof docker-containerd-current) local mountinfo=$(< /proc/$pidOfCotainerd/mountinfo) for config in $LIBCONTAINERD_DIR/*/config.json;do local cid=$(awk -F'/' '{print $6}' <<<$config) local rootpath=$(jq -r .root.path $config|sed 's/\/rootfs$//') grep "$rootpath" <<<$mountinfo >/dev/null if [[ $? -eq 0 ]]; then echo $cid $rootpath OK else echo $cid $rootpath Should repair local device=/dev/mapper/$(docker inspect $cid|jq -r .[0].GraphDriver.Data.DeviceName) nsenter -m -p -t $pidOfCotainerd mount -t xfs -o rw,nouuid,attr2,inode64,sunit=512,swidth=1024,noquota $device $rootpath fi done } main |
那么该mount点是如何丢掉的呢?重启dockerd能否自动修复该问题呢?(应该重启一下容器就行)稍后再研究