5月 292017
 

从 help 来看,只要使用了 -p 选项,就可以进入目标进程的pid名字空间,换言之,就可以只看到目标进程所在的名字空间的进程,用法如下:

事实上,

看到的却是nsenter当前所在名字空间(严格来讲,这样描述也不太准确)的所有进程,为什么呢?

因为top参考的是 /proc 文件系统,所以,进入相应的mount空间也很重要,所以,正确的写法为:

 Posted by at 上午 3:25
5月 112017
 

关于cgroupdriver:

docker 默认的cgroupdriver为cgroupfs,也可以手动指定(通过环境变量、命令行参数、daemon.json)systemd,有些docker的rpm包会在 /usr/lib/systemd/system/docker.service 中通过命令行的方式指定cgroupdriver为systemd, 当前支持的cgroupdriver有: cgroupfs、systemd; 命令行设置方式:

虽然创建容器时没有指定cgroupdriver的选项,但是通过修改cgroupdriver重启daemon,可以使得同一个daemon下的容器使用不同的cgroupdriver的(仅仅是出于理解技术实现的思考,实践中似乎没有任何必要)

至于容器使用的native.cgroupdrive 是cgroupfs 还是 systemd 不是配置在容器上的,而是取决于dockerd启动时的配置,如果dockerd配置的是cgroupfs,则容器启动的时候就是使用的cgroupfs,如果dockerd在配置为cgroupfs时已经启动了几个容器,然后修改配置为systemd,则现在启动的容器就是systemd的了,这时候,两种类型的cgroupdriver的容器是并存的,切换cgroupdriver只需要修改dockerd配置后重启容器即可

曾经有文章中疑惑,有的docker容器的cgroup都是在相同的docker目录下的,有的却是docker-xxxx.scope; 原因就在于前者是通过cgroupfs来实现的,后者是通过systemd来实现的; 另外 docker daemon也可以通过选项–cgroup-parent 来指定一个父cgroup(cgroup是有层级结构的),默认为docker,根据cgroupdriver的不同,该cgroup-parent的表现形式也不同,cgroupfs中表现为父目录,systemd中表现为前缀(参考: https://docs.docker.com/engine/reference/commandline/dockerd/#default-cgroup-parent );每个容器都可以有自己的–cgroup-parent,这个对于不同容器进行分组时似乎是不错的。

 

关于runc:

默认runc为 docker-runc,其实runc表现为一个可执行的二进制文件,docker info中显示的只是一个配置时指定的名字,至于该名字对应哪个二进制文件,是在配置的时候指定的,也就是说,你可以在不修改配置的情况下,直接修改对应的二进制文件来修改runc,重启容器就会生效; 另外,创建容器的时候,可以指定runc,指定的是配置daemon时使用的名字,每个容器可以有不同的runc

 

 

 Posted by at 下午 3:40
5月 112017
 

参考: http://blog.csdn.net/theorytree/article/details/6259104

说明: 显示所有支持的调度策略, 方框内的是当前启用的调度策略

查看当前系统支持的调度算法:

难道调度算法和设备本身也有关系?从下面来看,阿里云的云盘不支持任何调度策略:

但是:

值得一提的是,Anticipatory算法从Linux 2.6.33版本后,就被移除了,因为CFQ通过配置也能达到Anticipatory算法的效果。

 

查资料发现, 调度策略为 ‘none’ 的现象和阿里云虚拟机没关系,和阿里云云盘没关系,和操作系统版本也没有(直接)关系,仅仅和内核版本有关系, linux内核从3.13开始引入 blk-mq 队列机制,并在3.16得以全部实现,上面看到的非‘none’的情况,内核版本都在3.10之前,为‘none’的情况是被我手动升级内核到4.4.61 的

如何验证是否启用了blk-mq机制?可以通过查看是否存在mq目录,如下:

目录不存在,说明没有启用该机制

存在mq目录,说明使用的是blk-mq机制

参考: https://www.thomas-krenn.com/en/wiki/Linux_Multi-Queue_Block_IO_Queueing_Mechanism_(blk-mq)

参考: http://www.cnblogs.com/cobbliu/p/5389556.html

 Posted by at 上午 10:29
4月 202017
 

参考: https://github.com/moby/moby/blob/master/CHANGELOG.md

2017年2月8号发布了docker-1.13.1, 下一个版本便是2017年3月1号发布的17.03.0-ce,其中: ce代表社区版,17.03 代表2017年3月,以后docker的版本就按照YY.MM来命名了:

17.03.0-ce (2017-03-01)

IMPORTANT: Starting with this release, Docker is on a monthly release cycle and uses a new YY.MM versioning scheme to reflect this. Two channels are available: monthly and quarterly. Any given monthly release will only receive security and bugfixes until the next monthly release is available. Quarterly releases receive security and bugfixes for 4 months after initial release. This release includes bugfixes for 1.13.1 but there are no major feature additions and the API version stays the same. Upgrading from Docker 1.13.1 to 17.03.0 is expected to be simple and low-risk.

虽然此次命名方式发生了变化,和上个版本相比,变更的内容并不多,都是小修小改

 Posted by at 下午 12:31
4月 122017
 

docker swarm 居然和–live-restore 不兼容,真恶心

 

 

参考:

http://www.jianshu.com/p/9eb9995884a5

 Posted by at 下午 6:12
4月 122017
 

摘自: https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-storage-driver-option

The Docker daemon has support for several different image layer storage drivers: aufs, devicemapper, btrfs, zfs, overlay and overlay2.

The aufs driver is the oldest, but is based on a Linux kernel patch-set that is unlikely to be merged into the main kernel. These are also known to cause some serious kernel crashes. However, aufs allows containers to share executable and shared library memory, so is a useful choice when running thousands of containers with the same program or libraries.

The devicemapper driver uses thin provisioning and Copy on Write (CoW) snapshots. For each devicemapper graph location – typically/var/lib/docker/devicemapper – a thin pool is created based on two block devices, one for data and one for metadata. By default, these block devices are created automatically by using loopback mounts of automatically created sparse files. Refer to Storage driver optionsbelow for a way how to customize this setup. ~jpetazzo/Resizing Docker containers with the Device Mapper plugin article explains how to tune your existing setup without the use of options.

The btrfs driver is very fast for docker buildbut like devicemapper does not share executable memory between devices. Usedockerd -s btrfs -g /mnt/btrfs_partition.

The zfs driver is probably not as fast as btrfs but has a longer track record on stability. Thanks to Single Copy ARC shared blocks between clones will be cached only once. Use dockerd -s zfs. To select a different zfs filesystem set zfs.fsname option as described in Storage driver options.

The overlay is a very fast union filesystem. It is now merged in the main Linux kernel as of 3.18.0. overlay also supports page cache sharing, this means multiple containers accessing the same file can share a single page cache entry (or entries), it makes overlay as efficient with memory as aufs driver. Call dockerd -s overlay to use it.

Note: As promising as overlay is, the feature is still quite young and should not be used in production. Most notably, using overlay can cause excessive inode consumption (especially as the number of images grows), as well as being incompatible with the use of RPMs.

The overlay2 uses the same fast union filesystem but takes advantage of additional features added in Linux kernel 4.0 to avoid excessive inode consumption. Call dockerd -s overlay2 to use it.

Note: Both overlay and overlay2 are currently unsupported on btrfs or any Copy on Write filesystem and should only be used over ext4 partitions

 

overlay 实现上只支持2层的文件系统,但是overlay2 支持多层的(docker镜像都是多层的)

 

更多参考:

https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/#overlayfs-and-docker-performance

 Posted by at 下午 5:42
3月 272017
 

缘起:

监控(docker stats)显示容器内存被用完了,进入容器瞅了瞅,没有发现使用内存多的进程,使用awk等工具把容器所有进程使用的内存加起来看看,距离用完还远了去了,何故?

分析:

该不会docker stats计算错误?

进入/sys/fs/cgroup/memory/docker/xxxxx/ 查看memory.usage ,确认计算没有错误

 

我们知道,系统内存会有一部分被buffer、cache之类占用,linux也会把这部分内存算到已使用,对于容器来讲,也应该存在此“问题”,而且非常有可能linux会把某容器引发的cache占用算到容器占用的内存上;验证很简单,进容器dd一个大文件就知道了:

dd 大文件后,docker stat显示已用内存变多

宿主机上: echo 3 > /proc/sys/vm/drop_caches 后,docker stat显示已用内存变少

 

至此,原因查明

问题:

对于宿主机来讲,计算内存占用时,可以拿已用内存减去cache/buffer ,那么对于容器来讲,如果减去容器部分的cache/buffer 呢?如果不减去,也会造成误报警

 

测试发现: dd 产生的文件cache占用的内存会计算到 inactive_file 的头上

参考: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html

 

 Posted by at 下午 5:30