【Docker】从命名空间和路由角度探究Docker的bridge网络

news/2024/6/3 17:50:47 标签: docker, 网络, 容器, 命名空间, 桥接

桥接网络是Docker的默认网络模式。在桥接网络中,Docker会为每个容器创建一个虚拟网络接口,并为容器分配一个IP地址。容器可以通过桥接网络与主机和其他容器进行通信,也能暴露端口供外部访问。

容器之间的通信原理

首先我们创建两个容器

$ docker container run -d --rm --name box1 busybox /bin/sh -c "while true; do sleep 3600; done"
e6e89f95de12eeda726fed5f4f909d32be2ea13c3cecb350acd86bc13394b769

$ docker container run -d --rm --name box2 busybox /bin/sh -c "while true; do sleep 3600; done"
c0c1a152155bcf66bed71fdc51e558f4c3b1c3632866c61a69303a4da10c2f54

$ docker container ls
CONTAINER ID   IMAGE     COMMAND                  CREATED          STATUS          PORTS     NAMES
c0c1a152155b   busybox   "/bin/sh -c 'while t…"   31 seconds ago   Up 30 seconds             box2
e6e89f95de12   busybox   "/bin/sh -c 'while t…"   41 seconds ago   Up 40 seconds             box1

然后我们在容器box1中尝试ping通容器box2:

$ docker container exec -it box2 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
21: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

$ docker container exec -it box1 ping 172.17.0.3 -c 3
PING 172.17.0.3 (172.17.0.3): 56 data bytes
64 bytes from 172.17.0.3: seq=0 ttl=64 time=0.886 ms
64 bytes from 172.17.0.3: seq=1 ttl=64 time=0.049 ms
64 bytes from 172.17.0.3: seq=2 ttl=64 time=0.106 ms

--- 172.17.0.3 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.049/0.347/0.886 ms

为什么在box1中能ping通box2呢?容器之间是怎么通讯的呢?

Docker是使用namespace实现网络,计算等资源的隔离,但是为什么使用ip netns命令却无法在主机上看到任何network namespace呢?

这是因为Docker默认把创建的网络命名空间链接文件隐藏起来了,导致ip netns命令无法读取,给分析网络原理和排查问题带来了麻烦。

下面是恢复netns命名空间的办法。

执行下面的命令来获取容器进程号:

$ docker inspect box1 | grep Pid
            "Pid": 43568,
            "PidMode": "",
            "PidsLimit": null,

$ docker inspect box2 | grep Pid
            "Pid": 43640,
            "PidMode": "",
            "PidsLimit": null,

执行如下命令,将进程网络命名空间恢复到主机目录:

$ ln -s /proc/43568/ns/net /var/run/netns/box1

$ ln -s /proc/43640/ns/net /var/run/netns/box2

如果/var/run/netns目录不存在,以root用户手动创建目录即可。

然后执行ip netns命令即可看到容器网络命名空间

$ ip netns list
box2 (id: 3)
box1 (id: 2)

查看网络命名空间box1和box2的IP地址:

$ ip netns exec box1 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
19: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

$ ip netns exec box2 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
21: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

发现网络命名空间box1的IP为172.17.0.2网络命名空间box2的IP为172.17.0.3,要想实现两个相同网段的网络命名空间的通信,需要借助bridge。

Docker默认会创建一个名为docker0的bridge:

$ ip link show type bridge
9: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:53:1d:f7:5f brd ff:ff:ff:ff:ff:ff

然后查看一下docker0的veth网口:

$ brctl show docker0
bridge name     bridge id               STP enabled     interfaces
docker0         8000.0242531df75f       no              vetha7d1dd5
                                                        vethadaa66f

docker0有两个veth网口:vetha7d1dd5、vethadaa66f

再来主机上看下veth网口:

$ ip link show type veth
20: vethadaa66f@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether 52:4c:41:8c:91:01 brd ff:ff:ff:ff:ff:ff link-netns box1
22: vetha7d1dd5@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether 8a:e9:19:ce:72:cb brd ff:ff:ff:ff:ff:ff link-netns box2

我们可以看到网络命名空间box1通过veth paireth0(if19)-vethadaa66f(if20)连接bridge0网络命名空间box2通过veth paireth0(if21)-vetha7d1dd5(if22)连接bridge0,这样网络命名空间box1和网络命名空间box2就能进行通讯了。

来个网络拓扑图:

容器访问外部网络原理

单靠网络命名空间+bridge只能实现网络命名空间之前的通讯,容器想要访问外部网络还需要借助iptables实现SNAT。

在box1中ping百度:

$ docker exec -it box1 ping www.baidu.com -c 3
PING www.baidu.com (14.119.104.189): 56 data bytes
64 bytes from 14.119.104.189: seq=0 ttl=51 time=9.908 ms
64 bytes from 14.119.104.189: seq=1 ttl=51 time=14.939 ms
64 bytes from 14.119.104.189: seq=2 ttl=51 time=11.023 ms

--- www.baidu.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 9.908/11.956/14.939 ms

查看iptables的规则:

$ iptables -nvxL -t nat
Chain PREROUTING (policy ACCEPT 20 packets, 3083 bytes)
    pkts      bytes target     prot opt in     out     source               destination
       0        0 DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 1 packets, 229 bytes)
    pkts      bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 2 packets, 137 bytes)
    pkts      bytes target     prot opt in     out     source               destination
       0        0 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 2 packets, 137 bytes)
    pkts      bytes target     prot opt in     out     source               destination
       6      300 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0

Chain DOCKER (2 references)
    pkts      bytes target     prot opt in     out     source               destination
       0        0 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0

发现nat表的POSTROUTING链中有一条规则是对源地址为172.17.0.0/16的网段进行SNAT转换,这样就可以跟外部网络进行通讯了。

我们清空iptables的所有规则:

$ iptables -t filter -F
$ iptables -t filter -X
$ iptables -t filter -Z
$ iptables -t nat -F
$ iptables -t nat -X
$ iptables -t nat -Z

再次查看所有的规则,发现规则和自定义链已经清空了:

$ iptables -t filter -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

$ iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

再次尝试访问百度,无法访问:

$ docker exec -it box1 ping www.baidu.com -c 3
ping: bad address 'www.baidu.com'

我们手动用iptables增加一条nat规则:

$ iptables -t nat -A POSTROUTING -s 172.17.0.0/16 -j MASQUERADE

再次访问百度,发现已经可以通讯了:

$ docker exec -it box1 ping www.baidu.com -c 3
PING www.baidu.com (14.119.104.189): 56 data bytes
64 bytes from 14.119.104.189: seq=0 ttl=51 time=16.015 ms
64 bytes from 14.119.104.189: seq=1 ttl=51 time=9.960 ms
64 bytes from 14.119.104.189: seq=2 ttl=51 time=9.247 ms

--- www.baidu.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 9.247/11.740/16.015 ms

有时filter表的FORWARD链的默认执行策略是DROP,我们需要手动将这个默认执行策略改为ACCEPT才能通讯,使用如下命令:

$ iptables -P FORWARD ACCEPT

现在因为我们暴力执行iptables -F导致docker的规则全清,想还原Docker的默认规则该怎么办呢?使用如下命令重启Docker即可:

$ service docker restart

当然不嫌麻烦,也可以手动一条一条将规则添加上。

端口转发原理

容器创建时可以使用-p参数指定将主机的端口映射到容器的端口,从而实现将访问主机端口的请求转发到容器内部。

首先创建一个nginx的web容器,并指定将主机的端口8080映射到容器的80端口:

$ docker container run -d --rm --name web -p 8080:80 nginx
441c77091abfeb9498d4fd21d62594d75363fb42338c4ec51a42b6f01d80e418

访问主机的8080端口,发现成功请求到容器内部:

$ curl localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

这种端口转发是怎么实现的呢?还是通过我们的老朋友iptables实现的。这里使用的是iptables实现DNAT。

查询iptables的规则:

$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0
    0     0 MASQUERADE  tcp  --  *      *       172.17.0.2           172.17.0.2           tcp dpt:80

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0
    0     0 DNAT       tcp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 to:172.17.0.2:80

我们可以发现在nat表的POSTROUTING链增加了如下规则,主要用于web容器可以访问外部网络

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  tcp  --  *      *       172.17.0.2           172.17.0.2           tcp dpt:80

还在DOCKER链(被PREROUTING引用)中增加了如下规则,用于将主机8080端口的请求转发到172.17.0.2:80

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DNAT       tcp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 to:172.17.0.2:80

下面我们启动一个容器时尝试不指定-p参数配置端口转发,手动通过iptables配置规则实现端口转发。

启动一个nginx镜像的web容器,不指定端口转发:

$ docker container run -d --rm --name web nginx

此时查看iptables的规则,发现除了docker的基础规则,并未添加新的转发规则:

$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0

此时访问主机的8080端口也是不通的:

$ curl 172.19.85.122:8080
curl: (7) Failed to connect to 172.19.85.122 port 8080: Connection refused

添加DNAT规则:

$ iptables -t nat -I DOCKER ! -i docker0 -p tcp --dport 8080 -j DNAT --to-destination 172.17.0.2:80

$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    2   120 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DNAT       tcp  --  !docker0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 to:172.17.0.2:80
    0     0 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0

此时可以通过主机的8080端口访问到web容器了:

$ curl 172.19.85.122:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

http://www.niftyadmin.cn/n/5139792.html

相关文章

NLP之LSTM原理剖析

文章目录 背景simpleRNN的局限性 LSTM手写一下sigmoid例子支持长记忆的神经网络解读3重门 背景 SimpleRNN有一定局限性&#xff0c; 图片上的文字内容: 图片标题提到“SimpleRNN是一种基础模型。它用于解决序列型问题&#xff0c;其中的每一步的输出会影响到下一步的结果。图…

项目管理-挣值管理例题-使用SV进度偏差和CV成本偏差来判断进度和成本是否合适

基础概念介绍 CV和SV的计算公式 在财务分析中&#xff0c;常常会用到CV和SV这两个指标。CV是成本偏差&#xff0c;SV是进度偏差。它们的计算公式如下&#xff1a; CV EV - AC SV EV - PV 其中&#xff0c;EV是挣值&#xff0c;AC是实际成本&#xff0c;PV是计划价值。 …

IDEA初步入门

1 安装 现在的系统更迭很快&#xff0c;很多软件都只支持win10 和 11了&#xff0c;但我们过时党还在用win7. 所以就必须找到合适的版本。在windows 7 64位系统下&#xff0c;可以使用IDEA 2020.1.4版本。 在Jetbrain官方下&#xff0c;找到历史版本&#xff0c;找到windows版…

写在2023末,很庆幸自己入了软件测试这行...

为什么会学习软件测试&#xff1f; 已经28岁了&#xff0c;算一下快过去3年了&#xff0c;刚毕业那会工作了一年&#xff0c;因为自己当时很迷茫&#xff08;觉得自己挺废的&#xff09;&#xff0c;所以就没去工作就一直在家&#xff0c;家里固定每个月给点生活费&#xff0c…

上海亚商投顾:三大指数小幅调整,医药股继续活跃

上海亚商投顾前言&#xff1a;无惧大盘涨跌&#xff0c;解密龙虎榜资金&#xff0c;跟踪一线游资和机构资金动向&#xff0c;识别短期热点和强势个股。 市场情绪 沪指昨日弱势震荡&#xff0c;尾盘探底回升一度翻红&#xff0c;深成指盘中跌超1%&#xff0c;午后跌幅有所收窄。…

excel从身份证号截取出生年月日

1、身份证号第7-14位&#xff0c;代表出生日期 2、用鼠标选中出生日期要存入的单元格&#xff0c;在编辑栏&#xff0c;输入函数公式&#xff1a;TEXT(MID(B3,7,8),"00-00-00")&#xff0c;然后按【CtrlEnter】结束确认&#xff0c;即可提取出身份证中的出生日期&am…

shell脚本学习-2

文章目录 一、shell参数传递二、shell中的特殊变量三、shell中的函数四、shell函数中的参数 一、shell参数传递 运行 Shell 脚本文件时我们可以给它传递一些参数&#xff0c;这些参数在脚本文件内部可以使用$n的形式来接收&#xff0c;例如&#xff0c;$1 表示第一个参数&…

神经网络的解释方法之CAM、Grad-CAM、Grad-CAM++、LayerCAM

原理优点缺点GAP将多维特征映射降维为一个固定长度的特征向量①减少了模型的参数量&#xff1b;②保留更多的空间位置信息&#xff1b;③可并行计算&#xff0c;计算效率高&#xff1b;④具有一定程度的不变性①可能导致信息的损失&#xff1b;②忽略不同尺度的空间信息CAM利用…