一 Pod调度策略
1 概述
在 k8s 中,调度是将 Pod 分配到合适的计算节点上,然后对应节点上的 Kubelet 运行这些 pod
kube-scheduler 是默认调度器,是集群的核心组件
调度器是如何工作的?
调度器通过 k8s的监测 (Watch) 机制来发现集群中尚未被调度到节点上的 Pod。调度器依据调度原则将 Pod 分配到一个合适的节点上运行
2 流程
调度器给一个 pod 做调度选择包含两个步骤:过滤和打分
第一步过滤(筛选)
首先要筛选出满足 Pod 所有的资源请求的节点,这里包含计算资源、内存、存储、网络、端口号等等,如果没有节点能满足 Pod 的需求,Pod 将一直停留在 Pending 状态,直到调度器能够找到合适的节点运行它
第二步打分(优选)
在打分阶段,调度器会根据打分规则,为每一个可调度节点进行打分。选出其中得分最高的节点来运行 Pod。如果存在多个得分最高的节点,调度器会从中随机选取一个
绑定
在确定了某个节点运行 Pod 之后,调度器将这个调度决定l通知给 kube-apiserver,这个过程叫做绑定。
3 Pod 定向调度
基于节点名称的调度
- 在创建 Pod 的过程中,我们可以配置相关的调度规则,从而让 Pod 运行在制定的节点
- nodeName 标签,让 Pod 运行在制定的节点上
案例
编写Pod资源对像文件
使用 nodeName 标签把这个 Pod 调度到 node-0001
[root@master ~]# vim myhttp.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: myhttp
spec:
nodeName: node-0001 # 基于节点名称进行调度
containers:
- name: apache
image: myos:httpd
测试验证
注意:如果标签指定的节点无法运行 Pod,它不会迁移到其他节点,将一直等待下去
[root@master ~]# kubectl apply -f myhttp.yaml
pod/myhttp created
[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
myhttp 1/1 Running 0 3s 10.244.1.6 node-0001
二 标签与 Pod 调度
1 标签概述
标签 (Labels)是附加到 Kubernetes 对象上的键值对
2 标签用途
k8s在创建、删除、修改资源对象的时候可以使用标签来确定要修改的资源对象。在 Pod 调度的任务中,使用标签可以更加灵活的完成调度任务
标签可以在创建时附加到对象,也可以在创建之后随时添加和修改。标签可以用于组织和选择对象的子集。
3 标签管理
-
查询标签
- 语法格式:kubectl get 资源类型 [资源名称] --show-labels
[root@master ~]# kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
myhttp 1/1 Running 0 2m34s <none>
[root@master ~]# kubectl get namespaces --show-labels
NAME STATUS AGE LABELS
default Active 3h44m kubernetes.io/metadata.name=default
kube-node-lease Active 3h44m kubernetes.io/metadata.name=kube-node-lease
kube-public Active 3h44m kubernetes.io/metadata.name=kube-public
kube-system Active 3h44m kubernetes.io/metadata.name=kube-system
[root@master ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master Ready control-plane 3h44m v1.26.0 kubernetes.io/hostname=master
node-0001 Ready <none> 3h38m v1.26.0 kubernetes.io/hostname=node-0001
node-0002 Ready <none> 3h38m v1.26.0 kubernetes.io/hostname=node-0002
node-0003 Ready <none> 3h38m v1.26.0 kubernetes.io/hostname=node-0003
node-0004 Ready <none> 3h38m v1.26.0 kubernetes.io/hostname=node-0004
node-0005 Ready <none> 3h38m v1.26.0 kubernetes.io/hostname=node-0005
-
使用标签过滤
- 语法格式:kubectl get 资源类型 [资源名称] -l <key>=<value>
# 使用标签过滤资源对象
[root@master ~]# kubectl get nodes -l kubernetes.io/hostname=master
NAME STATUS ROLES AGE VERSION
master Ready control-plane 3h38m v1.26.0
-
添加标签
- 语法格式:kubectl label 资源类型 [资源名称] <key>=<value>
[root@master ~]# kubectl label pod myhttp app=apache
pod/myhttp labeled
[root@master ~]# kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
myhttp 1/1 Running 0 14m app=apache
- 删除标签
- 语法格式:kubectl label 资源类型 [资源名称] <key>-
[root@master ~]# kubectl label pod myhttp app-
pod/myhttp labeled
[root@master ~]# kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
myhttp 1/1 Running 0 14m <none>
资源文件设置标签
[root@master ~]# vim myhttp.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: myhttp
labels: # 声明标签
app: apache # 标签键值对
spec:
containers:
- name: apache
image: myos:httpd
# 通过资源文件设置Pod标签
[root@master ~]# kubectl delete pods myhttp
pod "myhttp" deleted
[root@master ~]# kubectl apply -f myhttp.yaml
pod/myhttp created
[root@master ~]# kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
myhttp 1/1 Running 0 14m app=apache
三 Pod 标签调度
1 标签选择运算符
与名称和 UID 不同,标签不支持唯一性。通常,我们希望许多对象携带相同的标签。
通过标签选择算符,客户端/用户可以识别一组对象
标签选择算符可以由多个需求组成。在多个需求的情况下必须满足所有要求,相当于逻辑与 (&&)运算符。
[root@master ~]# kubectl get nodes node-0002 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node-0002 Ready <none> 3h38m v1.26.0 kubernetes.io/hostname=node-0002 ... ...
[root@master ~]# vim myhttp.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: myhttp
labels:
app: apache
spec:
nodeSelector: # 基于节点标签进行调度
kubernetes.io/hostname: node-0002 # 标签
containers:
- name: apache
image: myos:httpd
[root@master ~]# kubectl delete pods myhttp
pod "myhttp" deleted
[root@master ~]# kubectl apply -f myhttp.yaml
pod/myhttp created
[root@master ~]# kubectl get pods -l app=apache -o wide
NAME READY STATUS RESTARTS AGE IP NODE
myhttp 1/1 Running 0 9s 10.244.2.11 node-0002
容器调度案例2)">2 容器调度(案例2)
已知节点中 node-0002 和 node-0003 使用的是 ssd 硬盘
创建 5个 pod:[ web1、web2、web3、web4、web5 ]
这些 pod 必须运行在 ssd 硬盘的节点上
完成实验后,删除实验配置
[root@master ~]# kubectl label nodes node-0002 node-0003 disktype=ssd
node/node-0002 labeled
node/node-0003 labeled
[root@master ~]# vim myhttp.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: myhttp
labels:
app: apache
spec:
nodeSelector:
disktype: ssd
containers:
- name: apache
image: myos:httpd
[root@master ~]# sed "s,myhttp,web1," myhttp.yaml |kubectl apply -f -
[root@master ~]# sed "s,myhttp,web2," myhttp.yaml |kubectl apply -f -
[root@master ~]# sed "s,myhttp,web3," myhttp.yaml |kubectl apply -f -
[root@master ~]# sed "s,myhttp,web4," myhttp.yaml |kubectl apply -f -
[root@master ~]# sed "s,myhttp,web5," myhttp.yaml |kubectl apply -f -
[root@master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
myhttp 1/1 Running 0 29m 10.244.2.30 node-0002
web1 1/1 Running 0 10s 10.244.2.31 node-0002
web2 1/1 Running 0 10s 10.244.2.32 node-0002
web3 1/1 Running 0 10s 10.244.3.45 node-0003
web4 1/1 Running 0 10s 10.244.3.46 node-0003
web5 1/1 Running 0 10s 10.244.3.47 node-0003
清理实验配置
[root@master ~]# kubectl delete pod -l app=apache
pod "myhttp" deleted
pod "web1" deleted
pod "web2" deleted
pod "web3" deleted
pod "web4" deleted
pod "web5" deleted
[root@master ~]# kubectl label nodes node-0002 node-0003 disktype-
node/node-0002 labeled
node/node-0003 labeled
四 Pod资源配额
1 概述
当多个应用共享固定节点数目的集群时,人们担心某些应用无法获得足够的资源,从而影响到其正常运行,我们需要设定一些规则,用来保证应用能获得其运行所需资源
2 资源类型
- CPU资源类型
- CPU 资源的约束和请求以毫核 (m)为单位。在 k8s 中1m是最小的调度单元,CPU 的一个核心可以看作1000m
- 如果你有2颗CPU,且每 CPU为4核心,那么你的 CPU资源总量就是8000m
- 内存资源类型
- memory 的约束和请求以字节为单位
- 你可以使用以下单位来表示内存:E、P、T、G、M、k
- 你也可以使用对应的 2的幂数:Ei、Pi、Ti、Gi、Mi、Ki
- 例如,以下表达式所代表的是相同的值:
1k == 1000
1Ki == 1024
3 创建资源对象文件
[root@master ~]# vim minpod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: minpod
spec:
terminationGracePeriodSeconds: 0
containers:
- name: linux
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
4 内存资源配额
[root@master ~]# vim minpod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: minpod
spec:
terminationGracePeriodSeconds: 0
nodeSelector: # 配置 Pod 调度节点
kubernetes.io/hostname: node-0003 # 在 node-0003 节点创建
containers:
- name: linux
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
resources: # 资源策略
requests: # 配额策略
memory: 1100Mi # 内存配额
# 验证内存配额策略
[root@master ~]# for i in app{1..5};do sed "s,minpod,${i}," minpod.yaml;done |kubectl apply -f -
pod/app1 created
pod/app2 created
pod/app3 created
pod/app4 created
pod/app5 created
[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
app1 1/1 Running 0 4s
app2 1/1 Running 0 4s
app3 1/1 Running 0 4s
app4 0/1 Pending 0 4s
app5 0/1 Pending 0 4s
# 清理实验配置
[root@master ~]# kubectl delete pod --all
5 CPU资源配额
[root@master ~]# vim minpod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: minpod
spec:
terminationGracePeriodSeconds: 0
nodeSelector:
kubernetes.io/hostname: node-0003
containers:
- name: linux
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
resources:
requests:
cpu: 800m # 计算资源配额
# 验证配额策略
[root@master ~]# for i in app{1..5};do sed "s,minpod,${i}," minpod.yaml;done |kubectl apply -f -
pod/app1 created
pod/app2 created
pod/app3 created
pod/app4 created
pod/app5 created
[root@master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
app1 1/1 Running 0 8s
app2 1/1 Running 0 8s
app3 0/1 Pending 0 8s
app4 0/1 Pending 0 8s
app5 0/1 Pending 0 8s
# 清理实验配置
[root@master ~]# kubectl delete pod --all
6 综合资源配额
同时设置 CPU 和内存配额时,资源必须满足全部需求
[root@master ~]# vim minpod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: minpod
spec:
terminationGracePeriodSeconds: 0
nodeSelector:
kubernetes.io/hostname: node-0003
containers:
- name: linux
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
resources:
requests:
cpu: 800m # 计算资源配额
memory: 1100Mi # 内存资源配额
五 Pod资源配额
1 概述
限额策略是为了防止某些应用对节点资源过度使用,而配置的限制性策略,限额与配额相反,它不检查节点资源的剩余情况,只限制应用对资源的最大使用量
资源限额使用limits进行配置
2 限额内存/CPU
# 创建限额资源对象文件
[root@master ~]# vim maxpod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: maxpod
spec:
terminationGracePeriodSeconds: 0
containers:
- name: linux
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
resources:
limits:
cpu: 800m
memory: 2000Mi
[root@master ~]# kubectl apply -f maxpod.yaml
pod/maxpod created
3 验证内存限额
[root@master ~]# kubectl cp memtest.py maxpod:/usr/bin/
[root@master ~]# kubectl exec -it maxpod -- /bin/bash
# 大于2000Mi,获取资源失败
[root@maxpod /]# memtest.py 2500
Killed
# 小于2000Mi,获取资源成功
[root@maxpod /]# memtest.py 1500
use memory success
press any key to exit :
4 验证CPU限额
[root@master ~]# kubectl exec -it maxpod -- ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 79.9 0.0 9924 720 ? Rs 18:25 1:19 awk BEGIN{while(1){}}
root 8 0.5 0.0 12356 2444 pts/0 Ss 18:26 0:00 /bin/bash
[root@master ~]# kubectl top pods
NAME CPU(cores) MEMORY(bytes)
maxpod 834m 1Mi
# 清理实验 Pod
[root@master ~]# kubectl delete pod maxpod
pod "maxpod" deleted
六 全局资源管理
1 概述
如果有大量的容器需要设置资源配额,为每个 Pod 设置资源配额策略不方便且不好管理。管理员可以以名称空间为单位(namespace),限制其资源的使用与创建。在该名称空间中创建的容器都会受到规则的限制。
k8s支持的全局资源配额方式有
- 对单个 Pod 内存、CPU 进行配额:LimitRange
- 对资源总量进行配额: ResourceQuota
2 LimitRange
为名称空间 work 设置默认资源配额
Pod 资源配额,对整个 Pod 设置总限额
# 创建名称空间
[root@master ~]# kubectl create namespace work
namespace/work created
# 设置默认配额
[root@master ~]# vim limit.yaml
---
apiVersion: v1
kind: LimitRange
metadata:
name: mylimit # 策略名称
namespace: work # 规则生效的名称空间
spec:
limits: # 全局规则
- type: Container # 资源类型
default: # 对没有限制策略的容器添加规则
cpu: 300m # 计算资源配额
memory: 500Mi # 计算资源限额
defaultRequest:
cpu: 8m # 计算资源配额
memory: 8Mi # 内存资源配额
[root@master ~]# kubectl -n work apply -f limit.yaml
limitrange/mylimit created
验证配额策略
[root@master ~]# vim maxpod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: maxpod
spec:
terminationGracePeriodSeconds: 0
containers:
- name: linux
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
[root@master ~]# kubectl -n work apply -f maxpod.yaml
pod/maxpod created
[root@master ~]# kubectl -n work describe pod maxpod
... ...
Limits:
cpu: 300m
memory: 500Mi
Requests:
cpu: 10m
memory: 8Mi
... ...
[root@master ~]# kubectl -n work top pods
NAME CPU(cores) MEMORY(bytes)
maxpod 300m 0Mi
自定义资源
[root@master ~]# vim maxpod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: maxpod
spec:
terminationGracePeriodSeconds: 0
containers:
- name: linux
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
resources:
requests:
cpu: 10m
memory: 10Mi
limits:
cpu: 1100m
memory: 2000Mi
[root@master ~]# kubectl -n work delete -f maxpod.yaml
pod "maxpod" deleted
[root@master ~]# kubectl -n work apply -f maxpod.yaml
pod/maxpod created
[root@master ~]# kubectl -n work describe pod maxpod
... ...
Limits:
cpu: 1100m
memory: 2000Mi
Requests:
cpu: 10m
memory: 10Mi
... ...
[root@master ~]# kubectl -n work top pods maxpod
NAME CPU(cores) MEMORY(bytes)
maxpod 1000m 0Mi
资源配额范围
[root@master ~]# vim limit.yaml
---
apiVersion: v1
kind: LimitRange
metadata:
name: mylimit
namespace: work
spec:
limits:
- type: Container
default:
cpu: 300m
memory: 500Mi
defaultRequest:
cpu: 8m
memory: 8Mi
max:
cpu: 800m
memory: 1000Mi
min:
cpu: 2m
memory: 8Mi
[root@master ~]# kubectl -n work apply -f limit.yaml
limitrange/mylimit configured
[root@master ~]# kubectl -n work delete -f maxpod.yaml
pod "maxpod" deleted
[root@master ~]# kubectl -n work apply -f maxpod.yaml
Error from server (Forbidden): error when creating "maxpod.yaml": pods "maxpod" is forbidden: [maximum cpu usage per Container is 800m, but limit is 1, maximum memory usage per Container is 1000Mi, but limit is 2000Mi]
容器资源配额">多容器资源配额
[root@master ~]# vim maxpod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: maxpod
spec:
terminationGracePeriodSeconds: 0
containers:
- name: linux
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
resources:
requests:
cpu: 10m
memory: 10Mi
limits:
cpu: 800m
memory: 1000Mi
- name: linux1
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
resources:
requests:
cpu: 10m
memory: 10Mi
limits:
cpu: 800m
memory: 1000Mi
[root@master ~]# kubectl -n work apply -f maxpod.yaml
pod/maxpod created
[root@master ~]# kubectl -n work get pods
NAME READY STATUS RESTARTS AGE
maxpod 2/2 Running 0 50s
[root@master ~]# kubectl -n work top pods maxpod
NAME CPU(cores) MEMORY(bytes)
maxpod 1610m 0Mi
Pod资源总量配额
[root@master ~]# vim limit.yaml
---
apiVersion: v1
kind: LimitRange
metadata:
name: mylimit
namespace: work
spec:
limits:
- type: Container
default:
cpu: 300m
memory: 500Mi
defaultRequest:
cpu: 8m
memory: 8Mi
max:
cpu: 800m
memory: 1000Mi
min:
cpu: 2m
memory: 8Mi
- type: Pod
max:
cpu: 1200m
memory: 1200Mi
min:
cpu: 2m
memory: 8Mi
[root@master ~]# kubectl -n work apply -f limit.yaml
limitrange/mylimit configured
[root@master ~]# kubectl -n work delete -f maxpod.yaml
pod "maxpod" deleted
[root@master ~]# kubectl -n work apply -f maxpod.yaml
Error from server (Forbidden): error when creating "maxpod.yaml": pods "maxpod" is forbidden: [maximum cpu usage per Pod is 1200m, but limit is 1600m, maximum memory usage per Pod is 1200Mi, but limit is 2097152k]
多个 Pod 消耗资源
[root@master ~]# vim maxpod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: maxpod
spec:
terminationGracePeriodSeconds: 0
containers:
- name: linux
image: myos:8.5
command: ["awk", "BEGIN{while(1){}}"]
resources:
requests:
cpu: 10m
memory: 10Mi
limits:
cpu: 800m
memory: 1000Mi
# 创建太多Pod,资源也会耗尽
[root@master ~]# for i in app{1..9};do sed "s,maxpod,${i}," maxpod.yaml ;done |kubectl -n work apply -f -
# Pod 创建成功后,查看节点资源使用情况
[root@master ~]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 81m 4% 1040Mi 27%
node-0001 1800m 90% 403Mi 10%
node-0002 1825m 86% 457Mi 11%
node-0003 1816m 85% 726Mi 19%
node-0004 1823m 86% 864Mi 21%
node-0005 1876m 88% 858Mi 21%
# 清理实验配置
[root@master ~]# kubectl -n work delete pods --all
3 ResourceQuota
全局 quota 配额,限制配额总量
[root@master ~]# vim quota.yaml
---
apiVersion: v1
kind: ResourceQuota # 全局资源配额对象
metadata:
name: myquota # 规则名称
namespace: work # 规则作用的名称空间
spec:
hard: # 创建强制规则
requests.cpu: 1000m # 计算资源配额总数
requests.memory: 2000Mi # 计算资源配额总数
limits.cpu: 5000m # 计算资源配额总数
limits.memory: 8Gi # 计算资源配额总数
pods: 3 # 限制创建资源对象总量
[root@master ~]# kubectl -n work apply -f quota.yaml
resourcequota/myquota created
验证 quota 配额
[root@master ~]# for i in app{1..5};do sed "s,maxpod,${i}," maxpod.yaml ;done |kubectl -n work apply -f -
pod/app1 created
pod/app2 created
pod/app3 created
Error from server (Forbidden): error when creating "STDIN": pods "app4" is forbidden: exceeded quota: myquota, requested: pods=1, used: pods=3, limited: pods=3
Error from server (Forbidden): error when creating "STDIN": pods "app5" is forbidden: exceeded quota: myquota, requested: pods=1, used: pods=3, limited: pods=3
# 删除实验 Pod 与限额规则
[root@master ~]# kubectl -n work delete pods --all
pod "app1" deleted
pod "app2" deleted
pod "app3" deleted
[root@master ~]# kubectl -n work delete -f limit.yaml -f quota.yaml
limitrange "mylimit" deleted
resourcequota "myquota" deleted
[root@master ~]# kubectl delete namespace work
namespace "work" deleted