[toc]
kubernetes 操作记录六
调度器、预选策略及优选函数
整个调度大致可以分为三个过程: Predicate(预先) —> Priority(优选) —> Select(选定)
调度器:
预选策略:
CheckNodeCondition: 检查节点本身是否正常;
GeneralPredicates:
HostName: 检查Pod对象是否定义了 pod.spec.hostname,
PodFitsHostPorts:pods.spec.containers.ports.hostPort
MatchNodeSelector: pods.sepc.nodeSelector
PodFitsResources: 检查Pod的资源需求是否能被节点的需求所满足
NoDiskConflicts:检查Pod依赖的存储卷是否能满足需求;
PodToleratesNodeTaints: 检查Pod上的spec.tolerations可容忍的污点是否完全包含节点上的污点;
PodToleratesNodeNoExecuteTaints: 不能执行的污点,这个预选策略默认是不检查的;
CheckNodeLabelPresence: 检查节点指定标签的存在性;
CheckServiceAffinity: 把这个Pod调度到它所属的Service,已经调度完成其它Pod所在的节点上去;
MaxEBSVolumeCount:
MaxGCEPDVolumeCount:
MaxAzureDiskVolumeCount:
CheckVolumeBinding:
NoVolumeZoneConflict:
CheckNodeMemoryPressure:检查节点内存资源是否处于压力过大的状态;
CheckNodePidPressure:检查PId状态是否紧缺;
CheckNodeDiskPressure:
MatchInterPodAffity:
优选函数
LeastRequested:
(cpu(capacity-sum(requested))*10/capacity)+memory((capacity-sum(requested))*10/capacity))/2
BalancedResourceAllocation:
CPU和内存资源被占用率相近的胜出;
NodePreferAvoidPods:
节点注解信息”scheduler .alpha.kubernetes.io/preferAvoidPods”
TaintToleration: 将Pod对象的spec.tolerations列表项与节点的taints列表项进行匹配度检查,匹配条目越多,得分越低;
SelectorSpreading: 标签选择器的分散度,与当前Pod对象同属的标签选择器,选择适配的其它Pod的对象所在的节点,越多的,得分越底,否则得分越高;
InterPodAffinity: 遍历Pod的亲和性条目,并将那些能够匹配到给定节点的条目相加,结果值越大,得分越低;
NodeAffinity: 根据NodeSelecter进行匹配度检查,能成功匹配的越多,得分就越高;
MostRequested: 空闲量越小的,得分越大。他会尽量把一个节点的资源先用完;
NodeLabel: 标签越多,得分越高;
ImageLocality: 此节点是否有此Pod 所需要的镜像,拥有镜像越多的,得分越高,而它是根据镜像体积大小之和来计算的;
kubernetes 高级调度方式
节点选择器: nodeSelector, nodeName
节点亲和性调度: nodeAffinity
注: 当调度选择器选择了不存的标签时,Pod 会成为Pending状态
nodeSelector:
disktype: harddisk
节点亲和性调度;
硬亲和
# vim pod-nodeaffinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-node-affinity-demo labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: zone operator: In values: - foo - bar
软亲和
# vIm pod-nodeaffinity-demo-2.yaml apiVersion: v1 kind: Pod metadata: name: pod-node-affinity-demo-2 labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: zone operator: In values: - foo - bar weight: 60
# vim pod-required-affinity-demo.yaml apiVersion: v1 kind: Pod metadata: name: pod-first labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/myapp:v1 --- apiVersion: v1 kind: Pod metadata: name: pod-second labels: app: backend tier: db spec: containers: - name: busybox image: busybox:latest imagePullPolicy: IfNotPresent command: ["sh","-c","sleep 3600"] affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app, operator: In, values: ["myapp"]} topologyKey: kubernetes.io/hostname
注: podAntiAffinity
一定不会调度到同一节点上
taint 的effect定义对Pod排斥效果:
NoSchedule: 仅影响调度过程,对现存的Pod对象不产生影响;
NoExecute: 既影响调度过程,也影响现存的Pod对象; 不容忍的Pod对象将被驱逐;
PreferNoSchedule:
# kubectl taint node node01 node-type=production:NoSchedule # kubectl taint node node02 node-type=dev:NoExecute
apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deploy namespace: default spec: replicas: 5 selector: matchLabels: app: myapp release: canary template: metadata: labels: app: myapp release: canary spec: containers: - name: myapp image: ikubernetes/myapp:v2 ports: - name: http containerPort: 80 tolerations: - key: "node-type" operator: "Equal" value: "production" effect: "NoSchedule"
tolerations: - key: "node-type" operator: "Equal" value: "production" effect: "NoExecute" tolerationSeconds: 3600
tolerations: - key: "node-type" operator: "Exists" value: "" effect: "NoExecute"
容器资源需求、限制、及HeapSter
requests: 需求,最低保障;
limits: 限制,硬限制;
limits 一般大于等于 requests
CPU: 1颗逻辑CPU
1逻辑核心=1000,millicores
500m = 0.5CPU
内存:
E、P、T、G、M、K
Ei、Pi
取消节点污点
# kubectl taint node node01 node-type-
apiVersion: v1 kind: Pod metadata: name: pod-demo namespace: renjin labels: app: myapp tier: frontend spec: containers: - name: myapp image: ikubernetes/stress-ng command: ["/usr/bin/stress-ng", "-c 1", "--metrics-brief"] resources: requests: cpu: "500m" memory: "128Mi" limits: cpu: "500m" memory: "512Mi"
QoS:
Guranteed: 每个容器同时设置CPU和内在的requests和limits,
cpu.limits=cpu.requests memory.limits=memory.request
Burstable:至少有一个容器设置CPU或内存资源的requests
BestEffort: 没有任何一个容器设置了requests或limits属性;最低优先级别; 是自动配置的;
配置influxdb
# wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/influxdb.yaml # 更改以下内容 apiVersion: apps/v1 spec: replicas: 1 selector: matchLabels: task: monitoring k8s-app: influxdb spec: containers: - name: influxdb image: registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-influxdb-amd64:v1.5.2 # kubectl apply -f influxdb.yaml
配置heapster-rbac
# wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml # kubectl apply -f heapster-rbac.yaml
配置heapster
# wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/heapster.yaml # 修改以下内容 apiVersion: apps/v1 spec: replicas: 1 selector: matchLabels: task: monitoring k8s-app: heapster image: registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-amd64:v1.5.4 spec: ports: - port: 80 targetPort: 8082 type: NodePort # kubectl apply -f heapster.yaml
# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE heapster NodePort 10.101.89.141 <none> 80:31261/TCP 63s kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 48d kubernetes-dashboard NodePort 10.103.25.80 <none> 443:30660/TCP 21d monitoring-influxdb ClusterIP 10.103.131.36 <none> 8086/TCP 153m
看下图说明,heapster此时可以调通
grafana配置
# wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/grafana.yaml # vim grafana.yaml # 修改以下内容 apiVersion: apps/v1 spec: replicas: 1 selector: matchLabels: task: monitoring k8s-app: grafana image: registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-grafana-amd64:v5.0.4 ports: - port: 80 targetPort: 3000 type: NodePort # kubectl apply -f grafana.yaml