K8s | 摇头晃脑

It's ALWAYS DNS!

今天前端遇到一个问题，前端部署的[反向代理]](/posts/nginx-proxy/)到后端的upstream一直pending。 timeout？以为是后端服务压力大，来不及响应，所以更新upstream配置，加上timeout。立竿见影，没问题了。 1 2 3 4 5 6 7 8 9 10 11 12 location /api/ { proxy_set_header Host $http_host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; rewrite ^/api/(.*)$ /$1 break; proxy_pass http://api_server/; proxy_connect_timeout 5s; proxy_send_timeout 10s; proxy_read_timeout 10s; } 一段时间后，后端又接受不到前端请求了，nginx一直报错499。 DNS ! 调查了一段时间后发现根本问题是nginx的dns cache机制的问题。原来后端每次更新k8s后端deployment的时候，也会重建service从而导致 service的 IP发生了变化。和后端沟通修改了后端更新流水线脚本，不再重建service 后，问题解决 :) dubug force resolve (bad performance) force nginx to resolve DNS (of a dynamic hostname) everytime when doing proxy_pass? 1 2 3 4 5 6 7 server { #... resolver 127.0.0.1; set $backend "http://dynamic.example.com:80"; proxy_pass $backend; #... } resolver ...

细说kubeconfig

今天准备管理某一个kubernetes 集群时发现master主机22端口因为管理的需要被禁用了，无法登陆服务器。问了一下运维人员，原来是基于安全原因，公司决定禁用所有服务器的root ssh登陆权限，平时我都是ssh 登陆到master node，在服务器上直接使用kubectl命令查看/部署/debug deployment/service等资源，现在只好修改下本地 kubeconfig 文件，用自己本地的 kubectl 管理/操作kubernetes集群。操作了一段时间后，发现用本地kubectl操作kubernetes体验蛮好的，特别是服务器缺少本地editor(vim) kubectl edit ... 的语法高亮支持。配置kubeconfig过程分享如下，大体上说过就两步：添加 context； use context。 1 2 vim .kube/config kubectl config use-context dev-8-admin@kubernetes 除了使用vim 编辑 .kube/config 文件，对于一些简单的配置也可以使用kubectl config command 快速配置kubeconfig： 1 2 3 4 5 6 7 8 9 10 11 12 13 ## create new cluster kubectl config set-cluster NAME [--server=server] [--certificate-authority=path/to/certificate/authority] [--insecure-skip-tls-verify=true] ## create new user kubectl config set-credentials NAME [--client-certificate=path/to/certfile] [--client-key=path/to/keyfile] [--token=bearer_token] [--username=basic_user] [--password=basic_password] [--auth-provider=provider_name] [--auth-provider-arg=key=value] [options] ## create new context kubectl config set-context [NAME | --current] [--cluster=cluster_nickname] [--user=user_nickname] [--namespace=namespace] [options] ## use context kubectl config use-context CONTEXT_NAME [options] 另外，kubectl config set 不支持对 certificate-authority-data字段的设置，只支持指定data文件的路径，所以推荐用vim 编辑kubeconfig文件。 ...

pod生命周期事件生成器

PLEG 不熟悉PLEG(Pod Lifecycle Event Generator)的同学，可以先看下这篇文章What is PLEG?。这篇文章对pleg是什么和常见的unhealthy问题有很详细的介绍。 cni 当k8s的 cni 插件性能较差，node上的pod 数量较多（大于 80）的时候，我们常常会遇到PLEG出错的问题: PLEG is not healthy: pleg was last seen active 6m55.488150776s ago; threshold is 3m0s 调试kuryr cni的时候，发现当openstack neutron服务压力比较大的时候。 cni这边申请和释放 port的时延会相应的增加，导致虚拟机大量堆积无效的netns，然后就会遇到由kueblet PLEG not healthy引起的docker hang 住问题。 docker 重启 docker 和 kueblet 可以暂时解决PLEG unhealthy。 1 2 3 4 5 systemctl restart docker systemctl restart kubelet # do NOT use `docker rm -vf`, # which will kill running containers docker rm -v `docker ps -qa` 建议同时修改 kubelet 启动参数 –housekeeping-interval=30s ...

Pod生命周期和Liveness的区别

今天测试给我反馈了一个pod问题，测试给出的 use case 描述如下：配置一个nginx的web服务；在生命周期中选择http协议，端口配置80，路径配置/errorpath；服务中pod能正常启动；预期在pod的事件中应该有一个“FailedPostStartHook”错误信息。测试人员发现，第4点的预期没有达到，pod 正常启动了，却没有 FailedPostStarHook事件出现。简单分析了一下，我发现是测试人员把pod的lifecycle和 pod的liveness/readiness 诺混淆了，从而写出了错的test case。 Lifecycle handlers 首先回顾下pod lifecycle的作用： 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 kubectl explain pod.spec.containers.lifecycle.postStart RESOURCE: postStart <Object> DESCRIPTION: PostStart is called immediately after a container is created. If the handler fails, the container is terminated and restarted according to its restart policy. Other management of the container blocks until the hook completes. More info: https://kubernetes.io/docs/concepts/containers/ container-lifecycle-hooks/#container-hooks/ Handler defines a specific action that should be taken 简单翻译下就是说， kubernetes 提供的pod start和exit的 lifecycle hooks 方便开发人员hooked到 ...

VPC模式

之前写了一篇post 适配腾讯云backend 的文章，从代码的角度简单记录了flannel vpc backend实现过程。这篇文章是对前面文章的补充，全局鸟瞰描绘了flannel vpc backend网络数据包的流动过程。总体来看vpc 和 host-gw 模式是很类似的，理解host-gateway模式对理解vpc 模式很有帮助。 host gw host gateway 模式： host-gw adds route table entries on hosts, so that host know how to traffic container network packets. This works on L2, because it only concerns hosts, switches and containers. switches does not care IP and route, hosts know containers exists, and how to route to them, containers just send and receive data. ...

收集容器syslog

有一个app 跑在pod里面，这个app 默认会输出自己的运行日志到syslogd，请问如何让host主机上运行的syslogd日志收集器收集到上面app输出的运行日志呢？ /dev/log 答案：把主机的 /dev/log目录挂载到 pod 里面的 /dev/log即可。 Some of these messages need to be brought to a system administrator’s attention immediately. And it may not be just any system administrator – there may be a particular system administrator who deals with a particular kind of message. Other messages just need to be recorded for future reference if there is a problem. Still others may need to have information extracted from them by an automated process that generates monthly reports. ...

调度到master节点

一般来说，kubernetes 的pod是不在master 节点上运行的。如果要求pod 必须被调度到master 节点上运行，可以修改pod 的 toleration 和 affinity。 toleration和affinity：在pod加上toleration和affinity配置 yaml 1 2 3 4 5 6 7 8 9 10 11 12 13 spec: tolerations: - key: "node-role.kubernetes.io/master" operator: "Equal" value: "true" effect: "NoSchedule" affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/master operator: Exists go 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 + Operator: apiv1.TolerationOpExists, + Effect: apiv1.TaintEffectNoSchedule, + }, + }, + Affinity: &apiv1.Affinity{ + NodeAffinity: &apiv1.NodeAffinity{ + RequiredDuringSchedulingIgnoredDuringExecution: &apiv1.NodeSelector{ + NodeSelectorTerms: []apiv1.NodeSelectorTerm{ + apiv1.NodeSelectorTerm{ + MatchExpressions: []apiv1.NodeSelectorRequirement{ + apiv1.NodeSelectorRequirement{ + Key: "node-role.kubernetes.io/master", + Operator: apiv1.NodeSelectorOpExists, + }, + }, + }, + }, + },

替换k8s所有证书

客户需要把kubernetes apiserver/etcd/kubelet/kubectl 等所有的证书有效期修改为100年。很明显这是一个不合理的需求，不过客户说什么就是什么。于是经几天的调试有了下面的这个 Makefile批量生成所有(FILES变量)的证书。如果对makefile的语法不熟悉，可以看看Makefile简介 makefile 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 FILES = ca.crt ca.key sa.key sa.pub front-proxy-ca.crt front-proxy-ca.key etcd_ca.crt etcd_ca.key CONFS = admin.conf controller-manager.conf kubelet.conf scheduler.conf SELFS = kubelet.crt.self kubelet.crt.key #KEYs = ca.key front-proxy-ca.key etcd_ca.key sa.key #CAs = ca.crt front-proxy-ca.crt etcd_ca.crt #PUBs = sa.pub ## kubernetes will sign certificate ## automatically, so below ## csr/cert is for test purpose #CSR = apiserver.csr apiserver-kubelet-client.csr CERT_KEYS = apiserver.key apiserver-kubelet-client.key front-proxy-client.key CERTS = apiserver.cert apiserver-kubelet-client.cert front-proxy-client.cert # openssl genrsa -des3 -out rootCA.key 4096 CMD_CREATE_PRIVATE_KEY = openssl genrsa -out $@ 2048 CMD_CREATE_PUBLIC_KEY = openssl rsa -in $< -pubout -out $@ # openssl req -x509 -new -nodes -key rootCA.key -sha256 -days 1024 -out rootCA.crt CMD_CREATE_CA = openssl req -x509 -new -nodes -key $< -sha256 -days 36500 -out $@ -subj '/CN=kubernetes' # openssl req -new -key mydomain.com.key -out mydomain.com.csr CMD_CREATE_CSR = openssl req -new -key $< -out $@ -config $(word 2,$^) # openssl x509 -req -in mydomain.com.csr -CA rootCA.crt -CAkey rootCA.key -CAcreateserial -out mydomain.com.crt -days 500 -sha256 CMD_SIGN_CERT = openssl x509 -req -in $< -CA $(word 2,$^) -CAkey $(word 3,$^) -CAcreateserial -out $@ -days 36500 -sha256 -extfile $(word 4,$^) -extensions my_extensions # generata self sign certificate CMD_CREATE_CERT = openssl req -x509 -new -nodes -key $< -sha256 -days 36500 -out $@ -subj '/CN=nodeXXX@timestamp1531732165' CMD_MSG = @echo generating $@ ... MASTER_IP := 192.168.1.200 ## REMEMBER CHANGE ME .PHONY: all clean check self_sign rename all: ${FILES} ${CONFS} ${CERT_KEYS} ${CERTS} clean: -rm ${FILES} ${CONFS} ${CERT_KEYS} ${CERTS} self_sign: ${SELFS} check: for f in *.cert *.crt; do echo $$f; openssl x509 -noout -dates -in $$f; echo '==='; done rename: for f in *.cert; do echo $$f; mv $$f $${f%.*}.crt; echo '====='; done %.key: ${CMD_MSG} ${CMD_CREATE_PRIVATE_KEY} %.pub: %.key ${CMD_MSG} ${CMD_CREATE_PUBLIC_KEY} %.self: %.key ${CMD_MSG} ${CMD_CREATE_CERT} %.crt: %.key ${CMD_MSG} ${CMD_CREATE_CA} %.csr: %.key %.csr.cnf ${CMD_MSG} ${CMD_CREATE_CSR} %.cert: %.csr ca.crt ca.key %.csr.cnf #%.cert: %.csr front-proxy-ca.crt front-proxy-ca.key %.csr.cnf ${CMD_MSG} ${CMD_SIGN_CERT} %.conf: %.cert %-conf.sh sh $(word 2,$^) ${MASTER_IP} 上面的Makefile还需要对应的csr和 conffiles。 ...

flannel vpc

update: flannel从v0.14.0(2021/05/27)开始已经支持腾讯云的vpc backend了。客户需要在腾讯云上部署kubernetes集群而且选用的网络插件是flannel，所以我们需要为flannel 添加腾讯云 vpc 的 backend 适配。我大致看了下github上阿里云和 aws 适配器的代码，发现并不复杂，flannel已经把所有的dirty work flannel 都包装好API了。稍稍了解一些网络设备或者Linux网络相关的命令（比如route table）就可以比较轻松的写出flannel适配器。整个适配过程可以分为下面4个步骤：定义 TxVpcBackend struct, 实现New func 在init func中注册; 调用腾讯云SDK 实现 RegisterNetwork method; 最后在main.go中注册腾讯云backend 即可；部署deployment 的时候选择 tx-vpc 的backend 即可. 下面结合部分代码具体的说下实现过程：开发定义结构体只是搭一个架子，方便注册到flannel backend上，不含具体适配器的逻辑： 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 type TxVpcBackend struct { sm subnet.Manager extIface *backend.ExternalInterface } func New(sm subnet.Manager, extIface *backend.ExternalInterface) (backend.Backend, error) { be := TxVpcBackend{ sm: sm, extIface: extIface, } return &be, nil } func init() { backend.Register("tx-vpc", New) } 实现RegisterNetwork 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 func (be *TxVpcBackend) RegisterNetwork(ctx context.Context, config *subnet.Config) (backend.Network, error) { // 1. Parse our configuration cfg := struct { AccessKeyID string AccessKeySecret string }{} if len(config.Backend) > 0 { if err := json.Unmarshal(config.Backend, &cfg); err != nil { return nil, fmt.Errorf("error decoding VPC backend config: %v", err) } } log.Infof("Unmarshal Configure : %v\n", cfg) // 2. Acquire the lease form subnet manager attrs := subnet.LeaseAttrs{ PublicIP: ip.FromIP(be.extIface.ExtAddr), } l, err := be.sm.AcquireLease(ctx, &attrs) switch err { case nil: case context.Canceled, context.DeadlineExceeded: return nil, err default: return nil, fmt.Errorf("failed to acquire lease: %v", err) } if cfg.AccessKeyID == "" || cfg.AccessKeySecret == "" { cfg.AccessKeyID = os.Getenv("ACCESS_KEY_ID") cfg.AccessKeySecret = os.Getenv("ACCESS_KEY_SECRET") if cfg.AccessKeyID == "" || cfg.AccessKeySecret == "" { return nil, fmt.Errorf("ACCESS_KEY_ID and ACCESS_KEY_SECRET must be provided! ") } } err = createRoute(l.Subnet.String(), cfg.AccessKeyID, cfg.AccessKeySecret) if err != nil { log.Errorf("Error DescribeVRouters: %s .\n", err.Error()) } return &backend.SimpleNetwork{ SubnetLease: l, ExtIface: be.extIface, }, nil } 主要逻辑是使用腾讯云的SDK 在vpc 网络下创建route , 即上面的 ...