记一次coredns解析故障
现象:coredns链接apiserver失败:
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.68.0.1:443/version": dial tcp 10.68.0.1:443: i/o timeout
排查过程
查看k8s的apiserver的状态:
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.68.0.1 <none> 443/TCP 172m
在宿主机上去去访问apiserver的地址:
root@industai-sxyq:/opt#curl 10.68.0.1:443
404 page not found
root@industai-sxyq:/opt#ping 10.68.0.1
PING 10.68.0.1 (10.68.0.1) 56(84) bytes of data.
64 bytes from 10.68.0.1: icmp_seq=1 ttl=64 time=0.107 ms
64 bytes from 10.68.0.1: icmp_seq=2 ttl=64 time=0.085 ms
怀疑是iptables转发的问题,查看iptables,但是机器上并没有iptables命令,这就很奇怪。
(base) root@industai-sxyq:/opt# ipt
iptables-apply iptables-restore iptables-save iptables-xml
(base) root@industai-sxyq:/opt# ipt
使用apt 安装iptables。限制iptables已安装。只能使用apt-get install --reinstall iptables
重新安装iptabels。
(base) root@industai-sxyq:/opt# apt install iptables
Reading package lists... Done
Building dependency tree
Reading state information... Done
iptables is already the newest version (1.6.1-2ubuntu2.1).
The following packages were automatically installed and are no longer required:
ieee-data python-certifi python-chardet python-jmespath python-kerberos python-libcloud python-lockfile python-netaddr python-openssl python-requests
python-selinux python-simplejson python-urllib3 python-xmltodict
Use 'apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 10 not upgraded.
重新安装iptables之后,iptables命令重新出现了。看来问题就在这里。于是清除了iptables规则iptables -F
清楚之后,重启了k3s。重启之后,k3s重建了iptables规则。而且coredns可以正常启动啦。
新的问题
本以为问题就这样解决了,发现其他的pod根本不能访问coredns进行dns解析。
检查了flaanel的运行状态,发现flannel和cni0网卡都在。
通过tcp抓cni0网卡的包发现:
11:44:20.643974 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275158535 ecr 0,nop,wscale 7], length 0
11:44:21.665944 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275159557 ecr 0,nop,wscale 7], length 0
11:44:23.681946 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275161573 ecr 0,nop,wscale 7], length 0
11:44:25.825941 ARP, Request who-has industai-sxyq tell 10.58.0.172, length 28
11:44:27.873945 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275165765 ecr 0,nop,wscale 7], length 0
11:44:36.065944 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275173957 ecr 0,nop,wscale 7], length 0
11:44:52.193949 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275190085 ecr 0,nop,wscale 7], length 0
11:44:57.313946 ARP, Request who-has industai-sxyq tell 10.58.0.172, length 28
查看k3s得网络模式,是使用的ipvs,
ExecStart=/usr/local/bin/k3s \
server \
'--disable=traefik' \
'--disable=coredns' \
'--data-dir=/data/rancher/k3s' \
'--service-cidr=10.68.0.0/16' \
'--cluster-cidr=10.58.0.0/16' \
'--cluster-dns=10.68.0.2' \
'--cluster-domain=cluster.local.' \
'--flannel-backend=vxlan' \
'--kubelet-arg=topology-manager-policy=single-numa-node' \
'--kubelet-arg=cpu-manager-policy=static' \
'--kubelet-arg=kube-reserved=cpu=1' \
'--kube-apiserver-arg=service-node-port-range=20000-40000' \
'--kube-apiserver-arg=authorization-mode=Node,RBAC' \
'--kube-apiserver-arg=allow-privileged=true' \
'--kube-proxy-arg=proxy-mode=ipvs' \
'--kube-proxy-arg=masquerade-all=true' \
'--kube-proxy-arg=metrics-bind-address=0.0.0.0' \
'--kube-scheduler-arg=config=/etc/rancher/k3s/scheduler-policy-config.yaml'
```
在网上排查找了个帖子`https://ask.kubesphere.io/forum/d/4699-k8s-pod-ping-kubesphere-devops/52`,感觉和我的抓包结果很像,发出去了包但是没有回应。
怀疑是自己系统的ipv4转发有什么问题,于是重新配置了`/etc/sysctl.conf`
重新apply之后,发现还是一样。无法访问coredns。
```
vm.swappiness=0
net.core.rmem_default=256960
net.core.rmem_max=16777216
net.core.wmem_default=256960
net.core.wmem_max=16777216
net.core.netdev_max_backlog=2000
net.core.somaxconn=65535
net.core.optmem_max=81920
net.ipv4.tcp_mem=8388608 12582912 16777216
net.ipv4.tcp_rmem=8192 87380 16777216
net.ipv4.tcp_wmem=8192 65536 16777216
net.ipv4.tcp_keepalive_time=180
net.ipv4.tcp_keepalive_intvl=30
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_sack=1
net.ipv4.tcp_fack=1
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_fin_timeout=10
net.ipv4.tcp_max_syn_backlog=100000
fs.file-max=1100000
fs.nr_open=1100000
fs.inotify.max_user_watches=524288
kernel.pid_max=655350
后来还发现一个奇怪的现象,其实pod里面不能访问的只有coredns的10.68.0.2这个ip、pod的ip以及外部的ip,其他k8s的service都可以访问。这就很奇怪了。
其中还做过的操作包括但不限于:重建路由规则,清除iptables,把k3s的运行模式从ipvs改到iptables,升级k3s从1.23升级到1.25,均无法解决。
但是基本能确定,问题就是出在ipvs和iptables上。无奈,知识不够,无法分析到具体原因。
故障解决
于是,清理了已经部署的k3s,然后重启了机器,重新安装k3s 1.25版本,重启之后进行安装,ansible脚本对系统配置进行了以下修改:
- name: Disable daily security update
remote_user: root
become: yes
shell: "{{ item }}"
with_items:
- "systemctl kill --kill-who=all apt-daily.service"
- "systemctl stop apt-daily.timer"
- "systemctl stop apt-daily-upgrade.timer"
- "systemctl stop apt-daily.service"
- "systemctl stop apt-daily-upgrade.service"
- "systemctl disable apt-daily.timer"
- "systemctl disable apt-daily-upgrade.timer"
- "systemctl disable apt-daily.service"
- "systemctl disable apt-daily-upgrade.service"
when: ansible_os_family == "Debian"
ignore_errors: True
tags: init_env
#- name: create user
# user: name={{ item }} shell=/bin/bash createhome=yes
# with_items:
# - "{{ username }}"
# - "readonly"
# tags: create_user
- name: Set limits
pam_limits:
dest: "{{ item.dest }}"
domain: '*'
limit_type: "{{ item.limit_type }}"
limit_item: "{{ item.limit_item }}"
value: "{{ item.value }}"
with_items:
- { dest: '/etc/security/limits.conf',limit_type: 'soft',limit_item: 'nofile', value: '655350' }
- { dest: '/etc/security/limits.conf',limit_type: 'hard',limit_item: 'nofile', value: '655350'}
#- { dest: '/etc/security/limits.conf',limit_type: 'soft',limit_item: 'nproc', value: '102400' }
#- { dest: '/etc/security/limits.conf',limit_type: 'hard',limit_item: 'nproc', value: '102400' }
- { dest: '/etc/security/limits.conf',limit_type: 'soft',limit_item: 'sigpending', value: '255377' }
- { dest: '/etc/security/limits.conf',limit_type: 'hard',limit_item: 'sigpending', value: '255377' }
- { dest: '/etc/security/limits.d/90-nproc.conf', limit_type: 'soft',limit_item: 'nproc', value: '262144' }
- { dest: '/etc/security/limits.d/90-nproc.conf', limit_type: 'hard',limit_item: 'nproc', value: '262144' }
tags: init_env
- sysctl:
name: "{{ item.name }}"
value: "{{ item.value}}"
state: present
reload: yes
with_items:
- { name: 'vm.swappiness', value: '0'}
- { name: 'net.core.rmem_default ',value: '256960'}
- { name: 'net.core.rmem_max',value: '16777216'}
- { name: 'net.core.wmem_default',value: '256960'}
- { name: 'net.core.wmem_max ',value: '16777216'}
- { name: 'net.core.netdev_max_backlog ',value: '2000'}
- { name: 'net.core.somaxconn',value: '65535'}
- { name: 'net.core.optmem_max',value: '81920'}
- { name: 'net.ipv4.tcp_mem',value: '8388608 12582912 16777216'}
- { name: 'net.ipv4.tcp_rmem',value: '8192 87380 16777216'}
- { name: 'net.ipv4.tcp_wmem',value: '8192 65536 16777216'}
- { name: 'net.ipv4.tcp_keepalive_time',value: '180'}
- { name: 'net.ipv4.tcp_keepalive_intvl',value: '30'}
- { name: 'net.ipv4.tcp_keepalive_probes',value: '3'}
- { name: 'net.ipv4.tcp_sack',value: '1'}
- { name: 'net.ipv4.tcp_fack',value: '1'}
- { name: 'net.ipv4.tcp_window_scaling',value: '1'}
- { name: 'net.ipv4.tcp_syncookies',value: '1'}
- { name: 'net.ipv4.tcp_tw_reuse',value: '1'}
- { name: 'net.ipv4.tcp_tw_recycle',value: '0'}
- { name: 'net.ipv4.tcp_fin_timeout',value: '10'}
#- { name: 'net.ipv4.ip_local_port_range',value: '1024 65000'}
- { name: 'net.ipv4.tcp_max_syn_backlog',value: '100000'}
- { name: 'fs.file-max',value: '1100000'}
- { name: 'fs.nr_open',value: '1100000'}
- { name: 'fs.inotify.max_user_watches', value: '524288' }
- { name: 'kernel.pid_max', value: '655350' }
ignore_errors: True
tags: init_env
安装成功后,发现coredns依旧无法启动,但是查看内核日志:
11月 08 10:07:06 industai-sxyq k3s[1867]: E1108 10:07:06.376295 1867 proxier.go:1562] "Failed to execute iptables-restore" err=<
11月 08 10:07:06 industai-sxyq k3s[1867]: exit status 1: iptables-restore: invalid option -- 'w'
11月 08 10:07:06 industai-sxyq k3s[1867]: iptables-restore: invalid option -- 'W'
11月 08 10:07:06 industai-sxyq k3s[1867]: Unknown arguments found on commandline
11月 08 10:07:06 industai-sxyq k3s[1867]: > rules=<
11月 08 10:07:06 industai-sxyq k3s[1867]: *nat
11月 08 10:07:06 industai-sxyq k3s[1867]: :KUBE-SERVICES - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]: :KUBE-POSTROUTING - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]: :KUBE-NODE-PORT - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]: :KUBE-LOAD-BALANCER - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]: :KUBE-MARK-MASQ - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-POSTROUTING -m comment --comment "Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose" -m set --match-se
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-NODE-PORT -p tcp -m comment --comment "Kubernetes nodeport TCP port for masquerade purpose" -m set --match-set KUBE-NODE-POR
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-SERVICES -m comment --comment "Kubernetes service cluster ip + port for masquerade purpose" -m set --match-set KUBE-CLUSTER-
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-SERVICES -m addrtype --dst-type LOCAL -j KUBE-NODE-PORT
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-LOAD-BALANCER -j KUBE-MARK-MASQ
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-SERVICES -m set --match-set KUBE-CLUSTER-IP dst,dst -j ACCEPT
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-POSTROUTING -m mark ! --mark 0x00004000/0x00004000 -j RETURN
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-POSTROUTING -j MARK --xor-mark 0x00004000
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-MARK-MASQ -j MARK --or-mark 0x00004000
11月 08 10:07:06 industai-sxyq k3s[1867]: COMMIT
11月 08 10:07:06 industai-sxyq k3s[1867]: *filter
11月 08 10:07:06 industai-sxyq k3s[1867]: :KUBE-FORWARD - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]: :KUBE-NODE-PORT - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]: :KUBE-PROXY-FIREWALL - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]: :KUBE-SOURCE-RANGES-FIREWALL - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-SOURCE-RANGES-FIREWALL -j DROP
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x00004000/0x00004000 -j ACCEPT
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
11月 08 10:07:06 industai-sxyq k3s[1867]: -A KUBE-NODE-PORT -m comment --comment "Kubernetes health check node port" -m set --match-set KUBE-HEALTH-CHECK-NODE-PORT dst -j ACC
11月 08 10:07:06 industai-sxyq k3s[1867]: COMMIT
11月 08 10:07:06 industai-sxyq k3s[1867]: >
iptables-restore没有-w这个选项,但是我用的模式是ipvs啊。同时iptables命令又没了。
查看内核日志:
kernel: [569918.603973] IPVS: rr: UDP 10.68.0.2:53 - no destination available
结合上一条没有,应该是iptables的问题。apt-get install --reinstall iptables
重新安装iptables,再重启k3s,再次查看,coredns正常启动,再进pod尝试dns解析,发现pod内解析正常,也能正常访问pod的ip。查看k3s日志和内核日志,均无报错。
开发反馈服务正常了。
问题
虽然问题就这么解决了,但是我有点想不通。自己还是太菜了。
- 1、为啥这个系统重启后iptables会消失?
- 2、是什么导致的无法访问coredns的services和pod?
- 3、为啥ipvs模式会使用iptables的命令?ipvs和iptables有什么区别?