记一次coredns解析故障

现象：coredns链接apiserver失败：

[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.68.0.1:443/version": dial tcp 10.68.0.1:443: i/o timeout

排查过程

查看k8s的apiserver的状态：
kubectl get svc

NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1    <none>        443/TCP   172m

在宿主机上去去访问apiserver的地址：

root@industai-sxyq:/opt#curl  10.68.0.1:443
404 page not found

root@industai-sxyq:/opt#ping  10.68.0.1
PING 10.68.0.1 (10.68.0.1) 56(84) bytes of data.
64 bytes from 10.68.0.1: icmp_seq=1 ttl=64 time=0.107 ms
64 bytes from 10.68.0.1: icmp_seq=2 ttl=64 time=0.085 ms

怀疑是iptables转发的问题，查看iptables，但是机器上并没有iptables命令，这就很奇怪。

(base) root@industai-sxyq:/opt# ipt
iptables-apply    iptables-restore  iptables-save     iptables-xml   
(base) root@industai-sxyq:/opt# ipt

使用apt 安装iptables。限制iptables已安装。只能使用apt-get install --reinstall iptables 重新安装iptabels。

(base) root@industai-sxyq:/opt# apt install iptables
Reading package lists... Done
Building dependency tree
Reading state information... Done
iptables is already the newest version (1.6.1-2ubuntu2.1).
The following packages were automatically installed and are no longer required:
  ieee-data python-certifi python-chardet python-jmespath python-kerberos python-libcloud python-lockfile python-netaddr python-openssl python-requests
  python-selinux python-simplejson python-urllib3 python-xmltodict
Use 'apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 10 not upgraded.

重新安装iptables之后，iptables命令重新出现了。看来问题就在这里。于是清除了iptables规则iptables -F
清楚之后，重启了k3s。重启之后，k3s重建了iptables规则。而且coredns可以正常启动啦。

新的问题

本以为问题就这样解决了，发现其他的pod根本不能访问coredns进行dns解析。
检查了flaanel的运行状态，发现flannel和cni0网卡都在。

通过tcp抓cni0网卡的包发现：

11:44:20.643974 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275158535 ecr 0,nop,wscale 7], length 0
11:44:21.665944 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275159557 ecr 0,nop,wscale 7], length 0
11:44:23.681946 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275161573 ecr 0,nop,wscale 7], length 0
11:44:25.825941 ARP, Request who-has industai-sxyq tell 10.58.0.172, length 28
11:44:27.873945 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275165765 ecr 0,nop,wscale 7], length 0
11:44:36.065944 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275173957 ecr 0,nop,wscale 7], length 0
11:44:52.193949 IP 10.58.0.172.57526 > 10.58.0.192.domain: Flags [S], seq 2885522300, win 64860, options [mss 1410,sackOK,TS val 4275190085 ecr 0,nop,wscale 7], length 0
11:44:57.313946 ARP, Request who-has industai-sxyq tell 10.58.0.172, length 28

查看k3s得网络模式，是使用的ipvs，

ExecStart=/usr/local/bin/k3s \
    server \
	'--disable=traefik' \
	'--disable=coredns' \
	'--data-dir=/data/rancher/k3s' \
	'--service-cidr=10.68.0.0/16' \
	'--cluster-cidr=10.58.0.0/16' \
	'--cluster-dns=10.68.0.2' \
	'--cluster-domain=cluster.local.' \
	'--flannel-backend=vxlan' \
	'--kubelet-arg=topology-manager-policy=single-numa-node' \
	'--kubelet-arg=cpu-manager-policy=static' \
	'--kubelet-arg=kube-reserved=cpu=1' \
	'--kube-apiserver-arg=service-node-port-range=20000-40000' \
	'--kube-apiserver-arg=authorization-mode=Node,RBAC' \
	'--kube-apiserver-arg=allow-privileged=true' \
	'--kube-proxy-arg=proxy-mode=ipvs' \
	'--kube-proxy-arg=masquerade-all=true' \
	'--kube-proxy-arg=metrics-bind-address=0.0.0.0' \
	 '--kube-scheduler-arg=config=/etc/rancher/k3s/scheduler-policy-config.yaml'
     ```


在网上排查找了个帖子`https://ask.kubesphere.io/forum/d/4699-k8s-pod-ping-kubesphere-devops/52`，感觉和我的抓包结果很像，发出去了包但是没有回应。
怀疑是自己系统的ipv4转发有什么问题，于是重新配置了`/etc/sysctl.conf`
重新apply之后，发现还是一样。无法访问coredns。
     ```
     vm.swappiness=0
net.core.rmem_default=256960
net.core.rmem_max=16777216
net.core.wmem_default=256960
net.core.wmem_max=16777216
net.core.netdev_max_backlog=2000
net.core.somaxconn=65535
net.core.optmem_max=81920
net.ipv4.tcp_mem=8388608  12582912  16777216
net.ipv4.tcp_rmem=8192  87380  16777216
net.ipv4.tcp_wmem=8192  65536  16777216
net.ipv4.tcp_keepalive_time=180
net.ipv4.tcp_keepalive_intvl=30
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_sack=1
net.ipv4.tcp_fack=1
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_fin_timeout=10
net.ipv4.tcp_max_syn_backlog=100000
fs.file-max=1100000
fs.nr_open=1100000
fs.inotify.max_user_watches=524288
kernel.pid_max=655350

后来还发现一个奇怪的现象，其实pod里面不能访问的只有coredns的10.68.0.2这个ip、pod的ip以及外部的ip，其他k8s的service都可以访问。这就很奇怪了。
其中还做过的操作包括但不限于：重建路由规则，清除iptables，把k3s的运行模式从ipvs改到iptables，升级k3s从1.23升级到1.25，均无法解决。
但是基本能确定，问题就是出在ipvs和iptables上。无奈，知识不够，无法分析到具体原因。

故障解决

于是，清理了已经部署的k3s，然后重启了机器，重新安装k3s 1.25版本，重启之后进行安装，ansible脚本对系统配置进行了以下修改：

- name: Disable daily security update
  remote_user: root
  become: yes
  shell: "{{ item }}"
  with_items:
    - "systemctl kill --kill-who=all apt-daily.service"
    - "systemctl stop apt-daily.timer"
    - "systemctl stop apt-daily-upgrade.timer"
    - "systemctl stop apt-daily.service"
    - "systemctl stop apt-daily-upgrade.service"
    - "systemctl disable apt-daily.timer"
    - "systemctl disable apt-daily-upgrade.timer"
    - "systemctl disable apt-daily.service"
    - "systemctl disable apt-daily-upgrade.service"
  when: ansible_os_family == "Debian"
  ignore_errors: True
  tags: init_env

#- name: create user
#  user: name={{ item }} shell=/bin/bash createhome=yes
#  with_items:
#    - "{{ username }}"
#    - "readonly"
#  tags: create_user


- name: Set limits
  pam_limits:
      dest: "{{ item.dest }}"
      domain: '*'
      limit_type: "{{ item.limit_type }}"
      limit_item: "{{ item.limit_item }}"
      value: "{{ item.value }}"
  with_items:
      - { dest: '/etc/security/limits.conf',limit_type: 'soft',limit_item: 'nofile', value: '655350' }
      - { dest: '/etc/security/limits.conf',limit_type: 'hard',limit_item: 'nofile', value: '655350'}
      #- { dest: '/etc/security/limits.conf',limit_type: 'soft',limit_item: 'nproc', value: '102400' }
      #- { dest: '/etc/security/limits.conf',limit_type: 'hard',limit_item: 'nproc', value: '102400' }
      - { dest: '/etc/security/limits.conf',limit_type: 'soft',limit_item: 'sigpending', value: '255377' }
      - { dest: '/etc/security/limits.conf',limit_type: 'hard',limit_item: 'sigpending', value: '255377' }
      - { dest: '/etc/security/limits.d/90-nproc.conf', limit_type: 'soft',limit_item: 'nproc', value: '262144' }
      - { dest: '/etc/security/limits.d/90-nproc.conf', limit_type: 'hard',limit_item: 'nproc', value: '262144' }
  tags: init_env

- sysctl:
    name: "{{ item.name }}"
    value: "{{ item.value}}"
    state: present
    reload: yes
  with_items:
    - { name: 'vm.swappiness', value: '0'}
    - { name: 'net.core.rmem_default ',value: '256960'}
    - { name: 'net.core.rmem_max',value: '16777216'}
    - { name: 'net.core.wmem_default',value: '256960'}
    - { name: 'net.core.wmem_max ',value: '16777216'}
    - { name: 'net.core.netdev_max_backlog ',value: '2000'}
    - { name: 'net.core.somaxconn',value: '65535'}
    - { name: 'net.core.optmem_max',value: '81920'}
    - { name: 'net.ipv4.tcp_mem',value: '8388608  12582912  16777216'}
    - { name: 'net.ipv4.tcp_rmem',value: '8192  87380  16777216'}
    - { name: 'net.ipv4.tcp_wmem',value: '8192  65536  16777216'}
    - { name: 'net.ipv4.tcp_keepalive_time',value: '180'}
    - { name: 'net.ipv4.tcp_keepalive_intvl',value: '30'}
    - { name: 'net.ipv4.tcp_keepalive_probes',value: '3'}
    - { name: 'net.ipv4.tcp_sack',value: '1'}
    - { name: 'net.ipv4.tcp_fack',value: '1'}
    - { name: 'net.ipv4.tcp_window_scaling',value: '1'}
    - { name: 'net.ipv4.tcp_syncookies',value: '1'}
    - { name: 'net.ipv4.tcp_tw_reuse',value: '1'}
    - { name: 'net.ipv4.tcp_tw_recycle',value: '0'}
    - { name: 'net.ipv4.tcp_fin_timeout',value: '10'}
    #- { name: 'net.ipv4.ip_local_port_range',value: '1024  65000'}
    - { name: 'net.ipv4.tcp_max_syn_backlog',value: '100000'}
    - { name: 'fs.file-max',value: '1100000'}
    - { name: 'fs.nr_open',value: '1100000'}
    - { name: 'fs.inotify.max_user_watches', value: '524288' }
    - { name: 'kernel.pid_max', value: '655350' }
  ignore_errors: True
  tags: init_env

安装成功后，发现coredns依旧无法启动，但是查看内核日志：

11月 08 10:07:06 industai-sxyq k3s[1867]: E1108 10:07:06.376295    1867 proxier.go:1562] "Failed to execute iptables-restore" err=<
11月 08 10:07:06 industai-sxyq k3s[1867]:         exit status 1: iptables-restore: invalid option -- 'w'
11月 08 10:07:06 industai-sxyq k3s[1867]:         iptables-restore: invalid option -- 'W'
11月 08 10:07:06 industai-sxyq k3s[1867]:         Unknown arguments found on commandline
11月 08 10:07:06 industai-sxyq k3s[1867]:  > rules=<
11月 08 10:07:06 industai-sxyq k3s[1867]:         *nat
11月 08 10:07:06 industai-sxyq k3s[1867]:         :KUBE-SERVICES - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]:         :KUBE-POSTROUTING - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]:         :KUBE-NODE-PORT - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]:         :KUBE-LOAD-BALANCER - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]:         :KUBE-MARK-MASQ - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-POSTROUTING -m comment --comment "Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose" -m set --match-se
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-NODE-PORT -p tcp -m comment --comment "Kubernetes nodeport TCP port for masquerade purpose" -m set --match-set KUBE-NODE-POR
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-SERVICES -m comment --comment "Kubernetes service cluster ip + port for masquerade purpose" -m set --match-set KUBE-CLUSTER-
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-SERVICES -m addrtype --dst-type LOCAL -j KUBE-NODE-PORT
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-LOAD-BALANCER -j KUBE-MARK-MASQ
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-SERVICES -m set --match-set KUBE-CLUSTER-IP dst,dst -j ACCEPT
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-POSTROUTING -m mark ! --mark 0x00004000/0x00004000 -j RETURN
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-POSTROUTING -j MARK --xor-mark 0x00004000
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-MARK-MASQ -j MARK --or-mark 0x00004000
11月 08 10:07:06 industai-sxyq k3s[1867]:         COMMIT
11月 08 10:07:06 industai-sxyq k3s[1867]:         *filter
11月 08 10:07:06 industai-sxyq k3s[1867]:         :KUBE-FORWARD - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]:         :KUBE-NODE-PORT - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]:         :KUBE-PROXY-FIREWALL - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]:         :KUBE-SOURCE-RANGES-FIREWALL - [0:0]
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-SOURCE-RANGES-FIREWALL -j DROP
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x00004000/0x00004000 -j ACCEPT
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
11月 08 10:07:06 industai-sxyq k3s[1867]:         -A KUBE-NODE-PORT -m comment --comment "Kubernetes health check node port" -m set --match-set KUBE-HEALTH-CHECK-NODE-PORT dst -j ACC
11月 08 10:07:06 industai-sxyq k3s[1867]:         COMMIT
11月 08 10:07:06 industai-sxyq k3s[1867]:  >

iptables-restore没有-w这个选项，但是我用的模式是ipvs啊。同时iptables命令又没了。
查看内核日志：

kernel: [569918.603973] IPVS: rr: UDP 10.68.0.2:53 - no destination available

结合上一条没有，应该是iptables的问题。apt-get install --reinstall iptables 重新安装iptables，再重启k3s，再次查看，coredns正常启动，再进pod尝试dns解析，发现pod内解析正常，也能正常访问pod的ip。查看k3s日志和内核日志，均无报错。
开发反馈服务正常了。

问题

虽然问题就这么解决了，但是我有点想不通。自己还是太菜了。

1、为啥这个系统重启后iptables会消失？
2、是什么导致的无法访问coredns的services和pod？
3、为啥ipvs模式会使用iptables的命令？ipvs和iptables有什么区别？

FunkyPantsSa