《kubernetes权威指南:从Docker到Kubernetes实践全接触(第4版)》第2章 Kubernetes安装配置指南的实践篇。书中内容写的比较简洁,本案例把遇到的各种情况和解决方案都明确列出来了。
2.1.3 实践内容2.1.3.1 k8s集群规划集群NODE节点分配:
(1) k8s-master 内网IP:192.168.0.3 公网IP: 114.67.107.240 数据目录:/data/k8s(2) k8s-node1 内网IP:192.168.0.3 公网IP: 114.67.110.126 数据目录:/root/k8s(3) k8s-node2 内网IP:192.168.0.5 公网IP: 114.67.107.226 数据目录:/data/k8s
HOST配置:
$ vim /etc/hosts # 增加以下配置,消除 sudo: unable to resolve host 告警信息# 127.0.0.1 k8s-master $sudo hostnamectl set-hostname k8s-master #在k8s-master 上执行 IP:114.67.107.240$ vim /etc/hosts # 增加以下配置,消除 sudo: unable to resolve host 告警信息# 127.0.0.1 k8s-node1$sudo hostnamectl set-hostname k8s-node1 #k8s-node1 上执行 IP:114.67.110.126$ vim /etc/hosts # 增加以下配置,消除 sudo: unable to resolve host 告警信息# 127.0.0.1 k8s-node2$sudo hostnamectl set-hostname k8s-node2 #k8s-node2 上执行 IP: 114.67.107.226# 可查看$ hostnamectl
硬件配置要求
操作系统:Ubuntu 18.04.1 LTS
内核:4.15.0-36-generic
(2)准备工作[1] 禁用开机启动防火墙
关闭ufw防火墙,Ubuntu默认未启用,无需设置。
sudo ufw disable systemctl stop firewalld
[2] 永久禁用SELINUXubuntu默认不安装selinux,假如安装了的话,按如下步骤禁用selinux
永久禁用
$ sudo vi /etc/selinux/configSELINUX=permissive
说明:
kubelet目前对selinux的支持还不好,需要禁用掉。不禁用selinux的话有时还会出现明明容器里是root身份运行,操作挂载进来的主机文件夹时没有权限的情况,这时候要么配置selinux的权限,要么就禁掉selinux另外,当Docker的storage-driver使用overlay2的时候,低版本内核的selinux不支持overlay2文件驱动,docker启动时设置为--selinux-enabled会失败报错:“Error starting daemon: SELinux is not supported with the overlay2 graph driver on this kernel”,需设置--selinux-enabled=false
[3] 开启数据包转发
修改/etc/sysctl.conf,开启ipv4转发:
$ sudo vim /etc/sysctl.confnet.ipv4.ip_forward = 1 #开启ipv4转发,允许内置路由
写入后执行如下命令生效:
$ sudo sysctl -p
备注:
什么是ipv4转发:出于安全考虑,Linux系统默认是禁止数据包转发的。转发即当主机拥有多于一块的网卡时,其中一块收到数据包,根据数据包的目的ip地址将数据包发往本机另一块网卡,该网卡根据路由表继续发送数据包。这通常是路由器所要实现的功能。kube-proxy的ipvs模式和calico(都涉及路由转发)都需要主机开启ipv4转发。另外,不使用k8s,即使只使用docker的时候,以下两种情况也依赖ipv4转发:<1>当同一主机上的两个跨bridge(跨bridge相当于跨网段,跨网络需要路由)的容器互访<2>从容器内访问外部
[4] 防火墙修改FORWARD链默认策略
数据包经过路由后,假如不是发往本机的流量,下一步会走iptables的FORWARD链,而docker从1.13版本开始,将FORWARD链的默认策略设置为DROp,会导致出现一些例如跨主机的两个pod使用podIP互访失败等问题。解决方案有2个:
在所有节点上开机启动时执行iptables -P FORWARD ACCEPT让docker不操作iptables
方案一 临时生效:
$ sudo iptables -P FORWARD ACCEPT
iptables的配置重启后会丢失,可以将配置写进/etc/rc.local中,重启后自动执行:
/usr/sbin/iptables -P FORWARD ACCEPT
方案二
设置docker启动参数添加--iptables=false选项,使docker不再操作iptables,比如1.10版以上可编辑docker daemon默认配置文件/etc/docker/daemon.json:
{... "iptables": false...}
备注:建议方案二kubernetes官网建议和k8s结合的时候docker的启动参数设置--iptables=false使得docker不再操作iptables,完全由kube-proxy来操作iptables。参考:<1>https://docs.docker.com/v17.09/engine/userguide/networking/default_network/container-communication/#container-communication-between-hosts<2>https://github.com/kubernetes/kubernetes/issues/40182<3>https://kubernetes.io/docs/setup/scratch/#docker<4>https://github.com/moby/moby/pull/28257
[5] 禁用swap
禁掉所有的swap分区
$ sudo swapoff -a
同时还需要修改/etc/fstab文件,注释掉 SWAP 的自动挂载,防止机子重启后swap启用。
Kubernetes 1.8开始要求关闭系统的Swap,如果不关闭,默认配置下kubelet将无法启动,虽然可以通过kubelet的启动参数--fail-swap-on=false更改这个限制,但不建议,最好还是不要开启swap。一些为什么要关闭swap的讨论:<1>https://github.com/kubernetes/kubernetes/issues/7294<2>https://github.com/kubernetes/kubernetes/issues/53533
[6] 配置iptables参数,使得流经网桥的流量也经过iptables/netfilter防火墙
$ sudo tee /etc/sysctl.d/k8s.conf <<-'EOF'net.bridge.bridge-nf-call-ip6tables = 1net.bridge.bridge-nf-call-iptables = 1EOF$ sudo sysctl --system
备注:
网络插件需要为kube-proxy提供一些特定的支持,比如kube-proxy的iptables模式基于iptables,网络插件就需要确保容器的流量可以流过iptables。比如一些网络插件会用到网桥,而网桥工作在数据链路层,iptables/netfilter防火墙工作在网络层,以上配置则可以使得通过网桥的流量也进入iptables/netfilter防火墙中,确保iptables模式的kube-proxy可以正常工作。默认没有指定kubelet网络插件的情况下,会使用noop插件,它也会设置net/bridge/bridge-nf-call-iptables=1来确保iptables模式的kube-proxy可以正常工作。参考:<1>https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#network-plugin-requirements<2>https://kubernetes.io/docs/setup/independent/install-kubeadm/
(3)安装docker[1] 安装并启动docker
$ apt-get update$ apt install docker.io$ systemctl enable docker && systemctl start docker$ systemctl status docker
[2] docker镜像源参数配置
为docker设置如下配置:
设置阿里云镜像库加速dockerhub的镜像。国内访问dockerhub不稳定,将对dockerhub的镜像拉取代理到Docker 中国官方镜像
(4)安装kubeadm、kubelet、kubectl
$ vim /etc/docker/daemon.json { "registry-mirrors": ["https://registry.docker-cn.com"], "iptables": false, "ip-masq": false, "storage-driver": "overlay2", "graph": "/home/lk/docker"}$ sudo systemctl restart docker
[1] 创建kubernetes的repo
创建kubernetes的source文件,google地址被墙的情况下可以使用阿里云或者中科大的镜像站:
sudo apt-get update && sudo apt-get install -y apt-transport-https curlsudo curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -sudo tee /etc/apt/sources.list.d/kubernetes.list <<-'EOF'deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial mainEOFsudo apt-get update
[2] 安装kubeadm、kubelet、kubectl
1.查看可用软件版本:
$ apt-cache madison kubeadm kubeadm | 1.21.3-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages kubeadm | 1.21.2-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages kubeadm | 1.21.1-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages kubeadm | 1.21.0-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages kubeadm | 1.20.9-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages kubeadm | 1.20.8-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages kubeadm | 1.20.7-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages kubeadm | 1.20.6-00 | https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 Packages ......
2.安装指定版本:
$ sudo apt-get install -y kubelet=1.20.5-00 kubeadm=1.20.5-00 kubectl=1.20.5-00# 阻止自动更新(apt upgrade时忽略)。所以更新的时候先unhold,更新完再hold。$ sudo apt-mark hold kubelet=1.20.5-00 kubeadm=1.20.5-00 kubectl=1.20.5-00
3.设置开机自启动并运行kubelet:
sudo systemctl enable kubelet && sudo systemctl start kubelet
备注:
此时kubelet的服务运行状态是异常的(因为缺少主配置文件kubelet.conf等,可以暂不处理,因为在完成Master节点的初始化后才会生成这个配置文件)
# systemctl status kubelet● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Thu 2021-07-29 15:51:07 CST; 55min ago Docs: https://kubernetes.io/docs/home/ Main PID: 20437 (kubelet) Tasks: 27 (limit: 4915) CGroup: /system.slice/kubelet.service └─20437 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-pluJul 29 16:47:00 k8s-master kubelet[20437]: E0729 16:47:00.491624 20437 remote_runtime.go:116] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up saJul 29 16:47:00 k8s-master kubelet[20437]: E0729 16:47:00.491680 20437 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "coredns-74ff55c5b-6rvx4_kube-system(bd28b427-0af4-434d-a170-416cfJul 29 16:47:00 k8s-master kubelet[20437]: E0729 16:47:00.491694 20437 kuberuntime_manager.go:755] createPodSandbox for pod "coredns-74ff55c5b-6rvx4_kube-system(bd28b427-0af4-434d-a170-416cJul 29 16:47:00 k8s-master kubelet[20437]: E0729 16:47:00.491742 20437 pod_workers.go:191] Error syncing pod bd28b427-0af4-434d-a170-416cf3ac871d ("coredns-74ff55c5b-6rvx4_kube-system(bd28bJul 29 16:47:00 k8s-master kubelet[20437]: W0729 16:47:00.636211 20437 pod_container_deletor.go:79] Container "c820e1b0c312b05ef5cdb247a5e0afdead74f026909df40a8a8782fc998b67b1" not found inJul 29 16:47:00 k8s-master kubelet[20437]: E0729 16:47:00.636368 20437 cni.go:366] Error adding kube-system_coredns-74ff55c5b-gcfjf/c820e1b0c312b05ef5cdb247a5e0afdead74f026909df40a8a8782fc9Jul 29 16:47:00 k8s-master kubelet[20437]: W0729 16:47:00.638632 20437 docker_sandbox.go:402] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "Jul 29 16:47:00 k8s-master kubelet[20437]: W0729 16:47:00.642714 20437 pod_container_deletor.go:79] Container "7a3b54ba7a6c63e57621c06b8fd784120befa2327e99d85e670d94d1f85620a8" not found inJul 29 16:47:00 k8s-master kubelet[20437]: W0729 16:47:00.643960 20437 cni.go:333] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container Jul 29 16:47:00 k8s-master kubelet[20437]: W0729 16:47:00.644073 20437 cni.go:333] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container Jul 29 16:47:00 k8s-master kubelet[20437]: E0729 16:47:00.869338 20437 remote_runtime.go:116] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up saJul 29 16:47:00 k8s-master kubelet[20437]: E0729 16:47:00.869386 20437 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "coredns-74ff55c5b-gcfjf_kube-system(50027ab9-037c-4a63-932a-a6bf3Jul 29 16:47:00 k8s-master kubelet[20437]: E0729 16:47:00.869399 20437 kuberuntime_manager.go:755] createPodSandbox for pod "coredns-74ff55c5b-gcfjf_kube-system(50027ab9-037c-4a63-932a-a6bfJul 29 16:47:00 k8s-master kubelet[20437]: E0729 16:47:00.869440 20437 pod_workers.go:191] Error syncing pod 50027ab9-037c-4a63-932a-a6bf39a16e54 ("coredns-74ff55c5b-gcfjf_kube-system(50027
问题描述1 :
# systemctl status kubelet● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: activating (auto-restart) (Result: exit-code) since Mon 2021-08-16 09:59:06 CST; 2s ago Docs: https://kubernetes.io/docs/home/ Process: 22001 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255) Main PID: 22001 (code=exited, status=255)Aug 16 09:59:06 k8s-node2 systemd[1]: kubelet.service: Unit entered failed state.Aug 16 09:59:06 k8s-node2 systemd[1]: kubelet.service: Failed with result 'exit-code'.
首先查看下kubelet的日志,日志记录的东西都是最详细的。
# journalctl -xeu kubelet | less
核心错误提示:
-- The start-up result is done.Aug 16 09:58:52 k8s-node2 kubelet[21974]: F0816 09:58:52.200450 21974 server.go:198] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directoryAug 16 09:58:52 k8s-node2 kubelet[21974]: goroutine 1 [running]:
这个一般是原有残留文件导致的join失败,删掉所有的相关文件、目录,重新来一遍试试。
2.1.3.3 Kubernetes集群安装2.1.3.3.1 master节点部署[1] 镜像下载
sudo systemctl stop kubelet sudo systemctl disable kubeletsudo apt-get purge kubeadmsudo apt-get purge kubectlsudo apt-get purge kubelet# 清理残留数据dpkg -l |grep ^rc|awk '{print $2}' |sudo xargs dpkg -P
1,提前下载所需镜像
看一下kubernetes v1.20.5需要哪些镜像:
$ kubeadm config images list --kubernetes-version=v1.20.5k8s.gcr.io/kube-apiserver:v1.20.5k8s.gcr.io/kube-controller-manager:v1.20.5k8s.gcr.io/kube-scheduler:v1.20.5k8s.gcr.io/kube-proxy:v1.20.5k8s.gcr.io/pause:3.2k8s.gcr.io/etcd:3.4.13-0k8s.gcr.io/coredns:1.7.0
由于gcr.io被墙,查看了下https://hub.docker.com/u/v5cn的镜像挺全的。
docker pull v5cn/kube-apiserver:v1.20.5docker pull v5cn/kube-controller-manager:v1.20.5docker pull v5cn/kube-scheduler:v1.20.5docker pull v5cn/kube-proxy:v1.20.5docker pull v5cn/pause:3.2docker pull v5cn/etcd:3.4.13-0docker pull v5cn/coredns:1.7.0
2.重新打回k8s.gcr.io的镜像tag:
[2] kubeadm init初始化集群
docker tag v5cn/kube-apiserver:v1.20.5 k8s.gcr.io/kube-apiserver:v1.20.5docker tag v5cn/kube-controller-manager:v1.20.5 k8s.gcr.io/kube-controller-manager:v1.20.5docker tag v5cn/kube-scheduler:v1.20.5 k8s.gcr.io/kube-scheduler:v1.20.5docker tag v5cn/kube-proxy:v1.20.5 k8s.gcr.io/kube-proxy:v1.20.5docker tag v5cn/pause:3.2 k8s.gcr.io/pause:3.2docker tag v5cn/etcd:3.4.13-0 k8s.gcr.io/etcd:3.4.13-0docker tag v5cn/coredns:1.7.0 k8s.gcr.io/coredns:1.7.0
<1> 配置文件方式启动
执行kubeadm config print init-defaults,可以取得默认的初始化参数文件:
kubeadm config print init-defaults --kubeconfig ClusterConfiguration > kubeadm.yml
配置文件kubeadm.yml:
配置关键点:
advertiseAddress:114.67.107.240,bindPort: 6443kubernetesVersion: v1.20.5serviceSubnet: 10.96.0.0/12 //不需要跟实际网络一致,采用默认即可podSubnet: 10.244.0.0/16 //不需要跟实际网络一致,采用默认即可
apiVersion: kubeadm.k8s.io/v1beta2bootstrapTokens:- groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authenticationkind: InitConfigurationlocalAPIEndpoint: advertiseAddress: 114.67.107.240 bindPort: 6443nodeRegistration: criSocket: /var/run/dockershim.sock name: k8s-master taints: - effect: NoSchedule key: node-role.kubernetes.io/master---apiServer: timeoutForControlPlane: 4m0sapiVersion: kubeadm.k8s.io/v1beta2certificatesDir: /etc/kubernetes/pkiclusterName: kubernetescontrollerManager: {}dns: type: CoreDNSetcd: local: dataDir: /var/lib/etcdimageRepository: k8s.gcr.iokind: ClusterConfigurationkubernetesVersion: v1.20.5networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16scheduler: {}
初始化集群:
kubeadm init --config=kubeadm.yml
运行成功的结果:
$ kubeadm init --config=kubeadm.yml [init] Using Kubernetes version: v1.20.5[preflight] Running pre-flight checks [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.2. Latest validated version: 19.03[preflight] Pulling images required for setting up a Kubernetes cluster[preflight] This might take a minute or two, depending on the speed of your internet connection[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'[certs] Using certificateDir folder "/etc/kubernetes/pki"...[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key[addons] Applied essential addon: CoreDNS[addons] Applied essential addon: kube-proxyYour Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/configAlternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.confYou should now deploy a pod network to the cluster.Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/Then you can join any number of worker nodes by running the following on each as root:kubeadm join 114.67.107.240:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:99ea8c7fbd529eafe466f7b5ac403b896c700746a2d16178c5748927f04680db
问题说明:
(1)如果原来有集群配置,需要重置,则运行重启集群,所有的image都需要重新下载。请参考[7]操作。
(2)遇到告警:
[preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.2. Latest validated version: 19.03
执行:
$ Setup daemon.cat > /etc/docker/daemon.json <<EOF{ "exec-opts": ["native.cgroupdriver=systemd"], "insecure-registries":["192.168.0.3:8080"] }EOFmkdir -p /etc/systemd/system/docker.service.d#我顺便吧docker的私有仓库也加在里面# Restart docker.systemctl daemon-reloadsystemctl restart docker
查看 Docker 使用的 cgroup driver:
# docker info | grep -i cgroup Cgroup Driver: systemdWARNING: No swap limit support Cgroup Version: 1
可以看出docker 20.10.2的Cgroup Driver已改为systemd了。
(3)如果遇到port 10251 and 10252 are in use 错误请执行 netstat -lnp | grep 1025 然后kill 进程ID
(4)云主机/ECS下kubeadm部署k8s无法指定公网IP,导致集群初始化失败。
京东云主机或者阿里云下,运行master的kubeadm init,指定公网IP 114.67.107.240 为advertiseAddress,出现集群启动失败。
失败打印:
...[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s[kubelet-check] Initial timeout of 40s passed. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all Kubernetes containers running in docker: - 'docker ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'docker logs CONTAINERID'error execution phase wait-control-plane: couldn't initialize a Kubernetes clusterTo see the stack trace of this error execute with --v=5 or higher
解决方法:修改etcd.yaml
在输入上述命令后,kubeadm即开始了master节点的初始化,但是由于etcd配置文件不正确,所以etcd无法启动,要对该文件进行修改。
$ vim /etc/kubernetes/manifests/etcd.yaml
修改前:
- --listen-client-urls=https://127.0.0.1:2379,https://114.67.107.240:2379 - --listen-metrics-urls=http://127.0.0.1:2381 - --listen-peer-urls=https://114.67.107.240:2380
修改后:
- --listen-client-urls=https://0.0.0.0:2379 - --listen-metrics-urls=http://0.0.0.0:2381 - --listen-peer-urls=https://0.0.0.0:2380
稍等片刻之后,master节点就初始化好了。
<2> 命令文件方式启动-未测试
[3] 检查kubelet使用的cgroup driver
sudo kubeadm init --apiserver-advertise-address=192.168.0.3 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 --kubernetes-version=v1.20.5
kubelet启动时指定的cgroup driver需要和docker所使用的保持一致。
1.查看 Docker 使用的 cgroup driver:
$ docker info | grep -i cgroup Cgroup Driver: systemdWARNING: No swap limit support Cgroup Version: 1
可以看出docker 20.10.2默认使用的Cgroup Driver已为systemd。
[4] 创建kubectl使用的kubeconfig文件为了使得kubectl控制集群,需要做对于非root用户:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
对于root用户:
[5] 设置master参与工作负载
export KUBECONFIG=/etc/kubernetes/admin.conf
kubeadm在Master上也安装了kubelet,在默认情况下并不参与工作负载。如果希望安装一个单机All-In-One的Kubernetes环境,则可以执行下面的命令(删除Node的Label“node-role.kubernetes.io/master”),让Master成为一个Node:
$kubectl taint nodes --all node-role.kubernetes.io/master-输出:node/k8s-master untainted
【说明】查询nodes节点信息,出现错误
# kubectl get nodesUnable to connect to the server: x509: certificate is valid for 47.117.64.1, 172.19.23.53, not 47.117.67.43
查看发现,kubeadm reset时没有删除.kube目录下的文件,这里手动删除,然后再按上述步骤执行一遍,即可解决问题。
cd $HOMEls -arm -rf .kubemkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config# 查看pods和nodes信息root@k8s-master:~## kubectl get pods --all-namespacesNAMESPACE NAME READY STATUS RESTARTS AGEdefault curl 0/1 Pending 0 168mkube-system coredns-74ff55c5b-6rvx4 0/1 ContainerCreating 0 3h55mkube-system coredns-74ff55c5b-gcfjf 0/1 ContainerCreating 0 3h55mkube-system etcd-k8s-master 1/1 Running 0 3h55mkube-system kube-apiserver-k8s-master 1/1 Running 0 3h55mkube-system kube-controller-manager-k8s-master 1/1 Running 0 3h28mkube-system kube-flannel-ds-wq8k4 0/1 CrashLoopBackOff 39 155mkube-system kube-proxy-bf669 1/1 Running 0 3h55mkube-system kube-scheduler-k8s-master 1/1 Running 0 170mkube-system kuboard-74c645f5df-bgctl 0/1 ContainerCreating 0 77mroot@k8s-master:~# kubectl get nodesNAME STATUS ROLES AGE VERSIONk8s-master Ready control-plane,master 3h39m v1.20.5
如果发现有状态错误的Pod,则可以执行kubectl --namespace=kube-system describepod<pod_name>来查看错误原因,常见的错误原因是镜像没有下载完成。
$kubectl describe pod kube-flannel-ds-wq8k4 --namespace=kube-system ...Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning BackOff 79s (x432 over 96m) kubelet Back-off restarting failed container
查看contailer的Log:
[6] 查看一下集群状态
$docker ps -a...eab06e3325ec 8522d622299c "/opt/bin/flanneld -…" 2 minutes ago Exited (1) 2 minutes ago k8s_kube-flannel_kube-flannel-ds-wq8k4_kube-system_68626880-c87a-4dbd-926d-f889aa9bbfb7_41docker logs -t -f 051e74608f16docker logs -t -f k8s_kube-flannel_kube-flannel-ds-wq8k4_kube-system_68626880-c87a-4dbd-926d-f889aa9bbfb7_41
确认个组件都处于healthy状态:
$ kubectl get csWarning: v1 ComponentStatus is deprecated in v1.19 NAME STATUS MESSAGE ERRORcontroller-manager Healthy ok scheduler Healthy ok etcd-0 Healthy {"health":"true"} $ kubectl get nodesNAME STATUS ROLES AGE VERSIONk8s-master NotReady control-plane,master 59m v1.20.5# 可以看到其中生成了名为kubeadm-config的ConfigMap对象$kubectl get -n kube-system configmapNAME DATA AGEcoredns 1 3h10mextension-apiserver-authentication 6 3h10mkube-flannel-cfg 2 110mkube-proxy 2 3h10mkube-root-ca.crt 1 3h10mkubeadm-config 2 3h10mkubelet-config-1.20 1 3h10m# 查看pds状态$kubectl get pods --all-namespacess# kubectl get pods --all-namespacesNAMESPACE NAME READY STATUS RESTARTS AGEkube-system coredns-74ff55c5b-pdrpb 0/1 ContainerCreating 0 12hkube-system coredns-74ff55c5b-tfcbn 0/1 ContainerCreating 0 12hkube-system etcd-k8s-master 1/1 Running 0 12hkube-system kube-apiserver-k8s-master 1/1 Running 0 12hkube-system kube-controller-manager-k8s-master 1/1 Running 0 12hkube-system kube-flannel-ds-w5f88 0/1 CrashLoopBackOff 146 12hkube-system kube-proxy-p9sxm 1/1 Running 0 12hkube-system kube-scheduler-k8s-master 1/1 Running 0 12h
说明:
(1) 超时
$ kubectl get csUnable to connect to the server: dial tcp 47.117.67.43:6443: i/o timeout
这儿是访问外网IP,该端口在安全组未打开。
(2)拒绝
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
出现这种情况是kube-controller-manager.yaml和kube-scheduler.yaml设置的默认端口是0,在文件中注释掉就可以了。(每台master节点都要执行操作)
vim /etc/kubernetes/manifests/kube-controller-manager.yamlvim /etc/kubernetes/manifests/kube-scheduler.yaml# 注释掉port=0这一行# - --port=0#所有节点重启kubeletsystemctl restart kubelet.service
(3)the server localhost:8080 was refused
关闭XSHELL,输入出现报错。
$ kubectl get csThe connection to the server localhost:8080 was refused - did you specify the right host or port?
出现这个问题的原因是kubectl命令需要使用kubernetes-admin来运行,解决方法如下,将主节点中的【/etc/kubernetes/admin.conf】文件拷贝到从节点相同目录下,然后配置环境变量:
[7] 安装失败,重新安装
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profilesource ~/.bash_profile
如果安装失败,则可以执行kubeadm reset命令将主机恢复原状,重新执行kubeadm init命令,再次进行安装。
$ kubeadm reset
然后手工删除文件:
$ rm -rf $HOME/.kube
然后从[1]开始重新执行。
2.1.3.3.2 flannel网络部署calico部署会同时部署cni插件以及calico组件两部分,而flannel的部署只会初始化一些cni的配置文件,并不会部署cni的可执行文件,需要手动部署,所以flannel部署分为两步:
CNI插件部署flannel组件部署先在master节点安装flannel网络配置。
[1] CNI插件部署(所有节点)1.创建cni插件目录
sudo mkdir -p /opt/cni/bincd /opt/cni/bin
2.到release页面下载二进制文件
$ sudo wget -c https://github.com.cnpmjs.org/containernetworking/plugins/releases/download/v0.9.1/cni-plugins-linux-amd64-v0.9.1.tgz
说明:
github.com连接不上,改为国内镜像地址访问。
提供两个最常用的镜像地址:
https://github.com.cnpmjs.orghttps://hub.fastgit.org
3.在/opt/cni/bin目录下解压即安装好
sudo tar -zxvf cni-plugins-linux-amd64-v0.9.1.tgz
添加了如下插件:
[2] flannel部署
$ ll /opt/cni/bintotal 111480-rwxr-xr-x 1 root root 4151672 Feb 5 23:42 bandwidth*-rwxr-xr-x 1 root root 4536104 Feb 5 23:42 bridge*-rw-r--r-- 1 root root 363874 Jul 29 15:13 cni-plugins-linux-amd64-v0.9.1.tgz-rw-r--r-- 1 root root 39771622 Feb 8 18:14 cni-plugins-linux-amd64-v0.9.1.tgz.1-rwxr-xr-x 1 root root 10270090 Feb 5 23:42 dhcp*-rwxr-xr-x 1 root root 4767801 Feb 5 23:42 firewall*-rwxr-xr-x 1 root root 3357992 Feb 5 23:42 flannel*-rwxr-xr-x 1 root root 4144106 Feb 5 23:42 host-device*-rwxr-xr-x 1 root root 3565330 Feb 5 23:42 host-local*-rwxr-xr-x 1 root root 4288339 Feb 5 23:42 ipvlan*-rwxr-xr-x 1 root root 3530531 Feb 5 23:42 loopback*-rwxr-xr-x 1 root root 4367216 Feb 5 23:42 macvlan*-rwxr-xr-x 1 root root 3966455 Feb 5 23:42 portmap*-rwxr-xr-x 1 root root 4467317 Feb 5 23:42 ptp*-rwxr-xr-x 1 root root 3701138 Feb 5 23:42 sbr*-rwxr-xr-x 1 root root 3153330 Feb 5 23:42 static*-rwxr-xr-x 1 root root 3668289 Feb 5 23:42 tuning*-rwxr-xr-x 1 root root 4287972 Feb 5 23:42 vlan*-rwxr-xr-x 1 root root 3759977 Feb 5 23:42 vrf*
1.获取yaml文件
curl -O https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
这里注意kube-flannel.yml这个文件里的flannel的镜像是v0.14.0,image: quay.io/coreos/flannel:v0.14.0
2.修改配置文件<1>修改其中net-conf.json中的Network参数使其与kubeadm init时指定的--pod-network-cidr保持一致。
net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" } }
3.镜像下载下载yaml文件中所需镜像。
docker pull quay.io/coreos/flannel:v0.14.0
4.部署
$ kubectl apply -f kube-flannel.ymlpodsecuritypolicy.policy/psp.flannel.unprivileged createdclusterrole.rbac.authorization.k8s.io/flannel createdclusterrolebinding.rbac.authorization.k8s.io/flannel createdserviceaccount/flannel createdconfigmap/kube-flannel-cfg createddaemonset.apps/kube-flannel-ds created
部署好后集群可以正常运行了。
问题说明:
<1> 假如网络部署失败或出问题需要重新部署,执行以下内容清除生成的网络接口:
sudo ifconfig cni0 downsudo ip link delete cni0sudo ifconfig flannel.1 downsudo ip link delete flannel.1sudo rm -rf /var/lib/cni/
<2> node 节点出现问题: cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
问题描述:
# systemctl status kubelet● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Mon 2021-08-16 12:46:35 CST; 1h 45min ago Docs: https://kubernetes.io/docs/home/ Main PID: 27092 (kubelet) Tasks: 14 Memory: 40.6M CPU: 1min 22.226s CGroup: /system.slice/kubelet.service └─27092 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infraAug 16 14:31:33 k8s-node2 kubelet[27092]: E0816 14:31:33.642859 27092 kubelet.go:2188] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is noAug 16 14:31:36 k8s-node2 kubelet[27092]: W0816 14:31:36.584036 27092 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.dAug 16 14:31:38 k8s-node2 kubelet[27092]: E0816 14:31:38.644359 27092 kubelet.go:2188] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is noAug 16 14:31:41 k8s-node2 kubelet[27092]: W0816 14:31:41.584177 27092 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.dAug 16 14:31:43 k8s-node2 kubelet[27092]: E0816 14:31:43.135195 27092 remote_runtime.go:116] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:Aug 16 14:31:43 k8s-node2 kubelet[27092]: E0816 14:31:43.135251 27092 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "kube-proxy-t2d2g_kube-system(3b90986c-f095-4e8b-ada0-03fd420aea3a)" failed: rpc errorAug 16 14:31:43 k8s-node2 kubelet[27092]: E0816 14:31:43.135267 27092 kuberuntime_manager.go:755] createPodSandbox for pod "kube-proxy-t2d2g_kube-system(3b90986c-f095-4e8b-ada0-03fd420aea3a)" failed: rpc erroAug 16 14:31:43 k8s-node2 kubelet[27092]: E0816 14:31:43.135321 27092 pod_workers.go:191] Error syncing pod 3b90986c-f095-4e8b-ada0-03fd420aea3a ("kube-proxy-t2d2g_kube-system(3b90986c-f095-4e8b-ada0-03fd420aAug 16 14:31:43 k8s-node2 kubelet[27092]: E0816 14:31:43.645665 27092 kubelet.go:2188] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is noAug 16 14:31:46 k8s-node2 kubelet[27092]: W0816 14:31:46.584316 27092 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
然后运行按照kubectl apply -f kube-flannel.yml时出现提示:
root@k8s-node2:/data/k8s# kubectl apply -f kube-flannel.ymlW0816 14:31:55.914180 4255 loader.go:223] Config not found: /etc/kubernetes/admin.confThe connection to the server localhost:8080 was refused - did you specify the right host or port?
还有错误:
RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:
解决方法:
复制admin.conf文件由于报错是找不到admin.conf文件,所以从主节点复制该文件到从节点.新增环境变量
scp /etc/kubernetes/admin.conf root@114.67.107.226:/etc/kubernetes/admin.confscp /etc/kubernetes/admin.conf root@114.67.110.126:/etc/kubernetes/admin.conf
pulling image "k8s.gcr.io/pause 表示拉取image失败。
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profilesource ~/.bash_profile
参考2.1.3.3.1 [1] 镜像下载,重新镜像,打tag。
重新运行
kubectl apply -f kube-flannel.yml
5, kubectl get pod -n kube-system 确保所有的Pod都处于Running状态。
$ kubectl get pod -n kube-systemNAME READY STATUS RESTARTS AGEcoredns-74ff55c5b-6rvx4 0/1 ContainerCreating 0 88mcoredns-74ff55c5b-gcfjf 0/1 ContainerCreating 0 88metcd-k8s-master 1/1 Running 0 88mkube-apiserver-k8s-master 1/1 Running 0 88mkube-controller-manager-k8s-master 1/1 Running 0 61mkube-flannel-ds-wq8k4 0/1 CrashLoopBackOff 6 8m23skube-proxy-bf669 1/1 Running 0 88mkube-scheduler-k8s-master 1/1 Running 0 23m
问题描述:
kube-flannel-ds-wq8k4 0/1 CrashLoopBackOff 6 8m23s
[1] 查看docker日志,docker ps -a查看可知,kube-flannel对应的CONTAINER ID为88bd276d0862。输入docker日志命令,如下:
s# docker logs 88bd276d0862I0730 03:40:23.510106 1 main.go:520] Determining IP address of default interfaceI0730 03:40:23.510398 1 main.go:533] Using interface with name eth0 and address 172.19.23.53I0730 03:40:23.510412 1 main.go:550] Defaulting external address to interface address (172.19.23.53)W0730 03:40:23.511160 1 client_config.go:608] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.I0730 03:40:23.702527 1 kube.go:116] Waiting 10m0s for node controller to syncI0730 03:40:23.702580 1 kube.go:299] Starting kube subnet managerI0730 03:40:24.702664 1 kube.go:123] Node controller sync successfulI0730 03:40:24.702708 1 main.go:254] Created subnet manager: Kubernetes Subnet Manager - k8s-masterI0730 03:40:24.702713 1 main.go:257] Installing signal handlersI0730 03:40:24.702814 1 main.go:392] Found network config - Backend type: vxlanI0730 03:40:24.702883 1 vxlan.go:123] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=falseE0730 03:40:24.704013 1 main.go:293] Error registering network: failed to acquire lease: node "k8s-master" pod cidr not assignedI0730 03:40:24.704073 1 main.go:372] Stopping shutdownHandler...
6, 由于flannel需要调用内核模块,请确保这些模块已经安装完成.
2.1.3.3.3 slave节点部署[1] 参考2.1.3.3.1完成环境配置
加载ipvs相关内核模块如果重新开机,需要重新加载(可以写在 /etc/rc.local 中开机自动加载)$ modprobe ip_vs$ modprobe ip_vs_rr$ modprobe ip_vs_wrr$ modprobe ip_vs_sh$ modprobe nf_conntrack_ipv4查看是否加载成功$ lsmod | grep ip_vsip_vs_sh 12688 0 ip_vs_wrr 12697 0 ip_vs_rr 12600 0 ip_vs 145497 6 ip_vs_rr,ip_vs_sh,ip_vs_wrrnf_conntrack 133095 9 ip_vs,nf_nat,nf_nat_ipv4,nf_nat_ipv6,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_netlink,nf_conntrack_ipv4,nf_conntrack_ipv6libcrc32c 12644 4 xfs,ip_vs,nf_nat,nf_conntrack
消除加入时告警“[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/”的方法:
[2] 节点互信
# Setup daemon.cat > /etc/docker/daemon.json <<EOF{ "exec-opts": ["native.cgroupdriver=systemd"]}EOFmkdir -p /etc/systemd/system/docker.service.d# Restart docker.systemctl daemon-reloadsystemctl restart docker## 查看信息$ docker info | grep -i cgroup Cgroup Driver: systemd Cgroup Version: 1WARNING: No swap limit support
需求:三台Linux主机,配置登录用户的互信
1.各节点ssh-keygen生成RSA密钥和公钥
ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa
2.将所有的公钥文件汇总到一个总的授权key文件中在114.67.107.240执行汇总:
ssh 114.67.107.240 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keysssh 114.67.110.126 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keysssh 114.67.107.226 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
出于安全性考虑,将这个授权key文件赋予600权限:
chmod 600 ~/.ssh/authorized_keys
3.将这个包含了所有互信机器认证key的认证文件,分发到各个机器中去
scp ~/.ssh/authorized_keys 114.67.110.126:~/.ssh/scp ~/.ssh/authorized_keys 114.67.107.226:~/.ssh/
4.验证互信,各节点执行下面命令,能不输入密码显示时间,配置成功
[3] HOST文件解析
ssh 114.67.107.240 date;ssh 114.67.110.126 date;ssh 114.67.107.226 date;
修改所有的服务器的HOST文件解析:
[4] flannet网络部署
vim /etc/hosts# 增加以下信息:114.67.107.240 k8s-master114.67.110.126 k8s-node1114.67.107.226 k8s-node2
参考“2.1.3.3.2 flannel网络部署”章节完成。
[5] 节点加入集群在slave节点上执行以下命令即可加入集群:
kubeadm join 114.67.107.240:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:99ea8c7fbd529eafe466f7b5ac403b896c700746a2d16178c5748927f04680db
成功加入:
# kubeadm join 114.67.107.240:6443 --token abcdef.0123456789abcdef \> --discovery-token-ca-cert-hash sha256:99ea8c7fbd529eafe466f7b5ac403b896c700746a2d16178c5748927f04680db [preflight] Running pre-flight checks[preflight] Reading configuration from the cluster...[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Starting the kubelet[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...This node has joined the cluster:* Certificate signing request was sent to apiserver and a response was received.* The Kubelet was informed of the new secure connection details.Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在master上输入查看:
# kubectl get nodesNAME STATUS ROLES AGE VERSIONk8s-master Ready control-plane,master 8h v1.20.5k8s-node NotReady <none> 170m v1.20.5
<1> 如果出现错误提示:
# kubeadm join 172.19.23.53:6443 --token abcdef.0123456789abcdef \> --discovery-token-ca-cert-hash \> sha256:7482c9bc27c43e0247c4428c1f6b859716ef335270d1be7ef90352a4694a340d[preflight] Running pre-flight checks [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.2. Latest validated version: 19.03error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists [ERROR FileAvailable--etc-kubernetes-bootstrap-kubelet.conf]: /etc/kubernetes/bootstrap-kubelet.conf already exists [ERROR Port-10250]: Port 10250 is in use [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`To see the stack trace of this error execute with --v=5 or higher
原因是:一些配置文件与服务已经存在
解决方案:
#重置kubeadm kubeadm reset
<2> 默认token的有效期为24小时,当过期之后,该token就不可用了。此时可以在master重新生成token:
kubeadm token generatekubeadm token create <generated-token> --print-join-command --ttl=0
设置–ttl=0代表永不过期
<3> 问题:/proc/sys/net/bridge/bridge-nf-call-iptables does not exist
# kubeadm join 114.67.107.240:6443 --token abcdef.0123456789abcdef \> --discovery-token-ca-cert-hash sha256:a4ba1051211c8cd99111d804fe41b1d3b564a8545b06e126145b506fc9a25b4f [preflight] Running pre-flight checkserror execution phase preflight: [preflight] Some fatal errors occurred: [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`To see the stack trace of this error execute with --v=5 or higher
解决办法:
modprobe br_netfilterecho 1 > /proc/sys/net/bridge/bridge-nf-call-iptablesecho 1 > /proc/sys/net/ipv4/ip_forward
<4> 问题: Unable to update cni config: no networks found in /etc/cni/net.d
node节点查看kubectrl日志信息,发现以下问题:
root@k8s-node2:/data/k8s# systemctl status kubelet● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Mon 2021-08-16 12:46:35 CST; 1h 39min ago Docs: https://kubernetes.io/docs/home/ Main PID: 27092 (kubelet) Tasks: 14 Memory: 41.5M CPU: 1min 18.405s CGroup: /system.slice/kubelet.service └─27092 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infraAug 16 14:26:21 k8s-node2 kubelet[27092]: W0816 14:26:21.567855 27092 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.dAug 16 14:26:22 k8s-node2 kubelet[27092]: E0816 14:26:22.140015 27092 remote_runtime.go:116] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:Aug 16 14:26:22 k8s-node2 kubelet[27092]: E0816 14:26:22.140073 27092 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "kube-proxy-t2d2g_kube-system(3b90986c-f095-4e8b-ada0-03fd420aea3a)" failed: rpc errorAug 16 14:26:22 k8s-node2 kubelet[27092]: E0816 14:26:22.140089 27092 kuberuntime_manager.go:755] createPodSandbox for pod "kube-proxy-t2d2g_kube-system(3b90986c-f095-4e8b-ada0-03fd420aea3a)" failed: rpc erroAug 16 14:26:22 k8s-node2 kubelet[27092]: E0816 14:26:22.140141 27092 pod_workers.go:191] Error syncing pod 3b90986c-f095-4e8b-ada0-03fd420aea3a ("kube-proxy-t2d2g_kube-system(3b90986c-f095-4e8b-ada0-03fd420aAug 16 14:26:23 k8s-node2 kubelet[27092]: E0816 14:26:23.548016 27092 kubelet.go:2188] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is noAug 16 14:26:26 k8s-node2 kubelet[27092]: W0816 14:26:26.568025 27092 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.dAug 16 14:26:28 k8s-node2 kubelet[27092]: E0816 14:26:28.548935 27092 kubelet.go:2188] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is noAug 16 14:26:31 k8s-node2 kubelet[27092]: W0816 14:26:31.568274 27092 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.dAug 16 14:26:33 k8s-node2 kubelet[27092]: E0816 14:26:33.550208 27092 kubelet.go:2188] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is no
docker ps发现flannel并没有配置。按照2.1.3.2.2重新配置flannel网络即可。
2.1.3.3.4 配置标签可以看到,这3个节点已经状态为Ready了,但是新加入的节点的ROLES为none.
# kubectl get nodes -o wideNAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIMEk8s-master Ready control-plane,master 8h v1.20.5 192.168.0.3 <none> Ubuntu 16.04.5 LTS 4.4.0-62-generic docker://18.9.7k8s-node1 Ready <none> 10m v1.20.5 192.168.0.3 <none> Ubuntu 16.04.5 LTS 4.4.0-62-generic docker://18.9.7k8s-node2 Ready <none> 8h v1.20.5 192.168.0.5 <none> Ubuntu 16.04.5 LTS 4.4.0-62-generic docker://18.9.7
将node1,node2的角色定义为worker。
# kubectl label nodes k8s-node1 node-role.kubernetes.io/worker=# kubectl label nodes k8s-node2 node-role.kubernetes.io/worker=
然后查看,角色有了。
2.1.3.3.5 安装Kuboard【未成功】
# kubectl get nodesNAME STATUS ROLES AGE VERSIONk8s-master Ready control-plane,master 9h v1.20.5k8s-node1 Ready worker 45m v1.20.5k8s-node2 Ready worker 9h v1.20.5
如果您已经有了 Kubernetes 集群,只需要一行命令即可安装 Kuboard: kubectl apply -f https://kuboard.cn/install-script/kuboard.yaml然后访问您集群中任意节点的 32567 端口(http://any-of-your-node-ip:32567, 例如http://47.117.67.43:32567),即可打开 Kuboard 界面。
$ kubectl apply -f https://kuboard.cn/install-script/kuboard.yaml deployment.apps/kuboard createdservice/kuboard createdserviceaccount/kuboard-user createdclusterrolebinding.rbac.authorization.k8s.io/kuboard-user createdserviceaccount/kuboard-viewer createdclusterrolebinding.rbac.authorization.k8s.io/kuboard-viewer created
用以下命令获取token:
2.1.3.3.6 卸载集群
kubectl create clusterrolebinding serviceaccounts-cluster-admin --clusterrole=cluster-admin --group=system:serviceaccountskubectl create serviceaccount dashboard -n defaultkubectl create clusterrolebinding dashboard-admin -n default --clusterrole=cluster-admin --serviceaccount=default:dashboardkubectl get secret $(kubectl get serviceaccount dashboard -o jsonpath="{.secrets[0].name}") -o jsonpath="{.data.token}" | base64 --decode
想要撤销kubeadm执行的操作,首先要排除节点,并确保该节点为空, 然后再将其关闭。
在Master节点上运行:
kubectl drain <node name> --delete-local-data --force --ignore-daemonsetskubectl delete node <node name>
然后在需要移除的节点上,重置kubeadm的安装状态:
sudo kubeadm reset
如果你想重新配置集群,使用新的参数重新运行kubeadm init或者kubeadm join即可。
2.1.4 参考(1) kubeadm部署kubernetes v1.16.3集群 https://blog.csdn.net/networken/article/details/84991940
(2)Ubuntu-18.04使用kubeadm安装kubernetes-1.12.0 https://blog.csdn.net/liukuan73/article/details/83116271
(3)ubuntu18安装Kubernetes 1.20.5 https://blog.csdn.net/ma_jiang/article/details/115431983
(4)使用Kubeadm搭建Kubernetes集群 https://cloud.tencent.com/developer/article/1618126
(5)《Kubernetes权威指南:从Docker到Kubernetes实践全接触(第4版)》第2章 Kubernetes安装配置指南