二十四 国产容器管理平台KubeSphere实战排错

概述:近期在使用QingCloud的Kubesphere,极好的用户体验,私有化部署,无基础设施依赖,无 Kubernetes 依赖,支持跨物理机、虚拟机、云平台部署,可以纳管不同版本、不同厂商的 Kubernetes 集群。在k8s上层进行了封装实现了基于角色的权限控制,DevOPS流水线快速实现CI/CD,内置harbor/gitlab/jenkins/sonarqube等常用工具,基于基于 OpenPitrix 提供应用的全生命周期管理,包含开发、测试、发布、升级,下架等应用相关操作自己体验还是非常的棒。 同样作为开源项目,难免存在一些bug,在自己的使用中遇到下排错思路,非常感谢qingcloud社区提供的技术协助,对k8s有兴趣的可以去体验下国产的平台,如丝般顺滑的体验,rancher的用户也可以来对不体验下。

24.1 清理退出状态的容器

在集群运行一段时间后,有些container由于异常状态退出Exited,需要去及时清理释放磁盘,可以将其设置成定时任务执行

docker rm docker ps -a | grep Exited |awk '{print $1}'

24.2 清理异常或被驱逐的 pod

  • 清理kubesphere-devops-system的ns下清理

    kubectl delete pods -n kubesphere-devops-system $(kubectl get pods -n kubesphere-devops-system | grep Evicted |awk ‘{print $1}’) kubectl delete pods -n kubesphere-devops-system $(kubectl get pods -n kubesphere-devops-system | grep CrashLoopBackOff |awk ‘{print $1}’)

  • 为方便清理指定ns清理evicted/crashloopbackoff的pod/清理exited的容器

   #!/bin/bash
   clear_evicted_pod() {
     ns=$1
     kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} | grep Evicted |awk '{print $1}')
   }
   clear_crash_pod() {
     ns=$1
     kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} | grep CrashLoopBackOff |awk '{print $1}')
   }
   clear_exited_container() {
     docker rm `docker ps -a | grep Exited |awk '{print $1}'`
   }


   echo "1.clear exicted pod"
   echo "2.clear crash pod"
   echo "3.clear exited container"
   read -p "Please input num:" num


   case ${num} in 
   "1")
     read -p "Please input oper namespace:" ns
     clear_evicted_pod ${ns}
     ;;


   "2")
     read -p "Please input oper namespace:" ns
     clear_crash_pod ${ns}
     ;;
   "3")
     clear_exited_container
     ;;
   "*")
     echo "input error"
     ;;
   esac
  • 清理全部ns中evicted/crashloopbackoff的pod
   # 获取所有ns
   kubectl get ns | grep -v "NAME" |awk '{print $1}'

   # 清理驱逐状态的pod
   for ns in `kubectl get ns | grep -v "NAME" | awk '{print $1}'`;do kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} | grep "Evicted" |awk '{print $1}');done
   # 清理异常pod
   for ns in `kubectl get ns | grep -v "NAME" | awk '{print $1}'`;do kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} | grep "CrashLoopBackOff" |awk '{print $1}');done

24.3 Docker 数据迁移

在安装过程中未指定docker数据目录,系统盘50G,随着时间推移磁盘不够用,需要迁移docker数据,使用软连接方式: 首选挂载新磁盘到/data目录

systemctl stop docker

mkdir -p /data/docker/

rsync -avz /var/lib/docker/ /data/docker/

mv /var/lib/docker /data/docker_bak

ln -s /data/docker /var/lib/

systemctl daemon-reload

systemctl start docker

24.4 kubesphere 网络排错

  • 问题描述:

在kubesphere的node节点或master节点,手动去启动容器,在容器里面无法连通公网,是我的配置哪里不对么,之前默认使用calico,现在改成fluannel也不行,在kubesphere中部署deployment中的pod的容器上可以出公网,在node或master单独手动启动的访问不了公网

查看手动启动的容器网络上走的docker0

   root@fd1b8101475d:/# ip a

   1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1

       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

       inet 127.0.0.1/8 scope host lo

          valid_lft forever preferred_lft forever

   2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1

       link/ipip 0.0.0.0 brd 0.0.0.0

   105: eth0@if106: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 

       link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

       inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0

          valid_lft forever preferred_lft forever

在pods中的容器网络用的是kube-ipvs0

   1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1

       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

       inet 127.0.0.1/8 scope host lo

          valid_lft forever preferred_lft forever

   2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1

       link/ipip 0.0.0.0 brd 0.0.0.0

   4: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue

       link/ether c2:27:44:13:df:5d brd ff:ff:ff:ff:ff:ff

       inet 10.233.97.175/32 scope global eth0

          valid_lft forever preferred_lft forever
  • 解决方案:

查看docker启动配置

_images/1.jpegimage

修改文件/etc/systemd/system/docker.service.d/docker-options.conf中去掉参数:–iptables=false 这个参数等于false时会不写iptables

[Service] Environment=”DOCKER_OPTS= –registry-mirror=https://registry.docker-cn.com –data-root=/var/lib/docker –log-opt max-size=10m –log-opt max-file=3 –insecure-registry=harbor.devops.kubesphere.local:30280”

24.5 kubesphere 应用路由异常

在kubesphere中应用路由ingress使用的是nginx,在web界面配置会导致两个host使用同一个ca证书,可以通过注释文件配置

⚠️注意:ingress控制deployment在:

_images/2.jpegimage

   kind: Ingress
   apiVersion: extensions/v1beta1
   metadata:
     name: prod-app-ingress
     namespace: prod-net-route
     resourceVersion: '8631859'
     labels:
       app: prod-app-ingress
     annotations:
       desc: 生产环境应用路由
       nginx.ingress.kubernetes.io/client-body-buffer-size: 1024m
       nginx.ingress.kubernetes.io/proxy-body-size: 2048m
       nginx.ingress.kubernetes.io/proxy-read-timeout: '3600'
       nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
       nginx.ingress.kubernetes.io/service-upstream: 'true'
   spec:
     tls:
       - hosts:
           - smartms.tools.anchnet.com
         secretName: smartms-ca
       - hosts:
           - smartsds.tools.anchnet.com
         secretName: smartsds-ca
     rules:
       - host: smartms.tools.anchnet.com
         http:
           paths:
             - path: /
               backend:
                 serviceName: smartms-frontend-svc
                 servicePort: 80
       - host: smartsds.tools.anchnet.com
         http:
           paths:
             - path: /
               backend:
                 serviceName: smartsds-frontend-svc

                 servicePort: 80

24.6 Jenkins 的 Agent

用户在自己的使用场景当中,可能会使用不同的语言版本活不同的工具版本。这篇文档主要介绍如何替换内置的 agent。

默认base-build镜像中没有sonar-scanner工具,Kubesphere Jenkins 的每一个 agent 都是一个Pod,如果要替换内置的agent,就需要替换 agent 的相应镜像。

构建最新 kubesphere/builder-base:advanced-1.0.0 版本的 agent 镜像

更新为指定的自定义镜像:ccr.ccs.tencentyun.com/testns/base:v1

参考链接:\ https://kubesphere.io/docs/advanced-v2.0/zh-CN/devops/devops-admin-faq/#%E5%8D%87%E7%BA%A7-jenkins-agent-%E7%9A%84%E5%8C%85%E7%89%88%E6%9C%AC

_images/3.jpegimage

_images/4.jpegimage

在 KubeSphere 修改 jenkins-casc-config 以后,您需要在 Jenkins Dashboard 系统管理下的 configuration-as-code 页面重新加载您更新过的系统配置。

参考:

https://kubesphere.io/docs/advanced-v2.0/zh-CN/devops/jenkins-setting/#%E7%99%BB%E9%99%86-jenkins-%E9%87%8D%E6%96%B0%E5%8A%A0%E8%BD%BD

_images/5.jpegimage

jenkins中更新base镜像

_images/6.jpegimage

⚠️先修改kubesphere中jenkins的配置,\ jenkins-casc-config <http://xxxxxxxxx:30800/system-workspace/projects/kubesphere-devops-system/configmaps/jenkins-casc-config>__

24.7 Devops 中 Mail的发送

参考:\ https://www.cloudbees.com/blog/mail-step-jenkins-workflow

内置变量:

+———————————–+———————————–+ | 变量名 | 解释 | +===================================+===================================+ | BUILD_NUMBER | The current build number, such as | | | “153” | +———————————–+———————————–+ | BUILD_ID | The current build ID, identical | | | to BUILD_NUMBER for builds | | | created in 1.597+, but a | | | YYYY-MM-DD_hh-mm-ss timestamp for | | | older builds | +———————————–+———————————–+ | BUILD_DISPLAY_NAME | The display name of the current | | | build, which is something like | | | “#153” by default. | +———————————–+———————————–+ | JOB_NAME | Name of the project of this | | | build, such as “foo” or | | | “foo/bar”. (To strip off folder | | | paths from a Bourne shell script, | | | try: | | | :math:{JOB_NAME}) | | BUILD_TAG | | | | String of "jenkins-\ {JOB_NAME | | | }-${BUILD_NUMBER}”. | | | Convenient to put into a resource | | | file, a jar file, etc for easier | | | identification. | +———————————–+———————————–+ | EXECUTOR_NUMBER | The unique number that identifies | | | the current executor (among | | | executors of the same machine) | | | that’s carrying out this build. | | | This is the number you see in the | | | “build executor status”, except | | | that the number starts from 0, | | | not 1. | +———————————–+———————————–+ | NODE_NAME | Name of the slave if the build is | | | on a slave, or “master” if run on | | | master | +———————————–+———————————–+ | NODE_LABELS | Whitespace-separated list of | | | labels that the node is assigned. | +———————————–+———————————–+ | WORKSPACE | The absolute path of the | | | directory assigned to the build | | | as a workspace. | +———————————–+———————————–+ | JENKINS_HOME | The absolute path of the | | | directory assigned on the master | | | node for Jenkins to store data. | +———————————–+———————————–+ | JENKINS_URL | Full URL of Jenkins, like | | | http://server:port/jenkins/ <htt | | | p://server:port/jenkins/>__ | | | (note: only available if Jenkins | | | URL set in system configuration) | +———————————–+———————————–+ | BUILD_URL | Full URL of this build, like | | | http://server:port/jenkins/job/f | | | oo/15/ <http://server:port/jenkin | | | s/job/foo/15/>__ | | | (Jenkins URL must be set) | +———————————–+———————————–+ | SVN_REVISION | Subversion revision number that’s | | | currently checked out to the | | | workspace, such as “12345” | +———————————–+———————————–+ | SVN_URL | Subversion URL that’s currently | | | checked out to the workspace. | +———————————–+———————————–+ | JOB_URL | Full URL of this job, like | | | http://server:port/jenkins/job/f | | | oo/ <http://server:port/jenkins/j | | | ob/foo/>__ | | | (Jenkins URL must be set) | +———————————–+———————————–+

最终自己写了适应自己业务的模版,可以直接使用

mail to: ‘xuel@net.com’, charset:’UTF-8’, // or GBK/GB18030 mimeType:’text/plain’, // or text/html subject: “Kubesphere ${env.JOB_NAME} [${env.BUILD_NUMBER}] 发布正常Running Pipeline: ${currentBuild.fullDisplayName}”, body: “”” ———Anchnet Devops Kubesphere Pipeline job——————–

         项目名称 : ${env.JOB_NAME}
         构建次数 : ${env.BUILD_NUMBER}
         扫描信息 : 地址:${SONAR_HOST}
         镜像地址 : ${REGISTRY}/${QHUB_NAMESPACE}/${APP_NAME}:${IMAGE_TAG}
         构建详情:SUCCESSFUL: Job ${env.JOB_NAME} [${env.BUILD_NUMBER}]
         构建状态 : ${env.JOB_NAME} jenkins 发布运行正常
         构建URL : ${env.BUILD_URL}"""

_images/7.jpegimage

_images/8.jpegimage