我们知道,以现在dockerd的架构,起容器需要有containerd,containerd-shim和容器进程(即容器主进程)三个进程。那么,这三个进程的依存关系如何?本次分析将介绍这方面的内容。
需要说明的是,由于不同shell中的内容并不是连贯执行的,所以进程号可能会不一致。
整体关系
首先,我们来看下containerd,containerd-shim和容器进程的关系:
1 2 3
| root 2156 1733 0 13:17 pts/0 00:00:00 ./bin/containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim /home/fankang/docker/containerd-0.2.4/src/github.com/docker/containerd/bin/containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc root 2198 2156 0 13:45 pts/0 00:00:00 /home/fankang/docker/containerd-0.2.4/src/github.com/docker/containerd/bin/containerd-shim nginx /home/fankang/mycontainer runc root 2214 2198 0 13:45 ? 00:00:00 /usr/bin/python /usr/bin/supervisord
|
可以看出,containerd是containerd-shim的父进程,contaienrd-shim是容器进程的父进程。
而杀死containerd进程后,contaienrd-shim和容器进程依然存在,只是containerd进程成孤儿进程后,被1进程接收了:
1 2
| root 2301 1 0 13:50 pts/0 00:00:00 /home/fankang/docker/containerd-0.2.4/src/github.com/docker/containerd/bin/containerd-shim nginx /home/fankang/mycontainer runc root 2317 2301 1 13:50 ? 00:00:00 /usr/bin/python /usr/bin/supervisord
|
所以,为了简化三个进程的关系,我们从下面4种情况来分析:
- containerd进程存在的情况下,杀死containerd-shim进程;
- containerd进程存在的情况下,杀死容器进程;
- containerd进程不存在的情况下,杀死containerd-shim进程,然后启动containerd进程;
- containerd进程不存在的情况下,杀死容器进程,然后启动containerd进程;
第一种情况
第一种情况:containerd进程存在的情况下,杀死containerd-shim进程
containerd运行中,containerd-shim和容器进程如下:
1 2
| root 2414 2383 0 14:02 pts/0 00:00:00 /home/fankang/docker/containerd-0.2.4/src/github.com/docker/containerd/bin/containerd-shim nginx /home/fankang/mycontainer runc root 2429 2414 1 14:02 ? 00:00:00 /usr/bin/python /usr/bin/supervisord
|
现在使用kill -9 2414
杀死cotnainerd-shim进程。
现在可以得出结论:容器进程退出。在containerd运行的情况下,杀死containerd-shim,容器进程会退出。
所以,现在来看下为什么容器进程会退出。
之前分析过,创建容器时会调用container的Start()方法,定义在containerd/runtime/container.go中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| func (c *container) Start(checkpointPath string, s Stdio) (Process, error) { processRoot := filepath.Join(c.root, c.id, InitProcessID) if err := os.Mkdir(processRoot, 0755); err != nil { return nil, err } cmd := exec.Command(c.shim, c.id, c.bundle, c.runtime, ) cmd.Dir = processRoot cmd.SysProcAttr = &syscall.SysProcAttr{ Setpgid: true, } spec, err := c.readSpec() if err != nil { return nil, err } config := &processConfig{ checkpoint: checkpointPath, root: processRoot, id: InitProcessID, c: c, stdio: s, spec: spec, processSpec: specs.ProcessSpec(spec.Process), } p, err := newProcess(config) if err != nil { return nil, err } if err := c.createCmd(InitProcessID, cmd, p); err != nil { return nil, err } return p, nil }
|
而Start()方法又会调用createCmd()方法执行命令:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
| func (c *container) createCmd(pid string, cmd *exec.Cmd, p *process) error { p.cmd = cmd if err := cmd.Start(); err != nil { close(p.cmdDoneCh) if exErr, ok := err.(*exec.Error); ok { if exErr.Err == exec.ErrNotFound || exErr.Err == os.ErrNotExist { return fmt.Errorf("%s not installed on system", c.shim) } } return err } defer func() { go func() { err := p.cmd.Wait() if err == nil { p.cmdSuccess = true } if same, err := p.isSameProcess(); same && p.pid > 0 { logrus.Infof("containerd: %s:%s (pid %v) has become an orphan, killing it", p.container.id, p.id, p.pid) err = unix.Kill(p.pid, syscall.SIGKILL) if err != nil && err != syscall.ESRCH { logrus.Errorf("containerd: unable to SIGKILL %s:%s (pid %v): %v", p.container.id, p.id, p.pid, err) } else { for { err = unix.Kill(p.pid, 0) if err != nil { break } time.Sleep(5 * time.Millisecond) } } } close(p.cmdDoneCh) }() }() if err := c.waitForCreate(p, cmd); err != nil { return err } c.processes[pid] = p return nil }
|
可以看出,createCmd()在启动进程后,在defer中会起一个go routine,如果containerd-shim异常退出,那么cmd.wait()阻塞消除,如果容器进程存在,则执行unix.Kill(p.pid, syscall.SIGKILL)
操作杀死容器进程。
所以,containerd存在的情况下,手动杀死containerd-shim进程,容器进程将会被containerd中创建容器时留下的go routine杀死。
第二种情况
第二种情况:containerd进程存在的情况下,杀死容器进程
一方面,在容器进程退出时,containerd-shim也会捕获到信号退出,这将在第四种情况下详细分析。
另一方面,容器进程退出,containerd中的monitor会会捕获到该事件,从而触发容器进程退出流程,这是本小节详细分析的内容。
之前分析过,monitor会把容器退出事件放到monitor的exits channel中,在containerd/supevisor/monitor_linux.go中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
| func (m *Monitor) start() { var events [128]syscall.EpollEvent for { n, err := archutils.EpollWait(m.epollFd, events[:], -1) if err != nil { if err == syscall.EINTR { continue } logrus.WithField("error", err).Fatal("containerd: epoll wait") } for i := 0; i < n; i++ { fd := int(events[i].Fd) m.m.Lock() r := m.receivers[fd] switch t := r.(type) { case runtime.Process: if events[i].Events == syscall.EPOLLHUP { delete(m.receivers, fd) if err = syscall.EpollCtl(m.epollFd, syscall.EPOLL_CTL_DEL, fd, &syscall.EpollEvent{ Events: syscall.EPOLLHUP, Fd: int32(fd), }); err != nil { logrus.WithField("error", err).Error("containerd: epoll remove fd") } if err := t.Close(); err != nil { logrus.WithField("error", err).Error("containerd: close process IO") } EpollFdCounter.Dec(1) m.exits <- t } case runtime.OOM: t.Flush() if t.Removed() { delete(m.receivers, fd) t.Close() EpollFdCounter.Dec(1) } else { m.ooms <- t.ContainerID() } } m.m.Unlock() } } }
|
而在containerd的supervisor启动时,会启动eixthandler(),在containerd/supervisor/supervisor.go中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
| func New(stateDir string, runtimeName, shimName string, runtimeArgs []string, timeout time.Duration, retainCount int) (*Supervisor, error) { startTasks := make(chan *startTask, 10) if err := os.MkdirAll(stateDir, 0755); err != nil { return nil, err } machine, err := CollectMachineInformation() if err != nil { return nil, err } monitor, err := NewMonitor() if err != nil { return nil, err } s := &Supervisor{ stateDir: stateDir, containers: make(map[string]*containerInfo), startTasks: startTasks, machine: machine, subscribers: make(map[chan Event]struct{}), tasks: make(chan Task, defaultBufferSize), monitor: monitor, runtime: runtimeName, runtimeArgs: runtimeArgs, shim: shimName, timeout: timeout, } if err := setupEventLog(s, retainCount); err != nil { return nil, err } go s.exitHandler() go s.oomHandler() if err := s.restore(); err != nil { return nil, err } return s, nil } func (s *Supervisor) exitHandler() { for p := range s.monitor.Exits() { e := &ExitTask{ Process: p, } s.SendTask(e) } }
|
可以看到,exitHandler()会消费monitor exits channel中的事件,然后包装成ExitTask,然后发送到supervisor的tasks中以进一步处理。
所以,容器进程退出会触发containerd对容器进行exit处理。在exit处理中会调用delete处理,这些就不再细展开。
所以,containerd存在的情况下,杀死容器进程,conainerd-shim主动退出,containerd触发exit事件以清理该容器。
第三种情况
第三种情况:containerd进程不存在的情况下,杀死containerd-shim进程,然后启动containerd进程
现在容器在运行,containerd关闭,进程如下:
1 2
| root 2522 1 0 15:33 pts/0 00:00:00 /home/fankang/docker/containerd-0.2.4/src/github.com/docker/containerd/bin/containerd-shim nginx /home/fankang/mycontainer runc root 2537 2522 0 15:33 ? 00:00:00 /usr/bin/python /usr/bin/supervisord
|
现在调用kill -9 2522
杀死2522。可以看到容器进程还在,成为孤儿进程,被进程1接收。
1 2
| root 2537 1 0 15:33 ? 00:00:00 /usr/bin/python /usr/bin/supervisord root 2571 2537 0 15:33 ? 00:00:00 /usr/sbin/sshd -D
|
启动containerd,容器进程消失。
所以containerd在启动时会清理残留的容器进程(对应的containerd-shim不存在)。
那么,这清理工作的流程是怎样的呢?supervisor在启动的时候会调用restore()方法,supervisor的restore()定义在containerd/supervisor/supervisor.go中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
| func (s *Supervisor) restore() error { dirs, err := ioutil.ReadDir(s.stateDir) if err != nil { return err } for _, d := range dirs { if !d.IsDir() { continue } id := d.Name() container, err := runtime.Load(s.stateDir, id, s.shim, s.timeout) if err != nil { return err } processes, err := container.Processes() if err != nil { return err } ContainersCounter.Inc(1) s.containers[id] = &containerInfo{ container: container, } if err := s.monitor.MonitorOOM(container); err != nil && err != runtime.ErrContainerExited { logrus.WithField("error", err).Error("containerd: notify OOM events") } logrus.WithField("id", id).Debug("containerd: container restored") var exitedProcesses []runtime.Process for _, p := range processes { if p.State() == runtime.Running { if err := s.monitorProcess(p); err != nil { return err } } else { exitedProcesses = append(exitedProcesses, p) } } if len(exitedProcesses) > 0 { sortProcesses(exitedProcesses) for _, p := range exitedProcesses { e := &ExitTask{ Process: p, } s.SendTask(e) } } } return nil }
|
restore()会读取contaienrd主目录下各容器目录,调用runtime.Load()导入容器。如果容器不为runnning,则触发exit事件。
所以,现在的关键是看如何导入容器,runtime.Load()定义在containerd/runtime/container.go中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
| func Load(root, id, shimName string, timeout time.Duration) (Container, error) { var s state f, err := os.Open(filepath.Join(root, id, StateFile)) if err != nil { return nil, err } defer f.Close() if err := json.NewDecoder(f).Decode(&s); err != nil { return nil, err } c := &container{ root: root, id: id, bundle: s.Bundle, labels: s.Labels, runtime: s.Runtime, runtimeArgs: s.RuntimeArgs, shim: s.Shim, noPivotRoot: s.NoPivotRoot, processes: make(map[string]*process), timeout: timeout, } if c.shim == "" { c.shim = shimName } dirs, err := ioutil.ReadDir(filepath.Join(root, id)) if err != nil { return nil, err } for _, d := range dirs { if !d.IsDir() { continue } pid := d.Name() s, err := readProcessState(filepath.Join(root, id, pid)) if err != nil { return nil, err } p, err := loadProcess(filepath.Join(root, id, pid), pid, c, s) if err != nil { logrus.WithField("id", id).WithField("pid", pid).Debug("containerd: error loading process %s", err) continue } c.processes[pid] = p } return c, nil }
|
在Load()中先通过loadProcess()导入容器目录下的进程。loadProcess()定义在containerd/runtime/process.go中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
| func loadProcess(root, id string, c *container, s *ProcessState) (*process, error) { p := &process{ root: root, id: id, container: c, spec: s.ProcessSpec, stdio: Stdio{ Stdin: s.Stdin, Stdout: s.Stdout, Stderr: s.Stderr, }, state: Stopped, } startTime, err := ioutil.ReadFile(filepath.Join(p.root, StartTimeFile)) if err != nil && !os.IsNotExist(err) { return nil, err } p.startTime = string(startTime) if _, err := p.getPidFromFile(); err != nil { return nil, err } if _, err := p.ExitStatus(); err != nil { if err == ErrProcessNotExited { exit, err := getExitPipe(filepath.Join(root, ExitFile)) if err != nil { return nil, err } p.exitPipe = exit control, err := getControlPipe(filepath.Join(root, ControlFile)) if err != nil { return nil, err } p.controlPipe = control p.state = Running return p, nil } return nil, err } return p, nil }
|
loadProcess()最重要的调用是p.ExitStatus(),如果出错,则状态为Running。所以琰看ExitStatus():
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| func (p *process) ExitStatus() (rst uint32, rerr error) { data, err := ioutil.ReadFile(filepath.Join(p.root, ExitStatusFile)) defer func() { if rerr != nil { rst, rerr = p.handleSigkilledShim(rst, rerr) } }() if err != nil { if os.IsNotExist(err) { return UnknownStatus, ErrProcessNotExited } return UnknownStatus, err } if len(data) == 0 { return UnknownStatus, ErrProcessNotExited } p.stateLock.Lock() p.state = Stopped p.stateLock.Unlock() i, err := strconv.ParseUint(string(data), 10, 32) return uint32(i), err }
|
ExitStatus()会去读exit pipe。此时exit中没有数据,所以会出错。这里的ExitStatus()参数很特别,rerr先获取ExitStatus()主流程的错误,然后在defer中把rerr交给handleSigkilledShim()处理,最后把handleSigkilledShim()的结果错误作为rerr返回。现在流程会转移到handleSigkilledShim():
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
| func (p *process) handleSigkilledShim(rst uint32, rerr error) (uint32, error) { if p.cmd == nil || p.cmd.Process == nil e := unix.Kill(p.pid, 0) if e == syscall.ESRCH { logrus.Warnf("containerd: %s:%s (pid %d) does not exist", p.container.id, p.id, p.pid) return p.updateExitStatusFile(UnknownStatus) } if same, err := p.isSameProcess(); !same { logrus.Warnf("containerd: %s:%s (pid %d) is not the same process anymore (%v)", p.container.id, p.id, p.pid, err) return p.updateExitStatusFile(UnknownStatus) } ppid, err := readProcStatField(p.pid, 4) if err != nil { return rst, fmt.Errorf("could not check process ppid: %v (%v)", err, rerr) } if ppid == "1" { logrus.Warnf("containerd: %s:%s shim died, killing associated process", p.container.id, p.id) unix.Kill(p.pid, syscall.SIGKILL) if err != nil && err != syscall.ESRCH { return UnknownStatus, fmt.Errorf("containerd: unable to SIGKILL %s:%s (pid %v): %v", p.container.id, p.id, p.pid, err) } for { e := unix.Kill(p.pid, 0) if e == syscall.ESRCH { break } time.Sleep(5 * time.Millisecond) } return p.updateExitStatusFile(128 + uint32(syscall.SIGKILL)) } return rst, rerr } e := unix.Kill(p.cmd.Process.Pid, 0) if e != syscall.ESRCH { return rst, rerr } <-p.cmdDoneCh shimStatus := p.cmd.ProcessState.Sys().(syscall.WaitStatus) if shimStatus.Signaled() && shimStatus.Signal() == syscall.SIGKILL { logrus.Debugf("containerd: ExitStatus(container: %s, process: %s): shim was SIGKILL'ed reaping its child with pid %d", p.container.id, p.id, p.pid) rerr = nil rst = 128 + uint32(shimStatus.Signal()) p.stateLock.Lock() p.state = Stopped p.stateLock.Unlock() } return rst, rerr }
|
handleSigkilledShim()的if p.cmd == nil || p.cmd.Process == nil
流程如下:
- 如果容器进程不存在,则返回;
- 如果容器进程发生改变,则交由monitor处理,返回;
- 如果容器进程的父进程为1,则表明shim退出,杀死容器进程,并调用updateExitStatusFile()把内容写到exit,返回;
- 返回。
现在,按我们分析的流程,handleSigkilledShim()将运行到步骤3。由于ExitStatus()的rerr接收了handleSigkilledShim()的返回值,所以rerr为nil,所以process的状态不为running。
所以supervisor的restore()会对该容器作exit操作。
exit操作中也会调用ExitStatus(),但此时exit中是有内容的;也会走到handleSigkilledShim()流程,但会在步骤1就返回,因为容器进程在之前的流程中已经被删除。
如果容器中containerd-shim和容器进程都存在,则从步骤4返回。
第四种情况
第四种情况:containerd进程不存在的情况下,杀死容器进程,然后启动containerd进程
杀死容器进程,containerd-shim进程主动退出。containerd在restore()中对该容器做exit操作。
这时提供一个demo,来看下go语言使用exec包启动进程的方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| package main import ( "os" "os/signal" "os/exec" "syscall" ) func main() { signals := make(chan os.Signal, 2048) signal.Notify(signals) cmd1 := exec.Command("/bin/sh", "-c", "sleep 50") cmd1.SysProcAttr = &syscall.SysProcAttr{Setpgid: true} cmd1.Start() cmd2 := exec.Command("/bin/sh", "-c", "sleep 50") cmd2.SysProcAttr = &syscall.SysProcAttr{Setpgid: true} cmd2.Start() select { case <-signals: syscall.Kill(-cmd1.Process.Pid, syscall.SIGKILL) syscall.Kill(-cmd2.Process.Pid, syscall.SIGKILL) } }
|
编译执行的结果如下:
1 2 3
| root 5838 1733 0 17:16 pts/0 00:00:00 ./test root 5843 5838 0 17:16 pts/0 00:00:00 /bin/sh -c sleep 50 root 5844 5838 0 17:16 pts/0 00:00:00 /bin/sh -c sleep 50
|
执行kill 5843
后,所有进程都不存在。
所以,在Go中,默认子进程的退出会引起父进程的退出。
分析完毕。