Go 调度器跟踪

sl6124 8年前
   <p> </p>    <p>这是William Kennedy写的第二篇文章: <a href="/misc/goto?guid=4959671239021255406" rel="nofollow,noindex">Scheduler Tracing In Go</a> , 第一篇为Stack Traces In Go。</p>    <p>本文主要介绍如何跟踪Go的调度器Scheduler的活动。</p>    <h2>介绍</h2>    <p>我喜欢Go的原因之一就是能够产生概要(profiling)和调试信息。当你的程序运行时, <strong>GODEBUG</strong> 环境变量可以产生运行时的调试信息。你可以请求垃圾回收器和调度器(scheduler)的摘要信息和细节。关键是你不需要额外创建单独的编译程序就可以实现。</p>    <p>在本文中,我将演示如何理解一个简单的并发Go程序的调度器跟踪信息。如果你已经有Go调度器的基本概念,对阅读本文会很有帮助, 在开始下面的介绍前我建议你阅读下面的两篇文章:</p>    <ol>     <li><a href="/misc/goto?guid=4959671263909487292" rel="nofollow,noindex">Concurrency, Goroutines and GOMAXPROCS</a></li>     <li><a href="/misc/goto?guid=4959671264042907927" rel="nofollow,noindex">Go Scheduler</a></li>    </ol>    <h2>代码</h2>    <p>下面的代码就是用来演示用的例子。</p>    <p>列表1</p>    <table>     <tbody>      <tr>       <td> </td>       <td> <pre>  package main    import (   "sync"   "time"  )    func main() {   var wg sync.WaitGroup   wg.Add(10)     for i :=0; i <10; i++ {   go work(&wg)   }     wg.Wait()     // Wait to see the global run queue deplete.   time.Sleep(3 * time.Second)  }    func work(wg *sync.WaitGroup) {   time.Sleep(time.Second)     var counter int   for i :=0; i <1e10; i++ {   counter++   }     wg.Done()  }  </pre> </td>      </tr>     </tbody>    </table>    <p>列表1的第12行使用一个for循环创建10个goroutine,第16行main函数中等待所有的goroutine完成任务。第22行的work函数先sleep 1秒,然后计数 10亿次。一旦计数完成,调用waitGroup的Done方法然后返回。</p>    <p>首先 go build 程序,然后运行时设置GODEBUG环境变量。这个变量会被运行时读取,所以Go命令也会产生跟踪信息。如果在运行 go run 命令的时候设置了GODEBUG变量,在程序运行之前就会看到跟踪信息。</p>    <p>所以还是让我们先编译程序:</p>    <pre>  gobuild example.go  </pre>    <h2>调度器摘要信息</h2>    <p>schedtrace参数告诉运行时打印一行调度器的摘要信息到标准err输出中,时间间隔可以指定,单位毫秒,如下所示:</p>    <pre>  GOMAXPROCS=1GODEBUG=schedtrace=1000./example  </pre>    <p>译者注: 如果在windows下可以运行 set GOMAXPROCS=1 && set GODEBUG=schedtrace=1000 && example</p>    <p>程序开始后每个一秒就会打印一行调度器的概要信息,程序本身没有任何输出,所以我们只关注输出的跟踪信息。让我们先看前两行:</p>    <pre>  SCHED 0ms:gomaxprocs=1idleprocs=0threads=2spinningthreads=0idlethreads=0  runqueue=0[1]    SCHED 1009ms:gomaxprocs=1idleprocs=0threads=3spinningthreads=0idlethreads=1  runqueue=0[9]  </pre>    <p>让我们分解每一个项,看看它们分别代表什么含义。</p>    <table>     <thead>      <tr>       <th>输出项</th>       <th>意义</th>      </tr>     </thead>     <tbody>      <tr>       <td>1009ms</td>       <td>自从程序开始的毫秒数</td>      </tr>      <tr>       <td>gomaxprocs=1</td>       <td>配置的处理器数(逻辑的processor,也就是Go模型中的P,会通过操作系统的线程绑定到一个物理处理器上)</td>      </tr>      <tr>       <td>threads=3</td>       <td>运行期管理的线程数,目前三个线程</td>      </tr>      <tr>       <td>idlethreads=1</td>       <td>空闲的线程数,当前一个线程空闲,两个忙</td>      </tr>      <tr>       <td>idleprocs=0</td>       <td>空闲的处理器数,当前0个空闲</td>      </tr>      <tr>       <td>runqueue=0</td>       <td>在全局的run队列中的goroutine数,目前所有的goroutine都被移动到本地run队列</td>      </tr>      <tr>       <td>[9]</td>       <td>本地run队列中的goroutine数,目前9个goroutine在本地run队列中等待</td>      </tr>     </tbody>    </table>    <p>Go运行时给了我们很多有用的摘要信息。当我们看第一秒的跟踪数据时,我们看到一个goroutine正在运行,而其它9个都在等待:</p>    <p><img src="https://simg.open-open.com/show/551897fe90528e6e01bc477a1467585b.png"></p>    <p>图1中处理器由字符P代表,线程由M代表,goroutine由G代表。我们看到全局run队列为空,处理器正在执行一个goroutine。其余9在本地队列中等待。</p>    <p>如果我们配置多个处理器会怎样呢?让我们增加GOMAXPROCS 看看输出结果:</p>    <pre>  GOMAXPROCS=2GODEBUG=schedtrace=1000./example    SCHED 0ms:gomaxprocs=2idleprocs=1threads=2spinningthreads=0  idlethreads=0runqueue=0[00]    SCHED 1002ms:gomaxprocs=2idleprocs=0threads=4spinningthreads=1  idlethreads=1runqueue=0[04]    SCHED 2006ms:gomaxprocs=2idleprocs=0threads=4spinningthreads=0  idlethreads=1runqueue=0[44]    …    SCHED 6024ms:gomaxprocs=2idleprocs=0threads=4spinningthreads=0  idlethreads=1runqueue=2[33]    …    SCHED 10049ms:gomaxprocs=2idleprocs=0threads=4spinningthreads=0  idlethreads=1runqueue=4[22]  …    SCHED 13067ms:gomaxprocs=2idleprocs=0threads=4spinningthreads=0  idlethreads=1runqueue=6[11]    …    SCHED 17084ms:gomaxprocs=2idleprocs=0threads=4spinningthreads=0  idlethreads=1runqueue=8[00]    …    SCHED 21100ms:gomaxprocs=2idleprocs=2threads=4spinningthreads=0  idlethreads=2runqueue=0[00]  </pre>    <p>让我们将视线放在第二秒:</p>    <pre>  SCHED 2002ms: gomaxprocs=2idleprocs=0threads=4spinningthreads=0  idlethreads=1runqueue=0[44]    2002ms : Thisisthetraceforthe2secondmark.  gomaxprocs=2:2processors are configuredforthis program.  threads=4:4threads exist.2forprocessorsand2fortheruntime.  idlethreads=1:1idle thread (3threadsrunning).  idleprocs=0:0processors are idle (2processors busy).  runqueue=0: All runnable goroutines have been movedtoalocalrunqueue.  [44] :4goroutines are waiting inside eachlocalrunqueue.  </pre>    <p><img src="https://simg.open-open.com/show/3a615ed7b70e5f675a2d725806175899.png"></p>    <p>让我们看第二秒的信息,可以看到goroutine是如何被处理器运行的。我们也可以看到有8个goroutine在本地run队列中等待,每个本地run队列包含4个等待的goroutine。</p>    <p>让我们再来看第6秒的信息:</p>    <pre>  SCHED 6024ms:gomaxprocs=2idleprocs=0threads=4spinningthreads=0  idlethreads=1runqueue=2[33]    idleprocs=0:0processors are idle (2processors busy).  runqueue=2:2goroutines returnedandare waiting to be terminated.  [33] :3goroutines are waiting inside each local run queue.  </pre>    <p><img src="https://simg.open-open.com/show/d60d4eb4d1cf2486463e8ec672752862.png"></p>    <p>第6秒数据有些变化,如图3所示,两个goroutine完成了它们的任务,放回到全局run队列中。当然我们还是有两个goruntine在运行(G6,G7),每个P运行一个。每个本地run队列中还有3个goroutine在等待。</p>    <p>注意:</p>    <p>在很多情况下goroutine在终止前并没有被放回到全局run队列。本文中的例子比较特殊,它的方法体是一个for循环,并没有调用其它函数,而且运行时间超过10ms。10毫秒是调度器的调度标杆,过了10毫秒的执行时间,调度器就会尝试占用(preempt)这个goroutine。本例中的goroutine没有被占用是因为它们没有调用其它函数。本例中一旦goroutine执行到wg.Done调用,它们立即被占用,然后移动到全局run队列等待终止。</p>    <p>在17秒我们看到只有两个goroutine还在运行:</p>    <pre>  SCHED 17084ms:gomaxprocs=2idleprocs=0threads=4spinningthreads=0  idlethreads=1runqueue=8[00]    idleprocs=0:0processors are idle (2processors busy).  runqueue=8:8goroutines returnedandare waiting to be terminated.  [00] : No goroutines are waiting inside any local run queue.  </pre>    <p><img src="https://simg.open-open.com/show/46f617bb295711ecb891d12b20813f6c.png"></p>    <p>从图4可以看出,8个goroutine被移动到全局run队列等待终止。两个goroutine(G12和G13)还在运行,本地run队列都为空。</p>    <p>最后在第21秒:</p>    <pre>  SCHED 21100ms:gomaxprocs=2idleprocs=2threads=4spinningthreads=0  idlethreads=2runqueue=0[00]    idleprocs=2:2processors are idle (0processors busy).  runqueue=0: All the goroutines that wereinthe queue have been terminated.  [00] : No goroutines are waiting inside any local run queue.  </pre>    <p><img src="https://simg.open-open.com/show/64ef45c6308f2f4d046cfc926622e266.png"> 如图5所示,这时那10个goroutine都已经完成任务被终止。</p>    <h2>调度器跟踪信息的细节</h2>    <p>调度器的摘要信息很有帮助,但是有时候你需要更深入的了解它的细节。在这种情况下,我们可以使用scheddetail 参数,可以提供处理器P,线程M和goroutine G的细节。让我们再运行一下程序,增加scheddetail 参数:</p>    <pre>  GOMAXPROCS=2GODEBUG=schedtrace=1000,scheddetail=1./example  </pre>    <p>下面的日志是第4秒的输出:</p>    <pre>  SCHED 4028ms:gomaxprocs=2idleprocs=0threads=4spinningthreads=0  idlethreads=1runqueue=2gcwaiting=0nmidlelocked=0stopwait=0sysmonwait=0  P0: status=1schedtick=10syscalltick=0m=3runqsize=3gfreecnt=0  P1: status=1schedtick=10syscalltick=1m=2runqsize=3gfreecnt=0  M3: p=0curg=4mallocing=0throwing=0gcing=0locks=0dying=0helpgc=0spinning=0blocked=0lockedg=-1  M2: p=1curg=10mallocing=0throwing=0gcing=0locks=0dying=0helpgc=0spinning=0blocked=0lockedg=-1  M1: p=-1curg=-1mallocing=0throwing=0gcing=0locks=1dying=0helpgc=0spinning=0blocked=0lockedg=-1  M0: p=-1curg=-1mallocing=0throwing=0gcing=0locks=0dying=0helpgc=0spinning=0blocked=0lockedg=-1  G1: status=4(semacquire)m=-1lockedm=-1  G2: status=4(force gc (idle))m=-1lockedm=-1  G3: status=4(GC sweep wait)m=-1lockedm=-1  G4: status=2(sleep)m=3lockedm=-1  G5: status=1(sleep)m=-1lockedm=-1  G6: status=1(stack growth)m=-1lockedm=-1  G7: status=1(sleep)m=-1lockedm=-1  G8: status=1(sleep)m=-1lockedm=-1  G9: status=1(stack growth)m=-1lockedm=-1  G10: status=2(sleep)m=2lockedm=-1  G11: status=1(sleep)m=-1lockedm=-1  G12: status=1(sleep)m=-1lockedm=-1  G13: status=1(sleep)m=-1lockedm=-1  G17: status=4(timer goroutine (idle))m=-1lockedm=-1  </pre>    <p>摘要信息是类似的,但是包含了处理器、线程和goroutine的信息。首先看一下处理器P的信息:</p>    <pre>  P0: status=1schedtick=10syscalltick=0m=3runqsize=3gfreecnt=0    P1: status=1schedtick=10syscalltick=1m=2runqsize=3gfreecnt=0  </pre>    <p>我们设置了两个处理器,因此有两行数据。下面是线程的信息:</p>    <pre>  M3: p=0curg=4mallocing=0throwing=0gcing=0locks=0dying=0helpgc=0  spinning=0blocked=0lockedg=-1    M2: p=1curg=10mallocing=0throwing=0gcing=0locks=0dying=0helpgc=0  spinning=0blocked=0lockedg=-1    M1: p=-1curg=-1mallocing=0throwing=0gcing=0locks=1dying=0helpgc=0  spinning=0blocked=0lockedg=-1    M0: p=-1curg=-1mallocing=0throwing=0gcing=0locks=0dying=0helpgc=0  spinning=0blocked=0lockedg=-1  </pre>    <p>在摘要信息中我们看到线程设置为4(threads=4),所以下面会有4行关于线程M的信息。而且日志细节显示了哪个线程隶属哪个处理器 <strong>M3: p=0</strong> :</p>    <pre>  P0: status=1schedtick=10syscalltick=0m=3runqsize=3gfreecnt=0    M3: p=0curg=4mallocing=0throwing=0gcing=0locks=0dying=0helpgc=0  spinning=0blocked=0lockedg=-1  </pre>    <p>我们看到线程M3隶属P0。</p>    <p>G代表goroutine。第4秒我们看到有14个goroutine, 但是自程序启动已经创建了17个线程。我们怎么知道的17个线程呢,这是从最后的goroutine的标志 <strong>G17</strong> 得到:</p>    <pre>  G17: status=4(timer goroutine (idle))m=-1lockedm=-1  </pre>    <p>如果程序持续创建goroutine,这个数会线性的增大。如果使用标准库net/http处理http请求,因为每个连接都会创建一个goroutine,我们可以根据这个数据得到多少request被处理,当然得在程序不会创建其它的goroutine情况下这个数才能作为预估的数量。</p>    <p>让我们看看运行main函数的goroutine:</p>    <pre>  G1: status=4(semacquire)m=-1lockedm=-1  </pre>    <p>可以看到这个goroutine的status是4, 阻塞在semacquire上(WaitGroup的Wait方法)。</p>    <p>为了更好的理解其它goroutine的状态,我们需要了解每个status值代表的含义。下面是一个列表:</p>    <pre>  status: http://golang.org/src/runtime/runtime.h  Gidle, // 0  Grunnable, // 1 runnable and on a run queue  Grunning, // 2 running  Gsyscall, // 3 performing a syscall  Gwaiting, // 4 waiting for the runtime  Gmoribund_unused, // 5 currently unused, but hardcoded in gdb scripts  Gdead, // 6 goroutine is dead  Genqueue, // 7 only the Gscanenqueue is used  Gcopystack, // 8 in this state when newstack is moving the stack  </pre>    <p>看看我们创建的10个goroutine,现在我们能更好的理解它们的状态:</p>    <pre>  // Goroutines running ina processor. (idleprocs=0)  G4: status=2(sleep)m=3lockedm=-1– Thread M3 / Processor P0  G10: status=2(sleep)m=2lockedm=-1– Thread M2 / Processor P1    // Goroutines waiting to be run on a particular processor. (runqsize=3)  G5: status=1(sleep)m=-1lockedm=-1  G7: status=1(sleep)m=-1lockedm=-1  G8: status=1(sleep)m=-1lockedm=-1    // Goroutines waiting to be run on a particular processor. (runqsize=3)  G11: status=1(sleep)m=-1lockedm=-1  G12: status=1(sleep)m=-1lockedm=-1  G13: status=1(sleep)m=-1lockedm=-1    // Goroutines waiting on the global run queue. (runqueue=2)  G6: status=1(stack growth)m=-1lockedm=-1  G9: status=1(stack growth)m=-1lockedm=-1  </pre>    <p>了解了调度器的基本概念和我们的程序的行为,我们可以深入了解Go是如何进行调度的,每个P,M,G的状态。</p>    <h2>结论</h2>    <p>GODEBUG变量非常好,可以深入了解调度器的行为。它也可以告诉你你的程序的行为。如果你想了解更多,动手写一些简单的程序,跟踪调度器的运行,然后就可以跟踪更复杂的程序了。</p>    <p>来自: <a href="/misc/goto?guid=4959671264150475101" rel="nofollow">http://colobu.com/2016/04/19/Scheduler-Tracing-In-Go/</a></p>