线程堆栈分析

hopechart 6年前
   <p>本文为读书笔记,关于 Java 线程堆栈分析,在阅读并进行实验的基础上进行整理,如果有问题欢迎反馈</p>    <p>Java 虚拟机提供了线程转储(Thread dump)的后门,通过这个后门,可以将线程堆栈打印出来。这个后门就是通过向 Java 进程发送一个 QUIT 信号,Java 虚拟机收到该信号之后,将系统当前的 Java 线程调用堆栈打印出来。</p>    <p>kill -3 <java_pid> /jstack <java_pid></p>    <h2>1. Demo</h2>    <pre>  <code class="language-java">public class MyTest {       Object obj1 = new Object();    Object obj2 = new Object();    public void fun1() {     synchronized(obj1) {      fun2();     }    }    public void fun2() {     synchronized(obj2) {      while(true) {       System.out.print("");      }     }    }    public static void main(String[] args) {     MyTest aa = new MyTest();     aa.fun1();    }  }  </code></pre>    <h2>1.1 使用 kill -3 得到的结果</h2>    <pre>  <code class="language-java">2017-12-26 10:59:58  Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode):   "Attach Listener" daemon prio=10 tid=0x00007fd9d4001000 nid=0x6993 runnable [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "Service Thread" daemon prio=10 tid=0x00007fda3c0a8800 nid=0x695f runnable [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "C2 CompilerThread1" daemon prio=10 tid=0x00007fda3c0a6000 nid=0x695e waiting on condition [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "C2 CompilerThread0" daemon prio=10 tid=0x00007fda3c0a3000 nid=0x695d waiting on condition [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "Signal Dispatcher" daemon prio=10 tid=0x00007fda3c0a0800 nid=0x695c waiting on condition [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "Finalizer" daemon prio=10 tid=0x00007fda3c078000 nid=0x695b in Object.wait() [0x00007fda32e06000]    java.lang.Thread.State: WAITING (on object monitor)    at java.lang.Object.wait(Native Method)    - waiting on <0x0000000758c04858> (a java.lang.ref.ReferenceQueue$Lock)    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)    - locked <0x0000000758c04858> (a java.lang.ref.ReferenceQueue$Lock)    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)   "Reference Handler" daemon prio=10 tid=0x00007fda3c076000 nid=0x695a in Object.wait() [0x00007fd9ec28e000]    java.lang.Thread.State: WAITING (on object monitor)    at java.lang.Object.wait(Native Method)    - waiting on <0x0000000758c04470> (a java.lang.ref.Reference$Lock)    at java.lang.Object.wait(Object.java:503)    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)    - locked <0x0000000758c04470> (a java.lang.ref.Reference$Lock)    "main" prio=10 tid=0x00007fda3c008800 nid=0x6950 runnable [0x00007fda4588a000]    java.lang.Thread.State: RUNNABLE    at MyTest.fun2(MyTest.java:12)    - locked <0x0000000758c4a7f8> (a java.lang.Object)    at MyTest.fun1(MyTest.java:6)    - locked <0x0000000758c4a7e8> (a java.lang.Object)    at MyTest.main(MyTest.java:18)   "VM Thread" prio=10 tid=0x00007fda3c071800 nid=0x6959 runnable   "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fda3c01e000 nid=0x6951 runnable   "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fda3c020000 nid=0x6952 runnable   "GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fda3c022000 nid=0x6953 runnable   "GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fda3c024000 nid=0x6954 runnable   "GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fda3c026000 nid=0x6955 runnable   "GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fda3c027800 nid=0x6956 runnable   "GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fda3c029800 nid=0x6957 runnable   "GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fda3c02b800 nid=0x6958 runnable   "VM Periodic Task Thread" prio=10 tid=0x00007fda3c0b3000 nid=0x6960 waiting on condition   JNI global references: 123   Heap    PSYoungGen total 150528K, used 7772K [0x0000000758c00000, 0x0000000763380000, 0x0000000800000000)    eden space 129536K, 6% used [0x0000000758c00000,0x0000000759397200,0x0000000760a80000)    from space 20992K, 0% used [0x0000000761f00000,0x0000000761f00000,0x0000000763380000)    to space 20992K, 0% used [0x0000000760a80000,0x0000000760a80000,0x0000000761f00000)    ParOldGen total 342528K, used 0K [0x000000060a400000, 0x000000061f280000, 0x0000000758c00000)    object space 342528K, 0% used [0x000000060a400000,0x000000060a400000,0x000000061f280000)    PSPermGen total 21504K, used 2429K [0x0000000605200000, 0x0000000606700000, 0x000000060a400000)    object space 21504K, 11% used [0x0000000605200000,0x000000060545f738,0x0000000606700000)  </code></pre>    <h2>1.2 jstack 的结果</h2>    <pre>  <code class="language-java">2017-12-26 11:04:54  Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode):   "Attach Listener" daemon prio=10 tid=0x00007fd9d4001000 nid=0x6993 waiting on condition [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "Service Thread" daemon prio=10 tid=0x00007fda3c0a8800 nid=0x695f runnable [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "C2 CompilerThread1" daemon prio=10 tid=0x00007fda3c0a6000 nid=0x695e waiting on condition [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "C2 CompilerThread0" daemon prio=10 tid=0x00007fda3c0a3000 nid=0x695d waiting on condition [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "Signal Dispatcher" daemon prio=10 tid=0x00007fda3c0a0800 nid=0x695c runnable [0x0000000000000000]    java.lang.Thread.State: RUNNABLE   "Finalizer" daemon prio=10 tid=0x00007fda3c078000 nid=0x695b in Object.wait() [0x00007fda32e06000]    java.lang.Thread.State: WAITING (on object monitor)    at java.lang.Object.wait(Native Method)    - waiting on <0x0000000758c04858> (a java.lang.ref.ReferenceQueue$Lock)    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)    - locked <0x0000000758c04858> (a java.lang.ref.ReferenceQueue$Lock)    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)   "Reference Handler" daemon prio=10 tid=0x00007fda3c076000 nid=0x695a in Object.wait() [0x00007fd9ec28e000]    java.lang.Thread.State: WAITING (on object monitor)    at java.lang.Object.wait(Native Method)    - waiting on <0x0000000758c04470> (a java.lang.ref.Reference$Lock)    at java.lang.Object.wait(Object.java:503)    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)    - locked <0x0000000758c04470> (a java.lang.ref.Reference$Lock)   "main" prio=10 tid=0x00007fda3c008800 nid=0x6950 runnable [0x00007fda4588a000]    java.lang.Thread.State: RUNNABLE    at MyTest.fun2(MyTest.java:12)    - locked <0x0000000758c4a7f8> (a java.lang.Object)    at MyTest.fun1(MyTest.java:6)    - locked <0x0000000758c4a7e8> (a java.lang.Object)    at MyTest.main(MyTest.java:18)   "VM Thread" prio=10 tid=0x00007fda3c071800 nid=0x6959 runnable   "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fda3c01e000 nid=0x6951 runnable   "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fda3c020000 nid=0x6952 runnable   "GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fda3c022000 nid=0x6953 runnable   "GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fda3c024000 nid=0x6954 runnable   "GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fda3c026000 nid=0x6955 runnable   "GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fda3c027800 nid=0x6956 runnable   "GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fda3c029800 nid=0x6957 runnable   "GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fda3c02b800 nid=0x6958 runnable   "VM Periodic Task Thread" prio=10 tid=0x00007fda3c0b3000 nid=0x6960 waiting on condition   JNI global references: 123  </code></pre>    <h2>1.3 分析</h2>    <p>用户线程为</p>    <pre>  <code class="language-java">"main" prio=10 tid=0x00007fda3c008800 nid=0x6950 runnable [0x00007fda4588a000] //main 表示线程名称,proi=1 表示线程优先级,tid=0x00007fda3c008800 表示线程 id,nid=0x6950 表示本地线程 id(这个查问题的时候会用到),runnable 表示线程状态,0x00007fda4588a000 表示内存地址   java.lang.Thread.State: RUNNABLE    at MyTest.fun2(MyTest.java:12) //第一个MyTest 表示类名,fun2 表示方法名,第二个 MyTest.java 表示正在调用的函数所在的源代码文件,12 表示行号    - locked <0x0000000758c4a7f8> (a java.lang.Object) //locked 表示持有锁,后面的 0x0000000758c4a7f8 表示锁的 ID    at MyTest.fun1(MyTest.java:6)    - locked <0x0000000758c4a7e8> (a java.lang.Object)    at MyTest.main(MyTest.java:18)  </code></pre>    <p>如何获得 Java 虚拟机对应的本地线程</p>    <pre>  <code class="language-java">1. jps //获取 Java 进程 ID  2. pstack <java_pid> 获得 Java 虚拟机的本地线程的堆栈  </code></pre>    <p>获得的本地线程栈如下所示</p>    <pre>  <code class="language-java">Thread 19 (Thread 0x7fda4588b700 (LWP 26960)): // Thread 19 表示线程名称,0x7fda4588b700 表示本地线程 id,LWP 26960 本地线程的另一种表示,LWP-light weight process(和上面的 nid 会一一对应,这里是十进制,上面的是十六进制)  #0 0x00007fda401b2c47 in ?? ()  #1 0x00007fda401ab8fc in ?? ()  #2 0x0000000758c4a7f8 in ?? ()  #3 0x00007fda4588a8a0 in ?? ()  #4 0x0000000000000001 in ?? ()  #5 0x00007fda4014f058 in ?? ()  #6 0x00007fda4588a8f8 in ?? ()  #7 0x00007fda4014f058 in ?? ()  #8 0x0000000758c4a7d0 in ?? ()  #9 0x0000000000000001 in ?? ()  #10 0x0000000758c4a7e8 in ?? ()  #11 0x00007fda4588a8a8 in ?? ()  #12 0x000000060545d7b0 in ?? ()  #13 0x00007fda4588a918 in ?? ()  #14 0x000000060545dc80 in ?? ()  #15 0x0000000000000000 in ?? ()   Thread 18 (Thread 0x7fda40148700 (LWP 26961)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6    Thread 17 (Thread 0x7fda3a5b3700 (LWP 26962)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6     Thread 16 (Thread 0x7fda3a4b2700 (LWP 26963)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6      Thread 15 (Thread 0x7fda3a3b1700 (LWP 26964)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6       Thread 14 (Thread 0x7fda3a2b0700 (LWP 26965)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6        Thread 13 (Thread 0x7fda3a1af700 (LWP 26966)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6         Thread 12 (Thread 0x7fda3a0ae700 (LWP 26967)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6          Thread 11 (Thread 0x7fda39fad700 (LWP 26968)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4438dad3 in GCTaskManager::get_task(unsigned int) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4438f198 in GCTaskThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6           Thread 10 (Thread 0x7fda32f08700 (LWP 26969)):  #0 0x00007fda4546ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44657c0f in os::PlatformEvent::park(long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda44619b2e in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda447e1899 in VMThread::loop() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda447e1ba0 in VMThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6            Thread 9 (Thread 0x7fd9ec28f700 (LWP 26970)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda44646ddd in ObjectMonitor::wait(long, bool, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda444b6298 in JVM_MonitorWait () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4015bd98 in ?? ()  #5 0x00007fda3c0768d8 in ?? ()  #6 0x00007fd900000000 in ?? ()  #7 0x00007fda3c076000 in ?? ()  #8 0x00007fd9ec28e5d8 in ?? ()  #9 0x0000000605202798 in ?? ()  #10 0x00007fd9ec28e648 in ?? ()  #11 0x00000006052a9e68 in ?? ()  #12 0x0000000000000000 in ?? ()             Thread 8 (Thread 0x7fda32e07700 (LWP 26971)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda44646ddd in ObjectMonitor::wait(long, bool, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda444b6298 in JVM_MonitorWait () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4015bd98 in ?? ()  #5 0x00007fda3c078a08 in ?? ()  #6 0x00007fda00000000 in ?? ()  #7 0x00007fda3c078000 in ?? ()  #8 0x00007fda32e065e8 in ?? ()  #9 0x0000000000000000 in ?? ()              Thread 7 (Thread 0x7fda32d06700 (LWP 26972)):  #0 0x00007fda4546da00 in sem_wait () from /lib64/libpthread.so.0  #1 0x00007fda446578ea in check_pending_signals(bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda44650b35 in signal_thread_entry(JavaThread*, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #7 0x00007fda44d99aad in clone () from /lib64/libc.so.6               Thread 6 (Thread 0x7fda32c05700 (LWP 26973)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a086 in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda44282648 in CompileQueue::get() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4428674a in CompileBroker::compiler_thread_loop() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #8 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #9 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #10 0x00007fda44d99aad in clone () from /lib64/libc.so.6                Thread 5 (Thread 0x7fda32b04700 (LWP 26974)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a086 in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda44282648 in CompileQueue::get() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4428674a in CompileBroker::compiler_thread_loop() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #8 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #9 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #10 0x00007fda44d99aad in clone () from /lib64/libc.so.6                 Thread 4 (Thread 0x7fda32a03700 (LWP 26975)):  #0 0x00007fda4546b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44652543 in os::PlatformEvent::park() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda4461987f in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda446e95a8 in ServiceThread::service_thread_entry(JavaThread*, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #8 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #9 0x00007fda44d99aad in clone () from /lib64/libc.so.6                  Thread 3 (Thread 0x7fda32902700 (LWP 26976)):  #0 0x00007fda4546ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  #1 0x00007fda44657c0f in os::PlatformEvent::park(long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda44619b2e in Monitor::IWait(Thread*, long) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda4461a00e in Monitor::wait(bool, long, bool) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda4478f121 in WatcherThread::sleep() const () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda4478f63e in WatcherThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6                   Thread 2 (Thread 0x7fda32801700 (LWP 27027)):  #0 0x00007fda4546eb2d in accept () from /lib64/libpthread.so.0  #1 0x00007fda44136af1 in LinuxAttachListener::dequeue() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #2 0x00007fda44136c8b in AttachListener::dequeue() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #3 0x00007fda441357da in attach_listener_thread_entry(JavaThread*, Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #4 0x00007fda447930ff in JavaThread::thread_main_inner() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #5 0x00007fda44793205 in JavaThread::run() () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #6 0x00007fda44658ca8 in java_start(Thread*) () from /usr/local/jdk1.7.0_76/jre/lib/amd64/server/libjvm.so  #7 0x00007fda45467aa1 in start_thread () from /lib64/libpthread.so.0  #8 0x00007fda44d99aad in clone () from /lib64/libc.so.6                    Thread 1 (Thread 0x7fda4588d700 (LWP 26959)):  #0 0x00007fda454682fd in pthread_join () from /lib64/libpthread.so.0  #1 0x00007fda45256505 in ContinueInNewThread0 () from /usr/local/jdk1.7.0_76/bin/../lib/amd64/jli/libjli.so  #2 0x00007fda4524b58a in ContinueInNewThread () from /usr/local/jdk1.7.0_76/bin/../lib/amd64/jli/libjli.so  #3 0x00007fda4524e0e0 in JLI_Launch () from /usr/local/jdk1.7.0_76/bin/../lib/amd64/jli/libjli.so  #4 0x0000000000400686 in main ()  </code></pre>    <p>线程状态中的 runnable 从虚拟机的角度来看,表示这个线程正在运行。但是出于 runnable 的线程不一定真正消耗 CPU。出于 runnable 的线程只能说明该线程没有阻塞在 java 的 wait 或者 sleep 方法上,同时也没有等待在锁上面。但是如果该线程调用了本地方法(比如 java.net.SocketInputStream.socketRead0) ,而本地方法出于等待状态,这个时候虚拟机是不知道本地代码中发生了什么,此时尽管当前线程实际上也是阻塞状态,但实际上显示出来的还是 runnable 状态,这种情况下是不消耗 CPU 的。如下线程堆栈所示</p>    <pre>  <code class="language-java">"Thread-243" prio=1 tid=0xa58f2048 nid=0x7ac2 runnable [0xaeedb000..0xaeedc480]          at java.net.SocketInputStream.socketRead0(Native Method)           at java.net.SocketInputStream.read(SocketInputStream.java:129)           at oracle.net.ns.Packet.receive(Unknown Source)           at oracle.net.ns.DataPacket.receive(Unknown Source)           at oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)           at oracle.net.ns.NetInputStream.read(Unknown Source)           at oracle.jdbc.driver.T4CMAREngine.getNBytes(T4CMAREngine.java:1520)           at oracle.jdbc.driver.T4CMAREngine.unmarshalNBytes()           at oracle.jdbc.driver.T4CLongRawAccessor.readStreamFromWire()           at oracle.jdbc.driver.T4CLongRawAccessor.readStream()           at oracle.jdbc.driver.T4CInputStream.getBytes(T4CInputStream.java:70)           - locked <0x934f4258> (a oracle.jdbc.driver.T4CInputStream)           - locked <0x6b0dd600> (a oracle.jdbc.driver.T4CConnection)           at oracle.jdbc.driver.OracleInputStream.needBytes()           ... ...             at org.hibernate.loader.Loader.list(Loader.java:1577)           at org.hibernate.loader.hql.QueryLoader.list()           at com.wes.timer.TimerTaskImpl.execute(TimerTaskImpl.java:627)           - locked <0x80df8ce8> (a com.wes.timer.TimerTaskImpl)           at com.wes.threadpool.RunnableWrapper.run(RunnableWrapper.java:209)           at com.wes.threadpool.PooledExecutorEx$Worker.run()           at java.lang.Thread.run(Thread.java:595)  </code></pre>    <p>那么 “. “ 和 “. “ 各表示什么含义呢? “. “ 表示当前正在执行类的初始化,”. “ 正在执行对象的构造函数。</p>    <p>除接口外,初始化一个类之前必须保证其直接父类已被初始化,并且该初始化过程是由 JVM 保证线程安全的。并非所有的类都拥有一个 方法,在以下条件中该类不会拥有 方法</p>    <ul>     <li>该类既没有声明任何类变量,也没有静态初始化语句</li>     <li>该类声明了类变量,但没有明确使用类变量初始化语句或静态初始化语句初始化</li>     <li>该类仅包含静态 final 变量的类变量初始化语句,并且类变量初始化语句是编译时常量表达式</li>    </ul>    <h2>2 锁的解读</h2>    <p>多线程中,wait() 和 sleep() 的区别。wait() 和 sleep() 有一个共同点,就是二者都会把当前的线程阻塞住,我们称之为睡眠或者等待。但二者实际上是完全不同的两个函数,有着本质的区别:</p>    <ul>     <li>当线程执行到 wait() 方法上,当前线程会释放监视锁,此时其他线程可以占有该锁,一旦 wait() 方法执行完成,当前线程又继续持有该锁,直到执行完该锁的作用域。<br> wait() 方法退出的原因:      <ul>       <li>达到了等待时间,自动退出。如 wati(5000), 5秒后 wait 方法退出</li>       <li>其他的线程调用了该锁的 notify() 方法。当如果多个线程在等待同一个锁,只有一个线程会被通知到</li>      </ul> </li>     <li>sleep() 与锁操作无关,如果该方法恰好在一个锁的保护范围之内,当前线程及时在执行 sleep() 的时候,仍然继续保持监视锁。该方法实际上仅仅完成等待或者睡眠的语义</li>    </ul>    <p><img src="https://simg.open-open.com/show/46bb32cb2df2ace475a8b95786e73aa4.png"></p>    <p>与锁相关的三个重要信息如下:</p>    <ul>     <li>当一个线程占有一个锁的时候,线程堆栈中会打印 – locked<0x22bffb60></li>     <li>当一个线程正在等待其他线程释放该锁,线程堆栈中会打印 – waiting to lock<0x22bffb60></li>     <li>当一个线程占有一个锁,但有执行到该锁的 wait() 上,线程堆栈中首先打印 locked, 然后又会打印 – waiting on<0x22c03c60></li>    </ul>    <h2>3 线程状态的解读</h2>    <h2>3.1 RUNNABLE</h2>    <p>RUNNABLE 从虚拟机的角度看,线程处于正在运行的状态,正在运行不一定消耗 CPU,比如下面的线程栈,在等待 IO</p>    <pre>  <code class="language-java">Thread-39" daemon prio=1 tid=0x08646590 nid=0x666d runnable [5beb7000..5beb88b8]    java.lang.Thread.State: RUNNABLE        at java.net.SocketInputStream.socketRead0(Native Method)       at java.net.SocketInputStream.read(SocketInputStream.java:129)       at java.io.BufferedInputStream.fill(BufferedInputStream.java:183)       at java.io.BufferedInputStream.read(BufferedInputStream.java:201)       - locked <0x47bfb940> (a java.io.BufferedInputStream)       at org.postgresql.PG_Stream.ReceiveChar(PG_Stream.java:141)       at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:68)       - locked <0x47bfb758> (a org.postgresql.PG_Stream)       at org.postgresql.Connection.ExecSQL(Connection.java:398)  </code></pre>    <p>下面的线程栈则消耗 CPU</p>    <pre>  <code class="language-java">"Thread-444" prio=1 tid=0xa4853568 nid=0x7ade runnable [0xafcf7000..0xafcf8680]    java.lang.Thread.State: RUNNABLE        //实实在在对应 CPU 指令       at org.apache.commons.collections.ReferenceMap.getEntry(Unknown Source)      at org.apache.commons.collections.ReferenceMap.get(Unknown Source)      at org.hibernate.util.SoftLimitMRUCache.get(SoftLimitMRUCache.java:51)      at org.hibernate.engine.query.QueryPlanCache.getNativeSQLQueryPlan()      at org.hibernate.impl.AbstractSessionImpl.getNativeSQLQueryPlan()      at org.hibernate.impl.AbstractSessionImpl.list()      at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:164)      at com.mogoko.struts.logic.user.LeaveMesManager.getCommentByShopId()      at com.mogoko.struts.action.shop.ShopIndexBaseInfoAction.execute()      ......  </code></pre>    <p>下面的代码正在进行 JNI 本地方法调用,具体是否消耗 CPU,要看 TcpRecvExt 的实现,如果 TcpRecvExt 是纯运算代码,那么实实在在的消耗 CPU,如果 TcpRecvExt() 中存在挂起的代码,那么该线程尽管显示 RUNNABLE,实际也不消耗 CPU 的</p>    <pre>  <code class="language-java">"ClientReceiveThread" daemon prio=1 tid=0x99dbacf8 nid=0x7988 runnable [...]    java.lang.Thread.State: RUNNABLE        at com.pangu.network.icdcomm.htcpapijni.TcpRecvExt(Native Method)       at com.pangu.network.icdcomm.IcdComm.receive(IcdComm.java:60)       at com.msp.client.MspFactory$ClientReceiveThread.task(MspFactory.java:333)       at com.msp.system.TaskThread.run(TaskThread.java:94)  </code></pre>    <h2>3.2 TIMED_WAITING(on object monitor)</h2>    <p>TIMED_WAITING(on object monitor) 表示当前线程被挂起一段时间,说明该线程正在执行 object.wait(int time) 方法,不消耗 CPU</p>    <pre>  <code class="language-java">"JMX server" daemon prio=6 tid=0x0ad2c800 nid=0xdec in Object.wait() [...]    java.lang.Thread.State: TIMED_WAITING (on  object  monitor)      at java.lang.Object.wait(Native Method)      - waiting on <0x03129da0> (a [I)      at com.sun.jmx.remote.internal.ServerComm$Timeout.run(ServerComm.java:150)      - locked <0x03129da0> (a [I)      at java.lang.Thread.run(Thread.java:620)  </code></pre>    <h2>3.3 TIME_WAITING(sleeping)</h2>    <p>TIME_WAITING(sleeping) 表示当前线程被挂起一段时间,正在执行 Thread.sleep(int time) 方法,不消耗 CPU</p>    <pre>  <code class="language-java">"Comm thread" daemon prio=10 tid=0x00002aaad4107400 nid=0x649f waiting on condition      [0x000000004133b000..0x000000004133ba00]           java.lang.Thread.State: TIMED_WAITING (sleeping)               at java.lang.Thread.sleep(Native Method)               at org.apache.hadoop.mapred.Task$1.run(Task.java:282)               at java.lang.Thread.run(Thread.java:619)  </code></pre>    <h2>3.4 TIME_WAITING(parking)</h2>    <p>TIME_WAITING(parking) 表示线程被挂起,正在执行 Thread.sleep(int time) 方法,不消耗 CPU</p>    <pre>  <code class="language-java">"RMI TCP" daemon prio=6 tid=0x0ae3b800 nid=0x958 waiting on condition [0x17eff000..0x17effa94]      java.lang.Thread.State: TIMED_WAITING (parking)          at sun.misc.Unsafe.park(Native Method)          - parking to wait for  <0x02f49f58> (a java.util.concurrent.SynchronousQueue$TransferStack)          at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:179)          at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)          at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:323)          at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:871)          at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:495)          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:693)          at java.lang.Thread.run(Thread.java:620)  </code></pre>    <p>3.5 WAITING(on object monitor)</p>    <p>WAITING(on object monitor) 表示当前线程被挂起,正在执行 obj.wait() 只能通过 notify() 唤醒,不消耗 CPU</p>    <pre>  <code class="language-java">"IPC Client" daemon prio=10 tid=0x00002aaad4129800 nid=0x649d in Object.wait() [0x039000..0x039d00]         java.lang.Thread.State: WAITING (on object monitor)               at java.lang.Object.wait(Native Method)               - waiting on <0x00002aaab3acad18>; (aorg.apache.hadoop.ipc.Client$Connection)               at java.lang.Object.wait(Object.java:485)               at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:234)               - locked <0x00002aaab3acad18> (aorg.apache.hadoop.ipc.Client$Connection)               at org.apache.hadoop.ipc.Client$Connection.run(Client.java:273)  </code></pre>    <h2>3.6 小结</h2>    <p>处于 TIME_WAITING、WAITING 状态的线程不消耗 CPU,处于 RUNNABLE 状态的线程要结合线程代码性质判断,是否消耗 CPU</p>    <ul>     <li>如果是纯 Java 运算代码,消耗 CPU</li>     <li>如果是网络 IO,很少消耗 CPU</li>     <li>如果是本地代码,结合本地代码性质进行判断(通过 pstack/gstack 获取本地线程栈),如果是纯运算代码,消耗 CPU,如果被挂起,不消耗 CPU,如果是 IO,则不怎么消耗 CPU</li>    </ul>    <h2>4 接触线程堆栈进行问题分析</h2>    <p>线程堆栈翻译系统当前时间正在执行什么代码。根据这些信息就可以知道系统当前到底在做什么。看堆栈一般从三个视角来分析:堆栈的局部信息、一次堆栈的统计信息(全局信息)、多个堆栈的对比信息</p>    <ol>     <li>视角一 从一次的堆栈信息中,我们能直接获取以下直接的信息:<br> a. 当前每一个线程的调用层次关系(调用上下文),即每个线程当前正在调用哪些函数<br> b. 当前每个线程当前的状态:持有了哪些锁?在等待哪些锁?</li>     <li>视角二 从一次的堆栈信息中,我们还可以获取下面的统计方面信息:<br> a. 当前锁的争用情况: <pre>  <code class="language-java">i. 是不是很多线程在等待同一个锁,如果很多线程等待同一个锁,说明这个系统已经出现了性能瓶颈,并导致了锁竞争。还可能是某个线程长时间持有一个锁不释放(比如这个线程正陷入了死循环的代码或者正在请求一个资源,很长时间得不到唤醒)  ii. 是否有死锁,哪些线程形成了死锁</code></pre> b. 当前大多数线程在干什么,即正在执行什么代码<br> c. 当前线程总数量</li>     <li>视角三 从多次(即前后打印多次堆栈进行对比)的堆栈信息中,我们还可以获得下面的统计对比方面的信息:<br> a. 一个线程是否在长期执行。如果每次打印的堆栈,某一个线程一直处于同样的调用上下文中,那么说明这个线程一直在执行这段代码,此时就要根据代码逻辑检查,这种长期执行是否合理?<br> b. 某个线程是否存在长期获取不到锁的情况?线程是不是永远得不到唤醒?如果每次打印的堆栈,某一个线程一直在等待一个锁,那么就需要检查占有这个锁的线程为什么不释放锁?</li>    </ol>    <p>通过多个视角进行观察,线程堆栈在定位如下类型的问题上非常有帮助:</p>    <ul>     <li>线程死锁分析(视角一)</li>     <li>Java 代码导致的 CPU 过高分析(视角三)</li>     <li>死循环分析(视角三)</li>     <li>资源不足分析(视角二)</li>     <li>性能瓶颈分析(视角二和视角三)</li>    </ul>    <h2>4.1 线程死锁分析</h2>    <p>死锁的原因:当两个或多个线程正在等待被对方占有的锁,死锁就会发生。死锁会导致两个线程无法继续运行,被永远挂起。下图中在时间点0,线程 0 占有了 lock0,线程1占有了 lock1,然后时间点 2 线程0 企图获取 lock 1,线程 1 企图获取 lock 0.由于这两个线程互相要等待被对方占有的锁,自己才能继续,因此造成死锁,二者永远没有机会继续运行下去。</p>    <p><img src="https://simg.open-open.com/show/a0dde9f0d1645a89cb1d649aaf2b01b2.png"></p>    <p>一个简单的例子</p>    <pre>  <code class="language-java">package MyPackage;  public class Main {   public static void main(String[] args) {    Object lockobj1 = new Object();    Object lockobj2 = new Object();    TestThread1 thread1 = new TestThread1(lockobj1,lockobj2);    thread1.start();    TestThread2 thread2 = new TestThread2(lockobj1,lockobj2);    thread2.start();   }  }                        package MyPackage;  public class TestThread1  extends Thread{   Object lock1 = null;   Object lock2 = null;   public TestThread1(Object lock1_,Object lock2_)   {    lock1 = lock1_;    lock2 = lock2_;    this.setName(this.getClass().getName());   }   public void run()   {    fun();   }   public void fun(){    synchronized(lock1){     try{      Thread.sleep(2);     }     catch(Exception e){      e.printStackTrace();     }     synchronized(lock2){     }       }   }  }                        package MyPackage;  public class TestThread2  extends Thread{   Object lock1 = null;   Object lock2 = null;   public TestThread2(Object lock1_,Object lock2_)   {    lock1 = lock1_;    lock2 = lock2_;    this.setName(this.getClass().getName());   }   public void run()   {    fun();   }   public void fun(){    synchronized(lock2){     try{      Thread.sleep(2);     }     catch(Exception e){      e.printStackTrace();     }     synchronized(lock1){     }    }   }  }  </code></pre>    <p>jstack 得到结果如下(截取后的堆栈信息)</p>    <pre>  <code class="language-java">Found one Java-level deadlock:  =============================  "MyPackage.TestThread2":    waiting to lock monitor 0x00007fa1b00062c8 (object 0x0000000758c4aa78, a java.lang.Object),    which is held by "MyPackage.TestThread1"  "MyPackage.TestThread1":    waiting to lock monitor 0x00007fa1b0004ed8 (object 0x0000000758c4aa88, a java.lang.Object),    which is held by "MyPackage.TestThread2"       Java stack information for the threads listed above:  ===================================================  "MyPackage.TestThread2":    at MyPackage.TestThread2.fun(TestThread2.java:24)   - waiting to lock <0x0000000758c4aa78> (a java.lang.Object)   - locked <0x0000000758c4aa88> (a java.lang.Object)   at MyPackage.TestThread2.run(TestThread2.java:13)  "MyPackage.TestThread1":     at MyPackage.TestThread1.fun(TestThread1.java:24)   - waiting to lock <0x0000000758c4aa88> (a java.lang.Object)   - locked <0x0000000758c4aa78> (a java.lang.Object)     at MyPackage.TestThread1.run(TestThread1.java:13)      Found 1 deadlock.  </code></pre>    <p>要避免死锁的问题,唯一的办法就是修改代码。一个可靠的并发系统可以说是设计出来的,而不是通过改 Bug 改出来的,这一点与其他类型的 Bug 有很大的不同。另外,死锁的两个或多个线程是不消耗 CPU 的,有的人认为 CPU 100% 的使用率是线程死锁造成的,这个说法是不对的。无限循环(死循环),并且循环中代码都是 CPU 密集型,才有可能导致 CPU 的 100% 使用率,像 socket 或者数据库等 IO 操作是不怎么消耗 CPU 的。</p>    <h2>4.2 Java 代码死循环等导致的 CPU 过高分析</h2>    <p>当系统负载大的时候,CPU 的使用率会较高,但是不正确的代码也会导致 CPU 过高,比如死循环。当发生 CPU 过高的问题,我们需要能够分析 CPU 高的真正原因。通过多次打印堆栈,前后堆栈对比找到一直在运行的线程,这些线程都是可疑的线程,具体的步骤如下:</p>    <ul>     <li>通过 jstack 等获取第一次的堆栈信息</li>     <li>等待一段时间,再获取第二次堆栈信息</li>     <li>预处理两次堆栈信息,首先去掉处于 sleeping 或者 waiting 状态的线程,因为这种线程是不消耗 CPU 的</li>     <li>比较第一次堆栈和第二次堆栈预处理后的线程,找出这段时间一直活跃的线程,如果两次堆栈中同一个线程处于同样的调用上下文,那么就应该列为重点怀疑对象。结合代码进行检查</li>    </ul>    <p>如果通过堆栈定位,没有发现热点代码段,那么 CPU 过高可能是不恰当的内存设置导致的频繁 GC,而从导致 CPU 过高</p>    <p>第一次 jstack 的输出如下:</p>    <pre>  <code class="language-java">"Thread-444" prio=1 tid=0xa4853568 nid=0x7ade runnable [0xafcf7000..0xafcf8680]          at org.apache.commons.collections.ReferenceMap.getEntry(Unknown Source)           at org.apache.commons.collections.ReferenceMap.get(Unknown Source)           at org.hibernate.util.SoftLimitMRUCache.get(SoftLimitMRUCache.java:51)           at org.hibernate.engine.query.QueryPlanCache.getNativeSQLQueryPlan()           at org.hibernate.impl.AbstractSessionImpl.getNativeSQLQueryPlan()           at org.hibernate.impl.AbstractSessionImpl.list()           at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:164)           at com.mogoko.struts.logic.user.LeaveMesManager.getCommentByShopId()           at com.mogoko.struts.action.shop.ShopIndexBaseInfoAction.execute()           ......  </code></pre>    <p>第二次 jstack 的输出如下:</p>    <pre>  <code class="language-java">"Thread-444" prio=1 tid=0xa4853568 nid=0x7ade runnable [0xafcf7000..0xafcf8680]          at org.apache.commons.collections.ReferenceMap.getEntry(Unknown Source)           at org.apache.commons.collections.ReferenceMap.get(Unknown Source)           at org.hibernate.util.SoftLimitMRUCache.get(SoftLimitMRUCache.java:51)           at org.hibernate.engine.query.QueryPlanCache.getNativeSQLQueryPlan()           at org.hibernate.impl.AbstractSessionImpl.getNativeSQLQueryPlan()           at org.hibernate.impl.AbstractSessionImpl.list()           at org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:164)           at com.mogoko.struts.logic.user.LeaveMesManager.getCommentByShopId()           at com.mogoko.struts.action.shop.ShopIndexBaseInfoAction.execute()           ... ...  </code></pre>    <p>在长达 5 分钟的时间里,一直在执行 org.apache.commons.collections.ReferenceMap.getEntry() 方法,说明这个方法一直没有结束。具体属于正常还是 Bug,需要结合源码进行判断。</p>    <p>导致死循环的代码属于代码的 Bug,这种类型的问题,重现比较难,但一旦重现,就比较容易解决。一般通过分析代码可以发现问题。导致死循环的原因大致有如下几个:</p>    <ul>     <li>HashMap 等线程不安全的容器,用在多线程读/写的场合,导致 HashMap 的方法调用形成死循环(可以参考2015-08-19 Hadoop升级过程出现异常问题casestudy)</li>     <li>多线程场合,对共享变量没有进行保护,导致数据混乱,从而使循环退出条件永远不满足,导致死循环的发生,比如:      <ul>       <li>for, while 循环中的退出条件永远不满足导致的死循环</li>       <li>链表等数据结构首尾相连,导致遍历永远无法停止</li>      </ul> </li>     <li>其他编码错误</li>    </ul>    <h2>4.3 高消耗 CPU 代码的常用分析方法</h2>    <p>借助操作系统提供的性能分析工具进行 CPU 消耗分析。上面介绍了针对死循环导致 CPU 遍告的分析方法,但是有些场景没有死循环也会导致 CPU 变高,我们需要借助系统提供的一些性能分析工具进行分析。在 Linux 中使用 top -p $pid 进行,步骤如下</p>    <ol>     <li>top -p $pid</li>     <li>输入 “H” 查看该进程所有线程的统计情况(CPU 等)如下所示(对于某些操作系统版本不支持 top 命令进行线程统计的</li>    </ol>    <pre>  <code class="language-java">下面 PID 表示线程 ID,即 LWPID    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    25059 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25060 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.04 java    25061 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25062 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25063 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25064 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25065 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25066 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25067 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25068 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25069 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:01.19 java    25070 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25071 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25072 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25073 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25074 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25075 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java    25076 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:24.31 java    25077 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:01.00 java    25078 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:01.05 java    25141 sankuai   20   0 9918m  55m 8264 S  0.0  0.2   0:00.00 java  </code></pre>    <p>具体导致问题的代码可能是:</p>    <ol>     <li>纯 Java 代码导致的 CPU 过高</li>     <li>Java 代码中调用的 JNI 代码导致的 CPU 过高</li>     <li>虚拟机自身的代码导致的 CPU 过高,比如 GC 的 bug 等</li>    </ol>    <p>可以如下进行分析:</p>    <ol>     <li>top -p $pid 获取最消耗 CPU 的本地线程 ID</li>     <li>jstack 打印出 Java 堆栈</li>     <li>pstack 打印本地线程堆栈</li>     <li>在 Java 线程堆栈中查找 nid=<第一步中最消耗 CPU 时间的线程 id><br> a. 如果找到的线程是纯 Java 代码,则该 Java 代码导致 CPU 过高<br> b. 如果找到的是 Native code,说明导致 CPU 过高的问题在 JNI 调用中<br> c. 如果找不到对于的线程,则有如下两种可能: <pre>  <code class="language-java">i. JNI 调用中重新创建的线程来执行,在 Java 线程堆栈中不存在该线程的信息  ii. 虚拟机自身的代码导致的 CPU 过高,如堆内存不够导致的频繁 fullgc,或虚拟机的 bug 等</code></pre> </li>    </ol>    <h2>4.4 资源不足等导致的性能下降分析</h2>    <p>大多数时候资源不足和性能瓶颈是统一类问题。</p>    <ul>     <li>大量的线程停在同样的调用上下文上</li>     <li>资源数量配置太少,系统当前压力比较大,资源不足导致某些线程不能及时获得资源而等待</li>     <li>获得资源的线程把持资源时间太久(比如获取数据库连接),导致资源不足</li>     <li>设计不合理导致资源占用时间过久,如 SQL 语句设计不恰当,或者没有索引导致的数据库访问太慢等</li>     <li>资源用完后,某些异常情况下,没有关闭或回收,导致可用资源泄漏或减少</li>    </ul>    <h2>4.5 线程不退出导致的系统挂死分析</h2>    <p>线程挂死,表现为每次打印线程堆栈,该线程都在同一个调用上下文,该类问题可以通过多次打印堆栈进行分析,导致线程挂死的原因有很多,如:</p>    <ul>     <li>线程正在执行死循环代码</li>     <li>资源不足或资源泄漏,造成当前线程阻塞在锁对象上,长期得不到唤醒</li>     <li>如果当前程序和外部通信,当外部程序挂起无返回时,也会导致当前线程挂起</li>    </ul>    <h2>4.6 其他问题分析</h2>    <ul>     <li>多个锁导致的锁链分析 – 多个线程等待同一个锁</li>     <li>进行性能瓶颈分析</li>    </ul>    <h2>4.7 线程堆栈不能分析什么问题</h2>    <p>线程堆栈定位问题,只能定位在当前线程上留下痕迹的问题,如线程死锁,线程挂死等,另外,定位由于锁的设计不恰当导致的性能问题,线程堆栈也是最有效的工具,因为性能问题时时刻刻反映在当前的线程统计状况上。但线程堆栈对于不留痕迹的问题,就无能为力了。例如:</p>    <ul>     <li>线程为什么跑飞</li>     <li>并发的 Bug 导致的数据混乱</li>     <li>数据库梭镖的问题</li>     <li>在实际的系统中,系统的问题分为几种类型:</li>    </ul>    <p>堆栈中能够表现出问题的,使用线程堆栈进行定位</p>    <ul>     <li>无法在线程中留下痕迹的问题定位,需要依赖一个好的日志设计</li>     <li>非常隐蔽的问题,只能依赖于丰富的代码经验,如多线程导致的数据混乱以及幽灵代码等</li>    </ul>    <p> </p>    <p>来自:http://www.klion26.com/2018/01/06/线程堆栈分析/</p>    <p> </p>