使用dotnet-dump 查找 .net core 3.0 占用CPU 100%的原因解析
公司的产品一直紧跟.netcore3.0preview不断升级,部署到Linux服务器后,偶尔会出现某个进程CPU占用100%.
由于服务部署在云上,不能使用远程调试;在局域网内的Linux服务器或Windows开发机上又不能重现这个问题,联想到Java的jstack,很是羡慕啊.想到.netcore已经出来这么久了,还是试着找找看吧,结果还真找到一篇博客Introducingdiagnosticsimprovementsin.NETCore3.0
这篇文章介绍了3个工具
•dotnet-counters:实时统计runtime的状况,包括CPU、内存、GC、异常等
•dotnet-trace:类似性能探测器
•dotnet-dump:程序崩溃时使用该工具
这次使用的是dotnet-dump,即使程序没有崩溃,也可以dump程序快照,用于分析
实验环境
ubuntu-16.04.5-desktop-amd64
SDK3.0.100-preview6-012264
1.新建一个简单Console程序(只能是.netcore3.0的程序,不支持.netcore2.2),模拟CPU占用100%的情况
mkdirNetCoreDumpTest&&cdNetCoreDumpTest dotnetnewconsole
编辑Program.cs
namespaceNetCoreDumpTest { usingSystem; usingSystem.Threading.Tasks; classProgram { staticvoidMain(string[]args) { Task.Factory.StartNew(()=>PrintNumber("Print",5)); Console.WriteLine("Pressanykeytoexit."); Console.ReadKey(); } staticvoidPrintNumber(stringmessage,intstartNumber) { varnumber=startNumber; while(true) Console.WriteLine($"{message}{number++}"); } } }
2.安装dotnet-dump
dotnettoolinstall--globaldotnet-dump--version1.0.4-preview6.19311.1
提示
Ifyouareusingbash,youcanaddittoyourprofilebyrunningthefollowingcommand:
cat<<\EOF>>~/.bash_profile
#Add.NETCoreSDKtools
exportPATH="$PATH:/home/****/.dotnet/tools"
EOF
Youcanaddittothecurrentsessionbyrunningthefollowingcommand:
exportPATH="$PATH:/home/****/.dotnet/tools"
Youcaninvokethetoolusingthefollowingcommand:dotnet-dump
Tool'dotnet-dump'(version'1.0.4-preview6.19311.1')wassuccessfullyinstalled.
建议将$HOME/.dotnet/tools加入到PATH,好吧,照着做吧,记得使用下面的命令使设置立即生效
source~/.bash_profile
3.使用dotnetNetCoreDumpTest.dll启动我们的问题程序,然后使用 ps-ef|grepdotnet 查看程序的进程ID,可以看到进程ID是3411
ps-ef|grepdotnet z*****e341114642207:51pts/800:00:59dotnetNetCoreDumpTest.dll z*****e34312935007:55pts/900:00:00grep--color=autodotnet
针对进程3411,我们还需要知道是哪个线程占CPU,使用top-Hp3411可以列出所有线程,由于top每隔3秒刷新一次,所以可能需要多观察几秒才能看到具体是哪个线程占用CPU比较高,这里我们可以看到是PID=3418的线程(Linux的进程ID和线程ID请自行了解一下).
top-Hp3411 PIDUSERPRNIVIRTRESSHRS%CPU%MEMTIME+COMMAND 3418z*****e20029977002906022400R10.31.40:20.68dotnet 3411z*****e20029977002906022400S0.01.40:00.11dotnet 3412z*****e20029977002906022400S0.01.40:00.02dotnet 3413z*****e20029977002906022400S0.01.40:00.00dotnet 3414z*****e20029977002906022400S0.01.40:00.00dotnet 3415z*****e20029977002906022400S0.01.40:00.01dotnet 3416z*****e20029977002906022400S0.01.40:00.00dotnet 3417z*****e20029977002906022400S0.01.40:00.00dotnet 3421z*****e20029977002906022400S0.01.40:00.00dotnet
获取dump,只能正对进程进行dump,所以我们输入的是3411
dotnet-dumpcollect-p3411 Writingminidumpwithheapto/tmp/core_20190623_075649 Complete
4.分析
dotnet-dumpanalyzecore_20190623_075649
使用clrthreads查看所有线程
>clrthreads
ThreadCount: 4
UnstartedThread: 0
BackgroundThread:3
PendingThread: 0
DeadThread: 0
HostedRuntime: no
Lock
DBG IDOSIDThreadOBJ StateGCMode GCAllocContext Domain CountAptException
0 1 d530000000001307D80 20020Preemptive 0000000000000000:000000000000000000000000013064501 Ukn
4 2 d57000000000135BBD0 21220Preemptive 0000000000000000:000000000000000000000000013064500 Ukn(Finalizer)
6 3 d5900007F666C0009F0 1020220Preemptive 0000000000000000:000000000000000000000000013064500 Ukn(ThreadpoolWorker)
7 4 d5a000000000130DA40 1021220Preemptive 00007F6678106860:00007F6678106F2000000000013064501 Ukn(ThreadpoolWorker)
我们关心的线程3418的16进制是d5a,也就是最后一行,它的DBG是7,我们需要使用setthread7,将其设置为 当前操作的线程
然后使用clrstack获取线程调用信息
>setthread7 >clrstack OSThreadId:0xd5a(7) ChildSPIPCallSite 00007F671556155800007f671a2bd4bd[InlinedCallFrame:00007f6715561558]Interop+Sys.Write(System.Runtime.InteropServices.SafeHandle,Byte*,Int32) 00007F671556155800007f669f669a9e[InlinedCallFrame:00007f6715561558]Interop+Sys.Write(System.Runtime.InteropServices.SafeHandle,Byte*,Int32) 00007F671556154000007F669F669A9EILStubClass.IL_STUB_PInvoke 00007F67155615E000007F669F67333ESystem.ConsolePal.Write(Microsoft.Win32.SafeHandles.SafeFileHandle,Byte*,Int32,Boolean) 00007F67155616A000007F669F67360CSystem.ConsolePal.Write(Microsoft.Win32.SafeHandles.SafeFileHandle,Byte[],Int32,Int32,Boolean)[/_/src/System.Console/src/System/ConsolePal.Unix.cs@1236] 00007F67155616C000007F669F672B2ASystem.IO.StreamWriter.Flush(Boolean,Boolean)[/_/src/System.Private.CoreLib/shared/System/IO/StreamWriter.cs@261] 00007F671556171000007F669F6729F3System.IO.StreamWriter.WriteLine(System.String)[/_/src/System.Private.CoreLib/shared/System/IO/StreamWriter.cs@474] 00007F671556176000007F669F6727D3System.IO.TextWriter+SyncTextWriter.WriteLine(System.String)[/_/src/System.Private.CoreLib/shared/System/IO/TextWriter.cs@891] 00007F67155617A000007F669F672770System.Console.WriteLine(System.String)[/_/src/System.Console/src/System/Console.cs@550] 00007F67155617C000007F669F663791NetCoreDumpTest.Program.PrintNumber(System.String,Int32)[/home/zhouke/NetCoreDumpTest/Program.cs@18] 00007F671556180000007F669F6636D9NetCoreDumpTest.Program+<>c.b__0_0() 00007F671556182000007F669F1872A1System.Threading.Tasks.Task.InnerInvoke()[/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs@2466] 00007F671556184000007F669F18CBC2System.Threading.Tasks.Task+<>c.<.cctor>b__274_0(System.Object)[/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs@2445] 00007F671556185000007F669F171AF2System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread,System.Threading.ExecutionContext,System.Threading.ContextCallback,System.Object)[/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs@289] 00007F671556189000007F669F187111System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.TaskByRef,System.Threading.Thread)[/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs@2406] 00007F671556191000007F669F186F28System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread)[/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs@2344] 00007F671556193000007F669F186EBBSystem.Threading.Tasks.Task.ExecuteFromThreadPool(System.Threading.Thread) 00007F671556194000007F669F17B754System.Threading.ThreadPoolWorkQueue.Dispatch()[/_/src/System.Private.CoreLib/shared/System/Threading/ThreadPool.cs@663] 00007F67155619C000007F669F169A5BSystem.Threading._ThreadPoolWaitCallback.PerformWaitCallback()[/_/src/System.Private.CoreLib/src/System/Threading/ThreadPool.CoreCLR.cs@29] 00007F6715561D5000007f6718a1ccaf[DebuggerU2MCatchHandlerFrame:00007f6715561d50]
哗啦啦一大片,有点Java调用堆栈的味道,不过我们还是找到了我们的问题代码
NetCoreDumpTest.Program.PrintNumber(System.String,Int32)
有时候我们想知道传入的什么参数导致CPU占用高,可以给clrstack加上参数-a
>clrstack-a .............. 00007F0DD6FFC7C000007F0D6EEF3791NetCoreDumpTest.Program.PrintNumber(System.String,Int32)[/home/zhouke/NetCoreDumpTest/Program.cs@18] PARAMETERS: message(0x00007F0DD6FFC7E8)=0x00007f0d4800b8b0 startNumber(0x00007F0DD6FFC7E4)=0x0000000000000005 LOCALS: 0x00007F0DD6FFC7E0=0x000000000014e42b 0x00007F0DD6FFC7DC=0x0000000000000001 ...............
可以看到PARAMETERS里,startNumber作为值类型,可以直接看到数值为5,而message是引用类型,指向0x00007f0d4800b8b0,这时候需要用到dumpobj命令
>dumpobj0x00007f0d4800b8b0 Name:System.String MethodTable:00007f0d6ef70f90 EEClass:00007f0d6eede1c0 Size:32(0x20)bytes File:/home/zhouke/dotnet/shared/Microsoft.NETCore.App/3.0.0-preview6-27804-01/System.Private.CoreLib.dll String:Print Fields: MTFieldOffsetTypeVTAttrValueName 00007f0d6ef6a138400022b8System.Int321instance5_stringLength 00007f0d6ef66f38400022ccSystem.Char1instance50_firstChar 00007f0d6ef70f90400022d108System.String0static00007f0d47fff360Empty
好了,可以看到它是一个字符串,内容为"Print"
假如message是一个复杂类型,可以查看Fields下面的信息进一步查看
clrstack还有一个实验性质的参数-i,协助查看各种变量信息,需要用到lldb,按照官方教程,我暂时没有实验成功.
查看进程ID和线程ID,更方便的方法是htop(需要安装),然后按F4进行过滤,输入dotnet即可
这张图是重新运行问题程序的结果,进程ID和线程ID与前面不一样
第二行白色的是进程ID=1650,第一行CPU占用高,是问题线程ID=1658
总结
以上所述是小编给大家介绍的使用dotnet-dump查找.netcore3.0占用CPU100%的原因解析,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对毛票票网站的支持!
如果你觉得本文对你有帮助,欢迎转载,烦请注明出处,谢谢!