现象

  • java以之web服务器突然挂掉,无别jvm相关日志,重开后赶忙再度挂掉
  • 重复重启,不久后机器挂掉【机器也虚拟机】

 

相关日志

  • 到宿主机器还开机器后,查看dmesg,可见到有连带消息:

      [46019.223344] 3065881 pages non-shared
      [46019.223348] Out of memory: kill process 16211 (java) score 1135790 or a child
      [46019.225305] Killed process 16211 (java)
      [46019.293729] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
      [46019.293734] java cpuset=/ mems_allowed=0
      [46019.293750] Pid: 2187, comm: java Not tainted 2.6.32-5-amd64 #1
      [46019.293752] Call Trace:
      [46019.293761]  [<ffffffff810b643c>] ? oom_kill_process+0x7f/0x23f
      [46019.293765]  [<ffffffff8106bb5e>] ? timekeeping_get_ns+0xe/0x2e
      [46019.293768]  [<ffffffff810b6960>] ? __out_of_memory+0x12a/0x141
      [46019.293771]  [<ffffffff810b6ab7>] ? out_of_memory+0x140/0x172
      [46019.293775]  [<ffffffff810ba81c>] ? __alloc_pages_nodemask+0x4ec/0x5fc
      [46019.293780]  [<ffffffff810bbd85>] ? __do_page_cache_readahead+0x9b/0x1b4
      [46019.293784]  [<ffffffff810bbeba>] ? ra_submit+0x1c/0x20
      [46019.293787]  [<ffffffff810b4b87>] ? filemap_fault+0x17d/0x2f6
      [46019.293793]  [<ffffffff810cab26>] ? __do_fault+0x54/0x3c3
      [46019.293796]  [<ffffffff810cce7a>] ? handle_mm_fault+0x3b8/0x80f
      [46019.293801]  [<ffffffff812ff306>] ? do_page_fault+0x2e0/0x2fc
      [46019.293805]  [<ffffffff812fd1a5>] ? page_fault+0x25/0x30
    
  • 征经过占用内存过强,被系统的oom_killer强行kill所致

  • 猜机器会挂的原由是:该过程占用内存过不久了高,oom_killer来不及动作虽已遭到殃及

其三、POS软件系统前台收银管理(C/S)

启分析以及方案拟

  • 品尝再次复现,用jdb attach 到java进程展开remote
    debug,【久经波折后】发现某分页请求数据的接口会间歇性的接触该场景,用vmstat观察机器内存以,发现发生问题常常内存下降非常敏捷,约每秒100M,数十秒内即会吃老机器所有内存

  • 该java进程Xmx配置也6G,且独占该机器。机器的内存有11G

  • 构成之前的阅历,进程占用内存远超Xmx的状况,初步认为非常可能是jni所予。将工程中享有最新用到jni的地方review,未能找到明确线索。

  • 当内存急剧下降期间对java进程取mem dump和jstack都不能看到异常现象

  • 设想到内存消耗的速度,决定就此strace来拘禁故障中的系统调用,命令为:

      strace -f -t -T -e trace=all -p 20390 2<&1 | tee -a 20390.strace.log
    
  • 故障发生时,需要赶紧杀掉进程,以避免祸及系统挂掉。而gdb正好可以以经过suspend,并而观察线程堆栈,可以为此来援助分析

  • 品后意识strace和gdb不可知而且attach到过程,于是开始将分析方案定为:

    • 并发故障时,先用strace打起系统调用,打十秒左右
    • 停掉strace,用gdb
      attach到过程,使进程挂于,一方面阻止内存的淘,另一方面可用来分析

      1.前台收银功能:如图

复现及分析

  • 总的来说方案,得到故障中系统调用异常的地方啊:

      [pid 21832] 17:15:26 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
      [pid 21751] 17:15:26 futex(0x42564324, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1419585326, 152497000}, ffffffff <unfinished ...>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1e5d000, 32768, PROT_READ|PROT_WRITE <unfinished ...>
      [pid 21832] 17:15:26 <... clock_gettime resumed> {19389, 378140695}) = 0 <0.000111>
      [pid 21747] 17:15:26 <... mprotect resumed> ) = 0 <0.000096>
      [pid 21832] 17:15:26 clock_gettime(CLOCK_MONOTONIC, {19389, 378329426}) = 0 <0.000045>
      [pid 21832] 17:15:26 clock_gettime(CLOCK_MONOTONIC, {19389, 378442545}) = 0 <0.000044>
      [pid 21832] 17:15:26 gettimeofday({1419585326, 103206}, NULL) = 0 <0.000046>
      [pid 21832] 17:15:26 futex(0x7ff9b878d1e4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1419585326, 153206000}, ffffffff <unfinished ...>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1e65000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000060>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1e6d000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000045>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1e75000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000062>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1e7d000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000043>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1e85000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000056>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1e8d000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000104>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1e95000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000044>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1e9d000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000062>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1ea5000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000044>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1ead000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000055>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1eb5000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000057>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1ebd000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000045>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1ec5000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000045>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1ecd000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000043>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1ed5000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000044>
      [pid 21747] 17:15:26 mprotect(0x7ff8e1edd000, 32768, PROT_READ|PROT_WRITE) = 0 <0.000055>
    
  • 足窥见mprotect方法调用频繁【结合故障出现中的系统调用进行对比】,且全于21747之线程内,查看doc可知每次会malloc
    32K底内存

  • 故gdb suspend进程后,查看21747针对诺的线程,并履行dt得到其堆栈:

      Breakpoint 1, 0x00007ff9e12b14e0 in mprotect () from /lib/x86_64-linux-gnu/libc.so.6
      (gdb) bt
      #0  0x00007ff9e12b14e0 in mprotect () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00007ff9e1254671 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
      #2  0x00007ff9e1255b90 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
      #3  0x00007ff9e0cd83f8 in os::malloc(unsigned long) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #4  0x00007ff9e07f5f8c in ChunkPool::allocate(unsigned long) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #5  0x00007ff9e07f572a in Chunk::operator new(unsigned long, unsigned long) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #6  0x00007ff9e07f5d11 in Arena::grow(unsigned long) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #7  0x00007ff9e0cbffa8 in Node::out_grow(unsigned int) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #8  0x00007ff9e07c63e1 in Node::add_out(Node*) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #9  0x00007ff9e0c4b785 in PhaseIdealLoop::clone_loop(IdealLoopTree*, Node_List&, int, Node*) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #10 0x00007ff9e0c4fa59 in PhaseIdealLoop::partial_peel(IdealLoopTree*, Node_List&) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #11 0x00007ff9e0c32dac in IdealLoopTree::iteration_split_impl(PhaseIdealLoop*, Node_List&) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #12 0x00007ff9e0c33000 in IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #13 0x00007ff9e0c32f68 in IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #14 0x00007ff9e0c41095 in PhaseIdealLoop::build_and_optimize(bool, bool) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #15 0x00007ff9e097134f in Compile::Optimize() () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #16 0x00007ff9e096de84 in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #17 0x00007ff9e08f0d3e in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #18 0x00007ff9e09786aa in CompileBroker::invoke_compiler_on_method(CompileTask*) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #19 0x00007ff9e0977f95 in CompileBroker::compiler_thread_loop() () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #20 0x00007ff9e0df0539 in compiler_thread_entry(JavaThread*, Thread*) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #21 0x00007ff9e0de9a41 in JavaThread::run() () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #22 0x00007ff9e0ce0d1f in java_start(Thread*) () from /global/install/jdk1.6.0_35/jre/lib/amd64/server/libjvm.so
      #23 0x00007ff9e176eb50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      #24 0x00007ff9e12b4a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
      #25 0x0000000000000000 in ?? ()
      (gdb) cont
      Continuing.
    
  • 由每frame中处处法名:CompileBroker::compiler_thread_loop(),
    Compile::Optimize()等,可以想见出这应当跟jvm类编译优化相关逻辑有关。结合已起先验知识,知道jvm对类的编译会与调用次数等于元素有关。

  • 于mprotect处下断点再cont发现会继续进入该断点,多次cont依旧如此

  • 迄今基本不过认定是jvm的bug,查看机器的jdk版本:

      java -version
      java version "1.6.0_35"
      Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
      Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)
    
  • 若是1.6新颖的为1.6.0_45, 因此预先不追究jvm具体的bug所在,先做提升

  • 晋升是否修复该故障,且听下回分解:)

图片 1

后续

  • 而今改过来分析该bug触发的逻辑,发现凡是测试同学为了便利测试,将欠分页获取数据接口由每次得到20长改以每次取200漫长。
  • 每当web启动之后,若先用20底分页进地调用,则会为jvm“预热”的优化该编译优化逻辑,不会见硌。而一旦当web启动后立即用200底分页请求,则势必会触发发该bug
  • 发出必不可少对jdk源码不同版本进行比为确认系逻辑是不是早已优化,当然不排队jdk还存在类似隐藏于生的bug

 

下回分解

  • 杀不好过!!!升级jdk(1.6)后该bug依旧存在,说明该bug未让修复

  • 支配尝试设置jit相关参数来准备绕了该bug,
    于是安-XX:CompileThreshold=1,
    发现bug不再能复现。但又发现web服务启动日变长,原因应是举行了汪洋的编译工作,并且web启动以后,cpu占用波动较充分,持续较长时间后才回归安宁————依旧是络绎不绝在举行编译

  • 另外,根据doc:

          -Xint, -Xcomp, and -Xmixed
      The two flags -Xint and -Xcomp are not too relevant for our everyday work, but highly interesting in order to learn something about the JVM. 
      The -Xint flag forces the JVM to execute all bytecode in interpreted mode, which comes along with a considerable slowdown, usually factor 10 or higher. 
      On the contrary, the flag -Xcomp forces exactly the opposite behavior, that is, the JVM compiles all bytecode into native code on first use, thereby applying maximum optimization level. 
      This sounds nice, because it completely avoids the slow interpreter. 
      However, many applications will also suffer at least a bit from the use of -Xcomp, even if the drop in performance is not comparable with the one resulting from -Xint. 
      The reason is that by setting-Xcomp we prevent the JVM from making use of its JIT compiler to full effect.
      The JIT compiler creates method usage profiles at run time and then optimizes single methods (or parts of them) step by step, and sometimes speculatively, to the actual application behavior. 
      Some of these optimization techniques, e.g., optimistic branch prediction, cannot be applied effectively without first profiling the application. 
      Another aspect is that methods are only getting compiled at all when they prove themselves relevant, i.e., constitute some kind of hot spot in the application. 
      Methods that are called rarely (or even only once) are continued to be executed in interpreted mode, thus saving the compilation and optimization cost.
    
  • 足知晓上述配置或者直接用Xcomp虽然接近可以避欠bug的接触,但会指向性有较充分的有害,因为jit会根据调用次数与性的统计信息来优化bytecode,如果直白comp就得无至这些统计信息优化的未足够好了

 

处理方案

  • 免做拍卖:考虑到之前线上劳动呈现稳定,从未接触发拖欠bug,以及该bug要么在服务启动后不久面世,要么不会见面世,因此少未举行优化调整,而是上线后观察两三分钟,不触发该bug才认为上线成功
  • 给oracle报bug
  • 考虑升级jdk到1.7要么1.8

  2.力量说明:

系阅读

  • JVM性能优化1-JVM简介
  • JVM性能优化2-编译器
  • JVM性能优化3-垃圾回收
  • JVM性能优化4-C4污染源回收
  • JVM性能优化5-Java的紧缩性
  • Useful JVM Flags 1-JVM Types and Compiler
    Modes
  • Useful JVM Flags 2-Flag Categories and JIT Compiler
    Diagnostics
  • Useful JVM Flags 3-Printing all XX Flags and their
    Values
  • Useful JVM Flags 4-Heap
    Tuning
  • Useful JVM Flags 5-Young Generation Garbage
    Collection
  • Useful JVM Flags 6-Throughput
    Collector
  • Useful JVM Flags 7-CMS
    Collector
  • Useful JVM Flags 8-GC
    Logging

     1)进入

致谢

  • 感谢北京新观念技术服务有限公司CEO李斯宁提供技术支持与析议论

         
启动POS客户端软件,然后输入用户称和密码,如果密码是就可登录POS系统,如下图。

 图片 2

2)      日常销售

         
POS软件之不过重点的一个效应,就是销售。只有在销售数据的支持下,POS系统才会进行商品销售信息之询问、统计、分析、预测相当操作。

   将光标置于商品条码输入框中,此时发生星星点点种植输入方式:

         1) 
可以采取条码枪直接描述条码,系统活动获取条码,然后会失掉查询时网被的库存商品数量以及价格,然后会充分成一漫长数为1之记录。

         2)  可以当手工输入条码,之后的流程以及1)一样。

 

       删除商品

      
如果一旦删减销售只着之某项商品记录,则当选该商品记录,然后在键盘上随下“DELETE”键即可。

       销售单界面如齐图所出示。

 

     3) 前台交班

        
实现销售人员换班操作,并能由此“交班人”显示交班的销售员,同时出示登录时,及交班时,记录及网中,以统后翻。

        
交班界面中应当来夫销售员此班一共做了聊金额之展示。其中有小是实际,多少是信用卡,多少是支票,多少是储值卡,多少是券,多少是折扣金额,多少是实收金额,多少是商品总金额,等消息。

 

    4) 退货管理

        a.把商品退回给商铺。

        b.店铺把货退回给庄。

    5) 数据上传下载

        
用来贯彻货物数量的上传下载,包括入库单(入店铺)、销售单、退货单(退回公司)。

   6) 报表管理

      会员费报表:   会员卡号,姓名,消费日期,金额等信息。

      销售排名报表:

       库存报表:

       销售员销售额报表:

       进销存报表