1. Are you ready for the Galaxy S20? Here is everything we know so far!

Framework reboot due to native crash in zygote crash with SIGSEGV error

Discussion in 'Android Development' started by testframeworks, Jul 31, 2019.

  1. testframeworks

    Thread Starter

    Description: Framework reboot due to native crash in zygote crash with SIGSEGV error

    Reproduction: Long duration test with multiple apps and reproduction rate – 1/100.

    Description:

    Below is the tombstone for zygote:

    Line 56603:07-08 10:19:39.605 26565 26565 F DEBUG : Build fingerprint: ------------------------------------------------------------------
    Line 56604: 07-08 10:19:39.605 26565 26565 F DEBUG : Revision: '0'
    Line 56605: 07-08 10:19:39.605 26565 26565 F DEBUG : ABI: 'arm'
    Line 56608: 07-08 10:19:39.606 26565 26565 F DEBUG : pid: 652, tid: 26546, name: HeapTaskDaemon >>> zygote <<<
    Line 56609: 07-08 10:19:39.606 26565 26565 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x130
    Line 56610: 07-08 10:19:39.606 26565 26565 F DEBUG : Cause: null pointer dereference
    Line 56611: 07-08 10:19:39.606 26565 26565 F DEBUG : r0 dec0b000 r1 5616eb84 r2 00000007 r3 00000000
    Line 56612: 07-08 10:19:39.606 26565 26565 F DEBUG : r4 00000130 r5 cd7929c8 r6 00000000 r7 e4fdc380
    Line 56613: 07-08 10:19:39.606 26565 26565 F DEBUG : r8 cd7924bc r9 00000000 r10 00000002 r11 00008a0c
    Line 56614: 07-08 10:19:39.607 26565 26565 F DEBUG : ip e49b8974 sp cd7924b0 lr e45e1507 pc e45ec316
    Line 56712: 07-08 10:19:39.730 26565 26565 F DEBUG :
    Line 56713: 07-08 10:19:39.730 26565 26565 F DEBUG : backtrace:
    Line 56715: 07-08 10:19:39.730 26565 26565 F DEBUG : #00 pc 000aa316 /system/lib/libart.so (art::TimingLogger::Reset()+106)
    Line 56716: 07-08 10:19:39.730 26565 26565 F DEBUG : #01 pc 0016663b /system/lib/libart.so (art::gc::collector::GarbageCollector::Run(art::gc::GcCause, bool)+178)
    Line 56717: 07-08 10:19:39.730 26565 26565 F DEBUG : #02 pc 0018035d /system/lib/libart.so (art::gc::Heap::CollectGarbageInternal(art::gc::collector::GcType, art::gc::GcCause, bool)+2420)
    Line 56718: 07-08 10:19:39.730 26565 26565 F DEBUG : #03 pc 0018dbeb /system/lib/libart.so (art::gc::Heap::ConcurrentGC(art::Thread*, art::gc::GcCause, bool)+182)
    Line 56719: 07-08 10:19:39.730 26565 26565 F DEBUG : #04 pc 00191b11 /system/lib/libart.so (art::gc::Heap::ConcurrentGCTask::Run(art::Thread*)+20)
    Line 56720: 07-08 10:19:39.730 26565 26565 F DEBUG : #05 pc 001aa957 /system/lib/libart.so (art::gc::TaskProcessor::RunAllTasks(art::Thread*)+34)
    Line 56721: 07-08 10:19:39.731 26565 26565 F DEBUG : #06 pc 0007463b /system/framework/arm/boot-core-libart.oat (offset 0x73000) (dalvik.system.VMRuntime.clampGrowthLimit [DEDUPED]+74)
    Line 56722: 07-08 10:19:39.731 26565 26565 F DEBUG : #07 pc 0014a85d /system/framework/arm/boot-core-libart.oat (offset 0x73000) (java.lang.Daemons$HeapTaskDaemon.runInternal+172)
    Line 56723: 07-08 10:19:39.731 26565 26565 F DEBUG : #08 pc 000ec963 /system/framework/arm/boot-core-libart.oat (offset 0x73000) (java.lang.Daemons$Daemon.run+66)
    Line 56724: 07-08 10:19:39.731 26565 26565 F DEBUG : #09 pc 002151b1 /system/framework/arm/boot-core-oj.oat (offset 0x106000) (java.lang.Thread.run+64)
    Line 56725: 07-08 10:19:39.731 26565 26565 F DEBUG : #10 pc 00411575 /system/lib/libart.so (art_quick_invoke_stub_internal+68)
    Line 56726: 07-08 10:19:39.731 26565 26565 F DEBUG : #11 pc 003eb045 /system/lib/libart.so (art_quick_invoke_stub+224)
    Line 56727: 07-08 10:19:39.731 26565 26565 F DEBUG : #12 pc 000a183d /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+136)
    Line 56728: 07-08 10:19:39.731 26565 26565 F DEBUG : #13 pc 003498d5 /system/lib/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
    Line 56729: 07-08 10:19:39.731 26565 26565 F DEBUG : #14 pc 0034a62d /system/lib/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue*)+320)
    Line 56730: 07-08 10:19:39.731 26565 26565 F DEBUG : #15 pc 0036d0a3 /system/lib/libart.so (art::Thread::CreateCallback(void*)+866)
    Line 56731: 07-08 10:19:39.731 26565 26565 F DEBUG : #16 pc 00072dcd /system/lib/libc.so (__pthread_start(void*)+22)
    Line 56732: 07-08 10:19:39.731 26565 26565 F DEBUG : #17 pc 0001e3b1 /system/lib/libc.so (__start_thread+24)



    One more observation is that we saw few app crashes prior to zygote crash in the path of zygote forking these apps.


    pid: 17395, tid: 17395, name: o.android.imoi >>> zygote <<<
    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x4
    Cause: null pointer dereference
    r0 00000000 r1 8b148311 r2 00000000 r3 00000000
    r4 e4bdc424 r5 e4bdc420 r6 e4b97a64 r7 e4bdc3c8
    r8 e4bc8000 r9 e4bdc448 r10 00003000 r11 00000003
    ip 000000ff sp ffbe1230 lr e41e7dd5 pc e41e7de8

    backtrace:
    #00 pc 000a8de8 /system/lib/libart.so (art::CumulativeLogger::Reset()+68)
    #01 pc 00166901 /system/lib/libart.so (art::gc::collector::GarbageCollector:: ()+192)
    #02 pc 0017d853 /system/lib/libart.so (art::gc::Heap::ResetGcPerformanceInfo()+34)
    #03 pc 003570db /system/lib/libart.so (art::Runtime::InitNonZygoteOrPostFork(_JNIEnv*, bool, art::Runtime::NativeBridgeAction, char const*, bool)+74)
    #04 pc 002e8fb7 /system/lib/libart.so (art::ZygoteHooks_nativePostForkChild(_JNIEnv*, _jclass*, long long, int, unsigned char, unsigned char, _jstring*)+3146)
    #05 pc 00074c63 /system/framework/arm/boot-core-libart.oat (offset 0x73000) (dalvik.system.ZygoteHooks.nativePostForkChild+154)
    #06 pc 000eba15 /system/framework/arm/boot-core-libart.oat (offset 0x73000) (dalvik.system.ZygoteHooks.postForkChild+68)
    #07 pc 00ba0ab9 /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.Zygote.callPostForkChildHooks+80)
    #08 pc 00412975 /system/lib/libart.so (art_quick_invoke_stub_internal+68)
    #09 pc 003eaec7 /system/lib/libart.so (art_quick_invoke_static_stub+222)
    #10 pc 000a184f /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+154)
    #11 pc 00349655 /system/lib/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
    #12 pc 0034947f /system/lib/libart.so (art::InvokeWithVarArgs(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, std::__va_list)+310)
    #13 pc 00290219 /system/lib/libart.so (art::JNI::CallStaticVoidMethodV(_JNIEnv*, _jclass*, _jmethodID*, std::__va_list)+444)
    #14 pc 0006e579 /system/lib/libandroid_runtime.so (_JNIEnv::CallStaticVoidMethod(_jclass*, _jmethodID*, ...)+28)
    #15 pc 0011c2ed /system/lib/libandroid_runtime.so ((anonymous namespace)::ForkAndSpecializeCommon(_JNIEnv*, unsigned int, unsigned int, _jintArray*, int, _jobjectArray*, long long, long long, int, _jstring*, _jstring*, bool, _jintArray*, _jintArray*, bool, _jstring*, _jstring*)+4052)
    #16 pc 0011ab37 /system/lib/libandroid_runtime.so (android::com_android_internal_os_Zygote_nativeForkAndSpecialize(_JNIEnv*, _jclass*, int, int, _jintArray*, int, _jobjectArray*, int, _jstring*, _jstring*, _jintArray*, _jintArray*, unsigned char, _jstring*, _jstring*)+470)
    #17 pc 003b8ba3 /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.Zygote.nativeForkAndSpecialize+338)
    #18 pc 00ba3a8b /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.ZygoteConnection.processOneCommand+1450)
    #19 pc 00ba7a5b /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.ZygoteServer.runSelectLoop+770)
    #20 pc 00ba5269 /system/framework/arm/boot-framework.oat (offset 0x393000) (com.android.internal.os.ZygoteInit.main+1696)
    #21 pc 00412975 /system/lib/libart.so (art_quick_invoke_stub_internal+68)
    #22 pc 003eaec7 /system/lib/libart.so (art_quick_invoke_static_stub+222)
    #23 pc 000a184f /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+154)
    #24 pc 00349655 /system/lib/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
    #25 pc 0034947f /system/lib/libart.so (art::InvokeWithVarArgs(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, std::__va_list)+310)
    #26 pc 00290219 /system/lib/libart.so (art::JNI::CallStaticVoidMethodV(_JNIEnv*, _jclass*, _jmethodID*, std::__va_list)+444)
    #27 pc 0006e579 /system/lib/libandroid_runtime.so (_JNIEnv::CallStaticVoidMethod(_jclass*, _jmethodID*, ...)+28)
    #28 pc 0007073b /system/lib/libandroid_runtime.so (android::AndroidRuntime::start(char const*, android::Vector<android::String8> const&, bool)+462)
    #29 pc 00001c8f /system/bin/app_process32 (main+1122)
    #30 pc 000a2245 /system/lib/libc.so (__libc_init+48)
    #31 pc 000017eb /system/bin/app_process32 (_start_main+38)
    #32 pc 000000c4 <unknown>




    Analysis:


    Loaded coredump in GDB:


    #0 art::TimingLogger::Reset (this=0x130) at art/runtime/base/timing_logger.cc:148
    No locals.
    #1 0xe46a863e in Reset (this=0x120, gc_cause=<optimized out>, clear_soft_references=<optimized out>) at art/runtime/gc/collector/garbage_collector.cc:49
    No locals.
    #2 art::gc::collector::GarbageCollector::Run (this=0xe4fdc380, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=true) at art/runtime/gc/collector/garbage_collector.cc:92
    start_time = 151785656375586
    current_iteration = 0x120
    end_time = <optimized out>
    self = <optimized out>
    #3 0xe46c2360 in art::gc::Heap::CollectGarbageInternal (this=0xe4f3c800, gc_type=art::gc::collector::kGcTypeFull, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=<optimized out>)
    at art/runtime/gc/heap.cc:2648
    runtime = 0xe4f3c400
    self = 0xe42acc00
    collector = 0xe4fdc380
    #4 0xe46cfbee in art::gc::Heap::ConcurrentGC (this=0xe4f3c800, self=0xe42acc00, cause=art::gc::kGcCauseBackground, force_full=<optimized out>) at art/runtime/gc/heap.cc:3675
    next_gc_type = art::gc::collector::kGcTypeSticky
    tid = 26546
    #5 0xe46d3b14 in art::gc::Heap::ConcurrentGCTask::Run (this=<optimized out>, self=0x5616eb84) at art/runtime/gc/heap.cc:3620
    heap = 0xe4f3c800
    #6 0xe46ec958 in art::gc::TaskProcessor::RunAllTasks (this=0xe4f30200, self=0xe42acc00) at art/runtime/gc/task_processor.cc:129
    task = 0xdec08000
    #7 0x720bc63c in ?? ()


    From here, we see that at frame 2, current_iteration = 0x120 is holding invalid address, which is a member of garbage collector class, see below code for reference.


    In file -- art/runtime/gc/collector/garbage_collector.cc


    91 Iteration* current_iteration = GetCurrentIteration();
    92 current_iteration->Reset(gc_cause, clear_soft_references);



    429 const collector::Iteration* GetCurrentGcIteration() const {
    430 return &current_gc_iteration_;
    431 }


    1254 collector::Iteration current_gc_iteration_;


    And we see the collector object being zeroed out, which seems to be the reason for our crash.


    gdb) f 2
    #2 art::gc::collector::GarbageCollector::Run (this=0xe4fdc380, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=true) at art/runtime/gc/collector/garbage_collector.cc:92
    92 in art/runtime/gc/collector/garbage_collector.cc
    (gdb) x/100 this
    0xe4fdc380: 0 0 0 0
    0xe4fdc390: 0 0 0 0
    0xe4fdc3a0: 0 0 0 0
    0xe4fdc3b0: 0 0 0 0
    0xe4fdc3c0: 0 0 0 0
    0xe4fdc3d0: 0 0 0 0
    0xe4fdc3e0: 0 0 0 0
    0xe4fdc3f0: 0 0 0 0
    0xe4fdc400: 0 0 0 0
    0xe4fdc410: 0 0 0 0
    0xe4fdc420: 0 0 0 0
    0xe4fdc430: 0 0 0 0
    0xe4fdc440: 0 0 0 0
    0xe4fdc450: 0 0 0 0
    0xe4fdc460: 0 0 0 0
    0xe4fdc470: 0 0 0 0
    0xe4fdc480: 0 0 0 0
    0xe4fdc490: 0 0 0 0
    0xe4fdc4a0: 0 0 0 0
    0xe4fdc4b0: 0 0 0 0
    0xe4fdc4c0: 0 0 0 0
    0xe4fdc4d0: 0 0 0 0
    0xe4fdc4e0: 0 0 0 0
    0xe4fdc4f0: 0 0 0 0

    The app crashes seen prior to this zygote crash also seem be to due to similar reason, collector object being NULL.

    Debug approaches -
    We have internally tried to use ASAN and malloc_debug to check is such corruptions can be caught.
    Unfortunately, after enabling malloc_debug, issue was not reproducible.
    And with ASAN enablement, device runs slow, and results in other unrelated issues.


    Can you please help to provide any debug suggestions/ share any similar instances of this issue ?


    Regards,
    Deepika
     


Loading...
Similar Threads - Framework reboot due
  1. joelle blackstarr
    Replies:
    4
    Views:
    33
  2. Faizan Alam
    Replies:
    1
    Views:
    138
  3. skfl
    Replies:
    5
    Views:
    429
  4. benmohammad593
    Replies:
    0
    Views:
    358
  5. sourodip9
    Replies:
    8
    Views:
    2,579
  6. Prashant Sontale
    Replies:
    0
    Views:
    224
  7. kevin_seductionn
    Replies:
    1
    Views:
    3,153
  8. theophany77
    Replies:
    16
    Views:
    1,024
  9. khayoussef
    Replies:
    3
    Views:
    1,908
  10. jmcook79
    Replies:
    0
    Views:
    1,262

Share This Page

Loading...