摘要:在前段时间遇到一个挺有意思的问题,高概率集中在某个调用堆栈过程中,或是在 GC 过程中 verify 失败发现存在坏根。经过几天的调试后才锁定编译优化的问题,以 Nterp 解释运行缓存了字节码,出现 幽灵 调用函数。
在前段时间遇到一个挺有意思的问题,高概率集中在某个调用堆栈过程中,或是在 GC 过程中 verify 失败发现存在坏根。经过几天的调试后才锁定编译优化的问题,以 Nterp 解释运行缓存了字节码,出现 幽灵 调用函数。
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000018Cause: null pointer dereference x0 0000000000000000 x1 0000000002e62ec8 x2 0000000000000000 x3 0000000072223650 x4 0000007c3ec13000 x5 3b7463656a624f2f x6 3b7463656a624f2f x7 0000007bbdababac x8 0000000000000002 x9 2542ebd30d0dfceb x10 0000000000000000 x11 0000000000000002 x12 00000000af950a08 x13 b400007d15e5fa50 x14 0000007f1598f880 x15 0000007bbd9446e8 x16 0000007fea726e40 x17 000000000 #技术分享0000020 x18 0000007f15ca0000 x19 b400007d55e10be0 x20 0000000000000000 x21 b400007d55e10ca0 x22 0000000002d51610 x23 0000000002e61b08 x24 0000000000000005 x25 0000000000000002 x26 0000000002e62ec8 x27 0000000000000002 x28 00000000031b1f38 x29 00000000ffffffff lr 00000000721b63c8 sp 0000007fea728e90 pc 00000000721b63d4 pst 0000000080001000101 total framesbacktrace: ...verification.cc:124] GC tried to mark invalid reference 0x20052a8verification.cc:124] ref=0x20052a8 klass=0x0 space=main space (region space) 0x2000000-0x42000000 card=0 adjacent_ram=0000000000000000 0000000000000000 0000000000000000 0000000000000000 |0000000000000000 0000000000000000 0000000000000000 0000000000000000verification.cc:124] holder=0x3ac8698 klass=0x20052a8 space=main space (region space) 0x2000000-0x42000000 card=0 adjacent_ram=9d73f26e02aea3e0 0000000002a33738 1000000002a38c48 0000000002a33738 |bd4f9b6d020052a8 0000000000000000 0000000000000000 0000000000000000verification.cc:124] reference addr adjacent_ram=9d73f26e02aea3e0 0000000002a33738 1000000002a38c48 0000000002a33738 |bd4f9b6d020052a8 0000000000000000 0000000000000000 0000000000000000 0xb400007a5760fdb0 main space (region space) 0x2000000-0x42000000复现抓取了对应错误 core 文件进行分析,该类型问题利用 core-parser 解析能力是最佳的。
core-parser> btSwitch oat version(259) env."main" sysTid=7943 Runnable | group="main" daemon=0 prio=5 target=0x0 uncaught_exception=0x0 | tid=1 sCount=0 flags=0 obj=0x7370c3e0 self=0xb400007a09207010 env=0xb400007a4920dd50 | stack=0x7fc3918000-0x7fc391a000 stackSize=0x7ff000 handle=0x7c428fa098 | mutexes=0xb400007a092077b0 held="mutator lock"(shared held) x0 0x0000000000000000 x1 0x00000000027e6d40 x2 0x0000000000000000 x3 0x0000000072fc3650 x4 0x0000007963213000 x5 0x000000006576696c x6 0x000000006576696c x7 0x000000794915fad4 x8 0x0000000000000002 x9 0x2e157e9e8ba7cb87 x10 0x0000000000000000 x11 0x0000000000000001 x12 0x00000000b06119e0 x13 0x000000000000005a x14 0x0000007c4156f880 x15 0x0000007948fa193c x16 0x0000007fc410ff90 x17 0x0000000000000020 x18 0x0000007c41ae8000 x19 0xb400007a09207010 x20 0x0000000000000000 x21 0xb400007a092070d0 x22 0x00000000026cefa0 x23 0x00000000027e5980 x24 0x0000000000000005 x25 0x0000000000000002 x26 0x00000000027e6d40 x27 0x00000000027e2478 x28 0x00000000026cc620 fp 0x00000000027df8f0 lr 0x0000000072f563c8 sp 0x0000007fc4111fe0 pc 0x0000000072f563d4 pst 0x0000000080001000 Native: Native: ManagedStack* 0xb400007a092070b8 maybe invalid. javaKt: JavaKt: JavaKt: JavaKt: JavaKt:该堆栈输出失败是因为在 oat 上报错,ManagedStack 还未保存这一帧,因此需要进行假帧处理。
core-parser> fake stack --sp 0x0000007fc4111fe0 --pc 0x0000000072f563d4core-parser> bt"main" sysTid=7943 Runnable | group="main" daemon=0 prio=5 target=0x0 uncaught_exception=0x0 | tid=1 sCount=0 flags=0 obj=0x7370c3e0 self=0xb400007a09207010 env=0xb400007a4920dd50 | stack=0x7fc3918000-0x7fc391a000 stackSize=0x7ff000 handle=0x7c428fa098 | mutexes=0xb400007a092077b0 held="mutator lock"(shared held) x0 0x0000000000000000 x1 0x00000000027e6d40 x2 0x0000000000000000 x3 0x0000000072fc3650 x4 0x0000007963213000 x5 0x000000006576696c x6 0x000000006576696c x7 0x000000794915fad4 x8 0x0000000000000002 x9 0x2e157e9e8ba7cb87 x10 0x0000000000000000 x11 0x0000000000000001 x12 0x00000000b06119e0 x13 0x000000000000005a x14 0x0000007c4156f880 x15 0x0000007948fa193c x16 0x0000007fc410ff90 x17 0x0000000000000020 x18 0x0000007c41ae8000 x19 0xb400007a09207010 x20 0x0000000000000000 x21 0xb400007a092070d0 x22 0x00000000026cefa0 x23 0x00000000027e5980 x24 0x0000000000000005 x25 0x0000000000000002 x26 0x00000000027e6d40 x27 0x00000000027e2478 x28 0x00000000026cc620 fp 0x00000000027df8f0 lr 0x0000000072f563c8 sp 0x0000007fc4111fe0 pc 0x0000000072f563d4 pst 0x0000000080001000 Native: Native: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt:core-parser>core-parser> f 0 JavaKt: { Location: /system/framework/framework.jar!classes4.dex art::ArtMethod: 0x71a2c858 dex_pc_ptr: 0x795fe114ac quick_frame: 0x7fc4111fe0 frame_pc: 0x72f563d4 method_header: 0x72f5634cDEX CODE: 0x795fe114a2: 0212 | const/4 v2, 0x795fe114a4: 1235 000a | if-ge v2, v1, 0x795fe114b8 //+10 0x795fe114a8: 0346 0200 | aget-object v3, v0, v2 0x795fe114ac: 106e 80d3 0003 | invoke-virtual {v3}, void android.view.View.jumpDrawablesToCurrentState // method@32979OAT CODE: 0x72f563b0: 6b18033f | cmp w25, w24 0x72f563b4: 540001aa | b.ge 0x72f563e8 0x72f563b8: 110032e0 | add w0, w23, 0x72f563bc: 1000007e | adr x30, 0x72f563c8 0x72f563c0: b59da314 | cbnz x20, 0x72e91820 0x72f563c4: b8797801 | ldr w1, [x0, x25, lsl 0x72f563c8: aa0103fa | mov x26, x1 0x72f563cc: b9400020 | ldr w0, [x1] 0x72f563d0: f949c000 | ldr x0, [x0, 0x72f563d4: f9400c1e | ldr x30, [x0, 0x72f563d8: d63f03c0 | blr x30 0x72f563dc: 11000739 | add w25, w25, }可以看到当前错误发生在 invoke-virtual jumpDrawablesToCurrentState 的过程当中。
0x72f563cc: b9400020 | ldr w0, [x1] 从 View 对象中取 klass_ 地址 0x72f563d0: f949c000 | ldr x0, [x0, 0x72f563d4: f9400c1e | ldr x30, [x0, 0x72f563d8: d63f03c0 | blr x30 跳转到函数 jumpDrawablesToCurrentStatecore-parser> vtor 0x00000000027e6d40 * VIRTUAL: 0x27e6d40 * OFFSET: 0x7e6d40 * OR: 0x78f506c6bd40 * MMAP: 0x0 *[2000000, 42000000) rw- 0040000000 0040000000 [anon:dalvik-main space (region space)] [*]可见地址范围是落在 Java 堆上,p 指令输出下该对象信息。
core-parser> p 0x00000000027e6d40ERROR: Size: 0x0core-parser> rd 0x00000000027e6d40 -e 0x00000000027e6e4027e6d40: 8adbc7e1b0000888 0000000000000000 ................27e6d50: 0000000000000000 0000000000000000 ................27e6d60: 0000000000000000 0000000000000000 ................27e6d70: 0000000000000000 0000000000000000 ................27e6d80: 0000000000000000 0000000000000000 ................27e6d90: 028c86b000000000 0000000000000000 ................27e6da0: 00000000026ccfd8 0000000000000000 ..l.............27e6db0: 0000000071199090 028c86c800000000 ...q............27e6dc0: 0000000000000000 0000000000000000 ................27e6dd0: 028c8718028c86e8 028c8760028c8730 ........0...`...27e6de0: 028c8798028c8778 0000000000000000 x...............27e6df0: 0000000000000000 028c87e8028c87b0 ................27e6e00: 00000000028c8860 028c887800000000 `...........x...27e6e10: 0000000000000000 711a35f800000000 .............5.q27e6e20: 026cefa000000000 0000000000000000 ......l.........27e6e30: 0000000000000000 028c889000000000 ................可见该地址是个坏根,非 Java 对象地址,由于 b0000888 落在有效内存段上。
core-parser> vtor b0000888 * VIRTUAL: 0xb0000888 * OFFSET: 0x1ef4888 * OR: 0x75b21f265888 * MMAP: 0x0 *[ae10c000, b010c000) r-x 0002000000 0002000000 /memfd:jit-cache (deleted) [*]core-parser> rd 0xb0001c08b0001c08: 0000000000000000 .......因此在 0x72f563d4: f9400c1e | ldr x30, [x0, #0x18] 上才出现错误。
core-parser> class android.view.View -m[0x710274a8]public class android.view.View extends java.lang.Object { ... [0x71a15d60] public void android.view.View.jumpDrawablesToCurrentState ...0x710274a8 +core-parser> rd 0x71028828 71028828: 0000000071a15d60 `].q....当能正确获取 view klass 地址,程序是正确的。
core-parser> space -cERROR: Region:[0x27e6d40, 0x27e7100) main space (region space) has bad object!!core-parser> rd 0x27e6c40 -e 0x27e6d4027e6c40: 1800001000000000 0000000000000001 ................27e6c50: 4000000000000000 0000000000000000 .......@........27e6c60: 40a00000bf800000 bf8000003f800000 .......@...?....27e6c70: 0000000000000000 0000000000000000 ................27e6c80: 7fc000007fc00000 7fc0000000000000 ................27e6c90: 000000003f800000 0000010100000001 ...?............27e6ca0: 0000000000000000 0000000000000000 ................27e6cb0: 0000000100000000 0100000000000000 ................27e6cc0: 0000000000000000 028c8de0027e69a8 .........i~.....27e6cd0: 028c8e00028c8df0 0000000000000000 ................27e6ce0: 028c8eb000000000 0000000005aefcf5 ................27e6cf0: 0000000000000011 0000000000000000 ................27e6d00: 0000000100000000 028c8ec000000000 ................27e6d10: 028c8f5000000000 70c4b320028c8f90 ....P..........p27e6d20: 800a3035b02fc458 00000000026cf448 X./.50..H.l.....27e6d30: 00000000b02fbed0 00000000026cf448 ../.....H.l.....core-parser> p 27e6d30 -bSize: 0x10Padding: 0x4Object Name: a.b.c.d.e.f.g$h [0x8] final a.b.c.d.e.f.g k = 0x26cf448 // extends java.lang.Object [0x4] private transient int shadow$_monitor_ = 0 [0x0] private transient java.lang.Class shadow$_klass_ = 0xb02fbed0Binary:27e6d30: 00000000b02fbed0 00000000026cf448 ../.....H.l.....core-parser> p 0x27e7100 -bSize: 0x10Object Name: a.b.c.d.u.s.t [0x0c] private java.io.File toq = 0x0 [0x08] private final android.content.Context k = 0x26dd860 // extends java.lang.Object [0x04] private transient int shadow$_monitor_ = 0 [0x00] private transient java.lang.Class shadow$_klass_ = 0xb0208330Binary:27e7100: 00000000b0208330 00000000026dd860 0.......`.m.....| a.b.c.d.e.f.g$h | 坏根大小 | a.b.c.d.u.s.t | | ---
| 0x27e6d30 | 0x27e6d40 | 0x27e7100 |
0x27e6d40 ~ 0x27e7100 中间存在至少一个坏根,总大小未 0x3c0。
坏根内存分析core-parser> rd 0x27e6d40 -e 0x27e710027e6d40: 8adbc7e1b0000888 0000000000000000 ................27e6d50: 0000000000000000 0000000000000000 ................27e6d60: 0000000000000000 0000000000000000 ................27e6d70: 0000000000000000 0000000000000000 ................27e6d80: 0000000000000000 0000000000000000 ................27e6d90: 028c86b000000000 0000000000000000 ................27e6da0: 00000000026ccfd8 0000000000000000 ..l.............27e6db0: 0000000071199090 028c86c800000000 ...q............27e6dc0: 0000000000000000 0000000000000000 ................27e6dd0: 028c8718028c86e8 028c8760028c8730 ........0...`...27e6de0: 028c8798028c8778 0000000000000000 x...............27e6df0: 0000000000000000 028c87e8028c87b0 ................27e6e00: 00000000028c8860 028c887800000000 `...........x...27e6e10: 0000000000000000 711a35f800000000 .............5.q27e6e20: 026cefa000000000 0000000000000000 ......l.........27e6e30: 0000000000000000 028c889000000000 ................27e6e40: 00000000027def48 027767f800000000 H.}..........gw.27e6e50: 00000000027dc398 028c88b000000000 ..}.............27e6e60: 0000000000000000 0000000000000000 ................27e6e70: 0000000000000000 0000000000000000 ................27e6e80: 0000000000000000 0000000000000000 ................27e6e90: 0000000000000000 0000000000000000 ................27e6ea0: 0000000000000000 ffffffff00000000 ................27e6eb0: ffffffffffffffff ffffffff00000090 ................27e6ec0: 00000000ffffffff 0000000000000000 ................27e6ed0: 0000000000000000 2dffffff00000a60 ........`......-27e6ee0: 0000000000000000 0000000000000000 ................27e6ef0: 000000012d000000 ffffffff00000000 ...-............27e6f00: ffffffff00000002 0300000300000000 ................27e6f10: 0000000000000000 00000a6000000000 ............`...27e6f20: 00000000000004c4 0000000000000000 ................27e6f30: 0000000000000000 ffffffff00000000 ................27e6f40: ffffffffffffffff ffffffffffffffff ................27e6f50: 40000a60ffffffff 00000001400004c4 ....`..@...@....27e6f60: 0000000000000000 0000000000000000 ................27e6f70: 81280810fffff448 0010001060032628 H.....(.(&.`....27e6f80: 000004c428000001 0000000000000000 ...(............27e6f90: 0000000000000000 0000000303000003 ................27e6fa0: 0000000000000000 0000001900000000 ................27e6fb0: 0000000000000000 8000000000000000 ................27e6fc0: 0000000000000000 0000000000000000 ................27e6fd0: 0000000080000000 1800021000000000 ................27e6fe0: 0000000000000001 4000000000000000 ...............@27e6ff0: 0000000000000000 40a00000bf800000 ...............@27e7000: bf8000003f800000 0000000000000000 ...?............27e7010: 0000000000000000 7fc000007fc00000 ................27e7020: 7fc0000000000000 000000003f800000 ...........?....27e7030: 0000010100000001 0000000000000000 ................27e7040: 0000000000000000 0000000100000000 ................27e7050: 0000000000000000 028c88c800000000 ................27e7060: 00000000028c88d8 0000000000000000 ................27e7070: 0000000000000000 0000000000000000 ................27e7080: 028c8958028c8928 028c8980028c8970 (...X...p.......27e7090: 028c88c8028c89a0 7114440000000000 .............D.q27e70a0: 028c89c000000000 00000000028c89d8 ................27e70b0: 000000ff00000000 000697bbffffffff ................27e70c0: 0000000000030890 7fffffff7fffffff ................27e70d0: 0000010000000000 0000000000000000 ................27e70e0: 0000000000010000 028c8a10028c89f0 ................27e70f0: 0000000000000000 0000000000000000 ................通过数据特征分析,确定为某个 Java 对象的数据结构。这里就定义为 A 类对象。
core-parser> class A -f[0xb01f0888]public final class A extends androidx.appcompat.widget.AppCompatImageView { [0x03b8] private boolean n [0x03b4] private a.b.c.d.u.o q [0x03b0] private volatile a.b.c.d.u.H k[0x03ac] private final androidx.appcompat.widget.ld6 mImageHelper [0x03a8] private final androidx.appcompat.widget.q mBackgroundTintHelper [0x03a6] private boolean mHasLevel ... }该对象大小正好 0x3c0,满足堆内存前后关系。
core-parser> wd 0x27e6d40 -v 8adbc7e1b01f0888core-parser> p 0x27e6d40Size: 0x3c0Padding: 0x7Object Name: A [0x3b8] private boolean n = false [0x3b4] private a.b.c.d.u.o q = 0x0 [0x3b0] private volatile a.b.c.d.u.H k = 0x0 // extends androidx.appcompat.widget.AppCompatImageView [0x3ac] private final androidx.appcompat.widget.ld6 mImageHelper = 0x28c8a10 [0x3a8] private final androidx.appcompat.widget.q mBackgroundTintHelper = 0x28c89f0 [0x3a6] private boolean mHasLevel = false // extends android.widget.ImageView [0x3a5] private boolean mMergeState = false... [0x004] private transient int shadow$_monitor_ = -1965307935 [0x000] private transient java.lang.Class shadow$_klass_ = 0xb01f0888core-parser>采样校验[0x3ac] private final androidx.appcompat.widget.ld6 mImageHelper = 0x28c8a10core-parser> p 0x28c8a10Size: 0x20Padding: 0x4Object Name: androidx.appcompat.widget.ld6 [0x18] private int n = 0 [0x14] private androidx.appcompat.widget.d3 zy = 0x0 [0x10] private androidx.appcompat.widget.d3 toq = 0x0 [0x0c] private androidx.appcompat.widget.d3 q = 0x0 [0x08] private final android.widget.ImageView k = 0x27e6d40 // extends java.lang.Object [0x04] private transient int shadow$_monitor_ = 0 [0x00] private transient java.lang.Class shadow$_klass_ = 0xb0299708[0x3a8] private final androidx.appcompat.widget.q mBackgroundTintHelper = 0x28c89f0core-parser> p 0x28c89f0Size: 0x20Object Name: androidx.appcompat.widget.q [0x1c] private int zy = -1 [0x18] private final androidx.appcompat.widget.f7l8 toq = 0x2766330 [0x14] private androidx.appcompat.widget.d3 q = 0x0 [0x10] private androidx.appcompat.widget.d3 n = 0x0 [0x0c] private final android.view.View k = 0x27e6d40 [0x08] private androidx.appcompat.widget.d3 g = 0x0 // extends java.lang.Object [0x04] private transient int shadow$_monitor_ = 0 [0x00] private transient java.lang.Class shadow$_klass_ = 0xb0299bf0core-parser>[0x0dc] android.view.ViewOutlineProvider mOutlineProvider = 0x711a35f8 core-parser> p 0x711a35f8 Size: 0x8 Object Name: android.view.ViewOutlineProvider$1 // extends android.view.ViewOutlineProvider // extends java.lang.Object [0x4] private transient int shadow$_monitor_ = 536870912 [0x0] private transient java.lang.Class shadow$_klass_ = 0x71392bc8可见内存中的其它成员对象是正确有效的。也就是 klass_ 内容中 1f 被擦掉。
core-parser> rd 0x27e6d40 --ori27e6d40: 8adbc7e1b0000888 ........core-parser> rd 0x27e6d40 27e6d40: 8adbc7e1b01f0888 ........0xb01f0888 + 0x1380 = 0xb01f1c08core-parser> rd 0xb01f1c08b01f1c08: 0000000071a76ea8 .n.q....core-parser> method 0000000071a76ea8 public void android.widget.ImageView.jumpDrawablesToCurrentState [dex_method_idx=51663]错误内存| 地址 | 正确值 | 错误值 | 对象名 | | ---
| 27e6d40 | 8adbc7e1b01f0888 | 8adbc7e1b0000888 | A |
本地复现出现各类场景堆栈,基本都是 klass_坏根错误。统计错误特征如下:
| 错误值 | 正确值 | 类名 | | ---
| 0x70004a00 | 0x70504a00 | java.lang.String | | 0xaf005a30 | 0xafa55a30 | A | | 0xaf004db0 | 0xafb54db0 | a.b.c.u.m$t | | 0xaf0000a8 | 0xafb600a8 | a.b.c.d.e.b.M | | 0xaf00d4d8 | 0xafb2d4d8 | a.a.t.b.i.T | | 0x7000b170 | 0x708fb170 | android.view.View$ScrollabilityCache | | 0x7000ff90 | 0x704eff90 | java.lang.Object | | 0x70008970 | 0x70bf8970 | android.view.View$ListenerInfo | | 0x70003418 | 0x704f3418 | java.util.HashMap | | main space 的类 | | | | 0x030026f8 | 0x032326f8 | android.graphics.RenderNodeStubImpl |
由于没有条件设置硬件观察点,并且观察 Java 对象地址,变化过大不可取,于是通过 core-parser 注入一个观察线程,通过异步抓取最近节点的堆栈。
void Monitor::CheckVirtualMemory(void *vaddr, uint64_t size) { while (1) { uint64_t *current = (uint64_t *)vaddr; uint64_t *end = (uint64_t *)((uint64_t)vaddr + size); while (current观察范围设置在 0x2000000 ~ 0x3000000,这范围是在前面本地复现坏根对象地址统计的基础确定。
core-parser> spaceTYPE REGION ADDRESS NAME 5 [0x2000000, 0x42000000) 0xb400007a6920aef0 main space (region space)..."main" sysTid=28886 Waiting | group="main" daemon=0 prio=5 target=0x0 uncaught_exception=0x0 | tid=1 sCount=0 flags=0 obj=0x72f744c0 self=0xb40000726060b380 env=0xb4000072f0615050 | stack=0x7fe9d3d000-0x7fe9d3f000 stackSize=0x7ff000 handle=0x7478bee098 | mutexes=0xb40000726060bb20 held= x0 0xb40000735061ad80 x1 0x0000000000000080 x2 0x00000000000005c3 x3 0x0000000000000000 x4 0x0000000000000000 x5 0x0000000000000000 x6 0x0000000000000000 x7 0x000000717e9b28af x8 0x0000000000000062 x9 0xa1a11ca7b7440108 x10 0x0000000000000000 x11 0x0000000000000044 x12 0x00000000afda7ad0 x13 0x0000000000000056 x14 0x0000007477865880 x15 0x000000717e7e3226 x16 0x00000071994340e8 x17 0x000000745f360a80 x18 0x000000747885c000 x19 0xb40000735061ad70 x20 0xb40000726060b380 x21 0x00000000000005c3 x22 0x0000000000000000 x23 0x0000007477865880 x24 0x0000000000000001 x25 0xb40000722084c770 x26 0x0000000000000047 x27 0x000000719ab449c8 x28 0x0000000000000000 fp 0x0000007fea536ed0 lr 0x0000007198e137ec sp 0x0000007fea536ec0 pc 0x000000745f360aa0 pst 0x0000000060001000 Native: Native: Native: Native: JavaKt: - waiting on (a a.b.c.e.e.w.G$x2) JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt:core-parser> rd 0x00000000026e6f9826e6f98: 0000000070004a00 .J.p....core-parser> space -c ERROR: Region:[0x26e6f98, 0x26e6fc0) main space (region space) has bad object!!可见 View 的过程中,android.view.View.dispatchDetachedFromWindow 中发生变化。
内存越界踩踏接下来就要去锁定为什么会踩到 Java 堆内存。由于破坏的内存比较特殊,Java 堆内存属于 32 位,无法使用传统的内存检测,如 HWASAN、MTE 等方式,并且中间含 GC 迁移等特殊情况,调试内存难度相当大。中间调试涉及到 BPF 采集,core-parser 注入 so 库观察配合 mprotect 后准确命中踩内存的机器码。
内存越界原因core-parser> bt"main" sysTid=28703 Runnable | group="main" daemon=0 prio=5 target=0x0 uncaught_exception=0x0 | tid=1 sCount=0 flags=0 obj=0x735c4d08 self=0xb400006ffd004010 env=0xb400006f3d00dd50 | stack=0x7ff182b000-0x7ff182d000 stackSize=0x7ff000 handle=0x70d0225098 | mutexes=0xb400006ffd0047b0 held="mutator lock"(shared held) x0 0x0000000000000000 x1 0x0000000070af83b0 x2 0x00000000008f8b89 x3 0x00000000022f7008 x4 0x0000000000000000 x5 0x0000000000000000 x6 0x0000000000000001 x7 0x0000000027d57d8b x8 0x0000000073d3317c x9 0x0000000000000001 x10 0x0000000000000000 x11 0x0000000020000000 x12 0x0000000000000004 x13 0x000000000000005a x14 0x0000007ff20255a0 x15 0x000000007361499c x16 0x0000007ff20234e0 x17 0x0000000070dc60f0 x18 0x00000070cf728000 x19 0xb400006ffd004010 x20 0x0000000000000000 x21 0xb400006ffd0040d0 x22 0x00000000021c07f0 x23 0x00000000022f7008 x24 0x0000000002b33b40 x25 0x0000007ff20255c0 x26 0x0000000018300004 x27 0x0000000000000004 x28 0x0000007ff20255d0 fp 0x0000007ff20255c8 lr 0x0000000072ef6cf4 sp 0x0000007ff2025560 pc 0x0000000072ef6cf4 pst 0x0000000080001000 Native: Native: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt: JavaKt:core-parser>core-parser> search A -o[1] 0x22f73a0 Acore-parser> search B -o [1] 0x22f7008 Bcore-parser> f 0 JavaKt: { Location: /system/framework/framework.jar!classes4.dex art::ArtMethod: 0x7192ef08 dex_pc_ptr: 0x6dfec2ae3e quick_frame: 0x7ff2025560 frame_pc: 0x72ef6cf4 method_header: 0x72ef6c3cDEX CODE: 0x6dfec2ae38: 000c | move-result-object v0 0x6dfec2ae3a: 0038 0007 | if-eqz v0, 0x6dfec2ae48 //+7 0x6dfec2ae3e: 206e 8ada 0020 | invoke-virtual {v0, v2}, void android.view.ViewRootImpl.xxxxx(android.view.View) // method@35546OAT CODE: 0x72ef6cd0: b5224794 | cbnz x20, 0x72f3b5c0 0x72ef6cd4: b940a401 | ldr w1, [x0, 0x72ef6cd8: aa1703e2 | mov x2, x23 0x72ef6cdc: aa0003f6 | mov x22, x0 0x72ef6ce0: aa0103f8 | mov x24, x1 0x72ef6ce4: b9400020 | ldr w0, [x1] 0x72ef6ce8: f9408800 | ldr x0, [x0, 0x72ef6cec: f9400c1e | ldr x30, [x0, 0x72ef6cf0: d63f03c0 | blr x30 0x72ef6cf4: 390e6aff | strb wzr, [x23, 0x72ef6cf8: a9425ff6 | ldp x22, x23, [sp, 0x72ef6cfc: a9437bf8 | ldp x24, x30, [sp, } core-parser>x23 = 0x00000000022f70080x00000000022f7008 + 0x39a = 0x22f73a2当运行 0x72ef6cf4: 390e6aff | strb wzr, [x23, #0x39a] 后,正好擦除了 A klass_ 地址core-parser> rd 0x22f73a0 22f73a0: 89103a34b02b4e20 .N+.4:..也就是 89103a34b02b4e20 >>> 89103a34b0004e20core-parser> class B -f[0xb00f0dd8]public class B extends android.view.TextureView {[0x0374] private java.lang.Boolean y [0x0370] java.util.HashSet s ...core-parser> class android.widget.ImageView -f[0x70ed9e18]public class android.widget.ImageView extends android.view.View { [0x03a5] private boolean mX1 ... [0x039a] private boolean mXxx由于把 B 类对象被错误的处理成 ImageView ,而且中间的错误调用刚好也满足了内存,于是在新增的代码中出现内存越界踩踏现像。
踩踏分析core-parser> method 0xb0476378 --dex --oatprotected void B.onDetachedFromWindow [dex_method_idx=15974]DEX CODE: 0x6d7d08ec04: 106f 0525 0001 | invoke-super {v1}, void android.view.View.onDetachedFromWindow // method@1317... 0x6d7d08ec18: 000e | return-voidcore-parser>core-parser> method 0xb0404368 --dex --oatprotected void C.onDetachedFromWindow [dex_method_idx=16079]DEX CODE: 0x6d7d08ec04: 106f 0525 0001 | invoke-super {v1}, void android.view.View.onDetachedFromWindow... 0x6d7d08ec18: 000e | return-voidcore-parser>| 函数名 | | | ---
| B.onDetachedFromWindow | | | C.onDetachedFromWindow | 这两个函数共用一段字节码内存。| | | |
core-parser> class C -i -f[0xb0017e60]public final class C extends androidx.appcompat.widget.AppCompatImageView {...而 C 类是继承 android.widget.ImageView 的子类的。
core-parser> method 0xb0476378 --dex --oat -v --pc 0x6e01aa71a0protected void B.onDetachedFromWindow [dex_method_idx=15974]Location : xxx.apk!classes3.dexCodeItem : 0x6d7d08ebf4Registers : 2Ins : 1Outs : 2Insns size : 0xbDEX CODE: 0x6d7d08ec04: 106f 0525 0001 | invoke-super {v1}, void android.view.View.onDetachedFromWindow // method@1317... 0x6d7d08ec18: 000e | return-voidOatQuickMethodHeader(0x6e01a9cf3c) code_offset: 0x6e01a9cf40 code_size: 0xef00 NterpFrameInfo frame_size_in_bytes: 0xd0 core_spill_mask: 0x7ff80000 (x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, fp, lr) fp_spill_mask: 0xff00 (x8, x9, x10, x11, x12, x13, x14, x15)OAT CODE: [0x6e01a9cf40, 0x6e01aabe40]core-parser>当前该函数 B.onDetachedFromWindow 以 Nterp 方式解释运行。
这个业务过程中 C.onDetachedFromWindow 比 B.onDetachedFromWindow 发生的更早,于是 UpdateCache(self, dex_pc_ptr, resolved_method); 该字节码 resolved_method 被缓存。
core-parser> cs 6d7d08ec04 -w6ffd004a80: 0000006d7d08ec04 000000007192ef08 ...}m......q....7ff2025580: 0000006d7d08ec04 000000000000106f ...}m...o.......7ff20255b0: 0000006d7d08ec04 0000007ff20255d0 ...}m....U......// self=0xb400006ffd004010core-parser> vtor 6ffd004a80 * VIRTUAL: 0x6ffd004a80 * PHYSICAL: 0xb78f4a80 * OFFSET: 0xa80 * OR: 0x79c77faf4a80 * MMAP: 0x0 *[6ffd004000, 6ffd044000) rw- 0000040000 0000040000 [anon:scudo:primary] [*]core-parser> method 000000007192ef08 protected void android.widget.ImageView.onDetachedFromWindow [dex_method_idx=51670] core-parser>C 类对象调用 onDetachedFromWindow 也是以 Nterp 方式运行的,可看到该字节码已经被缓存成 ImageView.onDetachedFromWindow 函数,于是 B 类对象在后来以 Nterp 方式运行该字节码就会进入 ImageView 的函数。
core-parser> method 0xb0404368 --dex --oat -b -v --pc 0000006e01a9cf40protected void C.onDetachedFromWindow [dex_method_idx=16079]Location : xxx.apk!classes3.dexCodeItem : 0x6d7d08ebf4Registers : 2Ins : 1Outs : 2Insns size : 0xbDEX CODE: 0x6d7d08ec04: 106f 0525 0001 | invoke-super {v1}, void android.view.View.onDetachedFromWindow // method@1317... 0x6d7d08ec18: 000e | return-voidOatQuickMethodHeader(0x6e01a9cf3c) code_offset: 0x6e01a9cf40 code_size: 0xef00 NterpFrameInfo frame_size_in_bytes: 0xd0 core_spill_mask: 0x7ff80000 (x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, fp, lr) fp_spill_mask: 0xff00 (x8, x9, x10, x11, x12, x13, x14, x15)OAT CODE: [0x6e01a9cf40, 0x6e01aabe40]Binary:b0404368: 10300004b0017e60 ffa3027f00003ecf `~....0..>......b0404378: 0000006d7d08ebf4 0000006e01a9cf40 ...}m...@...n...core-parser>总结B.onDetachedFromWindow 与 C.onDetachedFromWindow 函数编译优化共用了同一段字节码,触发虚拟机 Nterp 解释运行的字节码缓存机制,出现幽灵调用,因此发生内存踩踏写入脏数据问题。BTW: 当出现软件造成的坏根问题,往往后是相关的 Java 函数带来的,因为 Java 堆内存地址比较小,基本上处于整个虚拟内存的最前边,很难是其它动态库踩踏的,因此更多是各类优化、混淆的问题,或是某特性不支持等。
在开源项目 core-parser 中增加通用风险检测:(实现中...),后面大家可以检测下是否存在以下风险。
core-parser> space --full-checkcore-parser> space --full-checkERROR: verify array: [0x21a4888]Array Name: java.lang.Object [0] 0x27e6d40ERROR: verify array: [0x21a5b80] Array Name: java.lang.Object [0] 0x27e6d40ERROR: verify array: [0x21a6880] Array Name: java.lang.Object [0] 0x27e6d40ERROR: verify instance: [0x21aff30] Object Name: java.util.LinkedHashMap$Entry [0x08] final java.lang.Object key = 0x27e6d40 ... ERROR: Region:[0x27e6d40, 0x27e7100) main space (region space) has bad object!! ...ERROR: verify class reuse dex_pc_ptr method [0xb06179f0] protected void B.onDetachedFromWindow [0xb054e978] protected void C.onDetachedFromWindow来源:墨码行者