摘要:V8除了解释和编译JS之外,其它的功能如还可以编译 WebAssembly 。属于双引擎性质的组件,功能天然强大,有谷歌的支撑微软协助,背景也够硬。
V8除了解释和编译JS之外,其它的功能如还可以编译 WebAssembly 。属于双引擎性质的组件,功能天然强大,有谷歌的支撑微软协助,背景也够硬。
上一篇:我们了解下V8的JS解释器过程,本篇看下 WebAssembly。
WebAssembly(后面简称:Wasm)这几年的火爆程序不亚于AI,它以短小,精悍,紧凑的二进制格式。在 游戏,音视频,云,区块链等方面几乎全方位的秒杀了JS那笨重和简陋的设计。甚至一些久经考验的,声名远扬的桌面软件比如 AutoCAD、Photoshop通过Wasm部分移植到了web端。nodejs也用到了v8组件进行js解析。
同时可以把Rust,C++,Go的代码编译成Wasm,然后运行在浏览器上。因Wasm字节码是一个伪C形式的代码,但更接近汇编,它天然在对抗逆向方面有强效作用。
本篇来看下它的核心所在。
取代JS是否可行?看例子
html>html lang="en">head>meta charset="UTF-8">title>WASM Demotitle>head>body>h1>WASMh1>script>(async => { const Wasm = "AGFzbQEAAAABBwFgAn9/AX8DAgEABwcBA2FkZAAACgkBBwAgACABags="; const bytes = uint8Array.from(atob(Wasm), c => c.charCodeAt(0)); const { instance } = await WebAssembly.instantiate(bytes.buffer); const result = instance.exports.add(60, 60); const p = document.createElement('p'); p.textContent = `60 + 60 = ${result}`; document.body.appendChild(p);});script>body>html>给Script脚本进行了Wasm的操作,没有用JS。我们看下纯Wasm。
const Wasm = "AGFzbQEAAAABBwFgAn9/AX8DAgEABwcBA2FkZAAACgkBBwAgACABags=";function base64ToArrayBuffer(base64) { const binary_string = (function { const b = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="; let s = ""; let i = 0; while (i length) { const e1 = b.indexOf(base64.charAt(i++)); const e2 = b.indexOf(base64.charAt(i++)); const e3 = b.indexOf(base64.charAt(i++)); const e4 = b.indexOf(base64.charAt(i++)); const c1 = (e1 2) | (e2 >> 4); const c2 = ((e2 & 15) 4) | (e3 >> 2); const c3 = ((e3 & 3) 6) | e4; s += String.fromCharCode(c1); if (e3 != 64) s += String.fromCharCode(c2); if (e4 != 64) s += String.fromCharCode(c3); } return s; }); const len = binary_string.length; const bytes = new Uint8Array(len); for (let i = 0; i bytes[i] = binary_string.charCodeAt(i); } return bytes.buffer;}(async => { const buffer = base64ToArrayBuffer(Wasm); const { instance } = await WebAssembly.instantiate(buffer); const result = instance.exports.add(60, 60); print(`60 + 60 = ${result}`);});看Wasm变量
这一串代码的Wasm如下,实际上就实现了一个加法运算
(module (func (param i32 i32) (result i32) local.get 0 local.get 1 i32.add) (export "add" (func 0)))我们后面通过:
awaitWebAssembly.instantiate(buffer)实例化了一个句柄,然后通过句柄计算两个整数:instance.exports.add(60, 60)。当然这些表面上的代码不是我们的重点,我们需要知道的是Add这个函数代码在内存,在V8(Chromium)引擎里面是怎么被执行的。
先看下它初次的汇编:
D:\chromium\src\out\Debug>d8 --allow-natives-syntax --print-wasm-code "C:\Users\Administrator\Desktop\wasm.js"--- WebAssembly code ---name: wasm-function[0]index: 0kind: wasm functioncompiler: LiftoffBody (size = 256 = 196 + 60 padding)Instructions (size = 180)0xa448f51900 0 4531e4 xorl r12,r120xa448f51903 3 e868f8ffff call 000000A448F51170 (jump table)0xa448f51908 8 4881ec08000000 REX.W subq rsp,0x80xa448f5190f f 8bc0 movl rax,rax0xa448f51911 11 8bd2 movl rdx,rdx0xa448f51913 13 8b4eff movl rcx,[rsi-0x1]0xa448f51916 16 4903ce REX.W addq rcx,r140xa448f51919 19 0fb74907 movzxwl rcx,[rcx+0x7]0xa448f5191d 1d 81f9be000000 cmpl rcx,0xbe0xa448f51923 23 0f8421000000 jz 000000A448F5194A 0x4a>0xa448f51929 29 b94c000000 movl rcx,000000000000004C0xa448f5192e 2e 4989e2 REX.W movq r10,rsp0xa448f51931 31 4883ec28 REX.W subq rsp,0x280xa448f51935 35 4883e4f0 REX.W andq rsp,0xf00xa448f51939 39 4c89542420 REX.W movq [rsp+0x20],r100xa448f5193e 3e 48b86022bcc3fc7f0000 REX.W movq rax,00007FFCC3BC22600xa448f51948 48 ffd0 call rax0xa448f5194a 4a 493b65a0 REX.W cmpq rsp,[r13-0x60]0xa448f5194e 4e 0f8640000000 jna 000000A448F51994 0x94>//这里即是相加的地方0xa448f51954 54 8d0c10 leal rcx,[rax+rdx*1]//中间省略,便于观看0xa448f519b2 b2 ebb4 jmp 000000A448F51968 0x68>Source positions: pc offset position 96 0 statement a6 6 statementSafepoints (stack slots = 9, entries = 1, byte size = 15)0xa448f5199b 9b slots (sp->fp): 000000000RelocInfo (size = 0)--- End code ---60 + 60 = 120对于add函数,它用的汇编不是add指令,也不是简单的相加,而是lea指令
0xa448f51954 54 8d0c10 leal rcx,[rax+rdx*1]用lea的好处,可以在不影响标志位的情况下把寄存器+寄存器的地址求出来。它生成此段汇编的地方如下:
//src\v8\src\wasm\baseline\x64\liftoff-assembler-x64-inl.hvoid LiftoffAssembler::emit_i32_add(Register dst, Register lhs, Register rhs) { if (lhs != dst) { leal(dst, Operand(lhs, rhs, times_1, 0)); } else { addl(dst, rhs); }}为了定位到emit_i32_add这个函数,我们需要在以下函数下断点
void Decode { //省略便于观看 DecodeFunctionBody; }以及
//src\v8\src\wasm\function-body-decoder-impl.hvoid DecodeFunctionBody { TRACE("wasm-decode %p...%p (module+%u, %d bytes)\n", this->start, this->end, this->pc_offset, static_cast { if (V8_LIKELY(this->current_inst_trace_->first == 0)) { // Decode the function body. while (this->pc_ this->end_) { // Most operations only grow the stack by at least one element (unary // and binary operations, local.get, constants, ...). Thus check that // there is enough space for those operations centrally, and avoid any // bounds checks in those operations. stack_.EnsureMoreCapacity(1, this->zone_); uint8_t first_byte = *this->pc_; WasmOpcode opcode = static_cast CALL_INTERFACE_IF_OK_AND_REACHABLE(NextInstruction, opcode); int len; // Allowing two of the most common decoding functions to get inlined // appears to be the sweet spot. // Handling _all_ opcodes via a giant switch-statement has been tried // and found to be slower than calling through the handler table. if (opcode == kExprLocalGet) { len = WasmFullDecoder::DecodeLocalGet(this, opcode); } else if (opcode == kExprI32Const) { len = WasmFullDecoder::DecodeI32Const(this, opcode); } else { //这里下断点 OpcodeHandler handler = GetOpcodeHandler(first_byte); len = (*handler)(this, opcode); } this->pc_ += len; } }最终会把生成的机器码赋值给下面
jit_allocation的address变量进行执行。这个地方如果是.NET它会直接用生成机器码地址进行跳转执行,而不会再次赋值或者拷贝到其它地址。
//src\v8\src\wasm\wasm-code-manager.ccWritableJitAllocation jit_allocation = ThreadIsolation::LookupJitAllocation( reinterpret_cast dst_code_bytes.size, ThreadIsolation::JitAllocationType::kWasmCode, true);jit_allocation.CopyCode(0, desc.buffer, desc.instr_size);由于以上代码都在 RUNTIME_FUNCTION宏
RUNTIME_FUNCTION(RUNTIME_WasmCompileLazy) { DCHECK_EQ(2, args.length); TaggedWasmTrustedInstanceData> trusted_instance_data = TrustedCastWasmTrustedInstanceData>(args[0]); int func_index = args.smi_value_at(1); TRACE_EVENT1("v8.wasm", "wasm.CompileLazy", "func_index", func_index); DisallowHeapAllocation no_gc; SealHandleScope scope(isolate); DCHECK(isolate->context.is_null); if (trusted_instance_data->has_native_context) { isolate->set_context(trusted_instance_data->native_context); } bool success = wasm::CompileLazy(isolate, trusted_instance_data, func_index); if (!success) { DCHECK(v8_flags.wasm_lazy_validation); AllowHeapAllocation throwing_unwinds_the_stack; wasm::ThrowLazyCompilationError( isolate, trusted_instance_data->native_module, func_index); DCHECK(isolate->has_exception); return ReadOnlyRoots{isolate}.exception; } return Smi::FromInt( wasm::JumpTableOffset(trusted_instance_data->module, func_index));}最终是由v8.dll!Builtins_XXXX函数调比如下面的Builtins_WasmCEntry堆栈
> v8.dll!Builtins_WasmCEntry C++ v8.dll!Builtins_WasmCompileLazy C++ v8.dll!Builtins_JSToWasmWrapperAsm C++ v8.dll!Builtins_JSToWasmWrapper C++ v8.dll!Builtins_InterpreterEntryTrampoline C++ v8.dll!Builtins_AsyncFunctionAwaitResolveClosure C++ v8.dll!Builtins_PromiseFulfillReactionJob C++ v8.dll!Builtins_RunMicrotasks C++ v8.dll!Builtins_JSRunMicrotasksEntry C++ [内联框架] v8.dll!v8::internal::GeneratedCodeunsigned long long,unsigned long long,v8::internal::MicrotaskQueue *>::Call(unsigned __int64 args, v8::internal::MicrotaskQueue * args) 行 212 C++ v8.dll!v8::internal::`anonymous namespace'::Invoke(v8::internal::Isolate * isolate, const v8::internal::`anonymous namespace'::Invokeparams & params) 行 556 C++v8.dll!Builtins_WasmCEntry调用
RUNTIME_FUNCTION宏的地方和返回的地方
00007FFCAC47D6DC FF D3 call rbx //调用的地方//调用之后返回的地方00007FFCAC47D6DE 49 3B 85 C8 03 00 00 cmp rax,qword ptr [r13+3C8h]当然最终的Wasm代码执行在
Builtins_WasmCompileLazy
00007FFCAC4449A0 41 FF E7 jmp r15经过两个jmp的跳转终于来到了wasm代码汇编地方
0000078B48441900 45 31 E4 xor r12d,r12d 0000078B48441903 E8 68 F8 FF FF call 0000078B48441170 0000078B48441908 48 81 EC 08 00 00 00 sub rsp,8 0000078B4844190F 8B C0 mov eax,eax 0000078B48441911 8B D2 mov edx,edx //此处省略总结就是,V8通过桩入口进入到 Builtins相对应的实现预定好的执行函数,进行执行,其中的解释,编译,以及执行,跟其它语言比如Rust/Java等没有什么分别。
无论是V8的JS引擎和Wasm引擎,本质上都属于编译范畴,并不难。难点在于其庞大的外围组件和技术。
比如(以下参考网路)
Browser 进程(浏览器主进程) 所有其它进程的“调度者”。负责窗口管理、地址栏/书签/下载管理、网络堆栈(大部分 HTTP 请求)、磁盘缓存、权限控制、插件/子进程生命周期管理等。UI(标签栏、工具栏)也在这里渲染。 Renderer 进程(渲染进程) 每个标签页(或 iframe group / 站点隔离单元)对应一个渲染进程。负责解析 HTML/CSS/JS,排版布局、执行 JavaScript、构建 DOM 树和 Render Tree,合成图层(Compositing)等。崩溃时只影响该标签页。 GPU 进程把各个 Renderer 的图层交给 GPU 进行加速合成、WebGL、Canvas、视频解码等工作。减少 CPU 占用,提高流畅度。 Network Service 进程(较新版本) 原本在 Browser 进程里的网络堆栈现在拆出来单独进程,用于安全隔离和更高性能的网络请求、DNS 解析、缓存处理。这些组件导致了Chromium异常的庞大,一个level-symbols one和two的compile结果高达200多个G。
本篇视频地址:
https://www.bilibili.com/video/BV1hwbAzSEss?spm_id_from=333.788.videopod.episodes&vd_source=6502a137795fa3ed92afa244920d9e5a
来源:opendotnet