Runtime - Inline Cache


从编译说起。以C1为例,假设Java代码有个invoke指令:

void foo() {
    base.vcall();
}

在编译这个foo的时候,遇到base.vcall(),它会生成一个HIR指令Invoke,这个Invoke后面lowering成LIR_OpJavaCall:

void LIRGenerator::do_Invoke(Invoke* x) {
  ...
  switch (x->code()) {
    case Bytecodes::_invokestatic:
      __ call_static(target, result_register,
                     SharedRuntime::get_resolve_static_call_stub(),
                     arg_list, info);
      break;
    case Bytecodes::_invokespecial:
    case Bytecodes::_invokevirtual:
    case Bytecodes::_invokeinterface:
      // for loaded and final (method or class) target we still produce an inline cache,
      // in order to be able to call mixed mode
      if (x->code() == Bytecodes::_invokespecial || x->target_is_final()) {
        __ call_opt_virtual(target, receiver, result_register,
                            SharedRuntime::get_resolve_opt_virtual_call_stub(),
                            arg_list, info);
      } else {
        __ call_icvirtual(target, receiver, result_register,
                          SharedRuntime::get_resolve_virtual_call_stub(),
                          arg_list, info);
      }
      break;
    case Bytecodes::_invokedynamic: {
      __ call_dynamic(target, receiver, result_register,
                      SharedRuntime::get_resolve_static_call_stub(),
                      arg_list, info);
      break;
    }
    default:
      fatal("unexpected bytecode: %s", Bytecodes::name(x->code()));
      break;
  }

}

注意call_icvirtual,它的ciMethod是target,但是最终跳转的地址dest是SharedRuntime::get_resolve_virtual_call_stub。再次lowering的时候:

void LIR_Assembler::ic_call(LIR_OpJavaCall* op) {
  __ ic_call(op->addr());
  add_call_info(code_offset(), op->info());
  assert((__ offset() - NativeCall::instruction_size + NativeCall::displacement_offset) % BytesPerWord == 0,
         "must be aligned");
}
void MacroAssembler::ic_call(address entry, jint method_index) {
  RelocationHolder rh = virtual_call_Relocation::spec(pc(), method_index);
  movptr(rax, (intptr_t)Universe::non_oop_word());
  call(AddressLiteral(entry, rh));
}

ic_call展开是两条指令

foo:
  mov rax, non_oop
  call addr(SharedRuntime::get_resolve_virtual_call_stub)

所以这个vcall()最终生成的代码是上面两条,并不会真正call到vcall,需要走一个get_resolve_virtual_call_stub。这个get_resolve_virtual_call_stub的用处会尝试看看这个base.vcall是不是一直都是单态调用,如果是则找到vcall真正的地址,假设是real_vcall,然后告诉call处,你这里是单态(set_to_monomorphic),然后返回这个地址——同时还会修改mov 那个non_oop,把它改成当前找到的方法的,最后上面指令就变成了

foo:
  mov rax, real_call_receiver
  call real_vcall

caller侧差不多就这样。caller还需要检查,万一caller改了之后,callee肯定还要检查一下的。还是从编译说起。

void LIR_Assembler::emit_op0(LIR_Op0* op) {
  switch (op->code()) {
    ...
    case lir_std_entry:
      // init offsets
      offsets()->set_value(CodeOffsets::OSR_Entry, _masm->offset());
      _masm->align(CodeEntryAlignment);
      if (needs_icache(compilation()->method())) {
        check_icache();
      }
      offsets()->set_value(CodeOffsets::Verified_Entry, _masm->offset());
      _masm->verified_entry(compilation()->directive()->BreakAtExecuteOption);
      if (needs_clinit_barrier_on_entry(compilation()->method())) {
        clinit_barrier(compilation()->method());
      }
      build_frame();
      offsets()->set_value(CodeOffsets::Frame_Complete, _masm->offset());
      break;
    ...
}

这个vcall方法编译的时候,会在方法开头先生成ic检查指令,然后生成当前vcall的Verified entry,最后生成clinit barrier指令。然后接着生成后面的指令,所以现在大概长这样:

foo:
  mov rax, non_oop
  call addr(SharedRuntime::get_resolve_virtual_call_stub) ; 假设还没有走过resolve

vcall:
  // 方法开头
  mov rscratch1, addr(receiver, klass_offset)
  cmp rscratch1, rax
  jne get_ic_miss_stub
  ... // clinit_barrier
  // 方法体
  ...

check_icache就是插入inlinecache(后简称ic)的地方。

int LIR_Assembler::check_icache() {
  Register receiver = FrameMap::receiver_opr->as_register();
  Register ic_klass = IC_Klass;
  ...
  int offset = __ offset();
  __ inline_cache_check(receiver, IC_Klass);
  ...
  return offset;
}
void C1_MacroAssembler::inline_cache_check(Register receiver, Register iCache) {
  ...
  if (UseCompressedClassPointers) {
    load_klass(rscratch1, receiver, tmp_load_klass);
    cmpptr(rscratch1, iCache);
  } else {
    cmpptr(iCache, Address(receiver, oopDesc::klass_offset_in_bytes()));
  }
  jump_cc(Assembler::notEqual,
          RuntimeAddress(SharedRuntime::get_ic_miss_stub()));
  ...
}

IC_Klass是rax,和之前caller的调用点生成的一致,都是rax。这里inline_cache_check获取当前receiver对象的klass,判断是否和rax一样,如果一样,ic命中,无事发生。如果没有命中,则进入get_ic_miss_stub。这个get_ic_miss_stub可能视情况(vcall是接口调用还是虚调用)修改caller处的代码。假设这是虚调用

foo:
  mov rax, non_oop
  call vtable_stub

vcall:
  // 方法开头 unverified_entry_point
  mov rscratch1, addr(receiver, klass_offset)
  cmp rscratch1, rax
  jne get_ic_miss_stub
  ... // clinit_barrier
  // 方法体 verified_entry_point
  ...

那么caller处会修改成一个vtable_stub,这个vtable_stub就和c++类似了,查虚表,找目标方法,最后调用,相对之前会慢很多。可能有人已经注意到,这里改了caller,那callee的ic check指令就完全没必要了,所以vtable_stub直接指向的是verified_entry_point,即跳过ic check指令的代码地址。与之对应的概念,unverified_entry_point就是指包括ic check的代码地址。还有一篇补充文章https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall可以阅读,我写完后才发现的,它更详细更精确。