<div class="Box">
This article explains the internal architecture of the Java Virtual Machine (JVM). The following diagram show the key internal components of a typical JVM that conforms to.
The components shown on this diagram are each explained below in two sections. covers the components that are created for each thread and the covers the components that are created independently of threads.
A thread is a thread of execution in a program. The JVM allows an application to have multiple threads of execution running concurrently. In the Hotspot JVM there is a direct mapping between a Java Thread and a native operating system Thread. After preparing all of the state for a Java thread such as thread-local storage,allocation buffers,synchronization objects,stacks and the program counter,the native thread is created. The native thread is reclaimed once the Java thread terminates. The operating system is therefore responsible for scheduling all threads and dispatching them to any available cpu. Once the native thread has initialized it invokes the method in the Java thread. When the method returns,uncaught exceptions are handled,then the native thread confirms if the JVM needs to be terminated as a result of the thread terminating (i.e. is it the last non-deamon thread). When the thread terminates all resources for both the native and Java thread are released.
If you use jconsole or any debugger it is possible to see there are numerous threads running in the background. These background threads run in addition to the main thread,which is created as part of invoking main(String[]),and any threads created by the main thread. The main background system threads in the Hotspot JVM are:
Each thread of execution has the following components:
Address of the current instruction (or opcode) unless it is native. If the current method is native then the PC is undefined. All cpus have a PC,typically the PC is incremented after each instruction and therefore holds the address of the next instruction to be executed. The JVM uses the PC to keep track of where it is executing instructions,the PC will in fact be pointing at a memory address in the Method Area.
Each thread has its own stack that holds a frame for each method executing on that thread. The stack is a Last In First Out (LIFO) data structure,so the currently executing method is at the top of the stack. A new frame is created and added (pushed) to the top of stack for every method invocation. The frame is removed (popped) when the method returns normally or if an uncaught exception is thrown during the method invocation. The stack is not directly manipulated,except to push and pop frame objects,and therefore the frame objects may be allocated in the Heap and the memory does not need to be contiguous.
Not all JVMs support native methods,however,those that do typically create a per thread native method stack. If a JVM has been implemented using a C-linkage model for Java Native Invocation (JNI) then the native stack will be a C stack. In this case the order of arguments and return value will be identical in the native stack to typical C program. A native method can typically (depending on the JVM implementation) call back into the JVM and invoke a Java method. Such a native to Java invocation will occur on the stack (normal Java stack); the thread will leave the native stack and create a new frame on the stack (normal Java stack).
A stack can be a dynamic or fixed size. If a thread requires a larger stack than allowed a StackOverflowError is thrown. If a thread requires a new frame and there isn’t enough memory to allocate it then an OutOfMemoryError is thrown.
A new frame is created and added (pushed) to the top of stack for every method invocation. The frame is removed (popped) when the method returns normally or if an uncaught exception is thrown during the method invocation. For more detail on exception handling .
Each frame contains:
- Local variable array
- Return value
- Operand stack
- Reference to runtime constant pool for class of the current method
The array of local variables contains all the variables used during the execution of the method,including a reference to ,all method parameters and other locally defined variables. For class methods (i.e. static methods) the method parameters start from zero,for instance method the zero slot is reserved for .
A local variable can be:
- reference
- returnAddress
All types take a single slot in the local variable array except and which both take two consecutive slots because these types are double width (64-bit instead of 32-bit).
The operand stack is used during the execution of byte code instructions in a similar way that general-purpose registers are used in a native CPU. Most JVM byte code spends its time manipulating the operand stack by pushing,popping,duplicating,swapping,or executing operations that produce or consume values. Therefore,instructions that move values between the array of local variables and the operand stack are very frequent in byte code. For example,a simple variable initialization results in two byte codes that interact with the operand stack.
i;
Gets compiled to the following byte code:
:
:
For more detail explaining interactions between the local variables array,operand stack and run time constant pool .
Each frame contains a reference to the runtime constant pool. The reference points to the constant pool for the class of the method being executed for that frame. This reference helps to support dynamic linking.
C/C++ code is typically compiled to an object file then multiple object files are linked together to product a usable artifact such as an executable or dll. During the linking phase symbolic references in each object file are replaced with an actual memory address relative to the final executable. In Java this linking phase is done dynamically at runtime.
When a Java class is compiled,all references to variables and methods are stored in the class's constant pool as a symbolic reference. A symbolic reference is a logical reference not a reference that actually points to a physical memory location. The JVM implementation can choose when to resolve symbolic references,this can happen when the class file is verified,after being loaded,called eager or static resolution,instead this can happen when the symbolic reference is used for the first time called lazy or late resolution. However the JVM has to behave as if the resolution occurred when each reference is first used and throw any resolution errors at this point. Binding is the process of the field,method or class identified by the symbolic reference being replaced by a direct reference,this only happens once because the symbolic reference is completely replaced. If the symbolic reference refers to a class that has not yet been resolved then this class will be loaded. Each direct reference is stored as an offset against the storage structure associated with the runtime location of the variable or method.
The Heap is used to allocate class instances and arrays at runtime. Arrays and objects can never be stored on the stack because a frame is not designed to change in size after it has been created. The frame only stores references that point to objects or arrays on the heap. Unlike primitive variables and references in the local variable array (in each frame) objects are always stored on the heap so they are not removed when a method ends. Instead objects are only removed by the garbage collector.
To support garbage collection the heap is divided into three sections:
-
Young Generation
- Often split between Eden and Survivor
- Old Generation (also called Tenured Generation)
- Permanent Generation
Objects and Arrays are never explicitly de-allocated instead the garbage collector automatically reclaims them.
Typically this works as follows:
- New objects and arrays are created into the young generation
- Minor garbage collection will operate in the young generation. Objects,that are still alive,will be moved from the eden space to the survivor space.
- Major garbage collection,which typically causes the application threads to pause,will move objects between generations. Objects,will be moved from the young generation to the old (tenured) generation.
- The permanent generation is collected every time the old generation is collected. They are both collected when either becomes full.
Objects that are logically considered as part of the JVM mechanics are not created on the Heap.
The non-heap memory includes:
-
Permanent Generation that contains
- the method area
- interned strings
- Code Cache used for compilation and storage of methods that have been compiled to native code by the JIT compiler
Java byte code is interpreted however this is not as fast as directly executing native code on the JVM’s host cpu. To improve performance the Oracle Hotspot VM looks for “hot” areas of byte code that are executed regularly and compiles these to native code. The native code is then stored in the code cache in non-heap memory. In this way the Hotspot VM tries to choose the most appropriate way to trade-off the extra time it takes to compile code verses the extra time it take to execute interpreted code.
The method area stores per-class information such as:
- Classloader Reference
-
Run Time Constant Pool
- Numeric constants
- Field references
- Method References
- Attributes
-
Field data
- Per field
- Name
- Type
- Modifiers
- Attributes
- Per field
-
Method data
- Per method
- Name
- Return Type
- Parameter Types (in order)
- Modifiers
- Attributes
- Per method
-
Method code
- Per method
- Bytecodes
- Operand stack size
- Local variable size
- Local variable table
- Exception table
- Per exception handler
- Start point
- End point
- PC offset for handler code
- Constant pool index for exception class being caught
- Per exception handler
- Per method
All threads share the same method area,so access to the method area data and the process of dynamic linking must be thread safe. If two threads attempt to access a field or method on a class that has not yet been loaded it must only be loaded once and both threads must not continue execution until it has been loaded.
A compiled class file consists of the following structure: