Today we embark on a journey to answer one question: How is the 64-bit ntdll loaded into a 32-bit process under WoW64?
The journey will take us into uncharted territories inside the logic of the Windows kernel and we'll discover how the Memory Address space of a 32-bit process is initialized.
What is WoW64?
From MSDN:
| WOW64 is the x86 emulator that allows 32-bit Windows-based applications to run seamlessly on 64-bit Windows.
In other words, with the introduction of the 64-bit version of Windows, Microsoft needed to come up with a solution that allowed applications written during the 32-bit era of Windows to seamlessly interact with the new underlying components of a 64-bit Windows. Specifically, 64-bit memory addressing and components that spoke directly to the new kernel.
Two NT layers, One Kernel
In 32-bit Windows, applications that call into Windows APIs are routed through a series of Dynamic Link Libraries (DLLs). However, all system calls eventually route through to ntdll.dll, which is the highest layer in Usermode that passes execution of User Mode APIs to the Kernel.
An example of this is a call to CreateFileW. This API call originates from kernel32.dll in Usermode; it is then transferred to ntdll as NtCreateFile, and NtCreateFile then transfers control to the kernel via a System Call Dispatcher.
Under 32-bit Windows this is pretty straightforward - however, under WoW64 extra steps must be taken. The 32-bit ntdll cannot directly transfer control to the kernel because the kernel is now a 64-bit executable and only accepts types that follow the 64-bit ABI. Because of this a translation layer was added to 64-bit Windows in the form of several DLLs canonically named wow64.dll wow64cpu.dll and wow64win.dll. These DLLs are responsible for transitioning 32-bit originated calls to 64-bit calls.
These calls are eventually routed through the 64-bit ntdll mapped into every 32-bit process. Lots of information is available about this magical transition of 32-bit system call to 64-bit calls (1) so we won't get into it here.
What we're mostly focused on is how and when the kernel maps the 64-bit version of ntdll into a 32-bit process. This looks something like:
In particular, we're interested in the second to last entry. We can observe that the address ntdll is mapped to is a 64-bit address range (7FFFFED40000-7FFFFEF1FFFF) and its location is within the System32\ Directory where Windows' 64-bit system files reside. However, we know that 32-bit processes cannot access or work in 64-bit memory space.
In order to understand the output above, we must first discuss what the VAD (Virtual Address Descriptor) is and how it will help us understand the mechanism of loading the 64-bit ntdll into a 32-bit process.
What is the Virtual Address Descriptor?
The VAD is one of many ways the Windows Operating System keeps track of available physical memory in the system. The VAD specifically keeps track of reserved and committed addresses in the Usermode range for each process. Anytime a Process requests some memory, a new VAD instance is created to track it.
The VAD is structured as a self-balancing tree, where each node (a VAD instance) describes a memory range. Each node can contain up to two children; to the left for lower addresses and right for higher addresses. Each Process is assigned a VadRoot which can then be traversed to identify additional nodes that describe reserved or committed Virtual Address Ranges.
The output of the !vad command from WinDBG deserves some attention, as this is the output we will mostly use to track the mapping of a 32-bit process in 64-bit Windows. Not all of the fields are particularly interesting to us for this exercise.
Let's consider the output of our test application HelloWorld.exe. We begin by identifying the VadRoot of our process via the output from the !process ProcessObject command.
Once we've identified the VadRoot, we then input that address to the !vad command. (Output has been truncated for easier analysis).
We see five column headers: "VAD", "Level", "Start", "End", and "Commit". The !vad command itself accepts the address of a VAD instance; in our case we have provided it the VadRoot that is obtained when using the !process command for this process.
The VAD Address is the address of the current VAD structure or instance:
- The Level describes the level in the Tree this VAD instance (node) is located. Level 0 is the VadRoot that is obtained from the !process output above.
- Starting and Ending address values are expressed as VPN's (Virtual Page Numbers) These addresses can be converted to their Virtual Addresses by multiplying by the page size (4kb), or shifted left by 3. The Ending VPN will add an additional 0xFFF to extend to the end of the page. D20 -> D20000, and DD2 -> DD2FFF in our example above.
- Commit is the number of committed pages in the range described by this VAD instance.
- The type of allocation is next and this tells us whether the particular range has been mapped or is private to the Process
- Type of access describes the allowed access for the range and last is any name associated to the mapped region.
A VAD instance can be created in multiple ways; through the use of mapping APIs (CreateFileMapping/MapViewOfFile) or Memory allocations such as VirtualAlloc function. Memory can either be reserved or committed (or free'd), or reserved and partially committed. Whichever the case, a VAD entry is mapped into the Process' Vad Tree to let the Memory Manager know about the current memory commitment for this Process. Our look into the VAD will reveal the initial setup for a 32-bit process running under WoW64.
Mapping the NT Subsystem DLLs
Early during process initialization, Windows determines and preserves certain address ranges for special areas before the main executable is mapped in and initialized. Some of these contain the initial process address space, shared system space (_KUSER_SHARED_DATA), Control Flow Guard Bitmap region, and the NT native subsystems (ntdll).
Due to the overall complex nature of Process initialization we're only interested in the last piece which contains the logic of loading both the 32-bit ntdll and the 64-bit ntdll into a 32-bit process address space. We'll make our observations by following a series of API calls and paying close attention to the Virtual Address Descriptors (VADs) that describe the memory regions at each point.
In order for the kernel to differentiate how to map a new process, it needs to know whether or not this is a WoW64 process. It does this by reading a value within the undocumented _EPROCESS structure named _EPROCESS.Wow64Process when the Process Object is initially created. If this value is true it proceeds to Map memory accordingly.
PspAllocateProcess is where our journey begins, but more specifically we'll start at MmInitializeProcessAddressSpace().
MmInitializeProcessAddressSpace() is responsible for many of the initializations related to a new Process' Address Space. It calls MiMapProc essExecutable, which creates the VAD entries that define the initial process' addressable memory space and subsequently maps the newly created process into its base virtual address.
One particularly interesting function here is PspMapSystemDlls. Let's have a look at what a process' address space looks like before a call to PspMapSystemDlls. In WinDBG we ensure we're currently in the context of our test application (.process) and look up the current Vad Root (!vad output).
We can observe that so far, our Process has initially been mapped and assigned a base address (1200) in the 32-bit address space, Kernel Shared Memory (0x7FFE0000-0x7FFE0FFF) and the 64KB Reserved Memory (0x7FFE1000-0x7FFEFFFF) regions have also been mapped to their respective Virtual Addresses.
PspMapSystemDlls iterates through a global pointer that contains information for several platforms Subsystem modules. For x86 and x64 Windows these are ntdll.dll located in the C:\Windows\SysWow64 and C:\Windows\System directories respectively.
Once PspMapSystemDlls finds the relevant DLLs to load, it calls PspMapSystemDll to map them into the Process' Address Space. The function is fairly short and straightforward, and a snippet is shown below. In order for it to map the correct Native subsystem DLLs certain conditions need to be met.
PspMapSystemDll performs the actual mapping of the Native DLLs through a call to MmMapViewOfSection and saves the captured Base Address. After these two DLLs are mapped and their VAD Entries have been initialized, our 32-bit processes Address Space looks like this:
So now, our process had been mapped (0xc40000-0xcf2fff), shared kernel memory space (0x7ffe0000-0x7ffe0fff), the valid ending region for a 32-bit address space (0x7ffe1000-0x7ffeffff), and our two NT Subsystem DLLs.
Locking down the Address Space
There is one last step that must be taken to complete the mapping of a 32-bit Process. We know that a 32-bit process can only address up to 2GB of virtual Memory* so Windows needs to mask off the remaining address space for this process. For a 32-bit process this occurs after the range 0x7FFF0000 - 0x7FFFFFFF; however, nothing can be mapped after 0x7FFeFFFF. Due to this fact, memory regions adjacent to the 64-bit NTDLL need to be reserved (or masked out).
To accomplish this, the kernel marks the remainder of the 64-bit address space as Private. It creates this VAD entry by walking the VAD tree for the current Process and locating the last available Virtual Address, then appends and prepends a new VAD entry.
The API that accomplishes this task is MiInitializeUserNoAccess. This function receives the current Process handle and a Virtual Address. The Virtual Address passed is 0x7FFF0000 which is the beginning of the last addressable range for a 32-bit process. It then proceeds to walk the current VAD entries and performs an insert of a new range that covers the remaining address space for a 32-bit process. After this call, our Process' Address Space now looks like:
We can observe now that our 32-bit Process has been mapped and its valid memory address ranges have been reserved by the kernel. The VAD instances that cover the range 0x7FFF0 - 0x7FFFFED3F and 0x7FFFFEF20- 0x7FFFFFFEF have been reserved as Private by the kernel. Any subsequent calls to retrieve memory will occur only in the 32-bit address space allowable. Once the process has been fully loaded we can see all the additional committed memory occur around the address base of the Process itself (0xC40000).
Wrapping it up
We have now observed the initial mapping of a 32-bit Process under 64-bit Windows and how the 64-bit ntdll is mapped in the 64-bit region and subsequently the 64-bit Address Space is locked down from User Access. What did we learn?
1. Early initialization logic to determine if we are going to map a WoW64 Process.
2. Allocate the initial 32-bit address space regions; this includes the Highest Available 32-bit Address Range, and the Preferred Virtual Address for the base of the Process.
3. The NT Subsystem DLLs are loaded into their respective address ranges. 32-bit ntdll in 32-bit space, and 64-bit ntdll in 64-bit address space.
4. MmInitializeUserNoAccess is used to create private ranges adjacent to the 64-bit ntdll's range. This has the effect of locking down the 64-bit addressable space from a 32-bit process.
Hopefully this post has provided some clarity as to how Windows allows the seamless integration of 32-bit processes into the 64-bit Windows Operating System. With the addition of the WoW64 emulation layer some additional considerations on Address Space availability were made and this process reflects some of these considerations and their implementation.