Posted: January 14, 2008 at 1:24 pm | Tags: blog, boot, cache, class, flash, html, java, linux, loader, mtk, ror, Spreadtrum, windows, 串口, 内幕, 开发, 技术, 测试, 类
级别: 初级
詹荣开 (zhanrk@sohu.com), Linux爱好者
2003 年 12 月 01 日
本文详细地介绍了基于嵌入式系统中的 OS 启动加载程序 ―― Boot Loader 的概念、软件设计的主要任务以及结构框架等内容。
1. 引言
在专用的嵌入式板子运行 GNU/Linux 系统已经变得越来越流行。一个嵌入式 Linux 系统从软件的角度看通常可以分为四个层次:
1. 引导加载程序。包括固化在固件(firmware)中的 boot 代码(可选),和 Boot Loader 两大部分。
2. Linux 内核。特定于嵌入式板子的定制内核以及内核的启动参数。
3. 文件系统。包括根文件系统和建立于 Flash 内存设备之上文件系统。通常用 ram disk 来作为 root fs。
4. 用户应用程序。特定于用户的应用程序。有时在用户应用程序和内核层之间可能还会包括一个嵌入式图形用户界面。常用的嵌入式 GUI 有:MicroWindows 和 MiniGUI 懂。
引导加载程序是系统加电后运行的第一段软件代码。回忆一下 PC 的体系结构我们可以知道,PC 机中的引导加载程序由 BIOS(其本质就是一段固件程序)和位于硬盘 MBR 中的 OS Boot Loader(比如,LILO 和 GRUB 等)一起组成。BIOS 在完成硬件检测和资源分配后,将硬盘 MBR 中的 Boot Loader 读到系统的 RAM 中,然后将控制权交给 OS Boot Loader。Boot Loader 的主要运行任务就是将内核映象从硬盘上读到 RAM 中,然后跳转到内核的入口点去运行,也即开始启动操作系统。
而在嵌入式系统中,通常并没有像 BIOS 那样的固件程序(注,有的嵌入式 CPU 也会内嵌一段短小的启动程序),因此整个系统的加载启动任务就完全由 Boot Loader 来完成。比如在一个基于 ARM7TDMI core 的嵌入式系统中,系统在上电或复位时通常都从地址 0×00000000 处开始执行,而在这个地址处安排的通常就是系统的 Boot Loader 程序。
本文将从 Boot Loader 的概念、Boot Loader 的主要任务、Boot Loader 的框架结构以及 Boot Loader 的安装等四个方面来讨论嵌入式系统的 Boot Loader。
2. Boot Loader 的概念
简单地说,Boot Loader 就是在操作系统内核运行之前运行的一段小程序。通过这段小程序,我们可以初始化硬件设备、建立内存空间的映射图,从而将系统的软硬件环境带到一个合适的状态,以便为最终调用操作系统内核准备好正确的环境。
通常,Boot Loader 是严重地依赖于硬件而实现的,特别是在嵌入式世界。因此,在嵌入式世界里建立一个通用的 Boot Loader 几乎是不可能的。尽管如此,我们仍然可以对 Boot Loader 归纳出一些通用的概念来,以指导用户特定的 Boot Loader 设计与实现。
1. Boot Loader 所支持的 CPU 和嵌入式板
每种不同的 CPU 体系结构都有不同的 Boot Loader。有些 Boot Loader 也支持多种体系结构的 CPU,比如 U-Boot 就同时支持 ARM 体系结构和MIPS 体系结构。除了依赖于 CPU 的体系结构外,Boot Loader 实际上也依赖于具体的嵌入式板级设备的配置。这也就是说,对于两块不同的嵌入式板而言,即使它们是基于同一种 CPU 而构建的,要想让运行在一块板子上的 Boot Loader 程序也能运行在另一块板子上,通常也都需要修改 Boot Loader 的源程序。
2. Boot Loader 的安装媒介(Installation Medium)
系统加电或复位后,所有的 CPU 通常都从某个由 CPU 制造商预先安排的地址上取指令。比如,基于 ARM7TDMI core 的 CPU 在复位时通常都从地址 0×00000000 取它的第一条指令。而基于 CPU 构建的嵌入式系统通常都有某种类型的固态存储设备(比如:ROM、EEPROM 或 FLASH 等)被映射到这个预先安排的地址上。因此在系统加电后,CPU 将首先执行 Boot Loader 程序。
下图1就是一个同时装有 Boot Loader、内核的启动参数、内核映像和根文件系统映像的固态存储设备的典型空间分配结构图。
图1 固态存储设备的典型空间分配结构

3. 用来控制 Boot Loader 的设备或机制
主机和目标机之间一般通过串口建立连接,Boot Loader 软件在执行时通常会通过串口来进行 I/O,比如:输出打印信息到串口,从串口读取用户控制字符等。
4. Boot Loader 的启动过程是单阶段(Single Stage)还是多阶段(Multi-Stage)
通常多阶段的 Boot Loader 能提供更为复杂的功能,以及更好的可移植性。从固态存储设备上启动的 Boot Loader 大多都是 2 阶段的启动过程,也即启动过程可以分为 stage 1 和 stage 2 两部分。而至于在 stage 1 和 stage 2 具体完成哪些任务将在下面讨论。
5. Boot Loader 的操作模式 (Operation Mode)
大多数 Boot Loader 都包含两种不同的操作模式:”启动加载”模式和”下载”模式,这种区别仅对于开发人员才有意义。但从最终用户的角度看,Boot Loader 的作用就是用来加载操作系统,而并不存在所谓的启动加载模式与下载工作模式的区别。
启动加载(Boot loading)模式:这种模式也称为”自主”(Autonomous)模式。也即 Boot Loader 从目标机上的某个固态存储设备上将操作系统加载到 RAM 中运行,整个过程并没有用户的介入。这种模式是 Boot Loader 的正常工作模式,因此在嵌入式产品发布的时侯,Boot Loader 显然必须工作在这种模式下。
下载(Downloading)模式:在这种模式下,目标机上的 Boot Loader 将通过串口连接或网络连接等通信手段从主机(Host)下载文件,比如:下载内核映像和根文件系统映像等。从主机下载的文件通常首先被 Boot Loader 保存到目标机的 RAM 中,然后再被 Boot Loader 写到目标机上的FLASH 类固态存储设备中。Boot Loader 的这种模式通常在第一次安装内核与根文件系统时被使用;此外,以后的系统更新也会使用 Boot Loader 的这种工作模式。工作于这种模式下的 Boot Loader 通常都会向它的终端用户提供一个简单的命令行接口。
像 Blob 或 U-Boot 等这样功能强大的 Boot Loader 通常同时支持这两种工作模式,而且允许用户在这两种工作模式之间进行切换。比如,Blob 在启动时处于正常的启动加载模式,但是它会延时 10 秒等待终端用户按下任意键而将 blob 切换到下载模式。如果在 10 秒内没有用户按键,则 blob 继续启动 Linux 内核。
6. BootLoader 与主机之间进行文件传输所用的通信设备及协议
最常见的情况就是,目标机上的 Boot Loader 通过串口与主机之间进行文件传输,传输协议通常是 xmodem/ymodem/zmodem 协议中的一种。但是,串口传输的速度是有限的,因此通过以太网连接并借助 TFTP 协议来下载文件是个更好的选择。
此外,在论及这个话题时,主机方所用的软件也要考虑。比如,在通过以太网连接和 TFTP 协议来下载文件时,主机方必须有一个软件用来的提供 TFTP 服务。
在讨论了 BootLoader 的上述概念后,下面我们来具体看看 BootLoader 的应该完成哪些任务。
3. Boot Loader 的主要任务与典型结构框架
在继续本节的讨论之前,首先我们做一个假定,那就是:假定内核映像与根文件系统映像都被加载到 RAM 中运行。之所以提出这样一个假设前提是因为,在嵌入式系统中内核映像与根文件系统映像也可以直接在 ROM 或 Flash 这样的固态存储设备中直接运行。但这种做法无疑是以运行速度的牺牲为代价的。
从操作系统的角度看,Boot Loader 的总目标就是正确地调用内核来执行。
另外,由于 Boot Loader 的实现依赖于 CPU 的体系结构,因此大多数 Boot Loader 都分为 stage1 和 stage2 两大部分。依赖于 CPU 体系结构的代码,比如设备初始化代码等,通常都放在 stage1 中,而且通常都用汇编语言来实现,以达到短小精悍的目的。而 stage2 则通常用C语言来实现,这样可以实现给复杂的功能,而且代码会具有更好的可读性和可移植性。
Boot Loader 的 stage1 通常包括以下步骤(以执行的先后顺序):
- 硬件设备初始化。
- 为加载 Boot Loader 的 stage2 准备 RAM 空间。
- 拷贝 Boot Loader 的 stage2 到 RAM 空间中。
- 设置好堆栈。
- 跳转到 stage2 的 C 入口点。
Boot Loader 的 stage2 通常包括以下步骤(以执行的先后顺序):
- 初始化本阶段要使用到的硬件设备。
- 检测系统内存映射(memory map)。
- 将 kernel 映像和根文件系统映像从 flash 上读到 RAM 空间中。
- 为内核设置启动参数。
- 调用内核。
3.1 Boot Loader 的 stage1
3.1.1 基本的硬件初始化
这是 Boot Loader 一开始就执行的操作,其目的是为 stage2 的执行以及随后的 kernel 的执行准备好一些基本的硬件环境。它通常包括以下步骤(以执行的先后顺序):
1. 屏蔽所有的中断。为中断提供服务通常是 OS 设备驱动程序的责任,因此在 Boot Loader 的执行全过程中可以不必响应任何中断。中断屏蔽可以通过写 CPU 的中断屏蔽寄存器或状态寄存器(比如 ARM 的 CPSR 寄存器)来完成。
2. 设置 CPU 的速度和时钟频率。
3. RAM 初始化。包括正确地设置系统的内存控制器的功能寄存器以及各内存库控制寄存器等。
4. 初始化 LED。典型地,通过 GPIO 来驱动 LED,其目的是表明系统的状态是 OK 还是 Error。如果板子上没有 LED,那么也可以通过初始化 UART 向串口打印 Boot Loader 的 Logo 字符信息来完成这一点。
5. 关闭 CPU 内部指令/数据 cache。
3.1.2 为加载 stage2 准备 RAM 空间
为了获得更快的执行速度,通常把 stage2 加载到 RAM 空间中来执行,因此必须为加载 Boot Loader 的 stage2 准备好一段可用的 RAM 空间范围。
由于 stage2 通常是 C 语言执行代码,因此在考虑空间大小时,除了 stage2 可执行映象的大小外,还必须把堆栈空间也考虑进来。此外,空间大小最好是 memory page 大小(通常是 4KB)的倍数。一般而言,1M 的 RAM 空间已经足够了。具体的地址范围可以任意安排,比如 blob 就将它的 stage2 可执行映像安排到从系统 RAM 起始地址 0xc0200000 开始的 1M 空间内执行。但是,将 stage2 安排到整个 RAM 空间的最顶 1MB(也即(RamEnd-1MB) – RamEnd)是一种值得推荐的方法。
为了后面的叙述方便,这里把所安排的 RAM 空间范围的大小记为:stage2_size(字节),把起始地址和终止地址分别记为:stage2_start 和 stage2_end(这两个地址均以 4 字节边界对齐)。因此:
stage2_end=stage2_start+stage2_size
|
另外,还必须确保所安排的地址范围的的确确是可读写的 RAM 空间,因此,必须对你所安排的地址范围进行测试。具体的测试方法可以采用类似于 blob 的方法,也即:以 memory page 为被测试单位,测试每个 memory page 开始的两个字是否是可读写的。为了后面叙述的方便,我们记这个检测算法为:test_mempage,其具体步骤如下:
1. 先保存 memory page 一开始两个字的内容。
2. 向这两个字中写入任意的数字。比如:向第一个字写入 0×55,第 2 个字写入 0xaa。
3. 然后,立即将这两个字的内容读回。显然,我们读到的内容应该分别是 0×55 和 0xaa。如果不是,则说明这个 memory page 所占据的地址范围不是一段有效的 RAM 空间。
4. 再向这两个字中写入任意的数字。比如:向第一个字写入 0xaa,第 2 个字中写入 0×55。
5. 然后,立即将这两个字的内容立即读回。显然,我们读到的内容应该分别是 0xaa 和 0×55。如果不是,则说明这个 memory page 所占据的地址范围不是一段有效的 RAM 空间。
6. 恢复这两个字的原始内容。测试完毕。
为了得到一段干净的 RAM 空间范围,我们也可以将所安排的 RAM 空间范围进行清零操作。
3.1.3 拷贝 stage2 到 RAM 中
拷贝时要确定两点:(1) stage2 的可执行映象在固态存储设备的存放起始地址和终止地址;(2) RAM 空间的起始地址。
3.1.4 设置堆栈指针 sp
堆栈指针的设置是为了执行 C 语言代码作好准备。通常我们可以把 sp 的值设置为(stage2_end-4),也即在 3.1.2 节所安排的那个 1MB 的 RAM 空间的最顶端(堆栈向下生长)。
此外,在设置堆栈指针 sp 之前,也可以关闭 led 灯,以提示用户我们准备跳转到 stage2。
经过上述这些执行步骤后,系统的物理内存布局应该如下图2所示。
3.1.5 跳转到 stage2 的 C 入口点
在上述一切都就绪后,就可以跳转到 Boot Loader 的 stage2 去执行了。比如,在 ARM 系统中,这可以通过修改 PC 寄存器为合适的地址来实现。
图2 bootloader 的 stage2 可执行映象刚被拷贝到 RAM 空间时的系统内存布局

3.2 Boot Loader 的 stage2
正如前面所说,stage2 的代码通常用 C 语言来实现,以便于实现更复杂的功能和取得更好的代码可读性和可移植性。但是与普通 C 语言应用程序不同的是,在编译和链接 boot loader 这样的程序时,我们不能使用 glibc 库中的任何支持函数。其原因是显而易见的。这就给我们带来一个问题,那就是从那里跳转进 main() 函数呢?直接把 main() 函数的起始地址作为整个 stage2 执行映像的入口点或许是最直接的想法。但是这样做有两个缺点:1)无法通过main() 函数传递函数参数;2)无法处理 main() 函数返回的情况。一种更为巧妙的方法是利用 trampoline(弹簧床)的概念。也即,用汇编语言写一段trampoline 小程序,并将这段 trampoline 小程序来作为 stage2 可执行映象的执行入口点。然后我们可以在 trampoline 汇编小程序中用 CPU 跳转指令跳入 main() 函数中去执行;而当 main() 函数返回时,CPU 执行路径显然再次回到我们的 trampoline 程序。简而言之,这种方法的思想就是:用这段 trampoline 小程序来作为 main() 函数的外部包裹(external wrapper)。
下面给出一个简单的 trampoline 程序示例(来自blob):
.text .globl _trampoline _trampoline: bl main /* if main ever returns we just call it again */ b _trampoline
|
可以看出,当 main() 函数返回后,我们又用一条跳转指令重新执行 trampoline 程序――当然也就重新执行 main() 函数,这也就是 trampoline(弹簧床)一词的意思所在。
3.2.1初始化本阶段要使用到的硬件设备
这通常包括:(1)初始化至少一个串口,以便和终端用户进行 I/O 输出信息;(2)初始化计时器等。
在初始化这些设备之前,也可以重新把 LED 灯点亮,以表明我们已经进入 main() 函数执行。
设备初始化完成后,可以输出一些打印信息,程序名字字符串、版本号等。
3.2.2 检测系统的内存映射(memory map)
所谓内存映射就是指在整个 4GB 物理地址空间中有哪些地址范围被分配用来寻址系统的 RAM 单元。比如,在 SA-1100 CPU 中,从 0xC000,0000 开始的 512M 地址空间被用作系统的 RAM 地址空间,而在 Samsung S3C44B0X CPU 中,从 0x0c00,0000 到 0×1000,0000 之间的 64M 地址空间被用作系统的 RAM 地址空间。虽然 CPU 通常预留出一大段足够的地址空间给系统 RAM,但是在搭建具体的嵌入式系统时却不一定会实现 CPU 预留的全部 RAM 地址空间。也就是说,具体的嵌入式系统往往只把 CPU 预留的全部 RAM 地址空间中的一部分映射到 RAM 单元上,而让剩下的那部分预留 RAM 地址空间处于未使用状态。 由于上述这个事实,因此 Boot Loader 的 stage2 必须在它想干点什么 (比如,将存储在 flash 上的内核映像读到 RAM 空间中) 之前检测整个系统的内存映射情况,也即它必须知道 CPU 预留的全部 RAM 地址空间中的哪些被真正映射到 RAM 地址单元,哪些是处于 “unused” 状态的。
(1) 内存映射的描述
可以用如下数据结构来描述 RAM 地址空间中的一段连续(continuous)的地址范围:
typedef struct memory_area_struct { u32 start; /* the base address of the memory region */ u32 size; /* the byte number of the memory region */ int used; } memory_area_t;
|
这段 RAM 地址空间中的连续地址范围可以处于两种状态之一:(1)used=1,则说明这段连续的地址范围已被实现,也即真正地被映射到 RAM 单元上。(2)used=0,则说明这段连续的地址范围并未被系统所实现,而是处于未使用状态。
基于上述 memory_area_t 数据结构,整个 CPU 预留的 RAM 地址空间可以用一个 memory_area_t 类型的数组来表示,如下所示:
memory_area_t memory_map[NUM_MEM_AREAS] = { [0 ... (NUM_MEM_AREAS - 1)] = { .start = 0, .size = 0, .used = 0 }, };
|
(2) 内存映射的检测
下面我们给出一个可用来检测整个 RAM 地址空间内存映射情况的简单而有效的算法:
/* 数组初始化 */ for(i = 0; i < NUM_MEM_AREAS; i++) memory_map[i].used = 0; /* first write a 0 to all memory locations */ for(addr = MEM_START; addr < MEM_END; addr += PAGE_SIZE) * (u32 *)addr = 0; for(i = 0, addr = MEM_START; addr < MEM_END; addr += PAGE_SIZE) { /* * 检测从基地址 MEM_START+i*PAGE_SIZE 开始,大小为 * PAGE_SIZE 的地址空间是否是有效的RAM地址空间。 */ 调用3.1.2节中的算法test_mempage(); if ( current memory page isnot a valid ram page) { /* no RAM here */ if(memory_map[i].used ) i++; continue; } /* * 当前页已经是一个被映射到 RAM 的有效地址范围 * 但是还要看看当前页是否只是 4GB 地址空间中某个地址页的别名? */ if(* (u32 *)addr != 0) { /* alias? */ /* 这个内存页是 4GB 地址空间中某个地址页的别名 */ if ( memory_map[i].used ) i++; continue; } /* * 当前页已经是一个被映射到 RAM 的有效地址范围 * 而且它也不是 4GB 地址空间中某个地址页的别名。 */ if (memory_map[i].used == 0) { memory_map[i].start = addr; memory_map[i].size = PAGE_SIZE; memory_map[i].used = 1; } else { memory_map[i].size += PAGE_SIZE; } } /* end of for (…) */
|
在用上述算法检测完系统的内存映射情况后,Boot Loader 也可以将内存映射的详细信息打印到串口。
3.2.3 加载内核映像和根文件系统映像
(1) 规划内存占用的布局
这里包括两个方面:(1)内核映像所占用的内存范围;(2)根文件系统所占用的内存范围。在规划内存占用的布局时,主要考虑基地址和映像的大小两个方面。
对于内核映像,一般将其拷贝到从(MEM_START+0×8000) 这个基地址开始的大约1MB大小的内存范围内(嵌入式 Linux 的内核一般都不操过 1MB)。为什么要把从 MEM_START 到 MEM_START+0×8000 这段 32KB 大小的内存空出来呢?这是因为 Linux 内核要在这段内存中放置一些全局数据结构,如:启动参数和内核页表等信息。
而对于根文件系统映像,则一般将其拷贝到 MEM_START+0×0010,0000 开始的地方。如果用 Ramdisk 作为根文件系统映像,则其解压后的大小一般是1MB。
(2)从 Flash 上拷贝
由于像 ARM 这样的嵌入式 CPU 通常都是在统一的内存地址空间中寻址 Flash 等固态存储设备的,因此从 Flash 上读取数据与从 RAM 单元中读取数据并没有什么不同。用一个简单的循环就可以完成从 Flash 设备上拷贝映像的工作:
while(count) { *dest++ = *src++; /* they are all aligned with word boundary */ count -= 4; /* byte number */ };
|
3.2.4 设置内核的启动参数
应该说,在将内核映像和根文件系统映像拷贝到 RAM 空间中后,就可以准备启动 Linux 内核了。但是在调用内核之前,应该作一步准备工作,即:设置 Linux 内核的启动参数。
Linux 2.4.x 以后的内核都期望以标记列表(tagged list)的形式来传递启动参数。启动参数标记列表以标记 ATAG_CORE 开始,以标记 ATAG_NONE 结束。每个标记由标识被传递参数的 tag_header 结构以及随后的参数值数据结构来组成。数据结构 tag 和 tag_header 定义在 Linux 内核源码的include/asm/setup.h 头文件中:
/* The list ends with an ATAG_NONE node. */ #define ATAG_NONE 0x00000000 struct tag_header { u32 size; /* 注意,这里size是字数为单位的 */ u32 tag; }; …… struct tag { struct tag_header hdr; union { struct tag_core core; struct tag_mem32 mem; struct tag_videotext videotext; struct tag_ramdisk ramdisk; struct tag_initrd initrd; struct tag_serialnr serialnr; struct tag_revision revision; struct tag_videolfb videolfb; struct tag_cmdline cmdline; /* * Acorn specific */ struct tag_acorn acorn; /* * DC21285 specific */ struct tag_memclk memclk; } u; };
|
在嵌入式 Linux 系统中,通常需要由 Boot Loader 设置的常见启动参数有:ATAG_CORE、ATAG_MEM、ATAG_CMDLINE、ATAG_RAMDISK、ATAG_INITRD等。
比如,设置 ATAG_CORE 的代码如下:
params = (struct tag *)BOOT_PARAMS; params->hdr.tag = ATAG_CORE; params->hdr.size = tag_size(tag_core); params->u.core.flags = 0; params->u.core.pagesize = 0; params->u.core.rootdev = 0; params = tag_next(params);
|
其中,BOOT_PARAMS 表示内核启动参数在内存中的起始基地址,指针 params 是一个 struct tag 类型的指针。宏 tag_next() 将以指向当前标记的指针为参数,计算紧临当前标记的下一个标记的起始地址。注意,内核的根文件系统所在的设备ID就是在这里设置的。
下面是设置内存映射情况的示例代码:
for(i = 0; i < NUM_MEM_AREAS; i++) { if(memory_map[i].used) { params->hdr.tag = ATAG_MEM; params->hdr.size = tag_size(tag_mem32); params->u.mem.start = memory_map[i].start; params->u.mem.size = memory_map[i].size; params = tag_next(params); } }
|
可以看出,在 memory_map[]数组中,每一个有效的内存段都对应一个 ATAG_MEM 参数标记。
Linux 内核在启动时可以以命令行参数的形式来接收信息,利用这一点我们可以向内核提供那些内核不能自己检测的硬件参数信息,或者重载(override)内核自己检测到的信息。比如,我们用这样一个命令行参数字符串”console=ttyS0,115200n8″来通知内核以 ttyS0 作为控制台,且串口采用 “115200bps、无奇偶校验、8位数据位”这样的设置。下面是一段设置调用内核命令行参数字符串的示例代码:
char *p; /* eat leading white space */ for(p = commandline; *p == ' '; p++) ; /* skip non-existent command lines so the kernel will still * use its default command line. */ if(*p == '\0') return; params->hdr.tag = ATAG_CMDLINE; params->hdr.size = (sizeof(struct tag_header) + strlen(p) + 1 + 4) >> 2; strcpy(params->u.cmdline.cmdline, p); params = tag_next(params);
|
请注意在上述代码中,设置 tag_header 的大小时,必须包括字符串的终止符’\0′,此外还要将字节数向上圆整4个字节,因为 tag_header 结构中的size 成员表示的是字数。
下面是设置 ATAG_INITRD 的示例代码,它告诉内核在 RAM 中的什么地方可以找到 initrd 映象(压缩格式)以及它的大小:
params->hdr.tag = ATAG_INITRD2; params->hdr.size = tag_size(tag_initrd); params->u.initrd.start = RAMDISK_RAM_BASE; params->u.initrd.size = INITRD_LEN; params = tag_next(params);
|
下面是设置 ATAG_RAMDISK 的示例代码,它告诉内核解压后的 Ramdisk 有多大(单位是KB):
params->hdr.tag = ATAG_RAMDISK; params->hdr.size = tag_size(tag_ramdisk); params->u.ramdisk.start = 0; params->u.ramdisk.size = RAMDISK_SIZE; /* 请注意,单位是KB */ params->u.ramdisk.flags = 1; /* automatically load ramdisk */ params = tag_next(params);
|
最后,设置 ATAG_NONE 标记,结束整个启动参数列表:
static void setup_end_tag(void) { params->hdr.tag = ATAG_NONE; params->hdr.size = 0; }
|
3.2.5 调用内核
Boot Loader 调用 Linux 内核的方法是直接跳转到内核的第一条指令处,也即直接跳转到 MEM_START+0×8000 地址处。在跳转时,下列条件要满足:
1. CPU 寄存器的设置:
- R0=0;
- R1=机器类型 ID;关于 Machine Type Number,可以参见 linux/arch/arm/tools/mach-types。
- R2=启动参数标记列表在 RAM 中起始基地址;
2. CPU 模式:
- 必须禁止中断(IRQs和FIQs);
- CPU 必须 SVC 模式;
3. Cache 和 MMU 的设置:
- MMU 必须关闭;
- 指令 Cache 可以打开也可以关闭;
- 数据 Cache 必须关闭;
如果用 C 语言,可以像下列示例代码这样来调用内核:
void (*theKernel)(int zero, int arch, u32 params_addr) = (void (*)(int, int, u32))KERNEL_RAM_BASE; …… theKernel(0, ARCH_NUMBER, (u32) kernel_params_start);
|
注意,theKernel()函数调用应该永远不返回的。如果这个调用返回,则说明出错。
4. 关于串口终端
在 boot loader 程序的设计与实现中,没有什么能够比从串口终端正确地收到打印信息能更令人激动了。此外,向串口终端打印信息也是一个非常重要而又有效的调试手段。但是,我们经常会碰到串口终端显示乱码或根本没有显示的问题。造成这个问题主要有两种原因:(1) boot loader 对串口的初始化设置不正确。(2) 运行在 host 端的终端仿真程序对串口的设置不正确,这包括:波特率、奇偶校验、数据位和停止位等方面的设置。
此外,有时也会碰到这样的问题,那就是:在 boot loader 的运行过程中我们可以正确地向串口终端输出信息,但当 boot loader 启动内核后却无法看到内核的启动输出信息。对这一问题的原因可以从以下几个方面来考虑:
(1) 首先请确认你的内核在编译时配置了对串口终端的支持,并配置了正确的串口驱动程序。
(2) 你的 boot loader 对串口的初始化设置可能会和内核对串口的初始化设置不一致。此外,对于诸如 s3c44b0x 这样的 CPU,CPU 时钟频率的设置也会影响串口,因此如果 boot loader 和内核对其 CPU 时钟频率的设置不一致,也会使串口终端无法正确显示信息。
(3) 最后,还要确认 boot loader 所用的内核基地址必须和内核映像在编译时所用的运行基地址一致,尤其是对于 uClinux 而言。假设你的内核映像在编译时用的基地址是 0xc0008000,但你的 boot loader 却将它加载到 0xc0010000 处去执行,那么内核映像当然不能正确地执行了。
5. 结束语
Boot Loader 的设计与实现是一个非常复杂的过程。如果不能从串口收到那激动人心的”uncompressing linux……………… done, booting the kernel……”内核启动信息,恐怕谁也不能说:”嗨,我的 boot loader 已经成功地转起来了!”。
关于作者
Posted: September 21, 2007 at 6:47 pm | Tags: blog, cache, html, linux, openssl, redhat, server, unix, web
转至: http://blog.csdn.net/baitianhai/archive/2004/10/27/155461.aspx
最近在鼓捣redhat linux,想自己以源代码方式安装软件,不想用rpm方式安装。
首先从httpd开始,先卸载在安装倒是比较容易,不过后来像添加ssl功能,发现编译的时候需要用openssl的安装目录,本人比较愚笨,一顿好找也没有找到,于是就想把openssl也以源代码方式安装。先卸载,此时出现问题,系统好多东西依赖于openssl的库,我查了好多资料也没找到什么办法,于是我最后一狠心,用rpm -e –nodeps给卸载了,然后手动安装了openssl,然后重新启动,这下坏了,好多服务都起不来了,smb,ssh等等,图形模式也起不来了,我欲哭无泪。
因为我是在虚拟机上安装的,smb起不来了,我只能重新安装系统了。这次安装我大多数东西都没选择,一路安装完毕,结果在文本方式发现vi编辑没有颜色了,哎,也不知道是少装了那个东西弄得(各位谁知道麻烦告诉告诉我一下),只能按照猜测重新安装了又添加了一些东西。不过幸运的vi高亮显示功能又有了,遗憾的是具体是那个软件我还是不清楚。有了上次的教训我不敢轻易卸掉系统原来的openssl了,我从网上搜索到了一篇安装openssl的英文文章,地址在 http://www.devside.net/web/server/linux/openssl 我按照上面说的安装了zlib,openssl。步骤简介如下(怕以后忘了)
安装zlib
Home : http://www.gzip.org/zlib/
Package(linux source) : http://www.gzip.org/zlib/
Our Configuration
Install to : /usr/local
Module types : dynamically and staticly loaded modules, *.so and *.a
Build Instructions
zlib library files are placed into /usr/local/lib and zlib header files are placed into /usr/local/include, by default.
Build static libraries
…/zlib-1.2.1]# ./configure
…/zlib-1.2.1]# make test
…/zlib-1.2.1]# make install
Build shared libraries
…/zlib-1.2.1]# make clean
…/zlib-1.2.1]# ./configure –shared
…/zlib-1.2.1]# make test
…/zlib-1.2.1]# make install
…/zlib-1.2.1]# cp zutil.h /usr/local/include
…/zlib-1.2.1]# cp zutil.c /usr/local/include
/usr/local/lib should now contain…
libz.a
libz.so -> libz.so.1.2.1
libz.so.1 -> libz.so.1.2.1
libz.so.1.2.1
/usr/local/include should now contain…
zconf.h
zlib.h
zutil.h
[Optional] Instructions for non-standard placement of zlib
Create the directory that will contain zlib
…/zlib-1.2.1]# mkdir /usr/local/zlib
Follow the given procedure above, except
…/zlib-1.2.1]# ./configure –prefix=/usr/local/zlib
Update the Run-Time Linker
/etc/ld.so.cache will need to be updated with the new zlib shared lib: libz.so.1.2.1
For standard zlib installation…
Add /usr/local/lib to /etc/ld.so.conf, if specified path is not present
/etc]# ldconfig
If zlib was installed with a prefix…
Add /usr/local/zlib/lib to /etc/ld.so.conf
/etc]# ldconfig
安装openssl
Download
Home : http://www.openssl.org/
Package(source) : openssl-0.9.7d.tar.gz
Our Configuration
install to : /usr/local/ssl
module types : dynamically and staticly loaded modules, *.so *.a
Build Instructions
…/openssl-0.9.7d]# ./config
–prefix=/usr/local/ssl
[default location]
shared
[in addition to the usual static libraries, create shared libraries]
zlib-dynamic
[like "zlib", but has OpenSSL load the zlib library dynamically when needed]
…/openssl-0.9.7d]# ./config -t
[display guess on system made by ./config]
…/openssl-0.9.7d]# make
…/openssl-0.9.7d]# make test
…/openssl-0.9.7d]# make install
Update the Run-time Linker
ld.so.cache will need to be updated with the location of the new OpenSSL shared libs: libcrypto.so.0.9.7 and libssl.so.0.9.7
Sometimes it is sufficient to just add these two files to /lib, but we recommend you follow these instructions instead.
Edit /etc/ld.so.conf
Add /usr/local/ssl/lib to the bottom.
…]# ldconfig
Update the PATH
Edit /root/.bash_profile
Add /usr/local/ssl/bin to the PATH variable.
Re-login
Testing
…]# openssl version
Should display OpenSSL 0.9.7d 17 Mar 2004
If an older version is shown, your system contains a previously installed OpenSSL.
Repeate the steps in Update the PATH, except place the specified location at the start of the PATH variable.
[the older openssl, on most systems, is located under /usr/bin]
[the command 'which openssl' should display the path of the openssl that your system is using]
/usr/local/ssl/bin]# ./openssl version should display the correct version.
但是我最后没有得到想要的结果,系统原来的openssl还是没能卸载掉,我该怎么做那?我继续搜索资料,哈,幸运的我找了,在一个国内论坛上是这么说的
cd /usr/local/ssl/lib
ln -s libcrypto.so.0.9.7 libcrypto.so.2
ln -s libssl.so.0.9.7 libssl.so.2
//最后要刷新系统的动态连接库配置
echo /usr/local/ssl/lib >> /etc/ld.so.conf
ldconfig -v
这下子我豁然开朗,原来依赖的那2个文件是个软链接啊,我把它修改为我现在真正的openssl库文件不是就行了吗?于是一顿忙碌后,我终于执行了 rpm -e -nodeps ,然后重新启动系统,一路运行下去,全是绿灯。一时间感觉自己好幸福啊
为了这个问题我查了国内的几个比较大的unix/linux网站都没找到资料,不过从这里http://bbs.netbuddy.org/unix/737.html还是找到了(国外的E文大概意思能看懂,但是查找起来还是没找到,也不知道这方面好点的网站),
Posted: September 21, 2007 at 4:19 pm | Tags: blog, linux, server, ssl
转至: http://blog.ixpub.net/8400463
加密算法:
对称加密算法:
DES、IDEA、RC2、RC4、AES、Skipjack ……
非对称加密算法:
RSA、DSA、DiffieHellman、PKCS、PGP ……
单向的HASH算法属于报文摘要算法,虽然有些也出自OpenSSL库。
命令操作:
1、生成普通私钥:
[weigw@TEST src]$ openssl genrsa -out privatekey.key 1024
Generating RSA private key, 1024 bit long modulus ….++++++ …….++++++ e is 65537 (0×10001)
2、生成带加密口令的密钥:
[weigw@TEST src]$ openssl genrsa -des3 -out privatekey.key 1024
Generating RSA private key, 1024 bit long modulus …………++++++ …………………++++++ e is 65537 (0×10001) Enter pass phrase for privatekey.key: Verifying – Enter pass phrase for privatekey.key:
在生成带加密口令的密钥时需要自己去输入密码。对于为密钥加密现在提供了一下几种算法:
-des encrypt the generated key with DES in cbc mode
-des3 encrypt the generated key with DES in ede cbc mode (168 bit key)
-aes128, -aes192, -aes256 encrypt PEM output with cbc aes
去除密钥的口令:
[weigw@TEST src]$ openssl rsa -in privatekey.key -out
privatekey.key Enter pass phrase for privatekey.key: writing RSA key
通过生成的私钥去生成证书:
[weigw@TEST src]$ openssl req -new -x509 -key privatekey.key -out cacert.crt -days 1095
You are about to be asked to enter information that will be incorporated into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter ‘.’, the field will be left blank.
—–
Country Name (2 letter code) [GB]:CN
State or Province Name (full name) [Berkshire]:beijing
Locality Name (eg, city) [Newbury]:beijing
Organization Name (eg, company) [My Company Ltd]:wondersoft
Organizational Unit Name (eg, section) []:develop
Common Name (eg, your name or your server’s hostname) []:WeiGW
Email Address []:weigongwan@sina.com
在生成证书的时候需要按照提示输入一些个人信息。
通过私钥生成公钥:
[weigw@TEST src]$ openssl rsa -in privatekey.key -pubout -out pubkey.key writing RSA key
格式转换:(证书、私钥、公钥)(PEM <—–>DER)
[weigw@TEST src]$ openssl x509 -in cacert.crt -inform. PEM -out cacert.der -outform. DER
[weigw@TEST src]$
[weigw@TEST src]$ openssl rsa -in privatekey.key -inform. PEM -out privatekey.der -outform. DER
writing RSA key
[weigw@TEST src]$ openssl rsa -pubin -in pubkey.key -inform. PEM -pubout -out pubkey.der -outform. DER
writing RSA key
从DER格式转换成PEM格式一样,就是把inform的格式改成DERoutform的格式改成PEM即可。
下面是一个服务器和客户端认证的证书、私钥生成方法:(server.crt、client.crt、ca.crt)
第一步: 生成私钥
[weigw@TEST bin]$ openssl genrsa -out server.key 1024
Generating RSA private key, 1024 bit long modulus .++++++ ..
………++++++ e is 65537 (0×10001)
[weigw@TEST bin]$ openssl genrsa -out client.key 1024
Generating RSA private key, 1024 bit long modulus …++++++ ……
……….++++++ e is 65537 (0×10001)
[weigw@TEST bin]$ openssl genrsa -out ca.key 1024
Generating RSA private key, 1024 bit long modulus …….
..++++++ ………++++++ e is 65537 (0×10001)
[weigw@TEST bin]$
第三步: 申请证书(为请求文件签名)
[weigw@TEST bin]$ openssl ca -in server.csr -out server.crt -cert ca.crt -keyfile ca.key
[weigw@TEST bin]$ openssl ca -in client.csr -out client.crt -cert ca.crt -keyfile ca.key
如果在这步出现错误信息:
[weigw@TEST bin]$ openssl ca -in client.csr -out client.crt -cert ca.crt -keyfile ca.key
Using configuration from /usr/share/ssl/openssl.cnf I am unable to access the ./demoCA/newcerts directory ./demoCA/newcerts: No such file or directory
[weigw@TEST bin]$
自己手动创建一个CA目录结构:
[weigw@TEST bin]$ mkdir ./demoCA
[weigw@TEST bin]$ mkdir demoCA/newcerts
创建个空文件:
[weigw@TEST bin]$ vi demoCA/index.txt
向文件中写入01:
[weigw@TEST bin]$ vi demoCA/serial
合并证书文件(crt)和私钥文件(key):
[weigw@TEST bin]$ cat client.crt client.key > client.pem [weigw@TEST bin]$ cat server.crt server.key > server.pem
合并成pfx证书:
[weigw@TEST bin]$ openssl pkcs12 -export -clcerts -in client.crt -inkey client.key -out client.p12
Enter Export Password:
Verifying – Enter Export Password:
[weigw@TEST bin]$openssl pkcs12 -export -clcerts -in server.crt -inkey server.key -out server.p12
Enter Export Password:
Verifying – Enter Export Password:
文本化证书:
[weigw@TEST bin]$ openssl pkcs12 -in client.p12 -out client.txt Enter Import Password:
MAC verified OK
Enter PEM pass phrase: Verifying – Enter PEM pass phrase:
[weigw@TEST bin]$openssl pkcs12 -in server.p12 -out server.txt
Enter Import Password:
MAC verified OK
Enter PEM pass phrase: Verifying – Enter PEM pass phrase:
屏幕模式显式:(证书、私钥、公钥)
[weigw@TEST bin]$ openssl x509 -in client.crt -noout -text -modulus
[weigw@TEST bin]$ openssl rsa -in server.key -noout -text -modulus
[weigw@TEST bin]$ openssl rsa -in server.pub -noout -text -modulus
得到DH:
[weigw@TEST bin]$ openssl dhparam -out dh1024.pem 1024
Posted: May 24, 2007 at 3:50 pm | Tags: class, debug, html, linux, lua, php, ror, server, windows
转至: http://www.codeguru.com/cpp/w-p/system/misc/article.php/c11393
Downloads
pemaker1.zip –
pemaker2.zip –
pemaker3.zip –
pemaker4.zip –
pemaker5.zip –
peviewer.zip –
test1.zip –
Windows NT 3.51 (I mean, Win3.1, Win95, Win98 were not perfect OSs). The MS-DOS data causes that your executable file to have the performance inside MS-DOS and the MS-DOS Stub program lets it display: "This program can not be run in MS-DOS mode" or "This program can be run only in Windows mode", or some things like these comments when you try to run a Windows EXE file inside MS-DOS 6.0, where there is no footstep of Windows. Thus, this data is reserved for the code to indicate these comments in the MS-DOS operating system. The most interesting part of the MS-DOS data is "MZ"! Can you believe, it refers to the name of "Mark Zbikowski", one of the first Microsoft programmers?

0 Preface
You might demand to comprehend the ways a virus program injects its procedure into the interior of a portable executable file and corrupts it, or you are interested in implementing a packer or a protector to encrypt the data of your portable executable (PE) file. This article is committed to represent a brief discussion to realize the performance that is accomplished by EXE tools or some kinds of mal-ware.
You can employ this article’s source code to create your custom EXE builder. It could be used to make an EXE protector in the right way, or with the wrong intention, to spread a virus. However, my purpose of writing this article has been the first application, so I will not be responsible for the immoral usage of these methods.
1 Prerequisites
There are no specific mandatory prerequisites to follow the topics in this article. If you are familiar with a debugger and also the portable file format, I suggest you to drop to Sections 2 and 3; the whole of these sections has been made for people who don’t have any knowledge regarding the EXE file format or debuggers.
2 Portable Executable File Format
The Portable Executable file format was defined to provide the best way for the Windows Operating System to execute code and also to store the essential data that is needed to run a program—for example constant data, variable data, import library links, and resource data. It consists of MS-DOS file information, Windows NT file information, Section Headers, and Section images, as shown in Table 1.
2.1 The MS-DOS data
These data let you remember the first days of developing the Windows Operating System. You were at the beginning of a way to achieve a complete Operating System such as
To me, only the offset of the PE signature in the MS-DOS data is important, so I can use it to find the position of the Windows NT data. I just recommend that you take a look at Table 1, and then observe the structure of IMAGE_DOS_HEADER in the <winnt.h> header in the <Microsoft Visual Studio .net path>\VC7\PlatformSDK\include\ folder or the <Microsoft Visual Studio 6.0 path>\VC98\include\ folder. I do not know why the Microsoft team has forgotten to provide some comment about this structure in the MSDN library!
typedef struct _IMAGE_DOS_HEADER { WORD e_magic; WORD e_cblp; WORD e_cp; WORD e_crlc; WORD e_cparhdr; WORD e_minalloc; WORD e_maxalloc; WORD e_ss; WORD e_sp; WORD e_csum; WORD e_ip; WORD e_cs; WORD e_lfarlc; WORD e_ovno; WORD e_res[4]; WORD e_oemid; WORD e_oeminfo; WORD e_res2[10]; LONG e_lfanew; } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
e_lfanew is the offset that refers to the position of the Windows NT data. I have provided a program to obtain the header information from an EXE file and to display it to you. To use the program, just try:
PE Viewer


(Full Size Image)

(Full Size Image)
This sample is useful for the whole of this article.
Table 1: Portable Executable file format structure
MS-DOS information |
IMAGE_DOS_ HEADER |
DOS EXE Signature |
00000000 ASCII "MZ"00000002 DW 009000000004 DW 000300000006 DW 000000000008 DW 00040000000A DW 00000000000C DW FFFF0000000E DW 000000000010 DW 00B800000012 DW 000000000014 DW 000000000016 DW 000000000018 DW 00400000001A DW 00000000001C DB 00b&b&0000003B DB 000000003C DD 000000F0
|
| DOS_PartPag |
| DOS_PageCnt |
| DOS_ReloCnt |
| DOS_HdrSize |
| DOS_MinMem |
| DOS_MaxMem |
| DOS_ReloSS |
| DOS_ExeSP |
| DOS_ChkSum |
| DOS_ExeIPP |
| DOS_ReloCS |
| DOS_TablOff |
| DOS_Overlay |
b& Reserved words b& |
| Offset to PE signature |
MS-DOS Stub Program |
00000040 ..B:..B4.C!B8\LC!This program canno00000060 t be run in DOS mode....$.......
|
Windows NT information
IMAGE_ NT_HEADERS
|
Signature |
PE signature (PE) |
000000F0 ASCII "PE"
|
IMAGE_ FILE_HEADER |
Machine |
000000F4 DW 014C000000F6 DW 0003000000F8 DD 3B7D8410000000FC DD 0000000000000100 DD 0000000000000104 DW 00E000000106 DW 010F
|
| NumberOfSections |
| TimeDateStamp |
| PointerToSymbolTable |
| NumberOfSymbols |
| SizeOfOptionalHeader |
| Characteristics |
IMAGE_ OPTIONAL_ HEADER32 |
MagicNumber |
00000108 DW 010B0000010A DB 070000010B DB 000000010C DD 0001280000000110 DD 00009C0000000114 DD 0000000000000118 DD 000124750000011C DD 0000100000000120 DD 0001400000000124 DD 0100000000000128 DD 000010000000012C DD 0000020000000130 DW 000500000132 DW 000100000134 DW 000500000136 DW 000100000138 DW 00040000013A DW 00000000013C DD 0000000000000140 DD 0001F00000000144 DD 0000040000000148 DD 0001D7FC0000014C DW 00020000014E DW 800000000150 DD 0004000000000154 DD 0000100000000158 DD 001000000000015C DD 0000100000000160 DD 0000000000000164 DD 00000010
|
| MajorLinkerVersion |
| MinorLinkerVersion |
| SizeOfCode |
| SizeOfInitializedData |
| SizeOfUninitializedData |
| AddressOfEntryPoint |
| BaseOfCode |
| BaseOfData |
| ImageBase |
| SectionAlignment |
| FileAlignment |
| MajorOSVersion |
| MinorOSVersion |
| MajorImageVersion |
| MinorImageVersion |
| MajorSubsystemVersion |
| MinorSubsystemVersion |
| Reserved |
| SizeOfImage |
| SizeOfHeaders |
| CheckSum |
| Subsystem |
| DLLCharacteristics |
| SizeOfStackReserve |
| SizeOfStackCommit |
| SizeOfHeapReserve |
| SizeOfHeapCommit |
| LoaderFlags |
| NumberOfRvaAndSizes |
IMAGE_ DATA_DIRECTORY[16] |
Export Table |
| Import Table |
| Resource Table |
| Exception Table |
| Certificate File |
| Relocation Table |
| Debug Data |
| Architecture Data |
| Global Ptr |
| TLS Table |
| Load Config Table |
| Bound Import Table |
| Import Address Table |
| Delay Import Descriptor |
| COM+ Runtime Header |
| Reserved |
Sections information |
IMAGE_ SECTION_ HEADER[0] |
Name[8] |
000001E8 ASCII".text"000001F0 DD 000126B0000001F4 DD 00001000000001F8 DD 00012800000001FC DD 0000040000000200 DD 0000000000000204 DD 0000000000000208 DW 00000000020A DW 00000000020C DD 60000020 CODE|EXECUTE|READ
|
| VirtualSize |
| VirtualAddress |
| SizeOfRawData |
| PointerToRawData |
| PointerToRelocations |
| PointerToLineNumbers |
| NumberOfRelocations |
| NumberOfLineNumbers |
| Characteristics |
b& b& b& IMAGE_ SECTION_ HEADER[n] |
00000210 ASCII".data"; SECTION00000218 DD 0000101C ; VirtualSize = 0x101C0000021C DD 00014000 ; VirtualAddress = 0x1400000000220 DD 00000A00 ; SizeOfRawData = 0xA0000000224 DD 00012C00 ; PointerToRawData = 0x12C0000000228 DD 00000000 ; PointerToRelocations = 0x00000022C DD 00000000 ; PointerToLineNumbers = 0x000000230 DW 0000 ; NumberOfRelocations = 0x000000232 DW 0000 ; NumberOfLineNumbers = 0x000000234 DD C0000040 ; Characteristics = INITIALIZED_DATA|READ|WRITE00000238 ASCII".rsrc"; SECTION00000240 DD 00008960 ; VirtualSize = 0x896000000244 DD 00016000 ; VirtualAddress = 0x1600000000248 DD 00008A00 ; SizeOfRawData = 0x8A000000024C DD 00013600 ; PointerToRawData = 0x1360000000250 DD 00000000 ; PointerToRelocations = 0x000000254 DD 00000000 ; PointerToLineNumbers = 0x000000258 DW 0000 ; NumberOfRelocations = 0x00000025A DW 0000 ; NumberOfLineNumbers = 0x00000025C DD 40000040 ; Characteristics = INITIALIZED_DATA|READ
|
| SECTION[0] |
00000400 EA 22 DD 77 D7 23 DD 77 C*"C.wC.#C.w00000408 9A 18 DD 77 00 00 00 00 E!.C.w....00000410 2E 1E C7 77 83 1D C7 77 ..C.wF..C.w00000418 FF 1E C7 77 00 00 00 00 C?.C.w....00000420 93 9F E7 77 D8 05 E8 77 b.E8C'wC..C(w00000428 FD A5 E7 77 AD A9 E9 77 C=B%C'w­B)C)w00000430 A3 36 E7 77 03 38 E7 77 B#6C'w.8C'w00000438 41 E3 E6 77 60 8D E7 77 AC#C&w`BC'w00000440 E6 1B E6 77 2B 2A E7 77 C&.C&w+*C'w00000448 7A 17 E6 77 79 C8 E6 77 z.C&wyC.C&w00000450 14 1B E7 77 C1 30 E7 77 ..C'wC.0C'wb&
|
b& b& b& SECTION[n] |
b&0001BF00 63 00 2E 00 63 00 68 00 c...c.h.0001BF08 6D 00 0A 00 43 00 61 00 m...C.a.0001BF10 6C 00 63 00 75 00 6C 00 l.c.u.l.0001BF18 61 00 74 00 6F 00 72 00 a.t.o.r.0001BF20 11 00 4E 00 6F 00 74 00 ..N.o.t.0001BF28 20 00 45 00 6E 00 6F 00 .E.n.o.0001BF30 75 00 67 00 68 00 20 00 u.g.h. .0001BF38 4D 00 65 00 6D 00 6F 00 M.e.m.o.0001BF40 72 00 79 00 00 00 00 00 r.y.....0001BF48 00 00 00 00 00 00 00 00 ........0001BF50 00 00 00 00 00 00 00 00 ........0001BF58 00 00 00 00 00 00 00 00 ........0001BF60 00 00 00 00 00 00 00 00 ........0001BF68 00 00 00 00 00 00 00 00 ........0001BF70 00 00 00 00 00 00 00 00 ........0001BF78 00 00 00 00 00 00 00 00 ........
|
2.2 The Windows NT data
As mentioned in the preceding section, e_lfanew storage in the MS-DOS data structure refers to the location of the Windows NT information. Hence, if you assume that the pMem pointer relates the start point of the memory space for a selected portable executable file, you can retrieve the MS-DOS header and also the Windows NT headers by the following lines, which you also can perceive in the PE viewer sample (pelib.cpp, PEStructure::OpenFileName()):
IMAGE_DOS_HEADER image_dos_header;IMAGE_NT_HEADERS image_nt_headers;PCHAR pMem;b&memcpy(&image_dos_header, pMem, sizeof(IMAGE_DOS_HEADER));memcpy(&image_nt_headers, pMem+image_dos_header.e_lfanew, sizeof(IMAGE_NT_HEADERS));
IMAGE_NT_HEADERS structure definition. It makes it possible to grasp what the image NT header maintains to execute a code inside the Windows NT OS. Now, you are conversant with the Windows NT structure; it consists of the "PE" Signature, the File Header, and the Optional Header. Do not forget to take a glimpse at their comments in the MSDN Library and in Table 1.
It seems to be very simple, the retrieval of the headers information. I recommend inspecting the MSDN library regarding the
One the whole, I consider merely, in most circumstances, the following cells of the IMAGE_NT_HEADERS structure:
FileHeader->NumberOfSectionsOptionalHeader->AddressOfEntryPointOptionalHeader->ImageBaseOptionalHeader->SectionAlignmentOptionalHeader->FileAlignmentOptionalHeader->SizeOfImageOptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT] ->VirtualAddressOptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT] ->Size
You can observe the main purpose of these values clearly, and their role when the internal virtual memory space allocated for an EXE file by the Windows task manager if you pay attention to their explanations in MSDN library, so I am not going to repeat the MSDN annotations here.
I should make a brief comment regarding the PE data directories, or OptionalHeader-> DataDirectory[], because I think there are a few aspects of interest concerning them. When you come to survey the Optional header through the Windows NT information, you will find that there are 16 directories at the end of the Optional Header, where you can find the consecutive directories, including their Relative Virtual Address and Size. I just mention here the notes from <winnt.h> to clarify these information:
#define IMAGE_DIRECTORY_ENTRY_EXPORT 0#define IMAGE_DIRECTORY_ENTRY_IMPORT 1#define IMAGE_DIRECTORY_ENTRY_RESOURCE 2#define IMAGE_DIRECTORY_ENTRY_EXCEPTION 3#define IMAGE_DIRECTORY_ENTRY_SECURITY 4#define IMAGE_DIRECTORY_ENTRY_BASERELOC 5#define IMAGE_DIRECTORY_ENTRY_DEBUG 6#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE 7#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8#define IMAGE_DIRECTORY_ENTRY_TLS 9#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT 11#define IMAGE_DIRECTORY_ENTRY_IAT 12#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT 13#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14
The last one (15) was reserved for use in the future; I have not yet seen any purpose for it, even in PE64.
For instance, if you want to perceive the relative virtual address (RVA) and the size of the resource data, it is enough to retrieve them by:
DWORD dwRVA = image_nt_headers.OptionalHeader-> DataDirectory[IMAGE_DIRECTORY_ENTRY_RESOURCE]->VirtualAddress;DWORD dwSize = image_nt_headers.OptionalHeader-> DataDirectory[IMAGE_DIRECTORY_ENTRY_RESOURCE]->Size;
To comprehend more regarding the significance of data directories, I forward you to Section 3.4.3 of the Microsoft Portable Executable and the Common Object File Format Specification document by Microsoft, and furthermore Section 6 of this document, where you discern the various types of sections and their applications. You will see the section’s advantage subsequently.
2.3 The Section Headers and Sections
You currently observe how the portable executable files declare the location and the size of a section on a disk storage file and inside the virtual memory space allocated for the program with IMAGE_NT_HEADERS-> OptionalHeader->SizeOfImage by the Windows task manager, as well the characteristics to demonstrate the type of the section. To better understand the Section header as my previous declaration, I suggest having a brief look at the IMAGE_SECTION_HEADER structure definition in the MSDN library. For an EXE packer developer, VirtualSize, VirtualAddress, SizeOfRawData, PointerToRawData, and Characteristics cells have significant rules. When developing an EXE packer, you should be clever enough to play with them. There are somet hings to note when you modify them; you should take care to align the VirtualSize and VirtualAddress according to OptionalHeader->SectionAlignment, as well as SizeOfRawData and PointerToRawData in line with OptionalHeader->FileAlignment. Otherwise, you will corrupt your target EXE file and it will never run. Regarding Characteristics, I pay attention mostly to establish a section by IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | IMAGE_SCN_CNT_INITIALIZED_DATA, I prefer that my new section has the ability to initialize such data during the running process, such as import table; besides, I need it to be able to modify itself by the loader with my settings in the section characteristics to read- and writeable.
Moreover, you should pay attention to the section names; you can know the purpose of each section by its name. I will just forward you to Section 6 of the Microsoft Portable Executable and the Common Object File Format Specification documents. I believe it represents the totality of sections by their names; this is also included in Table 2.
Table 2: Section names
| ".text" |
Code Section |
| "CODE" |
Code Section of file linked by Borland Delphi or Borland Pascal |
| ".data" |
Data Section |
| "DATA" |
Data Section of file linked by Borland Delphi or Borland Pascal |
| ".rdata" |
Section for Constant Data |
| ".idata" |
Import Table |
| ".edata" |
Export Table |
| ".tls" |
TLS Table |
| ".reloc" |
Relocation Information |
| ".rsrc" |
Resource Information |
To comprehend the section headers and also the sections, you can run the sample PE viewer. With this PE viewer, you can realize only the application of the section headers in a file image, so to observe the main significance in the Virtual Memory, you should try to load a PE file by a debugger. The next section represents the main idea of using the virtual address and size in the virtual memory by using a debugger. The last note is about IMAGE_NT_HEADERS-> FileHeader->NumberOfSections, that provides a number of sections in a PE file. Do not forget to adjust it whenever you remove or add some sections to a PE file. I am talking about section injection!
3 Debugger, Disassembler and some Useful Tools
In this part, you will become familiar with the necessary and essential equipment to develop your PE tools.
3.1 Debuggers
The first essential prerequisite to become a PE tools developer is to have enough experience with bug tracer tools. Furthermore, you should know most of the assembly instructions. To me, the Intel documents are the best references. You can obtain them from the Intel site for IA-32, and on top of that IA-64; the future belongs to IA-64 CPUs, Windows XP 64-bit, and also PE64!
To trace a PE file, SoftICE by Compuware Corporation, I knew it also as named NuMega when I was at high school, is the best debugger in the world. It implements process tracing by using the kernel mode method debugging without applying Windows debugging application programming interface (API) functions. In addition, I will introduce one perfect debugger in user mode level. It utilizes the Windows debugging API to trace a PE file and also attaches itself to an active process. These API functions have been provided by Microsoft teams, inside the Windows Kernel32 library, to trace a specific process, by using Microsoft tools, or perhaps, to make your own debugger! Some of those API functions inlude:
3.1.1 SoftICE
It was in 1987; Frank Grossman and Jim Moskun decided to establish a company called NuMega Technologies in Nashua, NH, to develop some equipment to trace and test the reliability of Microsoft Windows software programs. Now, it is a part of Compuware Corporation and its product has participated to accelerate the reliability in Windows software, and additionally in Windows driver developments. Currently, everyone knows the Compuware DriverStudio that is used to establish an environment for implementing the elaboration of a kernel driver or a system file by aiding the Windows Driver Development Kit (DDK). It bypasses the involvement of DDK to implement a portable executable file of kernel level for a Windows system software developer. For us, only one instrument of DriverStudio is important, SoftICE; this debugger can be used to trace every portable executable file, a PE file for user mode level or a PE file for kernel mode level.
Figure 1: SoftICE Window
EAX=00000000EBX=7FFDD000 ECX=0007FFB0 EDX=7C90EB94 ESI=FFFFFFFF EDI=7C919738 EBP=0007FFF0 ESP=0007FFC4 EIP=010119E0 o d i s z a p c CS=0008 DS=0023 SS=0010 ES=0023 FS=0030 GS=0000 SS:0007FFC4=87C816D4F |
| 0023:01013000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ……………. 0023:01013010 01 00 00 00 20 00 00 00-0A 00 00 00 0A 00 00 00 ……………. 0023:01013020 20 00 00 00 00 00 00 00-53 63 69 43 61 6C 63 00 ……..SciCalc. 0023:01013030 00 00 00 00 00 00 00 00-62 61 63 6B 67 72 6F 75 ……..backgrou 0023:01013040 6E 64 00 00 00 00 00 00-2E 00 00 00 00 00 00 00 nd………….. |
| 0010:0007FFC4 4F 6D 81 7C 38 07 91 7C-FF FF FF FF 00 90 FD 7F Om |8 b.| . 0010:0007FFD4 ED A6 54 80 C8 FF 07 00-E8 B4 F5 81 FF FF FF FF T . 0010:0007FFE4 F3 99 83 7C 58 6D 81 7C-00 00 00 00 00 00 00 00 Xm |…….. 0010:0007FFF4 00 00 00 00 E0 19 01 01-00 00 00 00 00 00 00 00 …. …. |
| 010119E0 PUSH EBP 010119E1 MOV EBP,ESP 010119E3 PUSH -1 010119E5 PUSH 01001570 010119EA PUSH 01011D60 010119EF MOV EAX,DWORD PTR FS:[0] 010119F5 PUSH EAX 010119F6 MOV DWORD PTR FS:[0],ESP 010119FD ADD ESP,-68 01011A00 PUSH EBX 01011A01 PUSH ESI 01011A02 PUSH EDI 01011A03 MOV DWORD PTR SS:[EBP-18],ESP 01011A06 MOV DWORD PTR SS:[EBP-4],0 |
| :_
|
3.1.2 OllyDbg
It was about four years ago that I first saw this debugger by chance. For me, it was the best choice; I was not wealthy enough to purchase SoftICE, and at that time, SoftICE only had good functions for DOS, Windows 98, and Windows 2000. I found that this debugger supported all kinds of Windows versions. Therefore, I started to learn it very fast, and now it is my favorite debugger for the Windows OS. It is a debugger that can be used to trace all kinds of portable executable files except a Common Language Infrastructure (CLI) file format in user mode level, by using the Windows debugging API. Oleh Yuschuk, the author, is one of worthiest software developers I have seen in my life. He is a Ukrainian who now lives in Germany. I should mention here that his debugger is the best choice for hacker and cracker parties around the world! It is freeware! You can try it from the OllyDbg Homepage.
Figure 2: OllyDbg CPU Window

(
3.1.3 Which parts are important in a debugger interface?
I have introduced two debuggers without talking about how you can employ them, and also which parts you should pay attention to. Regarding using debuggers, I refer you to their instructions in help documents. However, I want to explain briefly the important parts of a debugger; of course, I am talking about low-level debuggers, or in other words, machine-language debuggers of the x86 CPU families.
All of low-level debuggers consist of the following subdivisions:
- Registers viewer.
| EAX |
| ECX |
| EDX |
| EBX |
| ESP |
| EBP |
| ESI |
| EDI |
| EIP |
|
o d t s z a p c
|
- Disassembler or Code viewer.
010119E0 PUSH EBP010119E1 MOV EBP,ESP010119E3 PUSH -1010119E5 PUSH 01001570010119EA PUSH 01011D60010119EF MOV EAX,DWORD PTR FS:[0]010119F5 PUSH EAX010119F6 MOV DWORD PTR FS:[0],ESP010119FD ADD ESP,-6801011A00 PUSH EBX01011A01 PUSH ESI01011A02 PUSH EDI01011A03 MOV DWORD PTR SS:[EBP-18],ESP01011A06 MOV DWORD PTR SS:[EBP-4],0
|
- Memory watcher.
| 0023:01013000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ……………. 0023:01013010 01 00 00 00 20 00 00 00-0A 00 00 00 0A 00 00 00 ……………. 0023:01013020 20 00 00 00 00 00 00 00-53 63 69 43 61 6C 63 00 ……..SciCalc. 0023:01013030 00 00 00 00 00 00 00 00-62 61 63 6B 67 72 6F 75 ……..backgrou 0023:01013040 6E 64 00 00 00 00 00 00-2E 00 00 00 00 00 00 00 nd………….. |
- Stack viewer.
| 0010:0007FFC4 4F 6D 81 7C 38 07 91 7C-FF FF FF FF 00 90 FD 7F Om |8 b.| . 0010:0007FFD4 ED A6 54 80 C8 FF 07 00-E8 B4 F5 81 FF FF FF FF T . 0010:0007FFE4 F3 99 83 7C 58 6D 81 7C-00 00 00 00 00 00 00 00 Xm |…….. 0010:0007FFF4 00 00 00 00 E0 19 01 01-00 00 00 00 00 00 00 00 …. …. |
- Command line, command buttons, or shortcut keys to follow the debugging process.
| Command |
SoftICE |
OllyDbg |
| Run |
F5 |
F9 |
| Step Into |
F11 |
F7 |
| Step Over |
F10 |
F8 |
| Set Break Point |
F8 |
F2 |
You can compare Figures 1 and 2 to distinguish the difference between SoftICE and OllyDbg. When you want to trace a PE file, you should mostly consider these five subdivisions. Furthermore, every debugger comprises of some other useful parts; you should discover them by yourself.
3.2 Disassembler
You can consider OllyDbg and SoftICE to be excellent disassemblers, but I also want to introduce another disassembler tool that is famous in the reverse engineering world.
3.2.1 Proview disassembler
Proview or PVDasm is an admirable disassembler by the Reverse-Engineering-Community; it is still under development and bug fixing. You can find its disassmbler source engine and employ it to create your own disassembler.
3.2.2 W32Dasm
W32DASM can disassemble both 16- and 32-bit executable file formats. In addition to its disassembling ability, you can employ it to analyze import, export, and resource data directories data.
3.2.3 IDA Pro
All reverse-engineering experts know that IDA Pro can be used to investigate, not only x86 instructions, but that of various kinds of CPU types like AVR, PIC, and so forth. It can illustrate the assembly source of a portable executable file by using colored graphics and tables, and is very useful for any newbie in this area. Furthermore, it has the capability to trace an executable file inside the user mode level in the same way as OllyDbg.
3.3 Some Useful Tools
A good PE tools developer is conversant with the tools that save his time, so I recommend that you select some appropriate instruments to investigate the base information under a portable executable file.
3.3.1 LordPE
LordPE by y0da is still the first choice to retrieve PE file information with the possibility to modify them.

3.3.2 PEiD
PE iDentifier is valuable to identify the type of compilers, packers, and cryptors of PE files. As of now, it can detect more than 500 different signature types of PE files.

3.3.3 Resource Hacker
Resource Hacker can be employed to modify resource directory information; icon, menu, version info, string table, and so on.

3.3.4 WinHex
WinHex, it is clear what you can do with this tool.

3.3.5 CFF Explorer
Eventually, CFF Explorer by Ntoskrnl is what you want to have as a PE Utility tool in your arsenal; it supports PE32/64, PE rebuild included Common Language Infrastructure (CLI) file. In other words, the .NET file, a resource modifier, and much more facilities which can not be found in others. Just try to discover every unimaginable option by hand.

4 Add a New Section and Change the OEP
You are ready to do the first step of making your project. I have provided a library to add a new section and rebuild the portable executable file. Before starting, I wnat you to get familiar with the headers of a PE file, by using OllyDbg. You should first open a PE file; that pops up a menu, View->Executable file. Again, you get a popup menu: Special->PE header. You will observe a scene similar to Figure 3. Now, come to the Main Menu View->Memory, and try to distinguish the sections inside the Memory map window.
Figure 3
00000000000000020000000400000006000000080000000A0000000C0000000E00000010000000120000001400000016000000180000001A0000001C0000001D0000001E0000001F000000200000002100000022000000230000002400000025000000260000002700000028000000290000002A0000002B0000002C0000002D0000002E0000002F000000300000003100000032000000330000003400000035000000360000003700000038000000390000003A0000003B0000003C
|
4D 5A 9000 0300 0000 0400 0000 FFFF 0000 B800 0000 0000 0000 4000 0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 F0000000
|
ASCII "MZ" DW 0090 DW 0003 DW 0000 DW 0004 DW 0000 DW FFFF DW 0000 DW 00B8 DW 0000 DW 0000 DW 0000 DW 0040 DW 0000 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DB 00 DD 000000F0
|
DOS EXE Signature DOS_PartPag = 90 (144.) DOS_PageCnt = 3 DOS_ReloCnt = 0 DOS_HdrSize = 4 DOS_MinMem = 0 DOS_MaxMem = FFFF (65535.) DOS_ReloSS = 0 DOS_ExeSP = B8 DOS_ChkSum = 0 DOS_ExeIP = 0 DOS_ReloCS = 0 DOS_TablOff = 40 DOS_Overlay = 0 Offset to PE signature
|
I want to explain how you can plainly change the Offset of Entry Point (OEP) in your sample file, CALC.EXE of Windows XP. First, by using a PE Tool, and also using your PE Viewer, you find OEP, 0x00012475, and Image Base, 0x01000000. This value of OEP is the Relative Virtual Address, so the Image Base value is used to convert it to the Virtual Address.
|
Virtual_Address = Image_Base + Relative_Virtual_Address
|
DWORD OEP_RVA = image_nt_headers-> OptionalHeader.AddressOfEntryPoint ;DWORD OEP_VA = image_nt_headers-> OptionalHeader.ImageBase + OEP_RVA ;
PE Maker: Step 1
Download pemaker1.zip and test1.zip from the files at the end of this article.
DynLoader(), in loader.cpp, is reserved for the data of the new section—in other words, the Loader.
DynLoader Step 1
__stdcall void DynLoader(){_asm{ DWORD_TYPE(DYN_LOADER_START_MAGIC) MOV EAX,01012475h JMP EAX DWORD_TYPE(DYN_LOADER_END_MAGIC)}}
Unfortunately, this source can only be applied for the sample test file. You should complete it by saving the value of the original OEP in the new section, and use it to reach the real OEP. I have accomplished it in Step 2 (Section 5).
4.1 Retrieve and Rebuild PE file
I have made a simple class library to recover PE information and to use it in a new PE file.
CPELibrary Class Step 1
class CPELibrary{private: PCHAR pMem; DWORD dwFileSize; protected: PIMAGE_DOS_HEADER image_dos_header; PCHAR pDosStub; DWORD dwDosStubSize, dwDosStubOffset; PIMAGE_NT_HEADERS image_nt_headers; PIMAGE_SECTION_HEADER image_section_header[MAX_SECTION_NUM]; PCHAR image_section[MAX_SECTION_NUM]; protected: DWORD PEAlign(DWORD dwTarNum,DWORD dwAlignTo); void AlignmentSections(); DWORD Offset2RVA(DWORD dwRO); DWORD RVA2Offset(DWORD dwRVA); PIMAGE_SECTION_HEADER ImageRVA2Section(DWORD dwRVA); PIMAGE_SECTION_HEADER ImageOffset2Section(DWORD dwRO); DWORD ImageOffset2SectionNum(DWORD dwRVA); PIMAGE_SECTION_HEADER AddNewSection(char* szName,DWORD dwSize); public: CPELibrary(); ~CPELibrary(); void OpenFile(char* FileName); void SaveFile(char* FileName); };
In Table 1, the usage of image_dos_header, pDosStub, image_nt_headers, image_section_header [MAX_SECTION_NUM], and image_section[MAX_SECTION_NUM] is clear. You use OpenFile() and SaveFile() to retrieve and rebuild a PE file. Furthermore, AddNewSection() is employed to create the new section, the important step.
4.2 Create data for the new section
Full Size Image)
You can comprehend the difference between incremental link and no-incremental link by looking at the following picture:
To acquire the virtual address of DynLoader(), you obtain the virtual address of JMP pemaker.DynLoader in the incremental link, but by no-incremental link, the real virtual address is gained by the following code:
DWORD dwVA= (DWORD) DynLoader;
This setting is more critical in the incremental link when you try to find the beginning and ending of the Loader, DynLoader(), by CPECryptor::ReturnToBytePtr():
void* CPECryptor::ReturnToBytePtr(void* FuncName, DWORD findstr){ void* tmpd; __asm { mov eax, FuncName jmp dfhjg: inc eaxdf: mov ebx, [eax] cmp ebx, findstr jnz hjg mov tmpd, eax } return tmpd;}
In pecrypt.cpp, I have represented another class, CPECryptor, to comprise the data of the new section. Nevertheless, the data of the new section is created by DynLoader() in loader.cpp, DynLoader Step 1. You use the CPECryptor class to enter this data in to the new section, and also some other stuff.
CPECryptor Class Step 1
class CPECryptor: public CPELibrary{private: PCHAR pNewSection; DWORD GetFunctionVA(void* FuncName); void* ReturnToBytePtr(void* FuncName, DWORD findstr); protected: public: void CryptFile(int(__cdecl *callback) (unsigned int, unsigned int)); };
4.3 Some notes regarding creating a new PE file
- Align the VirtualAddress and the VirtualSize of each section by SectionAlignment:
image_section_header[i]->VirtualAddress= PEAlign(image_section_header[i]->VirtualAddress, image_nt_headers->OptionalHeader.SectionAlignment);image_section_header[i]->Misc.VirtualSize= PEAlign(image_section_header[i]->Misc.VirtualSize, image_nt_headers->OptionalHeader.SectionAlignment);
- Align the PointerToRawData and the SizeOfRawData of each section by FileAlignment:
image_section_header[i]->PointerToRawData = PEAlign(image_section_header[i]->PointerToRawData, image_nt_headers->OptionalHeader.FileAlignment);image_section_header[i]->SizeOfRawData = PEAlign(image_section_header[i]->SizeOfRawData, image_nt_headers->OptionalHeader.FileAlignment);
- Correct the SizeofImage by the virtual size and the virtual address of the last section:
image_nt_headers->OptionalHeader.SizeOfImage = image_section_header[LastSection]->VirtualAddress + image_section_header[LastSection]->Misc.VirtualSize;
- Set the Bound Import Directory header to zero because this directory is not very important to execute a PE file:
image_nt_headers-> OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT]. VirtualAddress = 0;image_nt_headers-> OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BOUND_ IMPORT].Size = 0;
4.4 Some notes regarding linking this VC Project
- Set Linker->General->Enable Incremental Linking to No (/INCREMENTAL:NO).

(
5 Store Important Data and Reach the Original OEP
Right now, we save the Original OEP and also the Image Base in order to reach to the virtual address of OEP. I have reserved a free space at the end of DynLoader() to store them, DynLoader Step 2.
PE Maker – Step 2
Download the pemaker2.zip source files from the end of the article.
DynLoader Step 2
__stdcall void DynLoader(){_asm{ DWORD_TYPE(DYN_LOADER_START_MAGIC)Main_0: PUSHAD CALL Main_1Main_1: POP EBP SUB EBP,OFFSET Main_1 MOV EAX,DWORD PTR [EBP+_RO_dwImageBase] ADD EAX,DWORD PTR [EBP+_RO_dwOrgEntryPoint] PUSH EAX RETN DWORD_TYPE(DYN_LOADER_START_DATA1)_RO_dwImageBase: DWORD_TYPE(0xCCCCCCCC)_RO_dwOrgEntryPoint: DWORD_TYPE(0xCCCCCCCC)
The new function, CPECryptor::CopyData1(), will implement the copy of the Image Base value and the Offset of Entry Point value into 8 bytes of free space in the loader.
5.1 Restore the first register’s context
It is important to recover the Original Context of the thread. You have not yet done it in the DynLoader Step 2 source code. You can modify the source of DynLoader() to repossess the first Context.
__stdcall void DynLoader(){_asm{ DWORD_TYPE(DYN_LOADER_START_MAGIC)Main_0: PUSHAD CALL Main_1Main_1: POP EBP SUB EBP,OFFSET Main_1 MOV EAX,DWORD PTR [EBP+_RO_dwImageBase] ADD EAX,DWORD PTR [EBP+_RO_dwOrgEntryPoint] MOV DWORD PTR [ESP+1Ch],EAX POPAD PUSH EAX XOR EAX, EAX RETN DWORD_TYPE(DYN_LOADER_START_DATA1)_RO_dwImageBase: DWORD_TYPE(0xCCCCCCCC)_RO_dwOrgEntryPoint: DWORD_TYPE(0xCCCCCCCC) DWORD_TYPE(DYN_LOADER_END_MAGIC)}}
5.2 Restore the original stack
You also can recover the original stack by setting the value of the beginning stack + 0x34 to the Original OEP, but it is not very important. Nevertheless, in the following code, I have accomplished the loader code by a simple trick to reach the OEP in addition to redecorating the stack. You can observe the implementation by tracing using OllyDbg or SoftICE.
__stdcall void DynLoader(){_asm{ DWORD_TYPE(DYN_LOADER_START_MAGIC)Main_0: PUSHAD CALL Main_1Main_1: POP EBP SUB EBP,OFFSET Main_1 MOV EAX,DWORD PTR [EBP+_RO_dwImageBase] ADD EAX,DWORD PTR [EBP+_RO_dwOrgEntryPoint] MOV DWORD PTR [ESP+54h],EAX POPAD CALL _OEP_Jump DWORD_TYPE(0xCCCCCCCC)_OEP_Jump: PUSH EBP MOV EBP,ESP MOV EAX,DWORD PTR [ESP+3Ch] MOV DWORD PTR [ESP+4h],EAX XOR EAX,EAX LEAVE RETN DWORD_TYPE(DYN_LOADER_START_DATA1)_RO_dwImageBase: DWORD_TYPE(0xCCCCCCCC)_RO_dwOrgEntryPoint: DWORD_TYPE(0xCCCCCCCC) DWORD_TYPE(DYN_LOADER_END_MAGIC)}}
5.3 Approach OEP by structured exception handling
try-except statement in C++ clarifies the operation of structured exception handling. Besides the assembly code of this code, it elucidates the structured exception handler installation, the raise of an exception, and the exception handler function.
An exception is generated when a program falls into a fault code execution and an error happens, so in such a special condition, the program immediately jumps to a function called the exception handler from exception handler list of the Thread Information Block.
The next example of a
#include "stdafx.h"#include "windows.h"void RAISE_AN_EXCEPTION(){_asm{ INT 3 INT 3 INT 3 INT 3}}int _tmain(int argc, _TCHAR* argv[]){ __try { __try{ printf("1: Raise an Exception\n"); RAISE_AN_EXCEPTION(); } __finally { printf("2: In Finally\n"); } } __except( printf("3: In Filter\n"), EXCEPTION_EXECUTE_HANDLER ) { printf("4: In Exception Handler\n"); } return 0;}
; main()00401000: PUSH EBP00401001: MOV EBP,ESP00401003: PUSH -100401005: PUSH 00407160; __try {; the structured exception handler (SEH) installation 0040100A: PUSH _except_handler30040100F: MOV EAX,DWORD PTR FS:[0]00401015: PUSH EAX00401016: MOV DWORD PTR FS:[0],ESP0040101D: SUB ESP,800401020: PUSH EBX00401021: PUSH ESI00401022: PUSH EDI00401023: MOV DWORD PTR SS:[EBP-18],ESP; __try {00401026: XOR ESI,ESI00401028: MOV DWORD PTR SS:[EBP-4],ESI0040102B: MOV DWORD PTR SS:[EBP-4],100401032: PUSH OFFSET "1: Raise an Exception"00401037: CALL printf0040103C: ADD ESP,4; the raise a exception, INT 3 exception; RAISE_AN_EXCEPTION()0040103F: INT300401040: INT300401041: INT300401042: INT3; } __finally {00401043: MOV DWORD PTR SS:[EBP-4],ESI00401046: CALL 0040104D0040104B: JMP 004010800040104D: PUSH OFFSET "2: In Finally"00401052: CALL printf00401057: ADD ESP,40040105A: RETN; }; }; __except( 0040105B: JMP 004010800040105D: PUSH OFFSET "3: In Filter"00401062: CALL printf00401067: ADD ESP,40040106A: MOV EAX,1 ; EXCEPTION_EXECUTE_HANDLER = 10040106F: RETN; , EXCEPTION_EXECUTE_HANDLER ); {; the exception handler funtion00401070: MOV ESP,DWORD PTR SS:[EBP-18]00401073: PUSH OFFSET "4: In Exception Handler"00401078: CALL printf0040107D: ADD ESP,4; }00401080: MOV DWORD PTR SS:[EBP-4],-10040108C: XOR EAX,EAX; restore previous SEH0040108E: MOV ECX,DWORD PTR SS:[EBP-10]00401091: MOV DWORD PTR FS:[0],ECX00401098: POP EDI00401099: POP ESI0040109A: POP EBX0040109B: MOV ESP,EBP0040109D: POP EBP0040109E: RETN
Make a Win32 console project, and link and run the preceding C++ code, to perceive the result:
1: Raise an Exception 3: In Filter 2: In Finally 4: In Exception Handler _
|
This program runs the exception expression, printf("3: In Filter\n");, when an exception happens—in this example, the INT 3 exception. You can employ other kinds of exception too. In OllyDbg, Debugging options->Exceptions, you can see a short list of different types of exceptions.

5.3.1 Implement Exception Handler
You want to construct a structured exception handler to reach OEP. Now, I think you have distinguished the SEH installation, the exception raise, and the exception expression filter, by foregoing the assembly code. To establish your exception handler approach, you need to comprise the following codes:
- SEH installation:
LEA EAX,[EBP+_except_handler1_OEP_Jump]PUSH EAXPUSH DWORD PTR FS:[0]MOV DWORD PTR FS:[0],ESP
- An Exception Raise:
INT 3
- Exception handler expression filter:
_except_handler1_OEP_Jump: PUSH EBP MOV EBP,ESP ... MOV EAX, EXCEPTION_CONTINUE_SEARCH LEAVE RETN
So, you yearn to make the ensuing C++ code in assembly language to inaugurate your engine to approach the Offset of the Entry Point by SEH.
__try { __asm { INT 3 }}__except( ..., EXCEPTION_CONTINUE_SEARCH ){}
In assembly code…
; ---------------------------------------------------- ; the structured exception handler (SEH) installation ; __try { LEA EAX,[EBP+_except_handler1_OEP_Jump] PUSH EAX PUSH DWORD PTR FS:[0] MOV DWORD PTR FS:[0],ESP ; ---------------------------------------------------- ; the raise a INT 3 exception INT 3 INT 3 INT 3 INT 3 ; } ; __except( ... ; ---------------------------------------------------- ; exception handler expression filter_except_handler1_OEP_Jump: PUSH EBP MOV EBP,ESP ... MOV EAX, EXCEPTION_CONTINUE_SEARCH ; EXCEPTION_CONTINUE_SEARCH = 0 LEAVE RETN ; , EXCEPTION_CONTINUE_SEARCH ) { }
The exception value, __except(..., Value), determines how the exception is handled. It can have three values: 1, 0, -1. To understand them, refer to the try-except statement description in the MSDN library. You set it to EXCEPTION_CONTINUE_SEARCH (0), not to run the exception handler function; therefore, by this value, the exception is not recognized. It is simply ignored, and the thread continues its code execution.
How the SEH installation is implemented
As you perceived from the illustrated code, the SEH installation is done by the FS segment register. Microsoft Windows 32 bit uses the FS segment register as a pointer to the data block of the main thread. The first 0x1C bytes comprise the information of the Thread Information Block (TIB). Therefore, FS:[00h] refers to ExceptionList of the main thread, Table 3. In your code, you have pushed the pointer to _except_handler1_OEP_Jump in the stack and changed the value of ExceptionList, FS:[00h], to the beginning of the stack, ESP.
Thread Information Block (TIB)
typedef struct _NT_TIB32 { DWORD ExceptionList; DWORD StackBase; DWORD StackLimit; DWORD SubSystemTib; union { DWORD FiberData; DWORD Version; }; DWORD ArbitraryUserPointer; DWORD Self;} NT_TIB32, *PNT_TIB32;
Table 3: FS segment register and Thread Information Block
| DWORD PTR FS:[00h] |
ExceptionList |
| DWORD PTR FS:[04h] |
StackBase |
| DWORD PTR FS:[08h] |
StackLimit |
| DWORD PTR FS:[0Ch] |
SubSystemTib |
| DWORD PTR FS:[10h] |
FiberData / Version |
| DWORD PTR FS:[14h] |
ArbitraryUserPointer |
| DWORD PTR FS:[18h] |
Self |
5.3.2 Attain OEP by adjusting the Thread Context
In this part, you effectuate your performance by accomplishing the OEP approach. You change the Context of the thread and ignore every simple exception handling, and let the thread continue the execution, but in the original OEP!
When an exception happens, the context of the processor during the time of the exception is saved in the stack. Through
MOV EAX, ContextRecordMOV EDI, dwOEP ; EAX <- dwOEPMOV DWORD PTR DS:[EAX+0B8h], EDI ; pContext.Eip <- EAX
Win32 Thread Context structure
#define MAXIMUM_SUPPORTED_EXTENSION 512typedef struct _CONTEXT { DWORD ContextFlags; DWORD Dr0; DWORD Dr1; DWORD Dr2; DWORD Dr3; DWORD Dr6; DWORD Dr7; FLOATING_SAVE_AREA FloatSave; DWORD SegGs; DWORD SegFs; DWORD SegEs; DWORD SegDs; DWORD Edi; DWORD Esi; DWORD Ebx; DWORD Edx; DWORD Ecx; DWORD Eax; DWORD Ebp; DWORD Eip; DWORD SegCs; DWORD EFlags; DWORD Esp; DWORD SegSs; BYTE ExtendedRegisters[MAXIMUM_SUPPORTED_EXTENSION]; } CONTEXT,*LPCONTEXT;
Table 4: CONTEXT
| Context Flags |
0×00000000 |
ContextFlags |
|
Context Debug Registers
|
0×00000004 |
Dr0 |
| 0×00000008 |
Dr1 |
| 0x0000000C |
Dr2 |
| 0×00000010 |
Dr3 |
| 0×00000014 |
Dr6 |
| 0×00000018 |
Dr7 |
|
Context Floating Point
|
0x0000001C |
FloatSave |
StatusWord |
| 0×00000020 |
StatusWord |
| 0×00000024 |
TagWord |
| 0×00000028 |
ErrorOffset |
| 0x0000002C |
ErrorSelector |
| 0×00000030 |
DataOffset |
| 0×00000034 |
DataSelector |
0×00000038 … 0×00000087 |
RegisterArea [0x50] |
| 0×00000088 |
Cr0NpxState |
| Context Segments |
0x0000008C |
SegGs |
| 0×00000090 |
SegFs |
| 0×00000094 |
SegEs |
| 0×00000098 |
SegDs |
| Context Integer |
0x0000009C |
Edi |
| 0x000000A0 |
Esi |
| 0x000000A4 |
Ebx |
| 0x000000A8 |
Edx |
| 0x000000AC |
Ecx |
| 0x000000B0 |
Eax |
| Context Control |
0x000000B4 |
Ebp |
| 0x000000B8 |
Eip |
| 0x000000BC |
SegCs |
| 0x000000C0 |
EFlags |
| 0x000000C4 |
Esp |
| 0x000000C8 |
SegSs |
| Context Extended Registers |
0x000000CC … 0x000002CB
|
ExtendedRegisters[0x200] |
By the following code, you have accomplished the main purpose of coming to OEP by the structured exception handler:
__stdcall void DynLoader(){_asm{ DWORD_TYPE(DYN_LOADER_START_MAGIC)Main_0: PUSHAD CALL Main_1Main_1: POP EBP SUB EBP,OFFSET Main_1 MOV EAX,DWORD PTR [EBP+_RO_dwImageBase] ADD EAX,DWORD PTR [EBP+_RO_dwOrgEntryPoint] MOV DWORD PTR [ESP+10h],EAX LEA EAX,[EBP+_except_handler1_OEP_Jump] MOV DWORD PTR [ESP+1Ch],EAX POPAD PUSH EAX XOR EAX, EAX PUSH DWORD PTR FS:[0] MOV DWORD PTR FS:[0],ESP DWORD_TYPE(0xCCCCCCCC) _except_handler1_OEP_Jump: PUSH EBP MOV EBP,ESP MOV EAX,DWORD PTR SS:[EBP+010h] PUSH EDI MOV EDI,DWORD PTR DS:[EAX+0C4h] PUSH DWORD PTR DS:[EDI] POP DWORD PTR FS:[0] ADD DWORD PTR DS:[EAX+0C4h],8 MOV EDI,DWORD PTR DS:[EAX+0A4h] MOV DWORD PTR DS:[EAX+0B8h],EDI POP EDI MOV EAX, EXCEPTION_CONTINUE_SEARCH LEAVE RETN DWORD_TYPE(DYN_LOADER_START_DATA1)_RO_dwImageBase: DWORD_TYPE(0xCCCCCCCC)_RO_dwOrgEntryPoint: DWORD_TYPE(0xCCCCCCCC) DWORD_TYPE(DYN_LOADER_END_MAGIC)}}
6 Build an Import Table and Reconstruct the Original Import Table
There are two ways to use the Windows dynamic link library (DLL) in Windows application programming:
- Using Windows libraries by additional dependencies:

(
Full Size Image)
- Using Windows dynamic link libraries in run-time:
typedef HGLOBAL (*importFunction_GlobalAlloc)(UINT, SIZE_T);...importFunction_GlobalAlloc __GlobalAlloc;HINSTANCE hinstLib = LoadLibrary("Kernel32.dll");if (hinstLib == NULL){ }__GlobalAlloc = (importFunction_GlobalAlloc)GetProcAddress(hinstLib, "GlobalAlloc");if (addNumbers == NULL){ }FreeLibrary(hinstLib);
When you make a Windows application project, the linker includes at least kernel32.dll in the base dependencies of your project. Without LoadLibrary() and GetProcAddress() of Kernel32.dll, you cannot load a DLL at run time. The dependencies information is stored in the import table section. By using Dependency Walker, it is not so difficult to observe the DLL module and the functions that are imported into a PE file.

You attempt to establish your custom import table to conduct your project. Furthermore, you have to fix up the original import table at the end to run the real code of the program.
PE Maker: Step 3
Download the pemaker3.zip source files from the end of the article.
6.1 Construct the Client Import Table
I strongly advise that you to read Section 6.4 of the Microsoft Portable Executable and the Common Object File Format Specification document. This section contains the principal information to comprehend the import table performance. The import table data is accessible by a second data directory of the optional header from PE headers, so you can access it by using the following code:
DWORD dwVirtualAddress = image_nt_headers-> OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT]. VirtualAddress;DWORD dwSize = image_nt_headers-> OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT]. Size;
The VirtualAddress refers to structures by IMAGE_IMPORT_DESCRIPTOR. This structure contains the pointer to the imported DLL name and the relative virtual address of the first thunk.
typedef struct _IMAGE_IMPORT_DESCRIPTOR { union { DWORD Characteristics; DWORD OriginalFirstThunk; }; DWORD TimeDateStamp; DWORD ForwarderChain; DWORD Name; DWORD FirstThunk; } IMAGE_IMPORT_DESCRIPTOR, *PIMAGE_IMPORT_DESCRIPTOR;
When a program is running, the Windows Task Manager sets the thunks by the virtual address of the function. The virtual address is found by the name of the function. At first, the thunks hold the relative virtual address of the function name, as shown in Table 5; during execution, they are fixed up by the virtual address of the functions (see Table 6).
Table 5: The Import Table in a file image
IMAGE_IMPORT_ DESCRIPTOR[0] |
OriginalFirstThunk |
|
|
| TimeDateStamp |
| ForwarderChain |
| Name_RVA |
——> |
"kernel32.dll",0 |
| FirstThunk_RVA |
——> |
proc_1_name_RVA |
——> |
0,0,"LoadLibraryA",0 |
| |
proc_2_name_RVA |
——> |
0,0,"GetProcAddress",0 |
| proc_3_name_RVA |
——> |
0,0,"GetModuleHandleA",0 |
| … |
|
|
IMAGE_IMPORT_ DESCRIPTOR[1] |
|
| ... |
|
IMAGE_IMPORT_ DESCRIPTOR[n] |
|
Table 6: The Import Table in virtual memory
| IMAGE_IMPORT_DESCRIPTOR[0] |
OriginalFirstThunk |
|
| TimeDateStamp |
| ForwarderChain |
| Name_RVA |
------> |
"kernel32.dll",0 |
| FirstThunk_RVA |
------> |
proc_1_VA |
| |
proc_2_VA |
| proc_3_VA |
| ... |
| IMAGE_IMPORT_DESCRIPTOR[1] |
|
| ... |
|
| IMAGE_IMPORT_DESCRIPTOR[n] |
|
You want to make a simple import table to import LoadLibrary(), and GetProcAddress() from Kernel32.dll. You need these two essential API functions to cover other API functions in run-time. The following assembly code shows how easily you can reach your solution:
0101F000: 00000000 ; OriginalFirstThunk0101F004: 00000000 ; TimeDateStamp0101F008: 00000000 ; ForwarderChain0101F00C: 0001F034 ; Name; ImageBase + 0001F034 -> 0101F034 -> "Kernel32.dll",00101F010: 0001F028 ; FirstThunk; ImageBase + 0001F028 -> 0101F0280101F014: 000000000101F018: 000000000101F01C: 000000000101F020: 000000000101F024: 000000000101F028: 0001F041 ; ImageBase + 0001F041 -> 0101F041 -> 0,0,"LoadLibraryA",00101F02C: 0001F050 ; ImageBase + 0001F050 -> 0101F050 -> 0,0,"GetProcAddress",00101F030: 000000000101F034: 0001F041: 00 00 0001F050: 00 00 00 0000
After running…
0101F000: 00000000 ; OriginalFirstThunk0101F004: 00000000 ; TimeDateStamp0101F008: 00000000 ; ForwarderChain0101F00C: 0001F034 ; Name; ImageBase + 0001F034 -> 0101F034 -> "Kernel32.dll",00101F010: 0001F028 ; FirstThunk; ImageBase + 0001F028 -> 0101F0280101F014: 000000000101F018: 000000000101F01C: 000000000101F020: 000000000101F024: 000000000101F028: 7C801D77 ; -> Kernel32.LoadLibrary()0101F02C: 7C80AC28 ; -> Kernel32.GetProcAddress()0101F030: 000000000101F034: 0001F041: 00 00 0001F050: 00 00 00 0000
I have prepared a class library to make every import table by using a client string table. The CITMaker class library in itmaker.h; it will build an import table by sz_IT_EXE_strings and also the relative virtual address of the import table.
static const char *sz_IT_EXE_strings[]={ "Kernel32.dll", "LoadLibraryA", "GetProcAddress", 0,, 0,};
You subsequently employ this class library to establish an import table to support DLLs and OCXs, so this is a general library to present all possible import tables easily. The next step is clarified in the following code.
CITMaker *ImportTableMaker = new CITMaker( IMPORT_TABLE_EXE );...pimage_section_header=AddNewSection( ".xxx", dwNewSectionSize );ImportTableMaker->Build( pimage_section_header->VirtualAddress );memcpy( pNewSection, ImportTableMaker->pMem,ImportTableMaker->dwSize );...memcpy( image_section[image_nt_headers->FileHeader.NumberOfSections-1], pNewSection, dwNewSectionSize );...image_nt_headers->OptionalHeader. DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress = pimage_section_header->VirtualAddress;image_nt_headers->OptionalHeader. DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].Size = ImportTableMaker->dwSize;...delete ImportTableMaker;
The import table is copied at the beginning of the new section, and the relevant data directory is adjusted to the relative virtual address of the new section and the size of the new import table.
6.2 Using other API functions at run time
At this time, you can load other DLLs and find the process address of other functions by using LoadLibrary() and GetProcAddress():
lea edi, @"Kernel32.dll"push edimov eax,offset _p_LoadLibrarycall [ebp+eax] mov esi,eax lea edi, @"GetModuleHandleA"push edipush esimov eax,offset _p_GetProcAddresscall [ebp+eax]
LoadLibrary() and GetProcAddress() aid you in your effort to reach your intention.
I want to have a complete imported function table similar in performance done in a real EXE file. If you look inside a PE file, you will discover that an API call is done by an indirection jump through the virtual address of the API function:
JMP DWORD PTR [XXXXXXXX]
...0101F028: 7C801D77 ; Virtual Address of kernel32.LoadLibrary()...0101F120: JMP DWORD PTR [0101F028]...0101F230: CALL 0101F120 ; JMP to kernel32.LoadLibrary...
It makes it easy to expand the other part of your project by this performance, so you construct two data tables: the first for API virtual addresses, and the second for the JMP [XXXXXXXX].
#define __jmp_api byte_type(0xFF) byte_type(0x25)__asm{..._p_GetModuleHandle: dword_type(0xCCCCCCCC)_p_VirtualProtect: dword_type(0xCCCCCCCC)_p_GetModuleFileName: dword_type(0xCCCCCCCC)_p_CreateFile: dword_type(0xCCCCCCCC)_p_GlobalAlloc: dword_type(0xCCCCCCCC)_jmp_GetModuleHandle: __jmp_api dword_type(0xCCCCCCCC)_jmp_VirtualProtect: __jmp_api dword_type(0xCCCCCCCC)_jmp_GetModuleFileName: __jmp_api dword_type(0xCCCCCCCC)_jmp_CreateFile: __jmp_api dword_type(0xCCCCCCCC)_jmp_GlobalAlloc: __jmp_api dword_type(0xCCCCCCCC)...}
In the succeeding code, you have concluded your ambition to install a custom internal import table! (You cannot call it import table.)
... lea edi,[ebp+_p_szKernel32] lea ebx,[ebp+_p_GetModuleHandle] lea ecx,[ebp+_jmp_GetModuleHandle] add ecx,02h_api_get_lib_address_loop: push ecx push edi mov eax,offset _p_LoadLibrary call [ebp+eax] pop ecx mov esi,eax push edi call __strlen add esp,04h add edi,eax_api_get_proc_address_loop: push ecx push edi push esi mov eax,offset _p_GetProcAddress call [ebp+eax] pop ecx mov [ebx],eax mov [ecx],ebx add ebx,04h add ecx,06h push edi call __strlen add esp,04h add edi,eax mov al,byte ptr [edi] test al,al jnz _api_get_proc_address_loop inc edi mov al,byte ptr [edi] test al,al jnz _api_get_lib_address_loop ...
6.3 Fix up the Original Import Table
To run the program again, you should fix up the thunks of the actual import table; otherwise, you have a corrupted target PE file. Your code must correct all of the thunks the same as Table 5 to Table 6. Once more,
... mov ebx,[ebp+_p_dwImportVirtualAddress] test ebx,ebx jz _it_fixup_end mov esi,[ebp+_p_dwImageBase] add ebx,esi _it_fixup_get_lib_address_loop: mov eax,[ebx+00Ch] test eax,eax jz _it_fixup_end mov ecx,[ebx+010h] add ecx,esi mov [ebp+_p_dwThunk],ecx mov ecx,[ebx] test ecx,ecx jnz _it_fixup_table mov ecx,[ebx+010h]_it_fixup_table: add ecx,esi mov [ebp+_p_dwHintName],ecx add eax,esi push eax mov eax,offset _p_LoadLibrary call [ebp+eax] test eax,eax jz _it_fixup_end mov edi,eax_it_fixup_get_proc_address_loop: mov ecx,[ebp+_p_dwHintName] mov edx,[ecx] test edx,edx jz _it_fixup_next_module test edx,080000000h jz _it_fixup_by_name and edx,07FFFFFFFh jmp _it_fixup_get_addr_it_fixup_by_name: add edx,esi inc edx inc edx _it_fixup_get_addr: push edx push edi mov eax,offset _p_GetProcAddress call [ebp+eax] mov ecx,[ebp+_p_dwThunk] mov [ecx],eax add dword ptr [ebp+_p_dwThunk], 004h add dword ptr [ebp+_p_dwHintName],004h jmp _it_fixup_get_proc_address_loop_it_fixup_next_module: add ebx,014h jmp _it_fixup_get_lib_address_loop_it_fixup_end: ...
7 Support DLL and OCX
Now, you intend to include the dynamic link library (DLL) and OLE-ActiveX Control in your PE builder project. Supporting them is very easy if you pay attention to the two-time arrival into the Offset of Entry Point, the relocation table implementation, and the client import table.
PE Maker: Step 4
LoadLibrary(), or an OCX is registered by using LoadLibrary() and GetProcAddress() through calling DllRegisterServer(), the first of the OEP arrival is done.
hinstDLL = LoadLibrary( "test1.dll" );hinstOCX = LoadLibrary( "test1.ocx" );_DllRegisterServer = GetProcAddress( hinstOCX, "DllRegisterServer" );_DllRegisterServer();
Download the pemaker4.zip source files from the end of the article.
7.1 Twice OEP approach
The Offset of Entry Point of a DLL file or an OCX file is touched by the main program atleast twice:
To perform this, I have employed a trick that causes in the second time again, the instruction pointer (EIP) traveling towards the original OEP by the structured exception handler.
_main_0: pushad call _main_1_main_1: pop ebp sub ebp,offset _main_1 _support_dll_0: jmp _support_dll_1 jmp _support_dll_2_support_dll_1: ... mov edi,[ebp+_p_dwImageBase] add edi,[edi+03Ch] mov ax,word ptr [edi+016h] test ax,IMAGE_FILE_DLL jz _support_dll_2 mov ax, 9090h mov word ptr [ebp+_support_dll_0],ax_support_dll_2: ... into OEP by SEH ...
I hope you caught the trick in the preceding code, but this is not all of it. You have a problem in ImageBase, when the library has been loaded in different image bases by the main program. You should write some code to find the real image base and store it to use forward.
mov eax,[esp+24h] mov ebx,[esp+30h] cmp eax,ebx ja _no_dll_pe_file_0 cmp word ptr [eax],IMAGE_DOS_SIGNATURE jne _no_dll_pe_file_0 mov [ebp+_p_dwImageBase],eax_no_dll_pe_file_0:
This code finds the real image base by investigating the stack information. By using the real image base and the formal image base, you should correct all memory calls inside the image program!! Don't be afraid; it will be done simply by the relocating the table information.
7.2 Implement relocation table
To understand the relocation table better, you can take a look at Section 6.6 of the Microsoft Portable Executable and Common Object File Format Specification document. The relocation table contains many packages to relocate the information related to the virtual address inside the virtual memory image. Each package is comprised of an 8-byte header to exhibit the base virtual address and the number of data, demonstrated by the IMAGE_BASE_RELOCATION data structure.
typedef struct _IMAGE_BASE_RELOCATION { DWORD VirtualAddress; DWORD SizeOfBlock;} IMAGE_BASE_RELOCATION, *PIMAGE_BASE_RELOCATION;
Table 7 - The Relocation Table
| Block[1] |
VirtualAddress |
| SizeOfBlock |
| type:4 |
offset:12 |
type:4 |
offset:12 |
| type:4 |
offset:12 |
type:4 |
offset:12 |
| type:4 |
offset:12 |
type:4 |
offset:12 |
| ... |
... |
... |
... |
| type:4 |
offset:12 |
00 |
00 |
| Block[2] |
VirtualAddress |
| SizeOfBlock |
| type:4 |
offset:12 |
type:4 |
offset:12 |
| type:4 |
offset:12 |
type:4 |
offset:12 |
| type:4 |
offset:12 |
type:4 |
offset:12 |
| ... |
... |
... |
... |
| type:4 |
offset:12 |
00 |
00 |
| ... |
...
|
| Block[n] |
VirtualAddress |
| SizeOfBlock |
| type:4 |
offset:12 |
type:4 |
offset:12 |
| type:4 |
offset:12 |
type:4 |
offset:12 |
| type:4 |
offset:12 |
type:4 |
offset:12 |
| ... |
... |
... |
... |
| type:4 |
offset:12 |
00 |
00 |
Table 7 illustrates the main idea of the relocation table. Furthermore, you can upload a DLL or an OCX file in OllyDbg to observe the relocation table, the ".reloc" section through Memory map window. By the way, you find the position of the relocation table by using the following code in your project:
DWORD dwVirtualAddress = image_nt_headers-> OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC]. VirtualAddress;DWORD dwSize = image_nt_headers-> OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].Size;
By OllyDbg, you have the same as the following for the ".reloc" section, by using the Long Hex viewer mode. In this example, the base virtual address is 0x1000 and the size of the block is 0x184.
008E1000 : 00001000 00000184 30163000 30403028008E1010 : 30683054 308C3080 30AC309C 30D830CC008E1020 : 30E030DC 30E830E4 30F030EC 310030F4008E1030 : 3120310D 315F3150 31A431A0 31C031A8008E1040 : 31D031CC 31F431EC 31FC31F8 32043200008E1050 : 320C3208 32143210 324C322C 32583254008E1060 : 3260325C 32683264 3270326C 32B03274
It relocates the data in the subsequent virtual addresses:
0x1000 + 0x0000 = 0x10000x1000 + 0x0016 = 0x10160x1000 + 0x0028 = 0x10280x1000 + 0x0040 = 0x10400x1000 + 0x0054 = 0x1054...
Each package performs the relocation by using consecutive 4 bytes form its internal information. The first byte refers to the type of relocation and the next three bytes are the offset that must be used with the base virtual address and the image base to correct the image information.
What is the type?
The type can be one of the following values:
- IMAGE_REL_BASED_ABSOLUTE (0): No effect
- IMAGE_REL_BASED_HIGH (1): Relocate by the high 16 bytes of the base virtual address and the offset
- IMAGE_REL_BASED_LOW (2): Relocate by the low 16 bytes of the base virtual address and the offset
- IMAGE_REL_BASED_HIGHLOW (3): Relocate by the base virtual address and the offset
What is done in the relocation?
By relocation, some values inside the virtual memory are corrected according to the current image base by the ".reloc" section packages.
| delta_ImageBase = current_ImageBase - image_nt_headers->OptionalHeader.ImageBase |
mem[ current_ImageBase + 0x1000 ] = mem[ current_ImageBase + 0x1000 ] + delta_ImageBase ;mem[ current_ImageBase + 0x1016 ] = mem[ current_ImageBase + 0x1016 ] + delta_ImageBase ;mem[ current_ImageBase + 0x1028 ] = mem[ current_ImageBase + 0x1028 ] + delta_ImageBase ;mem[ current_ImageBase + 0x1040 ] = mem[ current_ImageBase + 0x1040 ] + delta_ImageBase ;mem[ current_ImageBase + 0x1054 ] = mem[ current_ImageBase + 0x1054 ] + delta_ImageBase ;...
I have employed the following code from Morphine packer to implement the relocation.
..._reloc_fixup: mov eax,[ebp+_p_dwImageBase] mov edx,eax mov ebx,eax add ebx,[ebx+3Ch] mov ebx,[ebx+034h] sub edx,ebx je _reloc_fixup_end mov ebx,[ebp+_p_dwRelocationVirtualAddress] test ebx,ebx jz _reloc_fixup_end add ebx,eax_reloc_fixup_block: mov eax,[ebx+004h] test eax,eax jz _reloc_fixup_end lea ecx,[eax-008h] shr ecx,001h lea edi,[ebx+008h]_reloc_fixup_do_entry: movzx eax,word ptr [edi] push edx mov edx,eax shr eax,00Ch mov esi,[ebp+_p_dwImageBase] and dx,00FFFh add esi,[ebx] add esi,edx pop edx_reloc_fixup_HIGH: dec eax jnz _reloc_fixup_LOW mov eax,edx shr eax,010h jmp _reloc_fixup_LOW_fixup_reloc_fixup_LOW: dec eax jnz _reloc_fixup_HIGHLOW movzx eax,dx _reloc_fixup_LOW_fixup: add word ptr [esi],ax jmp _reloc_fixup_next_entry_reloc_fixup_HIGHLOW: dec eax jnz _reloc_fixup_next_entry add [esi],edx _reloc_fixup_next_entry: inc edi inc edi loop _reloc_fixup_do_entry_reloc_fixup_next_base: add ebx,[ebx+004h] jmp _reloc_fixup_block_reloc_fixup_end: ...
7.3 Build a special import table
To support the OLE-ActiveX Control registration, you should present an appropriate import table to your target OCX and DLL file. Therefore, I have established an import table by the following string:
const char *sz_IT_OCX_strings[]={ "Kernel32.dll", "LoadLibraryA", "GetProcAddress", "GetModuleHandleA", 0, "User32.dll", "GetKeyboardType", "WindowFromPoint", 0, "AdvApi32.dll", "RegQueryValueExA", "RegSetValueExA", "StartServiceA", 0, "Oleaut32.dll", "SysFreeString", "CreateErrorInfo", "SafeArrayPtrOfIndex", 0, "Gdi32.dll", "UnrealizeObject", 0, "Ole32.dll", "CreateStreamOnHGlobal", "IsEqualGUID", 0, "ComCtl32.dll", "ImageList_SetIconSize", 0, 0,};
Without these API functions, the library can not be loaded, and moreover the DllregisterServer() and DllUregisterServer() will not operate. In CPECryptor::CryptFile, I have distinguished between EXE files and DLL files in the initialization of the new import table object during creation:
if(( image_nt_headers->FileHeader.Characteristics & IMAGE_FILE_DLL ) == IMAGE_FILE_DLL ){ ImportTableMaker = new CITMaker( IMPORT_TABLE_OCX );}else{ ImportTableMaker = new CITMaker( IMPORT_TABLE_EXE );}
8 Preserve the Thread Local Storage
By using Thread Local Storage (TLS), a program is able to execute a multithreaded process, This performance mostly is used by Borland linkers: Delphi and C++ Builder. When you pack a PE file, you should take care to keep the TLS clean; otherwise, your packer will not support Borland Delphi and C++ Builder linked EXE files. To comprehend TLS, I refer you to Section 6.7 of the Microsoft Portable Executable and Common Object File Format Specification document. You can observe the TLS structure by IMAGE_TLS_DIRECTORY32 in winnt.h.
typedef struct _IMAGE_TLS_DIRECTORY32 { DWORD StartAddressOfRawData; DWORD EndAddressOfRawData; DWORD AddressOfIndex; DWORD AddressOfCallBacks; DWORD SizeOfZeroFill; DWORD Characteristics;} IMAGE_TLS_DIRECTORY32, * PIMAGE_TLS_DIRECTORY32;
MessageBox() from user32.dll.
To keep the TLS directory safe, I have copied it in a special place inside the loader:
..._tls_dwStartAddressOfRawData: dword_type(0xCCCCCCCC)_tls_dwEndAddressOfRawData: dword_type(0xCCCCCCCC)_tls_dwAddressOfIndex: dword_type(0xCCCCCCCC)_tls_dwAddressOfCallBacks: dword_type(0xCCCCCCCC)_tls_dwSizeOfZeroFill: dword_type(0xCCCCCCCC)_tls_dwCharacteristics: dword_type(0xCCCCCCCC)...
It is necessary to correct the TLS directory entry in the Optional Header:
if(image_nt_headers-> OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS]. VirtualAddress!=0){ memcpy(&pDataTable->image_tls_directory, image_tls_directory, sizeof(IMAGE_TLS_DIRECTORY32)); dwOffset=DWORD(pData1)-DWORD(pNewSection); dwOffset+=sizeof(t_DATA_1)-sizeof(IMAGE_TLS_DIRECTORY32); image_nt_headers-> OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS]. VirtualAddress=dwVirtualAddress + dwOffset;}
9 Inject Your Code
You are ready to place your code inside the new section. Your code is a "Hello World!" message by
...push MB_OK | MB_ICONINFORMATIONlea eax,[ebp+_p_szCaption]push eaxlea eax,[ebp+_p_szText]push eaxpush NULLcall _jmp_MessageBox...
PE Maker: Step 5
Download the pemaker5.zip source files from the end of the article.

10 Conclusion
By reading this article, you have perceived how easily you can inject code to a portable executable file. You can complete the code by using the source of other packers, create a packer in the same way as Yoda's Protector, and make your packer undetectable by mixing up with Morphine source code. I hope that you have enjoyed this brief discussion of one part of the reverse engineering field. See you again in the next discussion!
EXCEPTION_POINTERS, you have access to the pointer of ContextRecord. The ContextRecord has the CONTEXT data structure, as seen in Table 4. This is the thread context during the exception time. When you ignore the exception by EXCEPTION_CONTINUE_SEARCH (0), the instruction pointer, as well as the context, will be set to ContextRecord to return to the previous condition. Therefore, if you change the Eip of the Win32 Thread Context to the Original Offset of Entry Point, it will come clearly into OEP.Full Size Image)
Posted: November 22, 2006 at 3:37 pm | Tags: blog, cache, flash, linux, ror, windows, 开发, 测试, 类
因工作需要, 在BLOG中记录一篇BOOTLOADER文章. 做为对BOOTLOADER概念的DEEPLY了解.
转至http://www.dz863.com
一、引言
在专用的嵌入式板子运行 GNU/Linux 系统已经变得越来越流行。一个嵌入式 Linux 系统从软件的角度看通常可以分为四个层次:
1. 引导加载程序。包括固化在固件(firmware)中的 boot 代码(可选),和 Boot Loader 两大部分。
2. Linux 内核。特定于嵌入式板子的定制内核以及内核的启动参数。
3. 文件系统。包括根文件系统和建立于 Flash 内存设备之上文件系统。通常用ram disk 来作为 root fs。
4. 用户应用程序。特定于用户的应用程序。有时在用户应用程序和内核层之间可能还会包括一个嵌入式图形用户界面。常用的嵌入式 GUI 有:MicroWindows 和 MiniGUI 等。
引导加载程序是系统加电后运行的第一段软件代码。回忆一下 PC 的体系结构我们可以知道,PC 机中的引导加载程序由 BIOS(其本质就是一段固件程序)和位于硬盘MBR中的OS Boot Loader(比如,LILO 和 GRUB 等)一起组成。BIOS 在完成硬件检测和资源分配后,将硬盘MBR中的 Boot Loader 读到系统的RAM 中,然后将控制权交给 OS Boot Loader。Boot Loader 的主要运行任务就是将内核映象从硬盘上读到RAM 中,然后跳转到内核的入口点去运行,也即开始启动操作系统。
而在嵌入式系统中,通常并没有像BIOS 那样的固件程序(注,有的嵌入式 CPU 也会内嵌一段短小的启动程序),因此整个系统的加载启动任务就完全由 Boot Loader 来完成。比如在一个基于 ARM7TDMI core 的嵌入式系统中,系统在上电或复位时通常都从地址 0×00000000 处开始执行,而在这个地址处安排的通常就是系统的Boot Loader程序。
本文将从 Boot Loader 的概念、Boot Loader 的主要任务、Boot Loader 的框架结构以及Boot Loader 的安装等四个方面来讨论嵌入式系统的 Boot Loader。
二、 Boot Loader 的概念
简单地说,Boot Loader 就是在操作系统内核运行之前运行的一段小程序。通过这段小程序,我们可以初始化硬件设备、建立内存空间的映射图,从而将系统的软硬件环境带到一个合适的状态,以便为最终调用操作系统内核准备好正确的环境。
通常,Boot Loader 是严重地依赖于硬件而实现的,特别是在嵌入式世界。因此,在嵌入式世界里建立一个通用的 Boot Loader 几乎是不可能的。尽管如此,我们仍然可以对 Boot Loader 归纳出一些通用的概念来,以指导用户特定的 Boot Loader 设计与实现。
1. Boot Loader 所支持的 CPU 和嵌入式板
每种不同的 CPU 体系结构都有不同的Boot Loader。有些 Boot Loader 也支持多种体系结构的 CPU,比如 U-Boot 就同时支持 ARM 体系结构和MIPS 体系结构。除了依赖于 CPU的体系结构外,Boot Loader 实际上也依赖于具体的嵌入式板级设备的配置。这也就是说,对于两块不同的嵌入式板而言,即使它们是基于同一种 CPU 而构建的,要想让运行在一块板子上的 Boot Loader 程序也能运行在另一块板子上,通常也都需要修改 Boot Loader 的源程序。
2. Boot Loader 的安装媒介(Installation Medium)
系统加电或复位后,所有的CPU 通常都从某个由 CPU 制造商预先安排的地址上取指令。比如,基于 ARM7TDMI core 的 CPU 在复位时通常都从地址 0×00000000 取它的第一条指令。而基于CPU 构建的嵌入式系统通常都有某种类型的固态存储设备(比如:ROM、EEPROM 或 FLASH 等)被映射到这个预先安排的地址上。因此在系统加电后,CPU 将首先执行Boot Loader 程序。
下图1就是一个同时装有 Boot Loader、内核的启动参数、内核映像和根文件系统映像的固态存储设备的典型空间分配结构图。

图1 固态存储设备的典型空间分配结构
3. 用来控制 Boot Loader 的设备或机制
主机和目标机之间一般通过串口建立连接,Boot Loader 软件在执行时通常会通过串口来进行 I/O,比如:输出打印信息到串口,从串口读取用户控制字符等。
4. Boot Loader 的启动过程是单阶段(Single Stage)还是多阶段(Multi-Stage)
通常多阶段的 Boot Loader 能提供更为复杂的功能,以及更好的可移植性。从固态存储设备上启动的 Boot Loader 大多都是 2 阶段的启动过程,也即启动过程可以分为 stage 1和 stage 2 两部分。而至于在 stage 1 和 stage 2 具体完成哪些任务将在下面几篇讨论。
5. Boot Loader 的操作模式 (Operation Mode)
大多数 Boot Loader 都包含两种不同的操作模式:"启动加载"模式和"下载"模式,这种区别仅对于开发人员才有意义。但从最终用户的角度看,Boot Loader 的作用就是用来加载操作系统,而并不存在所谓的启动加载模式与下载工作模式的区别。
启动加载(Boot loading)模式:这种模式也称为"自主"(Autonomous)模式。也即 Boot Loader 从目标机上的某个固态存储设备上将操作系统加载到 RAM 中运行,整个过程并没有用户的介入。这种模式是 Boot Loader 的正常工作模式,因此在嵌入式产品发布的时侯,Boot Loader 显然必须工作在这种模式下。
下载(Downloading)模式:在这种模式下,目标机上的 Boot Loader 将通过串口连接或网络连接等通信手段从主机(Host)下载文件,比如:下载内核映像和根文件系统映像等。从主机下载的文件通常首先被 Boot Loader 保存到目标机的 RAM 中,然后再被 Boot Loader 写到目标机上的FLASH 类固态存储设备中。Boot Loader 的这种模式通常在第一次安装内核与根文件系统时被使用;此外,以后的系统更新也会使用 Boot Loader 的这种工作模式。工作于这种模式下的 Boot Loader 通常都会向它的终端用户提供一个简单的命令行接口。
像 Blob 或 U-Boot 等这样功能强大的 Boot Loader 通常同时支持这两种工作模式,而且允许用户在这两种工作模式之间进行切换。比如,Blob 在启动时处于正常的启动加载模式,但是它会延时 10 秒等待终端用户按下任意键而将 blob 切换到下载模式。如果在 10 秒内没有用户按键,则 blob 继续启动 Linux 内核。
6. BootLoader 与主机之间进行文件传输所用的通信设备及协议
最常见的情况就是,目标机上的 Boot Loader 通过串口与主机之间进行文件传输,传输协议通常是 xmodem/ymodem/zmodem 协议中的一种。但是,串口传输的速度是有限的,因此通过以太网连接并借助 TFTP 协议来下载文件是个更好的选择。
此外,在论及这个话题时,主机方所用的软件也要考虑。比如,在通过以太网连接和 TFTP 协议来下载文件时,主机方必须有一个软件用来的提供 TFTP 服务。在讨论了 BootLoader 的上述概念后,下面我们来具体看看 BootLoader 的应该完成哪些任务。
三、Boot Loader 的主要任务与典型结构框架
在继续本节的讨论之前,首先我们做一个假定,那就是:假定内核映像与根文件系统映像都被加载到 RAM 中运行。之所以提出这样一个假设前提是因为,在嵌入式系统中内核映像与根文件系统映像也可以直接在 ROM 或 Flash 这样的固态存储设备中直接运行。但这种做法无疑是以运行速度的牺牲为代价的。从操作系统的角度看,Boot Loader 的总目标就是正确地调用内核来执行。
另外,由于 Boot Loader 的实现依赖于 CPU 的体系结构,因此大多数 Boot Loader 都分为 stage1 和 stage2 两大部分。依赖于 CPU 体系结构的代码,比如设备初始化代码等,通常都放在 stage1 中,而且通常都用汇编语言来实现,以达到短小精悍的目的。而 stage2 则通常用C语言来实现,这样可以实现给复杂的功能,而且代码会具有更好的可读性和可移植性。
Boot Loader 的 stage1 通常包括以下步骤(以执行的先后顺序):
·硬件设备初始化。
·为加载 Boot Loader 的 stage2 准备 RAM 空间。
·拷贝 Boot Loader 的 stage2 到 RAM 空间中。
·设置好堆栈。
·跳转到 stage2 的 C 入口点。
Boot Loader 的 stage2 通常包括以下步骤(以执行的先后顺序):
·初始化本阶段要使用到的硬件设备。
·检测系统内存映射(memory map)。
·将 kernel 映像和根文件系统映像从 flash 上读到 RAM 空间中。
·为内核设置启动参数。
·调用内核。
3.1 Boot Loader 的 stage1
3.1.1 基本的硬件初始化
这是 Boot Loader 一开始就执行的操作,其目的是为 stage2 的执行以及随后的 kernel 的执行准备好一些基本的硬件环境。它通常包括以下步骤(以执行的先后顺序):
1.屏蔽所有的中断。为中断提供服务通常是 OS 设备驱动程序的责任,因此在 Boot Loader 的执行全过程中可以不必响应任何中断。中断屏蔽可以通过写 CPU 的中断屏蔽寄存器或状态寄存器(比如 ARM 的 CPSR 寄存器)来完成。
2.设置 CPU 的速度和时钟频率。
3.RAM 初始化。包括正确地设置系统的内存控制器的功能寄存器以及各内存库控制寄存器等。
4.初始化 LED。典型地,通过 GPIO 来驱动 LED,其目的是表明系统的状态是 OK 还是 Error。如果板子上没有 LED,那么也可以通过初始化 UART 向串口打印 Boot Loader 的 Logo 字符信息来完成这一点。
5. 关闭 CPU 内部指令/数据 cache。
3.1.2 为加载 stage2 准备 RAM 空间
为了获得更快的执行速度,通常把 stage2 加载到 RAM 空间中来执行,因此必须为加载 Boot Loader 的 stage2 准备好一段可用的 RAM 空间范围。
由于 stage2 通常是 C 语言执行代码,因此在考虑空间大小时,除了 stage2 可执行映象的大小外,还必须把堆栈空间也考虑进来。此外,空间大小最好是 memory page 大小(通常是 4KB)的倍数。一般而言,1M 的 RAM 空间已经足够了。具体的地址范围可以任意安排,比如 blob 就将它的 stage2 可执行映像安排到从系统 RAM 起始地址 0xc0200000 开始的 1M 空间内执行。但是,将 stage2 安排到整个 RAM 空间的最顶 1MB(也即(RamEnd-1MB) – RamEnd)是一种值得推荐的方法。
为了后面的叙述方便,这里把所安排的 RAM 空间范围的大小记为:stage2_size(字节),把起始地址和终止地址分别记为:stage2_start 和 stage2_end(这两个地址均以 4 字节边界对齐)。因此:
stage2_end=stage2_start+stage2_size
另外,还必须确保所安排的地址范围的的确确是可读写的 RAM 空间,因此,必须对你所安排的地址范围进行测试。具体的测试方法可以采用类似于 blob 的方法,也即:以 memory page 为被测试单位,测试每个 memory page 开始的两个字是否是可读写的。为了后面叙述的方便,我们记这个检测算法为:test_mempage,其具体步骤如下:
1.先保存 memory page 一开始两个字的内容。
2.向这两个字中写入任意的数字。比如:向第一个字写入 0×55,第 2 个字写入 0xaa。
3.然后,立即将这两个字的内容读回。显然,我们读到的内容应该分别是 0×55 和 0xaa。如果不是,则说明这个 memory page 所占据的地址范围不是一段有效的 RAM 空间。
4.再向这两个字中写入任意的数字。比如:向第一个字写入 0xaa,第 2 个字中写入 0×55。
5.然后,立即将这两个字的内容立即读回。显然,我们读到的内容应该分别是 0xaa 和 0×55。如果不是,则说明这个 memory page 所占据的地址范围不是一段有效的 RAM 空间。
6.恢复这两个字的原始内容。测试完毕。
为了得到一段干净的 RAM 空间范围,我们也可以将所安排的 RAM 空间范围进行清零操作。
3.1.3 拷贝 stage2 到 RAM 中
拷贝时要确定两点:(1) stage2 的可执行映象在固态存储设备的存放起始地址和终止地址;(2) RAM 空间的起始地址。
3.1.4 设置堆栈指针 sp
堆栈指针的设置是为了执行 C 语言代码作好准备。通常我们可以把 sp 的值设置为(stage2_end-4),也即在 3.1.2 节所安排的那个 1MB 的 RAM 空间的最顶端(堆栈向下生长)。此外,在设置堆栈指针 sp 之前,也可以关闭 led 灯,以提示用户我们准备跳转到 stage2。经过上述这些执行步骤后,系统的物理内存布局应该如下图2所示。
3.1.5 跳转到 stage2 的 C 入口点
在上述一切都就绪后,就可以跳转到 Boot Loader 的 stage2 去执行了。比如,在 ARM 系统中,这可以通过修改 PC 寄存器为合适的地址来实现。
图2 bootloader 的 stage2 可执行映象刚被拷贝到 RAM 空间时的系统内存布局
3.2 Boot Loader 的 stage2
正如前面所说,stage2 的代码通常用 C 语言来实现,以便于实现更复杂的功能和取得更好的代码可读性和可移植性。但是与普通 C 语言应用程序不同的是,在编译和链接 boot loader 这样的程序时,我们不能使用 glibc 库中的任何支持函数。其原因是显而易见的。这就给我们带来一个问题,那就是从那里跳转进 main() 函数呢?直接把 main() 函数的起始地址作为整个 stage2 执行映像的入口点或许是最直接的想法。但是这样做有两个缺点:1)无法通过main() 函数传递函数参数;2)无法处理 main() 函数返回的情况。一种更为巧妙的方法是利用 trampoline(弹簧床)的概念。也即,用汇编语言写一段trampoline 小程序,并将这段 trampoline 小程序来作为 stage2 可执行映象的执行入口点。然后我们可以在 trampoline 汇编小程序中用 CPU 跳转指令跳入 main() 函数中去执行;而当 main() 函数返回时,CPU 执行路径显然再次回到我们的 trampoline 程序。简而言之,这种方法的思想就是:用这段 trampoline 小程序来作为 main() 函数的外部包裹(external wrapper)。
下面给出一个简单的 trampoline 程序示例(来自blob):
.text
.globl _trampoline
_trampoline:
bl main
/* if main ever returns we just call it again */
b _trampoline
可以看出,当 main() 函数返回后,我们又用一条跳转指令重新执行 trampoline 程序――当然也就重新执行 main() 函数,这也就是 trampoline(弹簧床)一词的意思所在。
3.2.1初始化本阶段要使用到的硬件设备
这通常包括:(1)初始化至少一个串口,以便和终端用户进行 I/O 输出信息;(2)初始化计时器等。在初始化这些设备之前,也可以重新把 LED 灯点亮,以表明我们已经进入 main() 函数执行。
设备初始化完成后,可以输出一些打印信息,程序名字字符串、版本号等。
3.2.2 检测系统的内存映射(memory map)
所谓内存映射就是指在整个 4GB 物理地址空间中有哪些地址范围被分配用来寻址系统的 RAM 单元。比如,在 SA-1100 CPU 中,从 0xC000,0000 开始的 512M 地址空间被用作系统的 RAM 地址空间,而在 Samsung S3C44B0X CPU 中,从 0x0c00,0000 到 0×1000,0000 之间的 64M 地址空间被用作系统的 RAM 地址空间。虽然 CPU 通常预留出一大段足够的地址空间给系统 RAM,但是在搭建具体的嵌入式系统时却不一定会实现 CPU 预留的全部 RAM 地址空间。也就是说,具体的嵌入式系统往往只把 CPU 预留的全部 RAM 地址空间中的一部分映射到 RAM 单元上,而让剩下的那部分预留 RAM 地址空间处于未使用状态。由于上述这个事实,因此 Boot Loader 的 stage2 必须在它想干点什么 (比如,将存储在 flash 上的内核映像读到 RAM 空间中) 之前检测整个系统的内存映射情况,也即它必须知道 CPU 预留的全部 RAM 地址空间中的哪些被真正映射到 RAM 地址单元,哪些是处于 "unused" 状态的。
(1) 内存映射的描述
可以用如下数据结构来描述 RAM 地址空间中的一段连续(continuous)的地址范围:
typedef struct memory_area_struct {
u32 start; /* the base address of the memory region */
u32 size; /* the byte number of the memory region */
int used;
} memory_area_t;
这段 RAM 地址空间中的连续地址范围可以处于两种状态之一:(1)used=1,则说明这段连续的地址范围已被实现,也即真正地被映射到 RAM 单元上。(2)used=0,则说明这段连续的地址范围并未被系统所实现,而是处于未使用状态。
基于上述 memory_area_t 数据结构,整个 CPU 预留的 RAM 地址空间可以用一个 memory_area_t 类型的数组来表示,如下所示:
memory_area_t memory_map[NUM_MEM_AREAS] = {
[0 ... (NUM_MEM_AREAS - 1)] = {
.start = 0,
.size = 0,
.used = 0
},
};
(2) 内存映射的检测
下面我们给出一个可用来检测整个 RAM 地址空间内存映射情况的简单而有效的算法:
/* 数组初始化 */
for(i = 0; i < NUM_MEM_AREAS; i++)
memory_map[i].used = 0;
/* first write a 0 to all memory locations */
for(addr = MEM_START; addr < MEM_END; addr += PAGE_SIZE)
* (u32 *)addr = 0;
for(i = 0, addr = MEM_START; addr < MEM_END; addr += PAGE_SIZE) {
/*
* 检测从基地址 MEM_START+i*PAGE_SIZE 开始,大小为
* PAGE_SIZE 的地址空间是否是有效的RAM地址空间。
*/
调用3.1.2节中的算法test_mempage();
if ( current memory page isnot a valid ram page) {
/* no RAM here */
if(memory_map[i].used )
i++;
continue;
}
/*
* 当前页已经是一个被映射到 RAM 的有效地址范围
* 但是还要看看当前页是否只是 4GB 地址空间中某个地址页的别名?
*/
if(* (u32 *)addr != 0) { /* alias? */
/* 这个内存页是 4GB 地址空间中某个地址页的别名 */
if ( memory_map[i].used )
i++;
continue;
}
/*
* 当前页已经是一个被映射到 RAM 的有效地址范围
* 而且它也不是 4GB 地址空间中某个地址页的别名。
*/
if (memory_map[i].used == 0) {
memory_map[i].start = addr;
memory_map[i].size = PAGE_SIZE;
memory_map[i].used = 1;
} else {
memory_map[i].size += PAGE_SIZE;
}
} /* end of for (…) */
在用上述算法检测完系统的内存映射情况后,Boot Loader 也可以将内存映射的详细信息打印到串口。
3.2.3 加载内核映像和根文件系统映像
(1) 规划内存占用的布局
这里包括两个方面:(1)内核映像所占用的内存范围;(2)根文件系统所占用的内存范围。在规划内存占用的布局时,主要考虑基地址和映像的大小两个方面。
对于内核映像,一般将其拷贝到从(MEM_START+0×8000) 这个基地址开始的大约1MB大小的内存范围内(嵌入式 Linux 的内核一般都不操过 1MB)。为什么要把从 MEM_START 到 MEM_START+0×8000 这段 32KB 大小的内存空出来呢?这是因为 Linux 内核要在这段内存中放置一些全局数据结构,如:启动参数和内核页表等信息。
而对于根文件系统映像,则一般将其拷贝到 MEM_START+0×0010,0000 开始的地方。如果用 Ramdisk 作为根文件系统映像,则其解压后的大小一般是1MB。
(2)从 Flash 上拷贝
由于像 ARM 这样的嵌入式 CPU 通常都是在统一的内存地址空间中寻址 Flash 等固态存储设备的,因此从 Flash 上读取数据与从 RAM 单元中读取数据并没有什么不同。用一个简单的循环就可以完成从 Flash 设备上拷贝映像的工作:
while(count) {
*dest++ = *src++; /* they are all aligned with word boundary */
count -= 4; /* byte number */
};
3.2.4 设置内核的启动参数
应该说,在将内核映像和根文件系统映像拷贝到 RAM 空间中后,就可以准备启动 Linux 内核了。但是在调用内核之前,应该作一步准备工作,即:设置 Linux 内核的启动参数。
Linux 2.4.x 以后的内核都期望以标记列表(tagged list)的形式来传递启动参数。启动参数标记列表以标记 ATAG_CORE 开始,以标记 ATAG_NONE 结束。每个标记由标识被传递参数的 tag_header 结构以及随后的参数值数据结构来组成。数据结构 tag 和 tag_header 定义在 Linux 内核源码的include/asm/setup.h 头文件中:
/* The list ends with an ATAG_NONE node. */
#define ATAG_NONE 0×00000000
struct tag_header {
u32 size; /* 注意,这里size是字数为单位的 */
u32 tag;
};
……
struct tag {
struct tag_header hdr;
union {
struct tag_core core;
struct tag_mem32 mem;
struct tag_videotext videotext;
struct tag_ramdisk ramdisk;
struct tag_initrd initrd;
struct tag_serialnr serialnr;
struct tag_revision revision;
struct tag_videolfb videolfb;
struct tag_cmdline cmdline;
/*
* Acorn specific
*/
struct tag_acorn acorn;
/*
* DC21285 specific
*/
struct tag_memclk memclk;
} u;
};
在嵌入式 Linux 系统中,通常需要由 Boot Loader 设置的常见启动参数有:ATAG_CORE、ATAG_MEM、ATAG_CMDLINE、ATAG_RAMDISK、ATAG_INITRD等。比如,设置 ATAG_CORE 的代码如下:
params = (struct tag *)BOOT_PARAMS;
params->hdr.tag = ATAG_CORE;
params->hdr.size = tag_size(tag_core);
params->u.core.flags = 0;
params->u.core.pagesize = 0;
params->u.core.rootdev = 0;
params = tag_next(params);
其中,BOOT_PARAMS 表示内核启动参数在内存中的起始基地址,指针 params 是一个 struct tag 类型的指针。宏 tag_next() 将以指向当前标记的指针为参数,计算紧临当前标记的下一个标记的起始地址。注意,内核的根文件系统所在的设备ID就是在这里设置的。
下面是设置内存映射情况的示例代码:
for(i = 0; i < NUM_MEM_AREAS; i++) {
if(memory_map[i].used) {
params->hdr.tag = ATAG_MEM;
params->hdr.size = tag_size(tag_mem32);
params->u.mem.start = memory_map[i].start;
params->u.mem.size = memory_map[i].size;
params = tag_next(params);
}
}
可以看出,在 memory_map[]数组中,每一个有效的内存段都对应一个 ATAG_MEM 参数标记。
Linux 内核在启动时可以以命令行参数的形式来接收信息,利用这一点我们可以向内核提供那些内核不能自己检测的硬件参数信息,或者重载(override)内核自己检测到的信息。比如,我们用这样一个命令行参数字符串"console=ttyS0,115200n8"来通知内核以 ttyS0 作为控制台,且串口采用 "115200bps、无奇偶校验、8位数据位"这样的设置。下面是一段设置调用内核命令行参数字符串的示例代码:
char *p;
/* eat leading white space */
for(p = commandline; *p == ‘ ‘; p++)
;
/* skip non-existent command lines so the kernel will still
* use its default command line.
*/
if(*p == ‘\0′)
return;
params->hdr.tag = ATAG_CMDLINE;
params->hdr.size = (sizeof(struct tag_header) + strlen(p) + 1 + 4) >> 2;
strcpy(params->u.cmdline.cmdline, p);
params = tag_next(params);
请注意在上述代码中,设置 tag_header 的大小时,必须包括字符串的终止符’\0′,此外还要将字节数向上圆整4个字节,因为 tag_header 结构中的size 成员表示的是字数。
下面是设置 ATAG_INITRD 的示例代码,它告诉内核在 RAM 中的什么地方可以找到 initrd 映象(压缩格式)以及它的大小:
params->hdr.tag = ATAG_INITRD2;
params->hdr.size = tag_size(tag_initrd);
params->u.initrd.start = RAMDISK_RAM_BASE;
params->u.initrd.size = INITRD_LEN;
params = tag_next(params);
下面是设置 ATAG_RAMDISK 的示例代码,它告诉内核解压后的 Ramdisk 有多大(单位是KB):
params->hdr.tag = ATAG_RAMDISK;
params->hdr.size = tag_size(tag_ramdisk);
params->u.ramdisk.start = 0;
params->u.ramdisk.size = RAMDISK_SIZE; /* 请注意,单位是KB */
params->u.ramdisk.flags = 1; /* automatically load ramdisk */
params = tag_next(params);
最后,设置 ATAG_NONE 标记,结束整个启动参数列表:
static void setup_end_tag(void)
{
params->hdr.tag = ATAG_NONE;
params->hdr.size = 0;
}
3.2.5 调用内核
Boot Loader 调用 Linux 内核的方法是直接跳转到内核的第一条指令处,也即直接跳转到 MEM_START+0×8000 地址处。在跳转时,下列条件要满足:
1. CPU 寄存器的设置:
·R0=0;
@R1=机器类型 ID;关于 Machine Type Number,可以参见 linux/arch/arm/tools/mach-types。
@R2=启动参数标记列表在 RAM 中起始基地址;
2. CPU 模式:
·必须禁止中断(IRQs和FIQs);
·CPU 必须 SVC 模式;
3. Cache 和 MMU 的设置:
·MMU 必须关闭;
·指令 Cache 可以打开也可以关闭;
·数据 Cache 必须关闭;
如果用 C 语言,可以像下列示例代码这样来调用内核:
void (*theKernel)(int zero, int arch, u32 params_addr)
= (void (*)(int, int, u32))KERNEL_RAM_BASE;
……
theKernel(0, ARCH_NUMBER, (u32) kernel_params_start);
注意,theKernel()函数调用应该永远不返回的。如果这个调用返回,则说明出错。
四、 关于串口终端
在 boot loader 程序的设计与实现中,没有什么能够比从串口终端正确地收到打印信息能更令人激动了。此外,向串口终端打印信息也是一个非常重要而又有效的调试手段。但是,我们经常会碰到串口终端显示乱码或根本没有显示的问题。造成这个问题主要有两种原因:(1) boot loader 对串口的初始化设置不正确。(2) 运行在 host 端的终端仿真程序对串口的设置不正确,这包括:波特率、奇偶校验、数据位和停止位等方面的设置。
此外,有时也会碰到这样的问题,那就是:在 boot loader 的运行过程中我们可以正确地向串口终端输出信息,但当 boot loader 启动内核后却无法看到内核的启动输出信息。对这一问题的原因可以从以下几个方面来考虑:
(1) 首先请确认你的内核在编译时配置了对串口终端的支持,并配置了正确的串口驱动程序。
(2) 你的 boot loader 对串口的初始化设置可能会和内核对串口的初始化设置不一致。此外,对于诸如 s3c44b0x 这样的 CPU,CPU 时钟频率的设置也会影响串口,因此如果 boot loader 和内核对其 CPU 时钟频率的设置不一致,也会使串口终端无法正确显示信息。
(3) 最后,还要确认 boot loader 所用的内核基地址必须和内核映像在编译时所用的运行基地址一致,尤其是对于 uClinux 而言。假设你的内核映像在编译时用的基地址是 0xc0008000,但你的 boot loader 却将它加载到 0xc0010000 处去执行,那么内核映像当然不能正确地执行了。
五、 结束语
Boot Loader 的设计与实现是一个非常复杂的过程。如果不能从串口收到那激动人心的内核启动信息,恐怕谁也不能说:"嗨,我的 boot loader 已经成功地转起来了!"。本文详细的介绍了bootloader的原理,回答了什么是bootloader
"uncompressing linux
……………… done,
booting the kernel……"