Instruction set compiled simulation A technique for fast and(2)

2025-06-17

We propose an instruction set compiled simulation (IS-CS)technique where the program is compiled prior to run time and executed interpretively as shown in Figure 3.The simulator recognizes if the program code of a previously ex-ecuted address has changed and initiates a re-decoding.We achieve both the performance of compiled simulation and ?exibility of interpretive simulation.The simulation perfor-mance of the IS-CS technique is upto 40%better than the best known result [1]in this category.There are two rea-sons for its superior performance.First,the time consum-ing instruction decoding process is moved to compile time while maintaining the ?exibility of interpretive simulation.Second,we use a novel instruction abstraction technique to generate aggressively optimized decoded instructions that further improve simulation performance.

3.INSTRUCTION SET COMPILED SIMU-LATION

We developed the instruction set compiled simulation (IS-CS)technique with the intention of combining the full ?ex-ibility of interpretive simulation with the speed of the com-piled principle.The basic idea is to move the time-consuming instruction decoding to compile time as shown in Figure 3.The application program,written in C/C++,is compiled using the gcc compiler con?gured to generate binary for the target machine.The instruction decoder decodes one bi-nary instruction at a time to generate the decoded program for the input application.The decoded program is com-piled by C++compiler and linked with the simulation li-brary to generate the simulator.The simulator recognizes if the previously decoded instruction has changed and initiates re-decoding of the modi?ed instruction.If any instruction

Figure3:Instruction Set Compiled Simulation Flow is modi?ed during execution and subsequently re-decoded, the location in instruction memory is updated with the re-decoded instruction.To improve the simulation speed we use a novel instruction abstraction technique that generates optimized decoded instructions as described in Section3.1. As a result the computation during run-time is minimized. This technique achieves the speed of compiled simulation due to compile-time decoding of application as described in Section3.2.Section3.3describes the simulation engine that o?ers the full?exibility of interpretive simulation.

3.1Instruction Abstraction

In traditional interpretive simulation(e.g.,Simplescalar [3])the decoding and execution of binary instructions are done using a single monolithic function.This function has many if-then-else and switch/case statements that perform certain activities based on bit patterns of opcode,operands, addressing modes etc.In advanced interpretive simulation (e.g.,LISA[1])the binary instruction is decoded and the decoded instruction contains pointers to speci?c functions. There are many variations of these two methods based on e?ciency of decode,complexity of implementation,and per-formance of execution.However,none of these techniques exploit the fact that a certain class of instructions may have a constant value for a particular?eld of the instruction.For example,a majority of the ARM instructions execute uncon-ditionally(condition?eld has value always).It is a waste of time to check the condition for such instructions every time they are executed.

Clearly,when certain input values are known for a class of instructions,the partial evaluation[13]technique can be applied.The e?ect of partial evaluation is to specialize a program with part of its input to get a faster version of the same program.To take advantage of such situations we need to have separate functions for each and every possible for-mat of instructions so that the function could be optimized by the compiler at compile time and produce the best per-formance at run time.Unfortunately,this is not feasible in practice.For example,consider the ARM data process-ing instructions.It can have16conditions,16operations, an update?ag(true/false),and one destination followed by two source operands.The second source operand,called shifter operand,has three?elds:operand type(reg/imm), shift options(5types)and shift value(reg/imm).In total, the ARM data processing instructions have16x16x2x2 x5x2=10240possible formats.

Our solution to this problem is to de?ne instruction classes, where each class contains instructions with similar formats. Most of the time this information is readily available from the instruction set architecture manual.For example,we de-?ned six instruction classes for the ARM processor viz.,Data Processing,Branch,LoadStore,Multiply,Multiple Load-Store,Software Interrupt,and Swap.Next,we de?ne a set of masks for each instruction class.The mask consists of’0’,’1’and’x’symbols.A’0’(’1’)symbol in the mask matches with a’0’(’1’)in binary pattern of the instruction at the same bit position.An’x’symbol matches with both’0’and ’1’.For example,the masks for the data processing instruc-tions are shown below:

"xxxx-001x xxxx-xxxx xxxx-xxxx xxxx-xxxx"

"xxxx-000x xxxx-xxxx xxxx-xxxx xxx0-xxxx"

"xxxx-000x xxxx-xxxx xxxx-xxxx0xx1-xxxx"

We use C++templates to implement the functionality for each class of instructions.For example,the pseudo code for the data processing template is shown below.The template has four parameters viz.,condition,operation,update?ag, and shifter operand.The shifter operand is a template hav-ing three parameters viz.,operand type,shift options and shift value.

Example1:Template for Data Processing Instructions

template class DataProcessing:

{

SftOper_sftOperand;

Reg_dest,_src1;

public:

.......

virtual void execute()

{

if(Cond::execute())

{

_dest=Op::execute(_src1,_sftOperand.getValue());

if(Flag::execute())

{

//Update Flags

.......

}

}

}

};

We also use a Mask Table for the mapping between mask patterns and templates.It also maintains a mapping be-tween mask patterns and functions corresponding to those templates.

This instruction abstraction technique is used to generate aggressively optimized decoded instructions as described in Section3.2.

3.2Instruction Decoder

Algorithm1decodes one binary instruction at a time to generate the decoded program for the input application. For each instruction in the application program it selects the appropriate template using Algorithm2.It generates a customized template for the instruction using the appro-priate parameter values.Algorithm3brie?y describes the

customized template generation process.Finally,the cus-tomized template is instantiated and appended in the tem-porary program TempProgram.The TempProgram is fed to a C++compiler that performs necessary optimizations to take advantage of the partial evaluation technique,de-scribed in Section3.1,to produce the DecodedProgram.The DecodedProgram is loaded into instruction memory which is a separate data structure than main memory.While the main memory holds the original program data and instruc-tion binaries,each cell of instruction memory holds a pointer to the optimized functionality as well as the instruction bi-nary.The instruction binary is used to check the validity of the decoded instruction during run-time.

Algorithm3describes the template customization pro-cess.The algorithm’s basic idea is to extract the values from speci?c?elds of the binary instruction(e.g.,opcode, operand etc.)and assign those values to the template.We maintain the information for each class of instructions,tem-plates,?eld formats,and mask patterns.These information can be derived from the processor speci?cation described us-ing an Architecture Description Language such as LISA[1], EXPRESSION[7]and nML[12].

Algorithm1:Instruction Decoding

Inputs:Application Program Appl(Binary),MaskTable maskTable. Output:Decoded Program DecodedProgram.

Begin

TempProgram={}

foreach binary instruction inst with address addr in Appl

template=DetermineTemplate(inst,maskTable)

Instruction set compiled simulation A technique for fast and(2).doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:常用英语礼貌服务用语

相关阅读
本类排行
× 游客快捷下载通道(下载后可以自由复制和排版)

下载本文档需要支付 7

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:xuecool-com QQ:370150219