| || ||RO||LONG||Frequency in Hz of the free running counter|
| || ||RO||LONG||Free running counter value, lower 32 bits|
| || ||RO||LONG||Free running counter value, higher 32 bits|
| || ||RO||LONG||Number of executed M68k instructions, lower 32 bits|
| || ||RO||LONG||Number of executed M68k instructions, higher 32 bits|
| || ||RO||LONG||Number of executed ARM instructions, lower 32 bits|
| || ||RO||LONG||Number of executed ARM instructions, higher 32 bits|
| || ||RO||LONG||Total cache size in bytes|
| || ||RO||LONG||Number of bytes free in the JIT cache|
| || ||RO||LONG||Number of JIT units in the cache|
| || ||RW||LONG||JIT threshold for soft cache flushes|
| || ||RW||LONG||JIT control register|
| || ||RO||LONG||Number of JIT cache misses|
| || ||RW||LONG||Debug control register|
| || ||RW||LONG||Lowest debug address|
| || ||RW||LONG||Highest debug address|
| || ||RW||LONG||JIT control register 2|
AArch64 features a free running 64-bit counter which can be used for timing purposes. This counter is exposed to the M68k and can be freely used by the software. The frequency of the counter is available through this register.
The value of free running counter is available through two registers.
CNTVALLO contains lower 32 bits of the free running counter, whereas
CNTVALHI contains the upper 32 bits. In order to make sure that the counter is read properly, i.e. that the lower 32 bit did not wrap between reading lower and higher longword, it is advisable to read CNTVALHI twice. If the value has changed on second read, it means that the lower 32 bits have wrapped and register read procedure should be repeated.
# Read CNTVAL register into d0:d1 pair. ReadCNT: move.l d2, -(a7) 1: movec.l #0xe2, d2 movec.l #0xe1, d1 movec.l #0xe2, d0 cmp.l d0, d2 bne.b 1b move.l (a7)+, d2 rts
Emu68 provides a real time counter of executed M68k instructions. The value of this 64 bit counter, stored in two read only control registers, allows one to learn about current performance of Emu68.
Current count of executed ARM instructions, including translated JIT code as well as exceptions, translator and main JIT loop.
Total size of JIT cache in bytes. This value is defined once during compilation.
Number of free bytes in JIT cache.
This register contains number of JIT units available in the cache at the moment.
The soft flush of JIT cache, controlled by the
JITCTRL register is time consuming, since the entire cache has to be walked through. If the JIT cache contains less entries than the threshold value, a soft flush will be eventually applied. If number of units exceeds the threshold, regular cache flush will be applied regardless of
JCC_SOFT bit value.
Configures behaviour of JIT translator.
| ||0||1||Use “soft flush” of JIT cache.|
| ||4||4||Inline loop count|
| ||8||16||Maximal distance for inline|
| ||24||8||Maximal JIT unit size|
If this bit is set, instruction cache flush does not remove units from the JIT cache. Instead, they are marked as not verified. On next execution of the code the CRC32 checksum of the unit will be verified and, if unchanged, the unit will be marked as valid, omitting compilation phase.
If JIT Translator finds a way to unroll the loop in the code, it will attempt to fit up to
JCC_LOOP_COUNT loops, provided there is enough place to fit given number of m68k instructions into the cache.
When JIT translator finds a branch (conditional or unconditional) with target address computable during compilation time, the branch will be inlined into current JIT translation unit if the branch distance is within a proximity given by
JCC_INLINE_RANGE in bytes. Value of
0 disables branch inlining.
Translator will put not more than
JCC_INSN_DEPTH m68k instructions within single JIT compilation unit. Value of
0 sets maximal number of instructions to
256. It must be noted that the JIT unit can contain less m68k instructions than the value set here, since every branch which is not computable during compilation phase as well as many context-synchronising instructions will break the translation.
The value of this 32 bit counter is increased every time a JIT cache miss occurred and the JIT compiler is started.
Configures behaviour of debug messages. It can be switched on the fly to change verbosity of debug messages as well as to switch disassemble of translated code on or off. The change affects only the newly compiled units, therefore, it is advisable to flush entire code cache after applying any changes here.
| ||0||2||Set verbosity level of debug|
| ||2||1||Enable/disable disassembler|
Debug information about JIT units is usually shown for all blocks of the memory going into the translator. Since such debug can be extremely huge (above 200 megabytes on regular system boot), the range where the verbosity of JIT units is elevated through
DBGCTRL register may be limited. If M68k address is not within a range between
DBGADDRHI, no information about such JIT unit will be written to the console.
Second control register influencing behavior of Emu68
| ||0||1||Slow down code executing from CHIP memory|
| ||1||1||Slow down special case of DBF busy loops|
| ||3||5||Controls forward scan depth of CCR optimizer|
If this bit is set, Emu68 will add a word read from current PC location before every translated m68k instruction. This setting will make code executed from CHIP memory significantly slower. Might be used in case of some ancient software designed for much slower CPUs.
This bit slows down special case of DBF instruction often used e.g. in old MOD replayers as a busy loop delay:
move.w #xxx, Dn loop: dbf Dn, loop
Due to nature of Emu68 such busy loops are much faster then expected. When this bit is set, each DBF executed from CHIP memory branching to itself will take the same amount of time as three subsequent byte reads from CHIP.
When Emu68 is translating m68k code to AArch64 code, it perform forward scanning of further m68k instructions to estimate if and, if yes, which bits of CCR should be updated. This greatly reduces amount of generated AArch64 code, but might be prone to errors e.g. in case of self-modifying code. By adjusting JC2_CCR_SCAN_DEPTH field it is possible to instruct Emu68 how many opcodes shall be scanned in advance. Valid values vary from 0 (CCR optimization completely disabled) up to 31. Default value on startup of Emu68 is 20.