CISC : The Intel 80486 vs. The Motorola MC68040 --------------------------------------------------- Source : Advanced Microprocessors by Daniel Tabak Scribe : X-> Mike <-X - July '92 --------------------------------------------------- System Comparison Most of the space in this text is dedicated to the most recent advanced CISC microprocessors, the top current products within their families; the Intel 80486 and the Motorola MC68040. They both belong to the latest 1.2 million transistors per chip generation. It therefore makes sense to compare the two. It would be unfair to compare the NS32532 with them, since the NS32532 belongs to an earlier generation and it is not in the same class as the 80486 and MC68040. A selection of points of comparison between the 80486 and the MC68040 is listed in Table 1.1. Looking carefully at the table, one can perceive only a single line indentically marked in both columns: both chips have an on-chip FPU, conforming to the IEEE 754-1985 standard. All other data are different, although quite close in some instances. The points of difference between the 80486 and the MC68040 will be discussed next in some detail. Table 1.1 Comparison of Intel 80486 and Motorola MC68040 ------------------------------------------------------------------------------- Feature Intel 80486 Motorola MC68040 ------------------------------------------------------------------------------- FPU on Chip Yes (IEEE) Yes (IEEE) CPU General-Purpose 32-bit Registers 8 16; 8 Data/8 Address FPU 80-bit Registers 8 (stack) 8 MMU on Chip Yes Yes; Dual: Data, Code Cache on Chip 8k Mixed 4k Data + 4k Code Segmentation Yes No Paging Yes; 4k/page Yes; 4k or 8k/page TLB (or ATC) size 32 entries 64 entries in each: Data, Code ATC Levels of protection 4 2 Instruction pipeline stages 5 6 Pins 168 179 ------------------------------------------------------------------------------- CPU General-Purpose Registers Both systems have 32-bit general-purpose registers; the 80486 has 8, while the 68040 has double that number, namely 16. There are advantages (and disadvantages) to having a large register file. The register file of the 80486 is definitely too small to avail itself to the advantages. This is particularly exacerbated by the fact that the CPU registers of the 80486 are not really quite as general purpose as one might wish. In fact, all of them are dedicated to certain special tasks, such as: EAX, EDX Dedicated to multiplication/division operations EDX Dedicated to some I/O operations EBX, EBP Dedicated to serve as base registers for some addressing modes ECX Dedicated to serve as a counter in LOOP instructions ESP Dedicated to serve as a stack pointer ESI, EDI Dedicated to serve as pointers in string instructions and as index registers in some addressing modes On the other hand, on the MC68040 the eight 32-bit data registers D0 to D7 are genuinely general purpose without any restrictions or specific tasks imposed on them. Of the eight 32-bit address registers A0 to A7, only A7 is dedicated as a stack pointer. The user is free to use the other seven resgisters A0 to A6 in any possible way. From the point of view of the CPU register file, the MC68040 has a very clear advantage. It is much better equipped to retain intermediate results during a program run, thus reducing CPU-memory traffic. From this standpoint, the MC68040 even has a slight edge over the VAX architecture. The VAX (any VAX model) also has sixteen 32-bit general-purpose registers. However, only 12 of those (as opposed to the 68040's 15) can be used freely by the programmer. Of the four VAX dedicated registers, one is used as a program counter and another as a stack pointer. The program counter is completely separate on both the MC68040 and the 80486 and is not included in the general-purpose registers. FPU General-Purpose Registers Both systems have eight 80-bit registers, providing a large range for floating-point number representation and a high level of precision. The only differnce between the two is that the 80486 FPU registers are organized as a stack, while those of the MC68040 are accessed directly, as its integer CPU registers. Because of the stack organization the 80486 might have a slight edge from the standpoint of compiler generation (for that part of the compiler dealing with floating-point operations). MMU on Chip The 80486 has a regular MMU on chip for the control and management of its memory. The MC68040 has two MMUs: one for code and one for data. This duality, supported by a separate operand data bus, allows the control unit to handle instruction and operand fetching simultaneously in parallel and enhances the handling of the instruction pipeline. Of course, the external bus leading to the off-chip main memory is single (32-bit data, 32-bit address), and it is shared by instructions and data operands. With a reasonable on-chip cache hit ratio, the off-chip bus would be used less often. Cache on Chip The total on-chip cache of both systems is 8 kbytes. Interestingly enough, they have the same parameters: both are four-way set-associative with 16 bytes per line. The difference is that while the 80486 on-chip 8k cache is mixed, storing both code and data the MC68040 cache is subdivided into two equal parts: a 4-kbyte data cache and a 4-kbyte code cache. Each cache is controlled by the respective MMU, mentioned above. The advantage, as in the MMU case, is the provision of two parallel paths for code and data, resulting in an overall speedup of operation. Segmentation The Intel 80x86 family implements segmentation, while the M68000 family does not. The earlier Intel systems (8086, 80286) were plagued with the upper 64-kbyte segment size limit, starting with the 80386 and so on, the segment sizecan be made as high as 4 Gbytes (maximum size of the physical memory), effectively removing the segmentation feature by the decision of the user. Therefore, as far as segmentation is concerned, the 80486 and MC68040 are comparable. The 80486 has some edge, since it allows the user to implement segmentation if needed and avail oneself to its advantages. Paging The MMUs of both systems feature paged virtual memory management. The 80486 offers a single standard page size of 4 kbytes. This page size is implemented in many other systems. With a 4-kbyte page size, one can arrange an address mapping where the page directory and the page tables also have the standard page size of 4 kbytes (1024 = 2^10 entries, 4 bytes each). Thus, the page directory and the page tables can be treated as entire pages and placed within page frames in the memory. This results in reduced complexity in the MMU hardware and in the OS software, one of whose tasks is to support the management of virtual memory. The MC68040 offers two page sizes, selectable by the user: 4 kbytes and 8 kbytes. This tends to complicate the MMU logic and the OS. It is a good thing that Motorola got rid of the other page size options available with its MC68851 paged MMU: 8 sizes ranging from 256 bytes to 32 kbytes, stepped by a factor of 2. On the other hand, the 8-kbyte per page option could be useful to a programmer dealing with large modules of code exceeding 4 kbytes. TLB (or ATC) Size The 80486 MMU has a 32-entry TLB. With a 4-kbyte page it covers 32 x 4 kbytes = 128 kbytes of memory. The MC68040 offers much more. The TLB is called address translation cache (ATC) by Motorola, but it does the same: it translates virtual into physical addresses. The name given by Motorola is simpler to perceive, although the TLB term is predominately used in the computer literature. Each of the two MC68040 MMUs has a 64-entry ATC, for a total of 128 entries on the chip. For a 4-kbyte page, a total of 128 x 4 kbytes = 512 kbytes of memory is covered (4 times that of the 80486), and for an 8-kbyte page, 1 Mbyte (8 times that of 80486). In this case, a strong advantage of the MC68040 is obvious. Since the ATCs encompass much more memory, the ATC miss probability is considerably smaller. Thus, less time will be wasted in accessing page tables in memory, resulting in faster overall operation. Levels of Protection The 80486 offers four levels of protection, while the MC68040 has only two - the supervisor and user, as does the whole M68000 family. While the protection mechanism of the 80486 is much more sophisticated and, with the segmentation encapsulation of information, offers more reliable protection, it also results in more complicated on-chip logic. More time is taken up with protection checks on the 80486. Instruction Pipeline Stages The 80486 instruction pipeline has five stages, while that of the MC68040 has six. This means that the 80486 pipeline can handle five instructions simultaneously and the MC68040 can handle six. This certainly gives an edge in favor of the MC68040, although its MMU-cache-internal buses duality is a much stronger contributor to its enhanced speed of operation. The above comments are valid if the instructions are executed sequentially, without any taken branches. In the case of the taken branch, the subsequent prefetched instructions are flushed from the pipeline hardware. Neither the 80486 nor the MC68040 employ the delayed branch feature, as do most of the RISC-type systems. The MC68040 designers have investigated the possibilityof featuring a delayed branch or other techniques to alleviate the problem of lost cycles in case of a flushed pipeline. After a number of simulations, they came to the conclusion that the gain in performance was not worth the extra hardware expenditure incurred in the implementation of any of the methods considered. In RISC-type systems, on the other hand, due to reduced control circuitry there is extra space for features such as the delayed branch which alleviates the pipeline management problem in case of a taken branch. Indeed, Intel's RISC 80860 and Motorola's RISC M88000 both implement the delayed branch technique as an option, selectable by the user. Performance Benchmarks Dhrystone Benchmark Version 2.1 (Integer Performance Test -- ALU) ----------------------------------------------------------------------------- System Results - Kdhrystones/s Relative ----------------------------------------------------------------------------- VAX 11/780 1.6 1.0 Motorola MC68030 (50 Mhz,1ws) 20.0 12.5 Intel 80486 (25 Mhz) 24.0 15.0 SPARC (25 Mhz) 27.0 16.8 Motorola M88000 (20 Mhz) 33.3 20.1 MIPS M/2000, R3000 (25 Mhz) 39.4 23.8 Motorola MC68040 (25 Mhz) 40.0 24.3 Intel 80860 (33.3 Mhz) 67.3 40.6 ----------------------------------------------------------------------------- As one can see, the MC68040 Dhrystone integer performance considerably exceeds that of the 80486. It should also be noted that the MC68040 outperforms its predecessor MC68030 by a factor of 2, while the MC68030 operates at a double frequency. Linpack Benchmark (Double-Precision, 100x100) (F-P Performance Test -- FPU) ----------------------------------------------------------------------------- System Results - MFLOPS ----------------------------------------------------------------------------- VAX 11/780 0.14 NS32532 + NS32381 0.17 Intel 80386 + 80387 (20 Mhz) 0.20 VAX 8600 0.49 Intel 80486 (25 Mhz) 1.0 Motorola M88000 (20 Mhz) 1.2 Sun SPARCstation 1 1.3 Decstation 3100 (MIPS R2000) 1.6 Sun 4/200 (SPARC) 1.6 Am29000 (25 Mhz) 1.71 IBM 3081 2.1 Motorola MC68040 (25 Mhz) 3.0 R3000/R3010 (25 Mhz) 3.9 Intel 80860 4.5 RS/6000 (25 Mhz) 10.9 Cray 1S 12.0 Cray X-MP 56.0 ----------------------------------------------------------------------------- Here, the MC68040 outperforms the 80486 by a factor of 3. This performance ratio is well supported by the discussion given for the data in Table 1.1. The fact that more RISC-type processors, tested above, outperform the 80486 CISC should not escape notice either. This is particularly significant for floating-point performance where the 80486 has an on-chip FPU, while the R3000 and the SPARC use off-chip coprocessors. A comparison of memory access clock cycles needed for the execution of ADD instructions is reported in the following: Memory Access Clock Counts ----------------------------------------------------------------------------- Source Destination MC68040 80486 M88000 ----------------------------------------------------------------------------- ADD reg reg 1 1 1 ADD mem reg (cache hit) 1 2 3* ADD reg mem (cache hit) 1 1 3* ADD mem reg (cache miss) 3 4 15* ----------------------------------------------------------------------------- --"reg" represents a CPU register and "mem" represents a location in memory. *Includes time to load register plus one clock for the ADD operation. The superior performance of the MC68040 fits the discussion given earlier in this text. It should also be noted that both the MC68040 and 80486 have an on-chip cache, while the M88000 cache is on a separate CMMU chip (MC88200). It should be noted that all of the above comparisons were conducted with artificial benchmark programs such as Dhrystone. It is quite possible that for some "real-life" programs the performance ordering might be quite different. It is no accident that when company A conducts benchmark experiments, its products come out ahead of others. It is quite possible that when another company, say B, publishes its own benchmark results, the performance ordering may look different. Therefore, the sample of benchmark comparison results presented should be regarded as a tentative indication. They are certainly not conclusive. <*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*> <*> <*> <*> >>> THIS TEXTFILE PASSED THROUGH IMAGINE BBS ++46-42-135505 <<< <*> <*> """"""""""""""""""""""""""""""""""""""""""""""""""""""" <*> <*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*> <úúSúCúOúOúPúEúXúúúSúWúEúDúEúNúúúHúEúAúDúQúUúAúRúTúEúRúú> .