From Wikipedia, the free encyclopedia
- Processor: 93.75 MHz
based on MIPS R4300i-series 64-bit RISC CPU
- L1 cache: 24 KiB (split: 16 KiB instruction, 8 KiB data). No L2 cache.
- Busses: 32-bit address and data.
- CPU to RCP Bandwidth: 250 MiB/s (non-DMA). CPU can not directly access RAM.
- Instruction Set: MIPS R4000 64-bit. Addressable Memory Space: 4 GiB (Virtual 1 TiB).
- 5-stage scalar pipeline. Integrated FPU. 93 million operations per second.
- 4.6 million transistors
- Manufactured by NEC using 0.35 µm process.
- RAM: 4 MiB RDRAM (upgradeable to 8 MiB with 4 MiB Expansion Pak)
- Data path: 9-bit width at 500 MHz
- Potential Memory Bandwidth: 562.5 MiB/s
- ~640 ns RAM latency
- Graphics: SGI 62.5 MHz 64-bit RCP (Reality Coprocessor) contains two
- RSP (Reality Signal Processor) controls 3D graphics and sound functions
- MIPS R4000-based 8-bit integer vector processor
- Programmable through microcode (µcode). Allows functions to be modified or added.
- Transformation, clipping, lighting, triangle setup, and audio decoding (audio could be done on main CPU as well)
- Geometry throughput: initially ~100,000 polygons per second with full quality. Some later games go higher with highly optimized microcode.
- RDP (Reality Drawing Processor) rasterizer handles all pixel drawing
operations in hardware, such as:
- Z-buffering (maintains 3D spatial relationships, is Mario in front of the tree or vice-versa?)
- Anti-aliasing (smoothes jagged lines and edges)
- Texture mapping (placing images over shapes, for example mapping a
face image to a sphere creates head)
- Bilinear filtering (prevents texture blockiness by blurring when resizing)
- Mip-mapping (creates distance textures of varying degress of fidelity)
- Trilinear mip-map interpolation (filters mip-maps and textures smoothly without blockiness). Nintendo 64's filtering is not entirely accurate. Precision was reduced to lower mathematical demands
- Perspective-correct texture mapping (keeps textures from "warping" when viewed at different angles)
- Environment mapping (best seen with metal Mario in Super Mario 64)
- Gouraud shading, Level of Detail (LOD)
- Fillrate: ~30 megapixels/sec with Z-buffering enabled
- 128-bit internal data bus between RSP and RDP. ~1.0 GiB/s bandwidth.
- Resolution: 256 × 224 to 640 × 480 pixels flicker-free, interlaced
- Color depth: 16.7 million colors (32,768 on-screen)
- RSP (Reality Signal Processor) controls 3D graphics and sound functions
- Sound: 16-bit Stereo. ADPCM-support. Some games used MP3 audio (software-driven).
- Channels: 100 PCM (max, 16-24 avg.). Each channel consumes about 1% CPU time.
- Sampling: 48.0 kHz (max, 44.1 kHz is CD quality)
- Media: 32 to 512-MBit (4 to 64 MiB) cartridges
- Dimensions: 10.23 × 7.48 × 2.87 inches (260 × 190 × 73 mm) W×D×H
- Weight: 2.4 lbs (1.1 kg)
- Controller: 1 analog stick; 2 shoulder buttons; one digital cross pad; six face buttons, 'start' button, and one digital trigger.
Architecture and Development
From Wikipedia, the free encyclopedia
The CPU was primarily used for game logic, such as input management, some audio, and AI, while the RCP did everything else. The RDP component basically just read a FIFO buffer and rasterized polygons. The RSP was the transform portion of the RCP, although it was really just a DSP, similar to a MIPS R4000 core, designed to work with 8-bit integer vector operations.
In a typical N64 game the RSP would do transforms, lighting, clipping, triangle setup, and some of the audio decoding. Nintendo 64 was one of the few consoles without a dedicated audio chip so these tasks fell on the RSP and/or CPU. It was relatively common to do audio on the main CPU to increase the graphics performance. Workload on the Nintendo 64 could be arranged almost in any way the programmer saw fit. This created a fascinating system that was quite flexible and moldable to the game's needs, but it also assumed the programmer would be able to properly profile the code to optimize usage of each part of the machine.
The RSP is completely programmable, through microcode (µcode). By altering the microcode run on the device it can perform different operations, create new effects, be better tuned for speed or quality, among other possibilities. However, Nintendo was quite unwilling to share the microcode tools with developers until the end of Nintendo 64's lifecycle when they shared this information with a select number of companies. Programming RSP microcode was said to be quite difficult because the Nintendo 64 µcode tools were very basic, with no debugger, and poor documentation. As a result, it was extremely easy to make mistakes that would be very hard to track down; mistakes that could cause seemingly random bugs or glitches. Some developers noted that the default SGI microcode ("Fast3D") was actually quite poorly profiled for use in games (it was too accurate), and performance suffered as a result. Several companies were able to create custom microcode programs that ran their software far better than SGI's generic software (i.e. Factor 5, Boss Game Studios, and Rare).
Two of the SGI microcodes:
* Fast3D microcode: < ~100,000 polygons per second
* Turbo3D microcode: 500,000-600,000 polygons per second with PlayStation quality. Nintendo never allowed this code to be used in shipping games. (It was even actually banned by them to prevent extremely high prices.)
The Nintendo 64 had some glaring weaknesses that were caused by a combination of oversight on the part of the hardware designers, limitations on 3D technology of the time, and manufacturing capabilities. One major flaw was the limited texture cache of 4KB. This made it extremely difficult to load large textures into the rendering engine, especially textures with high color depth. This was the primary cause of Nintendo 64's blurry texturing, secondary to the blurring caused by the bilinear filtering and limited ROM storage. To make matters worse, because of how the renderer was designed, if mipmapping was used the texture cache was effectively halved to 2KB. To put this in perspective, this cache could be quickly filled with even small textures (a 64x64 4-bit/pixel texture is 2KB and a 128x64 4-bit/pixel texture is 4KB). Creative developers towards the end of Nintendo 64's lifetime managed to use tricks such as multi-layered texturing and heavily clamped small texture pieces to simulate larger textures. Conker's Bad Fur Day is possibly the best example of this ingenuity.
There were other challenges for developers to work around. Z-Buffering significantly crippled the RDP's fillrate so managing the Z-depth of objects, so things would appear in the right order and not on top of each other, was put on the programmer instead of the hardware to get maximum speed. Most Nintendo 64 games were actually fillrate limited, not geometry limited, which is ironic considering the great concern for Nintendo 64's low ~100,000 polygon per second rating during its time. In fact, World Driver Championship was one of the most polygon-loaded Nintendo 64 games and frequently would push past Sony PlayStation's typical in-game polygon counts. This game also used custom microcode to improve the RSP's capabilities.
The unified memory subsystem of Nintendo 64 was another critical weakness for the machine. The RDRAM used was incredibly high latency memory (640 ns read) and this mostly cancelled out its high bandwidth advantage. A high latency memory subsystem creates delays in how fast the processors can get the data they need, and how fast they can alter this data. Game developers also said that the Nintendo 64's memory controller setup was fairly poor, and this magnified the situation somewhat. The R4300 CPU was the worst off component because it had to go through the RCP to access main memory, and could not use DMA (the RCP could) to do so, so its RAM access performance was quite poor. There was no memory prefetch or read under write functionality either.
Despite these drawbacks, the Nintendo 64 hardware was architecturally superior to the PlayStation. It was, however, more challenging to program for and to reach peak performance/quality.
One of the best examples of rewritten µcode on Nintendo 64 was with Factor 5's Indiana Jones and the Infernal Machine. In this game the Factor 5 team decided they wanted the game to run in high resolution mode (640x480) because of how much they liked the crispness it added. The machine was taxed to the limit running at 640x480 though, so they absolutely needed to scrape every last bit of performance they could out of Nintendo 64. Firstly, the Z-buffer could not be used because it alone consumed a huge amount of the console's texture fillrate. To work around the 4KB texture cache the programmers came up with custom texture formats and tools to help the artists make the best possible textures. The tool would analyze each texture and try to choose the best texture format to work with the machine and look as good as possible. They took advantage of the cartridge as a texture streaming source to squeeze as much detail into each environment, and work around RAM limitations. They wrote microcode for realtime lighting, because the SGI code was poor for this task, and they wanted to have even more lighting than the PC version had used. Factor 5's microcode allowed almost unlimited realtime lighting, and significantly boosted the polygon count. In the end, the game was more feature-filled than the PC version (quite a feat) and unsurprisingly, was one of the most advanced games for Nintendo 64.
Factor 5 also showed ingenuity with their Star Wars games, Star Wars: Rogue Squadron and Star Wars: Battle for Naboo, where their team again used custom microcode. In Star Wars: Rogue Squadron the team tweaked the microcode for a landscape engine to create the alien worlds. Then for Star Wars: Battle for Naboo they took what they learned from Rogue Squadron and pushed the machine even farther to make the game run at 640x480, and implement enhancements for both particles and the landscape engine. Battle for Naboo enjoyed an impressive draw distance and large amounts of snow and rain even with the high resolution, thanks to their efforts.
In essence, the Nintendo 64 is a dedicated 3D machine, similar in concept to the 2D optimised 16bit Super Nintendo - a machine dedicated to doing one task very well. While the machine is capable of 2D graphics and audio processing, the special hardware - the custom chip - have been designed for 3D processing.
The specifications of the machine are widely available. This page aims to be different so we won't be printing the same stuff that everyone else is. For the benefit of new readers, the N64 has the following capabilities (actual hardware performance is detailed below):
From what has been seen so far, the N64 does not process many more polygons per second than the Playstation or Saturn. The difference is that the graphics hardware can add smoothing and lighting that the other systems are not able to do easily or at all.
The Nintendo 64 has three main features. Edge magazine provided an excellent photo of the N64 motherboard and information about the various chips:
The heart of the console is the NEC MIPS R4300i processor. This is a low-cost chip based on the workstation-class MIPS 4400. The processor is able to operate in either a 32bit or 64bit mode and includes both a 64bit integer data execution unit and a 64bit floating point unit.
The chip has a single issue, five stage instruction pipeline that handles both the integer and floating point instructions. The chip also has two 64bit wide on-chip caches, a 16k instruction cache and an 8k data cache. The caches provide a 20% performance increase. Memory management is provided off-chip on the RCP.
The R4300i used in the Nintendo 64 is internally clocked at 93.75Mhz which is slower than the original 100Mhz design specification, but is still faster than any other currently available console. Externally, the R4300i uses a 32bit system interface. It can execute 125 Dhrystone MIPS and is rated at 60 SPECint92 and 45 SPECfp92.
For more information about the R4300i, check out the MIPS website
The second component of interest is the custom chip - the Reality Co-Processor (RCP). This is a 62.5Mhz chip that interfaces directly to the CPU. The RCP is designed to handle most of the audio and graphics processing. In addition to this, the chip also contains DMA logic, audio and video outputs, and a joystick input. The chip also supports timing and signals for the game cartridges.
There are two processors inside the RCP:
- RSP: Short for Reality Signal Processor. The RSP performs all 3D manipulations and audio functions. A special feature is that this processor is configurable via microcode allowing the system to be optimised over time.
- RDP: Reality Drawing Processor. A pixel drawing processor. This unit performs all pixel-level operations including texture mapping, anti-aliasing, tri-linear interpolation, MIP mapping and z-buffering.The RDP operates on a display list to provide it's graphics output meaning the RDP is essentially an Object List Processor.
SGI claim that the RCP contains a vector processor (probably the RSP) that can perform over half a billion arithmetic operations per second - approximately 10 times the raw compute power of a lowend Pentium. Internally the RCP performs 128bit processing, although this could be "split" between the RSP and RDP.
The audio is generated by both the R4300i and the RCP, presumably using the CPU to provide the music data, while the RCP performs the actual sound generation. It is possible to produce 16bit stereo sound at up to 48Khz (greater than CD at 44.1Khz). The number of sound channels is not defined in hardware. The total number of simultaneous voices depends on the software although 64 channels is apparently possible.
The process of designing the RCP took six supercomputers running simulations for 24 hours that took seven days to complete. The result is a processor smaller than a fingernail.
RAMBUS Unified Memory Architecture
There is a trend forming to use a Unified Memory Architecture (UMA) in forthcoming systems. The abortive M2 console also features a UMA. Although the Atari Jaguar was the first console to use a UMA, the Nintendo 64 is the most successful. A UMA uses a single chunk of RAM. All processors access and share the same memory.
The advantages of using a UMA is that the programmer can decide how much memory to allocate to program code, graphics data, screen buffer, audio data etc. In older systems (including the Sony Playstation and Sega Saturn), memory was assigned for certain tasks (1 megabyte VRAM, 2 Megabytes Main RAM etc). If a game does not use all the sound memory, it cannot be used for bitmaps. The UMA is a more flexible architecture.
The Nintendo 64 uses 36Mbits of RAMBUS 9bit DRAM. In more common terms this equates to 4 megabytes. The extra bit is presumably used for parity. The chips use a fast 8bit channel running at approximately 500Mhz. This yields a bandwidth of 520.5MB/sec or 4,500M bit/sec (exact benchmarks vary). The high speed of the memory, coupled with a 256k cache for both the R4300i and RCP, allow the system to make full use of the UMA.
It is rumoured that the 64DD will have an extra 4 megabytes of RDRAM which will plug into the front expansion connector on the top of the N64 taking the total amount of RAM in the system to 8 megabytes.
I didn't think the RAMBUS website was very good, but here's a link to it anyway.
How it works:
Much of the following is courtesy of BYTE magazine, an excellent read about the serious side of the computer industry - recommended! Check out the BYTE website for more information.
The Nintendo 64 partitions audio and graphics into separate tasks. The R4300i works as the central controller and interrupt handler. It also handles all high-level audio processing functions (the number of channels depends on what else the CPU is doing).
For example, the R4300i uses the FPU to synthesise high-precision audio wave forms. The RCP handles those jobs where software algorithms alone can't meet the bandwidth requirements. To generate sounds, the R4300i processes a list of musical events (for example, MIDI notes) to determine the resource and timing requirements. It then builds a digital signal processing command list, starts a DMA transfer of data from mass storage to main memory, and then goes to the next task. The RCP parses the command stream and processes the data in main memory. The DMA controller then sends the processed data to a digital-to-analog converter (DAC) for sound generation.
For generating graphics, the R4300i can readily create and manipulate models (3-D objects described as a mesh of polygons) for use in game scenes. When the game code needs to update the position and the attributes of the models, the R4300i can handle these updates in real time. The models are next forwarded to the graphics coprocessor, which performs matrix manipulation and renders the image. The R4300i's 64-bit mode gives game developers extra precision for models and other calculations without having to write high-precision algorithms or incurring a performance penalty.