ECE 411 is one option for UIUC’s computer engineer’s senior design course. I chose to take it because I was interested in learning more about how CPUs work and the design decisions required. It was especially intriguing because I had just finished my summer internship at Samsung where I had learned a lot about GPU hardware design, and was curious about how the differences in workload would manifest in the physical design.
The Final Project
The final project for ECE 411 is to build your own RISC-V processor in SystemVerilog with a group. By that point we had already assembled some components like a cache and state machine-based processor, but for the final project we had to implement pipelining, which drastically changes the main CPU as well as how it connects to external components like the cache.
Our processor would be judged on speed on unknown benchmark code, though we were given some other programs to test with ahead of time.
The Stages of Development
Checkpoint 1: A Pipelined Processor
Our first goal was simply to create the pipelined processor. We began by defining the stages of the pipeline and recognizing the data that would have to pass between stages. We created a large diagram to organize our design and prepare for future changes.
Checkpoint 2: Forwarding and Hazard Detection
The next step was to implement forwarding which allows for higher concurrency in pipelined designs. However, it can also produce errors unless paired with hazard detection which verifies when forwarding will produce valid results.
At this point we also connected our cache to the system with a simple arbiter module to control signals out to memory.
Checkpoint 3: Advanced Design Features
Each group could choose its own set of advanced design features to implement to satisfy the last requirement of the project. Our group chose branch prediction, parameterized cache, L2 cache, eviction write buffer, and the RISC-V M Extension.
I personally worked the most on branch prediction, the parameterized cache, and multiplication/division. Branch prediction was something we all worked on integrating into our design very early. Parameterizing the cache was personally interesting to me because I love generalizing code and had been thinking about how to do so for a while at that point. When I was finished with those the last feature left was the M extension. We used basic division but a more advanced wallace tree multiplication method.
Details on these features and their performance implications can be found in the accompanying report.