Memory Interfacing and Pipelined Functions
- Dec 12, 2018
- 7 min read
Project Details
Title: Memory Interfacing and Pipelined Functions
Class: Computer Design
Semester: Fall 2018
Programming Language: Verilog
Software: Xilinx Vivado
Hardware: Digilent’s ZYBO FPGA board
Introduction
As table 1 shown, we are provided some of the functions. The functions are provided to ensure that the memory can be accessed by the user_logic.v and can be displayed on the screen. This lab contains three subparts. In the first part of the lab, we are required to drive a memory bus to access the memory. The second part is to manipulate the memory by some function and rotation, the screen will display the corresponding memory. The third part is based on part two but more than part two. In part three, when part two’s value is written into the buffer, we read the value from the buffer and rewrite and display it on the screen.

Part one: Memory Bus Protocol
We are creating an initiator that can connect with the target. As shown in figure 1. The initiator outputs px_request, px_write, px_col_add, px_row_add and px_write_data and the Target takes these as inputs. We extract the column and row address and insert them into the px_write_data wait for the instruction. When the px_request signal is high, then check if px_write is high or low. When px_write is high, we make sure this is a write data mode. When it is low, it should be the read mode. When the signal px_ready comes from the target is ready, then we start to write the signal to the target. In our case, the target is the graphics frame buffer.
When we push the button[1], it will do nothing until it releases. When we detected the button[1] is released. The screen should display the memory.
The design of the state machine is shown in figure 2. State A and B are easy to understand. In state C, I go through every pixel display on the screen. The size of the pixels is 255 x 255. If the state machine goes through all the pixels, then we set the px_request to low and px_write to low. Otherwise, I keep looping through the entire screen. I have also implemented the sequential block, in every clock cycle, if the reset is high, then we reset everything. Otherwise, the present state will receive the next state. Since we are provided the other functions, I can assign the output data and directly display it on the screen.
As figure 5 is shown, there are 154 registers and 235 LUTs as logic are used.
The value of WNS is 2.605ns and is shown in figure 6 and the critical path is 7.661ns.

Figure 1: The relationship between the initiator and the target

Figure 2: The state machine of part one

Figure 3: The worst negative slack

Figure 4: the ASM of part one’s design

Figure 5: the on-FPGA resource from part one

Figure 6: the positive arrival time is the critical path based on my design
Part two: The Design of the Pipelined Functions
To manipulate the memory based on the equation: Px,y = RotateLeft(((x − 128)^2 + (y − 128)^2 )^ 2 , k). Each time the button[1] is pressed, a local variable k is incremented. We take the result from the equation and rotate k-bit. Since k is a 4-bit number, there would be 2^4 = 16 different results. If I directly input the column and row address as x,y to the function, it will give us the clock violation. Therefore, we need to pipeline this function. In order to pipeline this function, we separate this function into several parts.
Ux,y = (x − 128)^2
Vx,y = (y − 128)^2
Sx,y = Ux,y + Vx,y
Wx,y = S^2x,y
Px,y = RotateLeft(Wx,y, k)
Since the computation of Ux,y and Vx,y are very fast, there is no need to pipeline here. Therefore, I choose to pipeline Sx,y by introducing a register after Sx,y. In this way, while the function Wx,y and Px,y are calculating, Sx,y_next will also be computing. When Px,y is done, Sx,y_next is also ready to go. The timing after pipelining satisfied the condition. The pipeline design is shown in Figure 6.
In the state machine, k value is incremented whenever the button[1] is released. According to k’s value, it will be assigned to a combinational logic. The rotation result will be rotated k bits in each pixel. The state machine is built on top of the part one.
As figure 8 is shown, part two only use 129 registers and 198 LUTs as logic, which is fewer registers than part one, but we have more implementation than part one, which is really surprising.
As figure 7 and 9 are shown, the WNS is 2.386s and the critical path according to my timing report is 7.797ns.

Figure 6: The block diagram of the pipelined function

Figure 7: the worst negative slack

Figure 8: the on-FPGA resource from part 2

Figure 9: the positive arrival time is the critical path based on my design
Part three: Post-processing the Frame Buffer
For part 3, we are required to manipulate the frame buffer further by reading and writing the value from the frame buffer. Therefore, I need to introduce several new states in the state machine since part two only have write state. The design is shown in figure 10.

Figure 10, the ASM design of part three.
In state A, we check if the button[1] is pressed or not. If the button is pressed, then goes to state B. Otherwise, check if the button[2] is pressed. In state B, we check if button[2] is released, if not, we stay on state B. Otherwise, we incremented k and go to state H. The reason we have state H is to introduce an extra state to allow the result to wait for another clock cycle. Therefore, pxy will not try to rewrite the px_write_data_nxt in the next clock cycle. In state H, we set to the write state and request the data, then goes to state C. In state C, when the pixel is ready to write, we update the counter and check if the counter reaches the end, if the counter reaches the end of the frame buffer, we go to state A and set request and write to low. When button[2] is pressed, we go to state D. In state D, we check if D is release or not. If the released, we set it to the read state and start to request data, then go to the next state E. In state E, when the pixel is ready to be manipulated, the function starts to request the data. In case of nothing's changed, the set the next write data to the present write data. In state F, we also do the transformation to make the screen darker, lighter or interchange the color from red to blue. In the transformation, we take the data from the px_read_data, and change the px_read_data and write it on px_write_data correspondingly depend on the user input. Next, we go to state F. In this state, we check when px_ready is on, then, we update the counter, and check if we reach the last pixel. If we reach the last pixel, we go to state A, otherwise, we go back to state D, and keep looping until we reach the last pixel.
According to the on-resource from Vivado, There are 288 LUTs are used in part 3 and 185 registers are used as registers, 17 are used as latches. As figure 12 and 13 are shown, the WNS is 3.241ns.
Unfortunately, part three does not work well with part two together, but they work fine separately. I think the reason might be two px_write_data is overwriting each other.

Figure 11: the on-FPGA resource from part 3

Figure 12: The worst negative slack

Figure 13: the positive arrival time is the critical path based on my design
Conclusion
This is a very hard project. but I find it very interesting to explore higher level design. It is very fascinating that we can optimise the calculation speed by intruding the pipeline function, and the fact that part two is built on top of part one, but part two has faster speed than part one. I also prefer to manipulate the frame buffer and use the VGA to connect it to the screen. We would be able to visualise the output and those awesome output effect.
However, designing and coding are very hard. These three parts of the project are built on top of each other, the design is getting more and more complex. I need more help each part from the professor and the TA. They gave me many ideas to approach this project and the way to design the finite state machines. Before the debugging part two, I checked that my k value got incremented, and my function outputs change correspondingly if I manually rotate the bits. However, when I insert k to rotation function, it only works for one cycle. I spent so much time to debug it and can not find the problem. With the TA’s help, I finally realised that I did not reset the counter to zero when I finish counting. The rotation only happened on the last bit, and the change is so subtle that I will not even notice. After realising this problem, we fixed it quickly and it works perfectly.
Part three is the hardest one for me in the entire project. Since we need to read and write the pixel from and to the frame buffer, we have to add more state machine to allow it to work properly. With professor and TA’s help, the design is very clear to me. I tried to follow the design as close as possible, but it still does not work properly. However, it works fine separately. I spent so much time debugging this problem. I believe the problem is because the px_write_data is overwriting each other between part two and part three. I tried many ways to debug it, but it does not work well somehow.
Overall, this project gave me so much learning on FSM, pipeline function, background knowledge of the frame buffer. I really enjoyed it.
Video













![[Embedded Low Power] Temperature Display](https://img.youtube.com/vi/rQbG31gI3uw/mqdefault.jpg)










Comments