FIFOs - Architecture and Design
Introduction
A designer encounters FIFOs in speed matching or data width matching applications. An example of speed matching is when data is being transferred in bursts from a faster clock domain to a slower clock domain that is sampling the data. An example of data width matching is where the sampling clock domain is faster but the data width is narrower than the write side. FIFOs can be synchronous or asynchronous, i.e. the read and write clocks can be synchronous or asynchronous to each other.
Full and Empty flags
The FIFO full and empty status conditions are derived from the write and read pointers of the FIFO. The write pointer always points to the next word to be written and is incremented on a write to the FIFO. The read pointer points to the current word to be read and drives the valid data onto the output port to make the design efficient.
A FIFO can be full or empty when the read and write pointers are equal because of wraparound. To resolve this, an extra bit is added to the pointers - if the MSB of the pointers are different from each other, it indicates a full condition. If the MSB bits are the same, the FIFO is empty.
Pointer Synchronization
Synchronizing read and write pointers in an async FIFO is necessary since the write pointer is generated in write clock domain and read pointer is generated in read clock domain. To generate the empty and full status flags, it is necessary to transfer these pointers from one domain to the other.
Several techniques exist for synchronizing the pointers. One method is to synchronize the read and write strobes and use counters in read and write domains. The read counter tracks the number of valid data entries while the write counter tracks the number of entries to store data. The read counter is decremented on each read strobe, the read strobe is synchronized to write clock before incrementing the write counter. Similarly, the write strobe decrements the write counter and is synchronized to read clock before incrementing the read counter.
The strobes are synchronized using toggle synchronizers and indicate pessimistic empty/full status as there is latency in synchronization. The disadvantages of this method is that large counters are required for large FIFOs and also since there should be atleast two cycles spacing in strobes in slow clock domain (see toggle synchronizer), the data rate is inefficient.
Another method is to synchronize the read and write pointers but this is problematic in binary as more than one bit can change at a time and synchronization is unpredictable. The solution is to use Gray code counters that change one bit at a time, synchronize and generate the empty and full flags.
FIFO Depth
Calculating the depth of the FIFO requires the write and read clock frequency relation, burst rate on the write clock domain, synchronization latency and any idle cycles in the read domain.
Scenario 1:
Consider the case of a FIFO where the write clock frequency is 100 MHz and 50 words are written into the FIFO in 100 clocks while the read clock frequency is 50 MHz and one word is read out every clock.
In the worst case scenario, the 50 words are written into the FIFO as a burst in 500 ns. In the same time duration, the read side can read only 25 words out of the FIFO. The remaining 25 words are read out of the FIFO in the 50 idle write clocks. So the depth of the FIFO should be atleast 25 ( + synchronizer latency) = ~28.
The FIFO depth is calculated as
Depth = Burst_size * { 1 - (Frd/(Fwr * Idle_cycles)) }
Scenario 2:
Consider the case of a FIFO where the write clock frequency is 100 MHz and 80 words are written into the FIFO in 100 clocks while the read clock frequency is 80 MHz and 8 words are read out every 10 clocks. There is no feedback mechanism to throttle the writes to the FIFO.
In the worst case, the FIFO will write 80 words in a burst into the FIFO in 800 ns. In the same time, the read side can read only ~51 words ( (800/125) * 8 ) in that same time period. In the remaining 200 ns, only ~13 words ( (200/125) * 8 ) can be read out of the FIFO leaving 16 words on the floor. So the FIFO will need to be of infinite depth to make this design work!
For more details on FIFO design and Verilog code, the reader is recommended to read Cliff Cumming’s paper on Asynchronous FIFOs
Gray code counters
While designing modules with asynchronous clock transfers, one may encounter the problem of transferring multi-bit data bus from one clock domain to another. To dual synchronize these bits and hope that all the bits are latched on the same clock is problematic. To eliminate this problem, we use Gray code counters where only one bit changes during each clock transition.
The most common Gray code is where the lower half of the sequence is exactly the mirror image of first half with only the MSB inverted. We illustrate the 3-bit binary Gray code as an example.
Gray code counter schematic (from Cliff Cumming's paper)
Gray code to equivalent binary conversion is simple and is as shown below
bin[2] = gray[2];
bin[1] = gray[2] ^ gray[1] (XOR function)
bin[0] = gray[2] ^ gray[1] ^ gray[0]
Verilog module is as below
-
module gray2binary_converter (binary, gray);
-
-
parameter NUM_BITS = 3;
-
output [NUM_BITS-1:0] binary;
-
input [NUM_BITS-1:0] gray;
-
-
reg [NUM_BITS-1:0] binary;
-
integer i;
-
-
always @(gray) begin
-
for (i=0; i<NUM_bits; i=i+1)
-
binary[i] = ^(gray>> i); // Add padded 0's for the significant bits
-
end
-
-
endmodule
Similarly, the Binary to Gray conversion is achieved by
gray[2] = binary[3];
gray[1] = binary[2] ^ binary[1];
gray[0] = binary[0] ^ binary[1];
Verilog code is
-
module binary2gray_converter (gray, binary);
-
-
parameter NUM_BITS = 3;
-
output [NUM_BITS-1:0] gray;
-
input [NUM_BITS-1:0] binary;
-
-
assign gray = (binary>> 1) ^ binary; // Right shift binary vector and XOR
-
-
endmodule
The gray code counter can be implemented using these functions - please refer to Cliff Cumming's excellent paper on asynchronous clock domains.
Asynchronous and Synchronous Resets
Designing power-up reset sequence and reset structures in a chip is a critical task and there are many issues one needs to be aware of. Incorrect reset generation can cause intermittent failures that are hard to debug and in some cases can also make a chip DOA (Dead on Arrival).
In this post, we will look at asynchronous/synchronous resets, reset synchronizers and also factors that may affect reset sequence in a chip. Most of the information presented here is derived from Cliff Cummings et. al. excellent paper and the reader is strongly recommended to read the paper at his leisure.
Resets can be either synchronous or asynchronous and each flip-flop has a timing window during which the reset cannot change transition. Recovery time is known as the minimum time the reset should be stable BEFORE the active clock edge (setup time) while Removal is the minimum time the reset should be stable AFTER the active clock edge (hold time).
Advantages of using Synchronous resets are :
- The reset is active only on active clock edge.
- Fewer number of gates (although negligible)
- Timing analysis is easier as the reset is synchronous to the clock.
A major disadvantage of synchronous reset design is that the clock should be running at the time of reset. In some chips, this may not be feasible due to gated clocks or due to requirements of the design.
Advantages of using Asynchronous resets are :
- No extra logic on the datapath making timing closure easier
- No clock required at the time of reset
The problem with asynchronous resets is that they can cause flops to go metastable, hence care must be taken at the time of assertion or deassertion of reset. Another issue is that timing analysis should include checks for recovery and removal times.
Reset Synchronizer
A novel technique to overcome the issues with asynchronous resets is to use Reset synchronizers. A reset synchronizer ensures that the reset removal does not cause any metastable problems – it resets the design asynchronously ( i.e. without a running clock) while the deassertion is synchronous!
Reset synchronizer
A reset synchronizer circuit is shown above, the two flops are dual stage synchronizers to synchronize the reset to the clock. On assertion of the chip reset, the synchronizer output drives the internal reset to the flops in the design. Deassertion can only happen during the next active edge. An important point to note is that the second flop in the synchronizer cannot go metastable as both the input and output points are both low when the reset is removed.
The two flops in the reset synchronizer should not be made scannable for DFT and a bypass mux is added at the output of the reset synchronizer to control the reset in test modes. Also note that a separate reset synchronizer will be required for each clock domain.
Another important requirement in many multi-clock domains is sequencing of resets – i.e. reset in one clock domain must be deasserted prior to reset in another clock domain. The author has come across designs where this requirement was neglected or overlooked causing critical issues in Silicon. A circuit below using reset synchronizers illustrates this.
Reset Sequencer (resetb is deasserted only later than reseta)