THE VLSI HOMEPAGE

A Practical guide to VLSI Design and Verification..

Statistical Static Timing Analysis

Posted in Static Timing Analysis by Nigam on the October 14th, 2007

Various type of timing variations make accurate modeling of interconnect and circuits very hard as we move to nanometer processes. In traditional static timing analysis, based on process parameter files we characterize the cells, calculate the cell delays and crosstalk effects and run timing analysis. This however hides the silicon process variations from the designer.

These variations are broadly classified into Process variations (due to fabrication such as oxide thickness, transistor width, effective channel length etc.) and Environmental variations (temperature and voltage variations).

Process variations is further classified into Inter-die variations that vary from die to die and affect all the chips on the die in an exact similar way and Intra-die variations that may cause variations within a single chip or between chips on the same die.

To account for the inter-die variations, timing analysis at multiple different corners are run on the chip. But as we move towards nanometer processes, intra-die variations are significant and very complex to be captured in traditional STA as each region within a chip can use different process corner.

To overcome this limitation, an alternative approach known as Statistical Static Timing Analysis is proposed where the delays are not represented as fixed numbers but as probability density functions (pdf) taking the statistical distribution of variations into account.

In traditional timing analysis, the delay at the gate outputs are computed using “sum” of gate delay and delay at the gate input. Once all the component delays have been determined, a “max” operation gives the maximum arrival time at the output. SSTA is similar to STA except that the interconnect delays and cell delays are probability density functions (pdf like Gaussian Model) instead of numbers.

SSTA can again be broadly classified into path-based methods (find pdf on a path-by-path basis and then perform statistical add operation to find the delay distribution) and block-based methods (perform critical path analysis, processing each gate once but much faster than path-based methods if the number of paths are relatively large).

Monte-Carlo simulations is the most simple method for SSTA where given an arbitrary distribution, the tool generates sample points and runs analysis at each sample point and aggregates the results to find the delay distribution. The major disadvantage is the larger runtimes required.

Another complication in SSTA is to account for spatial corelations within a single chip - partition the chip into x by y grids, each grid modeled using corelated variables.

Several SSTA techniques have been proposed and it is still at a nascent stage - IBM recently announced a set of statistical timing analysis tools for 45 nm and below.

Sphere: Related Content

On Chip Variation and CRPR

Posted in Static Timing Analysis by Nigam on the September 27th, 2007

Static timing analysis in a chip is largely dependent on Process, Temperature and Voltage variations (PVT), the cell delays and interconnect delays vary largely with these factors. Hence it is necessary to run timing analysis in both worst and best case operating conditions and ensure we meet setup/hold requirements for the chip.

For worst case corners, we specify the chip running at high temperature, low voltage and a slow process (high cap). For best case corner, the voltage is high, temperature is low and a fast process (low cap). Setup is more problematic in slow corner because of larger cell/interconnect delays and hold is more problematic in the fast corner.

Another factor that needs to be considered during timing analysis is on-chip variation (OCV). On a single chip, there can be variations for two exactly similar gates due to other variables during manufacturing process. This variation can be anywhere between 8-12% and needs to be included in timing analysis for a more accurate and foolproof picture.

To add OCV analysis in Synopsys Primetime, we use timing derate factor for min/max cases (8-12%) as shown below. This specifies that the min paths can be faster than the max paths by 40% !

set_timing_derate –min 0.8 –max 1.2

Next, we use the “on_chip_variation” switch as shown below to enable OCV

set_operating_conditions -analysis_type on_chip_variation

However, if you look at the reports carefully, you will notice that Primetime is overtly pessimistic i.e. if there is a common branch of clock tree between launch and capture flops, Primetime varies this clock tree delay depending on OCV (for example, for setup analysis, it will slow down the common clock tree branch delay for launch flop and will fasten the same branch to capture flop!)

To counter this, Clock Reconvergence Pessimism Removal (CRPR) feature is added in Primetime. CRPR is enabled by using the command below

set timing_remove_clock_reconvergence_pessimism true

By enabling this feature, Primetime looks at the common logic in clock and data path, removes the difference between their max and min delays thus projecting a more realistic picture.

For more details on OCV and CRPR, please refer to the paper at the link below.

On Chip Variation Analysis

Sphere: Related Content

Input and Output Delays in Primetime

Posted in Static Timing Analysis by Nigam on the September 15th, 2007

Following the review of setup and hold time analysis in previous post, we will now cover timing analysis at the primary pins of the chip. The IO pins are constrained using input and output delays. We will look at an example to constrain IOs in a design using Synopsys Primetime, the widely used static timing analysis tool.

Input and Output delays in Primetime

Input and Output constraints for a chip

In the figure above, we have RXD (the data) being clocked by RXC (period = 8ns), both being primary inputs to the chip. The input specification for this interface in our chip is 4 ns setup and 1 ns hold. The RXC input clock is clock tree buffered and clocks an output flop. The TXD and TXC are primary outputs of the chip and the output specification is a max delay of 5 ns and min delay of 1 ns.

In primetime, the set_input_delay command specifies the data arrival time at the port with respect to a clock. The set_output_delay command specifies the data required time at the port with respect to a clock.

We need to define the clock at the input using create_clock and make it a propagated clock so that primetime can propagate the input through all the buffers to the external TXC output.

create_clock -name rxc -period 8 -waveform “0 4″ [get_ports RXC]
set_propagated_clock [get_clocks rxc]

To constrain the RXD input wrt RXC clock for setup, we define the maximum external delay allowable (i.e. period - setup requirement = 8 - 3 = 5 ns)

set_input_delay 5 -max -clock [get_clocks rxc] [get_clocks rxc]

For the hold constraint, we specify the hold requirement as is.

set_input_delay 1 -min -clock [get_clocks rxc] [get_clocks rxc]

For output constraints, we need to create a generated clock source at the output pin TXC. This is necessary since without this generated clock the latency of the TXC clock source is discarded which is incorrect.

create_generated_clock -name txc -source [get_ports RXC] -divide_by 1 [get_ports TXC]

The output port TXD is constrained with delays external to the chip i.e.

set_output_delay 3 -max -clock [get_clocks txc] [get_ports TXD]
set_output_delay -1 -min -clock [get_clocks txc] [get_ports TXD]

Note the negative sign for min output delay, this is because a larger hold requirement means that you must specify a more negative output delay. If you apply the hold requirement as a positive amount, the constraint will be incorrectly relaxed by double that amount. This could cause the hold check to pass when there is really a violation.

Always be careful while defining input/output constraints and check the reports to ensure that the IOs are constrained correctly.

Sphere: Related Content

« Previous PageNext Page »

Close
E-mail It