Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (2024)

ZALA AJAYSINH, RAKESH GOSAI, SYAMALARAO GORUPUTI and DEVARAJAN DORESWAMY, eInfochips, an Arrow company.

In physical design, the goal isto achieve better Area, Power and Timing. To achieve this, we focus on placeand route of design with equal importance given to the clock structure.According to the design requirement, we use different clock structures.

This paper will give an overallidea of some widely used clock structures with emphasis on practical implementationof conventional clock tree structure along with H-Tree. For the exampleprovided here, the implementation was done on a real time database withoperating frequency of 537MHz and 1.4 million instance count for 16nmtechnology node.

This paper also describes the typesof cells and routing used for implementation of H-Tree clock structure, customizationof H-Tree clock structure based on sink distribution, and various scenarios tobe taken care of while choosing this approach. Comparison of clock-qor betweenthis approach and conventional clock tree structure is given and it showspromising improvement.

Types of CTS clock structures

The main requirements for a clock tree structure are:

  1. Minimum Insertion Delay: A clock tree with minimum insertion delay will reduce clock tree power dissipation due to few clock tree buffers, uses less routing resources.
  2. Minimum skew: Minimum skew helps with hold timing closure. However, a tight skew requirement will lead to increase in clock insertion delay, which in turn leads to increase in the clock network power.
  3. More Common Paths: Having more common paths between launch and capture flop reduces the impact of OCV effects. The variations will cancel each other when the sinks share the same clock path to the root as any process-variation occurrence in that path affects both flops equally.
  4. Low Power Dissipation: A good clock tree structure should support implantation of clock gating to save the power.

ConventionalCTS (Figure 1) has been the default choice for most of the designs with lower frequencyand less number of sinks, due to its ease of implementation.

Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (1)

Conventional CTS may not be optimal choice for:

  • Designs with high clock frequency
  • Designs with high number of sinks
  • Designs with sinks spread all over the core area

Advantages

  • Simplicity of implementation
  • Better clock gating, allows to do clocking at root level
  • Low power consumption

Disadvantages

  • Sensitive to on chip variations as sinksshares fewer common paths
  • Higherinsertion delays
  • Difficult to achieve low skew due toasymmetric distribution of sinks

Mesh Structure

Clock mesh structure (Figure 2) produceslower clock skew and it is more tolerant to on-chip variations compared to conventionalCTS.

Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (2)

In the mesh structure, there will be anetwork of pre-mesh drivers to drive the clock signal from clock port to inputof mesh drivers. The output of all the mesh drivers will be shorted using ametal mesh, which will carry the clock signal across the block using horizontaland vertical metal stripes. Clock to the sinks will be routed from its nearesttap point from the mesh.

Advantages

  • Low clock skew
  • Highly tolerant to the on-chip variationsbecause of more common paths from the clock root to sink pin
  • Possible to achieve lower insertion delays

Disadvantages

  • Highpower (dynamic) consumption due to the parallel drivers driving the highcapacitive load created by the mesh
  • Requires more routing resources to createmesh
  • Inability to use clock gating in differentlevels of the structure – the gating has to be performed at the local levelonly.
  • Difficult to implement

H-Tree

H-Tree structure is based on equalization of wirelengths. An ideal H-Tree (Figure 3) will see same length of wire segments andsimilar kind of drivers from its clock root pin to the output of last leveldrivers. The outputs of last level buffers will act as tap points and the sinkswill have their clock routed from the nearest tap point. It provides good OCVtolerance because of more common paths.

  • Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (3)
  • Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (4)

Anideal design for H-Tree is the one with no RAMs, ROMs, or other hardblocks. Design with blocks (macros) may create obstructions that prevent theplacement of H-Tree super drivers from its ideal location. Thelack of obstructions also enables the H-Tree routes to be perfectly straight,making it easier to ensure an ideal balance.

Advantages

  • Possible to achieve low skewdue to its symmetry
  • Good OCV tolerance because of more commonpaths
  • Less power dissipationcompared to clock mesh
  • Uses less routing sourcescompared to clock mesh
  • Some high insertion delaycompared to mesh. but lower than the conventional CTS

Disadvantages

  • Difficult to implement and requires manual assistance todeal with blockages caused by macros and power connections
  • Need drivers with high drive strength andthese drivers should be surrounded by de-cap cells to avoid IR drop violations
  • H-Tree wire segments should be routed withextra care to avoid signal integrity issues (EM)

Multisource CTS

Multisource CTS (Figure 4) is a hybrid approach and itoffers the best of conventional clock tree and clock mesh. It has a coarsermesh structure compared to clock mesh, driven by mesh drivers. In general,there are three to nine levels of buffer or clock gating after mesh.

Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (5)

Advantages

  • Higher amount of common paths than conventional clock tree whichmakes it more immune to OCV compared to conventional tree
  • Less power consumption than clock mesh due tocoarse mesh fabric
  • Requires less routing resources compared toclock mesh
  • High flexibility of implementation for macro dominateddesign in comparison to clockmesh
  • Lesser skew in comparison with conventionalclock tree
  • Lesser insertion delay
  • Ease of power gating

Building an H-Tree: A practical approach

H-Tree building is mainly dividedinto the following three major steps. It starts with the clock root pin defined,for which we want to create conventional clock tree structure and H-Tree.

Placement of Clock Tree: In this step, High drive strength ((X96/X128)) clock cells will be placed based on a given predefined location.These locations are mainly governed by the types of H-Tree customization neededto cater to block requirement.

Routing of Clock Tree Net: Routing of clock tree net is done keeping the below points in mind.They are also assigned specific NDR and given ‘don’t touch’ on these clock treenets.

  • All routes should be asstraight as possible to minimize skew.
  • All routes should be inthe top metal layer.

CCOPT from Anchor Point: H-Tree endpoint will be treated as an anchor point. The tool will doCCOPT from the anchor point. We will set attribute to CCOPT; so, it willbalance sink of all anchor points. CCOPT will also distribute and, if needed,swap sink among all anchor point to achieve the target latency and skew.

Data comparison: Conventional CTS vsConventional CTS +H-Tree

Thefollowing table shows clock-qor comparison between conventional CTA andconventional CTS + H-Tree.

Sr. No. Parameter Conventional CTS Conventional CTS+ H-Tree
1. Max / Min Clock Level 32/47 12/20
2. Latency 1.177ns 0.449ns
3. Skew 0.143ns 0.110ns
4. Power 1007mW 994mW
5. Total Clock BUF/INV 23000 21323
6. CTS Routing Layer M2 to M9 M10 & M11 (H-Tree) M4-M9 (conventional CTS from the anchor point)
7. CTS Cell P8U20A_CKINVX16 P8U20A_CKINVX16 P8PU20A_CKINVX96

Table 1

Comparison done with below clock specification: Frequency: 537 MHz, Sink: 404896, Technology: 16nm, Instance: 1451840

Sample examples from actual projects usingH-Tree

Figure 5a is the defaultH-Tree structure and Figure 5b and 5c shows its variations according todistribution of sink in the design.

  • Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (6)
  • Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (7)
  • Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (8)

Figures 5b and 5c show a case of H-Tree customization,wherein sinks are distributed locally. As highlighted, we need not to generatea default H-Tree structure and customized H-Tree structure (highlighted in red)suffices the block latency and skew requirement.

H-Tree Checklist

H-Tree overlap:Thisissue will be generally found in Macro dominated designs.

  • Macroto super buffer overlap – To fix this we can refine the H-Tree cell location
  • MultipleH-Tree super buffer overlap – Need to specify different offset to each H-Treeto avoid overlap between different H-Tree super buffer.

H-Tree Routing:Should be in top layer (M10-M11).

H-Tree Via:In order create robust H-Treevia structure, via stacks should be dropped on output pin of super buffer. Figure6a shows that stack via was not dropped on output pin of super buffer becauseof VSS Net over super buffer.

H-Treestack via issue was resolved by refining cell placement of super buffer. Asshown in Fig.6b, stack via dropped over super buffer output pin.

  • Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (9)
  • Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (10)

Future Scope

Rectilinear block: This clock structure works well inthe rectangle block, however but will need modifications of structure forrectilinear block.

Multiple H-Tree: When we have multiple H-Tree, dependingon the congestion and density of the block there could be H-Tree shorts in theCTS structure due to overlapping of the trees. We can further modify script toaddress this issue at the clock stage.

Conclusion

These H-Tree clock networks may be an alternative to themore traditional clock distribution networks.

The proposed clock tree optimization methodologies reducethe power dissipation without any impact on signal characteristics. Theinductive behavior of the interconnects are reduced decreasing inductive noise.

In conclusion, when there is a tight skew requirement of 80~100ps and latency requirement <500ps and number of sink more than 10,000, using the H-Tree structure will be able to achieve better power, latency and skew.

Suggested additional reading

1. https://www.design-reuse.com/articles/21019/clock-mesh-benefits-analysis.html

2. https://www.electronicdesign.com/products/what-s-difference-between-cts-multisource-cts-and-clock-mesh

3. https://www.electronicdesign.com/products/introduction-multisource-clock-tree-systems

Clock Tree Optimization Methodologies for Power and Latency Reduction - Semiconductor Digest (2024)
Top Articles
Latest Posts
Article information

Author: Edmund Hettinger DC

Last Updated:

Views: 6134

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Edmund Hettinger DC

Birthday: 1994-08-17

Address: 2033 Gerhold Pine, Port Jocelyn, VA 12101-5654

Phone: +8524399971620

Job: Central Manufacturing Supervisor

Hobby: Jogging, Metalworking, Tai chi, Shopping, Puzzles, Rock climbing, Crocheting

Introduction: My name is Edmund Hettinger DC, I am a adventurous, colorful, gifted, determined, precious, open, colorful person who loves writing and wants to share my knowledge and understanding with you.