MARCO GSRC Calibrating Achievable Design : Bookshelf

FBI : Fast Buffer Insertion for Interconnect Optimization

Work in progress: last updated Mar 14 2007.

Zhuo Li: I received many requests to distribute the source code of FBI. Now FBI source code(O(nlog2n) algorithm) and O(bn2) algorithm [5] for b buffer types have been added here. I did not distribute the source code before since I planned to distribute a new version rewriting with C++ and with more general I/O and buffer library formats. And I also want to distribute fast algorithms for minimizing buffer cost reported in [6] [7] based on framework of [2]. However, due to recent busy schedule, the plan is postponed and I decided to distribute the original C code for the acedemic use.

The buffer insertion with cost version based on [6] has been uploaded. The buffer cost need to be integer and only the binary versions for SPARC (32 bit) and LINUX (both 32 and 64 bit) are available now. It supports more realistic topology (via, multiple metal layers) and technology format (inverters). Also, O(mn) algorithm [8] for nets with m sinks has been uploaded and included in FBI 2.2 package.

NOTE

(see other slots)

Zhuo Li and Weiping Shi (Department of Electrical Engineering, Texas A&M University, College Station, Texas 77840, USA)

Contents

I. Introduction
II. The Algorithm
III.I/O Data Formats
IV. Sample Instances
V. FBI Executable
VI. Parser
VII. References


I. Introduction

This entry is about the problem of single interconnection steiner tree optimization with buffer insertion. We focus on fast algorithm for buffer insertion. For more than a decade, Van Ginneken's classic buffer insertion algorithm [1] has been the foundation of buffer insertion. However, it is becoming bottleneck due to its O(n2) time and space complexity for large number of buffer positions, large number of buffer types or inner loop of simultaneous tree construction and buffer insertion. We developed a new algorithm, FBI, that computes the same optimal buffer insertion but runs much faster. Our algorithm has time complexity O(nlogn) for two-pin nets and O(nlog2n) algorithm for multi-pin nets. We also extend the basic algorithm to buffer sizing by allow b buffer types in time O(b2nlogn) for two-pin nets and O(b2nlog2n) for multi-pin nets, an improvement of O(b2n2) time algorithm of Lillis, Cheng and Lin [2].

In this page, you can find the outline of our FBI algorithm, the executables, our data format for a single interconnection tree description and a set of test-case files which is based on some large nets.

COPYRIGHT


II. The Algorithm

The FBI algorithm is briefly introduced here. If you need more information about the algorithm, please see [3][4].
FBI algorithm is achieved by four novel techniques:
Predictive Pruning
In traditional van Ginneken's algorithm, each candidate is characterized by its slack Q and load capacitance C. We say A1 dominates A2, if Q(A1)>= Q(A2) and C(A1) <= C(A2). If the upstream resistance is at least R, then we can take into account R in pruning. Define P = Q - R.C. If P(A1) >= P(A2) and C(A1) <= C(A2), then prune A2 . Predictive pruning is more efficient than using traditional (Q, C) pruning. Both methods produce the same final result and predictive pruning can prune redundant candidates early and avoid generating more redundant candidates. In our implementation, we choose R(b) as R, where R(b) is driver resistance or buffer resistance since a buffer/driver will be added to every candidate eventually. Also, it allows us to find
max{Q(Ai) - R(b).C(Ai) - t(b)} in O(1) time.

Candidate Tree
We use a binary search tree to store (Q, C) pairs. Each new candidate can be inserted in O(logn) time and each redundant candidate can be deleted in O(logn) time. Also the Q and C value are implicitly stored in candidate tree by using extra fields ra, ca and qa. This allows O(logn) time insertion of wire and buffers. Intuitively, ra is the resistance added to each candidate, ca is the extra capacitance added to each candidate and qa is the delay added to each candidate. The implicitly representation delays the repeated updates of Q and C and saves a great amount of time.

Fast Merge
Using linked list in van Ginneken's algorithm, merge takes O(nl+nr) time. Using candidate tree in our algorithm, merge can be done in O(nrlognl) time, where nr<=nl.

Redundancy Check
When a wire with resistance R is added, some non-redundant candidates may become redundant. For each candidate, we precompute the value of R that will make it redundant and store them in a balanced binary tree. When we add a wire e with resistance R(e), we check if any candidate is redundant.


III. I/O Data Formats

Input file format

FBI implementation takes these types of files as input:
  • Net information (.net)
contains all the information of nets including physical location and properties of the source, sinks, and internal nodes as well as the connection information. In this implementation, all internal nodes are considered to be possible buffer positions.
  • Buffer Library (.lib)
contains all the information about the buffer library.
We generate our testcases by constructing routing trees with an algorithm similar to A-tree algorithm on clock net benchmarks. The details can be found at http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/BST/. Here, we explain the fields in the testcase files.

  • wire_res_per_unit_length
Wire resistance per unit length (unit: Ohm/10^(-6)m)
  • wire_cap_per_unit_length
Wire capacitance per unit length (unit: fF/10^(-6)m)
  • driver x y r
the location and the driver resistance of source node; x and y can be any integer which shows the x- and y- coordinates of the physical location (unit: 10^-6 m). r is the driver resistance (unit: Ohm).
  • number_of_sinks
Total number of sink pins for the net
  • sink i x y a b
the location of the i-th sink node; x and y can be any integer which shows the x- and y- coordinates of the physical location. (unit: 10^-6 m);
a : the Load capacitance at the sink pin, unit is fF
b : Required arrival time at the sink pin, unit is ps
  • number_of_candidate_nodes
Total number of candidate nodes for buffer insertion. They are also the steiner nodes including segmenting nodes.
  • candidate i x y
the location of the candidate node with the node id i; x and y can be any integer which shows the x- and y- coordinates of the physical location. (unit: 10^-6 m). Note that the degree of each node should be equal to or less than 3.
  • edge i j
there exists a connection from node i to node j

Sample input file

Buffer library format


  • buffer c d r
each line shows all the information of one buffer type. "c" is the buffer load capacitance (in fF); "d" is the intrinsic delay (in ps); "r" is the buffer resistance (in Ohm). Each buffer is implicitly numbered as 1...number of buffers. Buffers should be listed in nonincreasing driving resistance order.

Sample buffer library file

Input Mode

There are two modes for our implementation algorithm. 1 is O(n2) and O(B2n2) algorithm. 2 is our FBI algorithm. Another parameter wire segmenting coef defines the number of segments for each original edge in .net inputfile. More segmenting implies more slack improvement. However, it may result overflow of stack. In a SUN Spac machine of 450 MHZ and 2G memory, segment coef under 50 for all test cases are tested. Use "ulimit -s unlimited" to get bigger stack size for very large nets and segmenting.

Output format

The output of our FBI implementation consists of the buffer location information for the solution which gives the best slack. It also write another output file .finalnet with the same format as input file which shows the topology file after segmenting.

  • start a end b buffertype c
Each line describes the buffer is added just after the node a on the connection from a to b with the buffer type c

Sample output file

Output mode

You can use "-d" to turn on the Verbose mode. The following information would be useful to understand the operations of FBI algorithm.

  • qc list at [id]:
followed by q and c information at node [id].
  • Slack
minimum slack of the whole net achieved by buffer insertion (You can show the slack before buffer insertion by giving a big buffer intrinsic delay)
  • Algorithm time
CPU time used for buffer insertion algorithm
  • Maximum memory
Memory usage for buffer insertion algorithm



IV. Sample Instances

Sample data files in tar.gz format
Sample data files for our FBI implementation, which follows the data format described in last section.



V. FBI Package

README of the FBI package (Click)
It shows the usage of FBI implementation.
NOTE, BUG and FIX of the FBI package (Click)
It shows the notes, reported bugs and fix of FBI software.
FBI Package (Version 2.3) FBI_package2.3.tar.gz
The full version FBI package: it contains the Solaris executable, source files, readme, note and the parser.
FBI with cost Package (Version 1.1) FBI_cost_binary_Solaris32_LINUX32.1.1.tar.gz (Solaris 2.7, gcc 2.95 staticly linked, Linux 32 bit versions are also included with gcc 3.3)
The binary version FBI with cost package: the Solaris binary code, readme and note.
Please read the readme and note in the package, since it is slightly different from FBI original package. It supports more realistic topology and technology file.



VI. Parser

MATLAB Parser to see the tree and buffer location. You need have .finalnet and .buf file. .m file



VII. References

[1] L. P. P. P. van Ginneken, "Buffer placement in distributed RC-tree network for minimal Elmore delay," in Proc. IEEE Int. Symp. Circuits Syst. 1990, pp. 865-868.
[2] J. Lillis, C. K. Cheng and T.-T. Y. Lin, "Optimal wire sizing and buffer insertion for low power and a generalized delay model," IEEE J. Solid-State Circuits, vol. 31(3), pp. 437-447, 1996.
[3] Weiping Shi and Zhuo Li, "A Fast Algorithm for Optimal Buffer Insertion," IEEE Trans. Computer-Aidede Design, vol. 24, no. 6, June 2005, pp. 879-891.
[4] Weiping Shi and Zhuo Li, "An O(nlogn) Time Algorithm for Optimal Buffer Insertion," 40th Design Automation Conference (DAC), pp. 580-585, 2003.
[5] Zhuo Li and Weiping Shi, "An O(bn2) Time Algorithm for Optimal Buffer Insertion with b Buffer Types", Conference on Design, Automation and Test in Europe (DATE), Munich, Germany, March 7-11 2005, pp. 1324-1329.
[6] Weiping Shi, Zhuo Li and Charles J. Alpert, "Complexity Analysis and Speedup Techniques for Optimal Buffer Insertion with Minimum Cost," 9th Asia and South Pacific Design Automation Conference (ASP-DAC), Yokohama, Japan, Jan. 27-30, 2004, pp. 609-614.
[7] Zhuo Li, C. N. Sze, Charles J. Alpert, Jiang Hu and Weiping Shi, "Making Fast Buffer Insertion even Faster via Approximation Techniques," 10th Asia and South Pacific Design Automation Conference (ASP-DAC), Shanghai, China, Jan. 2005, pp. 13-18.
[8] Zhuo Li and Weiping Shi, "An O(mn) Time Algorithm for Optimal Buffer Insertion of Nets with m Sinks", 11st Asia and South Pacific Design Automation Conference (ASP-DAC), Yokohama, Japan, Jan. 2006, pp. 320-325.


zhybear@gmail.com,wshi@ee.tamu.edu