Pass Transistor Logic Synthesizer (PTLS)

Rupesh S. Shelar and Sachin S. Sapatnekar

1. Introduction

Pass transistor logic (PTL) offers a good area/power-delay trade-off alternative to static CMOS circuits in today's technologies. It  may continue to do so even when leakage power becomes dominant in sub-100 nano-meter era due to smaller area implementations as compared to the corresponding static CMOS implementations. However, the design automation tools, specifically synthesis and layout tools, targeting pass transistor logic are not available and the potential of pass transistor logic remains unexplored. Pass transistor logic synthesizer (PTLS) tool is being developed for addressing the design automation needs for pass transistor logic. The current executable of PTLS addresses the performance driven synthesis problem for PTL using recursive bipartitioning of BDD's reported in this paper to minimize the delays in the PTL circuits using Elmore-like delay model.

2. Executable & Source Code

PTLS, version 1.0, tarred, gzipped, executable and source code is available for the following platforms:
  • Executable (i686/Linux 2.4.9)
PTLS-Linux-v10.tar.gz
  • Executable (Sparc/SunOS 5.8)
PTLS-SunOS-v10.tar.gz
  • Source code
PTLS-Source-v10.tar.gz

3. I/O Formats

The PTLS program accepts the Boolean network in Berkeley Logic Interchange Format (BLIF) and generates the  transistor netlists in SPICE format, which can be simulated using standard circuit simulators such as HSPICE. The program also reports the number of pass transistors, inverters, buffers and the delay in the final circuit using Elmore-like delay model. User should specify the kind of BDD's to be used - monolithic or multi-level. Depending on the kind of BDD's used, the results will be different. We advise that for MCNC benchmarks, one should use monolithic BDD's while for ISCAS benchmarks (except for C17), one should use multi-level BDD's. In case of ISCAS benchmarks, one should use script.rugged (or such similar script) in SIS to process the netlist. In either case, the logical netlists should not have "constant" nodes; "constant" nodes can be eliminated using "sweep" command in sis. These two flow are shown in the following Figure.

Flow

Figure1. Overall Flow for the Performance Driven PTL Synthesis
If the file1.blif is a file describing combinational logic in BLIF format, then the actual commands to be run corresponding to the flows in the Figure 1(a) and 1(b) are shown in the following table. The output file generated contains transistor netlist and it is named as file1Sweep.blif.sp which can be simulated using HSPICE.

sis> read_blif file1.blif
sis> read_blif file1.blif
sis> sweep
sis> source script
sis> write_blif file1Sweep.blif
sis>sweep
sis> quit
sis> write_blif file1Sweep.blif
$ ptls -global file1Sweep.blif
sis> quit

$ ptls -local file1Sweep.blif
Table 1. Actual commands to be run for the flows in 1(a) and 1(b)
PTLS package uses Colarado University Decision Diagram (CUDD) and the parser from BDD based Synthesis (BDS) package to build the BDD's. The PTLS program does not use any BDD optimization algorithms except for variable ordering algorithms. The "-global" option specifies that monolithic BDD's are to be built while "-local" option specifies that BDD's are to be built for each node in the Boolean network separately. Typically, one should use "-global" option for MCNC benchmarks (and C17) and "-local" for ISCAS benchmarks - essentially, use the monolithic BDD's whenever they can be built and are of reasonable size; use local BDD's, otherwise.

4. Example

The Figure 2 shows the BDD for carry output of 3-bit adder and the corresponding PTL implementation obtained by direct mapping of BDD nodes on to pass transistors.

Caary for 3-bit adder    
Figure 2. (a) BDD for carry output for 3-bit adder, (b) Corresponding PTL implementation
The PTLS tool recursively bipartitions the BDD halving the critical paths. The resulting BDD's and one-hot multiplexers are used to implement the given function. Following Figure shows the PTL implementation generated by  the PTLS program for the function shown in the Figure 2(a). The functions O0, O1, O2 are the select functions  for the one-hot multiplexer and only one of the {O0, O1, O2} evaluates to 1 for any assignment of the inputs a0, b0 and a1. The output of the multiplexer implements the carry function while the data functions are simply the PTL implementation of the BDD's rooted at the nodes b1 and a2's in the BDD for the carry function shown in Figure 2(a).


Decomposed Implementation
Figure 3. PTL implementation of the carry function using BDD decomposition and one-hot multiplexer

The netlists generated by the PTLS program tend to have smaller delays than the netlists generated by direct mapping of BDD's, albeit at the cost of area. Using max-flow min-cut algorithm, PTLS minimizes the area penalty (with some inaccuracies due to area estimation). The static CMOS implementation of the same carry function obtained by running script.delay in sis is shown in the following Figure.

static CMOS Implementation
Figure 4. Static CMOS implementation of the carry function

The PTL implementation shown in Figure 3 compares favorably with the static CMOS implementation for area, power and delay. This trend is not only true for arithmetic circuits (which usually can be implemented in PTL efficiently) but for AND-intensive random logic circuits as well. The efficiency in the PTL implementation comes due to the following factors:

    1. Exploration of larger design space due to libraryless-ness

    2. Optimality of decomposition algorithm up to the accuracy in estimation 

    3. Merits and demerits of NMOS transistors in PTL

    Remark:

    Normally, designers use cell characterization to get the area, power and delay numbers for the cells; these numbers are used while designing a circuit. The libraryless design may create problems for designers as every time they need to come up with a circuit, designers will have to consider whole design space and characterize the all cells that can possibly be used to design a circuit. The cell characterization can be avoided by using the transistor level delay models and then, just characterizing the transistor would suffice. In case of PTL, designer will have to characterize the pass transistor (and inverters with weak pull-up) and then, use C-R-C  PI models to estimate the delay. NMOS pass transistors pass the rising transitions slower than the falling transition. Since inverters are inserted after every few (typically, 3) transistors in series, the rising transition in the current segment becomes falling transition in the next segment and the effect of slow rising transition averages out.

    5. Test cases

    The following table lists the ISCAS benchmarks for which "Local" option for BDD's is recommended. For all other benchmarks, one should use the "Global" option keeping in mind the guideline at the end of the section 3.


    Example
    Option for PTLS
    C7552
    Local
    C6288
    Local
    C3540
    Local
    C1908
    Local
    C432
    Local
    C499
    Local
    C1355
    Local
    C1908
    Local
    C2670
    Local
    C5315
    Local

    Table 5. The options for ISCAS benchmarks

    References

    1. R. S. Shelar and S. S. Sapatnekar, Recursive Bipartitioning of BDD's for Performance Driven Synthesis of Pass Transistor Logic, Proceedings of IEEE/ACM ICCAD, Nov. 2001, pp. 449 - 452
    2. R. S. Shelar and S. S. Sapatnekar, BDD Decomposition for the Synthesis of High Performance PTL Circuits, Workshop Notes of IEEE IWLS, June 2001, pp. 298 - 303


    This page is maintained by  Rupesh S. Shelar.
    Last modified:  December 3, 2002.