MARCO GSRC T2 Fabrics: Bookshelf
New Data Formats For VLSI CAD
Contents
Appendix A. Note To Developers
I. Introduction
The MARCO/GSRC bookshelf
aims at collecting leading-edge implementations
for VLSI CAD algorithms as well as providing efficient mechanisms
of evaluating such implementations and comparing them against each other.
Standard benchmarks play an important role in such comparisons.
Several research groups participating in the bookshelf effort are now
converging on common representations for a number of fundamental
optimization problems in VLSI CAD.
Such representations include semantics (i.e., what kind of
information is given), abstract syntax (i.e., how the
information is organized to facilitate common use models) and
concrete syntax (i.e., specific serial representations).
At this point, we are not standardizing in-memory representations,
but implementation options may implicitly influence the choice of common
representations.
The new data formats are going to fill gaps where no public data formats
existed before and also improve on existing, but deficient formats.
They will be followed by publicly available benchmarks and simple utilities
(e.g., converters, statistics browsers, evaluators and constraint verifiers).
Bookshelf maintainers (in the future, the steering committee) actively
interact with the authors of the new formats in order to ensure
relevance, domain coverage, user convenience
and uniform look of all new formats. This document is designed to
summarize our motivations and general guidelines in addition to listing
the formats that are already available and can be used as examples.
II. Why And When Use New Data Formats (see
separate page)
III. Motivation and Main Goals
- Provide high-quality data formats to capture
leading-edge optimization problems in different areas of VLSI CAD
and make them easy to use both in academic and industrial settings.
- Ensure that all data formats must be ASCII-based and maximally
human-readable.
- Put independent groups of attributes into separate files and
connect such files with several-line "glue files" to ensure
maximal lifespan of format components even when other components
are not used.
- Attempt genericity and reuse by design, i.e., construct
single- and multi-file formats so that they can be used for multiple,
possibly, unexpected, purposes.
- Avoid arbitrary restrictions.
- Attempt to simplify the task of writing parsers through
common data format practices and through offering solutions to
standard problems (e.g., implementation of hash tables).
- Pay attention to detail, including carefull choice of
equivalent syntactic constructs (e.g., one-liners versus begin/end),
identifiers and default values. We will try to prevent confusion and
misuses.
- Describe the desired parser behavior, especially error
diagnostics, when this appears critical.
- Consider use scenarios for proposed file formats and ensure
that the general cases result in only minimal overhead in degenerate
cases (e.g., careful defaults versus specifying inexistant details
as "None")
- Pursue more fundamental issues first, ensuring modularily,
reuse and extensibility. We attempt to follow the analogy to the
C and C++ programming languages where the main functionalities
reside in standard libraries and do not affect the complexity of
the [core of the] language.
IV. Gotchas
- Post your plans for writing new data formats to
bookshelf-devel as well as early drafts of your format
descriptions. This is to avoid overlapping with the work of
others and detect possible misfeatures in your formats as early
as possible. Think of
Nicolas Bourbaki (i.e., defining concepts in clear and
context-independent terms, preferrably, relying only on numbers,
shapes and sets as fundamental concepts).
- Put some thought into modeling your particular domain with
generic mathematical constructs. Not only this results in clearer
semantics and syntax, but also enables unexpected reuse.
- Make sure you look at available format
descriptions and try to reuse as much as reasonable.
- The hardest issue in writing a parser is good error diagnostics.
To help the parser, modularize your data formats and annotate types.
- Explain what characters can be used in names used by your format
and whether particular names are prohibited (e.g., you cannot have
a variable called "for" in C because this is a reserved word).
Clearly mention your case-[non]-sensitivty requirements.
- Avoid redundant numerical information that is not useful for parsing
and error diagnostics (e.g., the shape of a hard block and the area).
- Avoid cryptic and confusing identifiers/declarators. In our
environment, a good data format will be clear to a specialist without
a manual.
- Carefully balance between verbose and laconic identifiers/declarators.
It is typically a good idea to use words instead of numbers when
choosing one of several options (e.g., relative/absolute rather
than 0/1).
- When saving files in a particular format, generously pad
variable-length fields with whitespace so that your files looks
like a table. This dramatically improves human-readability.
- When you require that several pieces of information be on the same
line, make sure that everything fits, including possible generous
whitespace padding in someone else's code.
- Post sample instances as early as possible, but when doing so,
clearly say whether the format is subject to change.
- Publish a reference parser, either in source code or in binary for
major platforms, to prevent others from producing non-standard
instances. Ensure good error diagnostics in the reference parser
(having good error diagnostics in any parser is justified,
e.g., for debugging the parser on standard benchmarks ;-).
- Grammar-based parsers, e.g., the ones that use lex/yacc
(or flex/bison), often have very poor error diagnostics abilities.
Do not be afraid of writing a parser in C++ --- it is very far
from anathema --- some good parsers are written in C++ from ground
up (e.g., the SGML parser SP).
- When writing your parser in C++, do not forget about numerous
string-processing functions in the C standard library, such as
strstr (type 'man strstr' on any Unix system, or better see the
2nd edition of the Kernighan & Ritchie C book for a tour of stdlibc).
User-defined C++ I/O manipulators appear very useful
(see Stroustrup's 3rd edition or, better, "Ruminations on C++" by
Koenig and Moo), especially for lookaheads.
- When processing arbitrary names, you will most likely need an
implementation of hash tables. While the standard C and C++
libraries mysteriously avoid hashing functions for character strings,
the 3rd edition of the Stroustrup's C++ book proposes an interface
to a hash-based container hash_map. This container is
implemented in the SGI STL
and included with g++ 2.95
and higher.
- PERL is a very strong candidate for converters that do not populate
in-memory representations. It will be interesting to see a
PERL-based parser that populates in-memory representations in C++
(contact imarkov@cs.ucla.edu if you are thinking of writing
such a parser).
V. General Guidlines/Standards
This is a list of general guidlines and standards that apply to all
bookshelf formats, unless otherwise explicetly noted. Included here
are definitions on legal characters in names, format for numbers,
definitions of "whitespace", etc.
This page can be refered to in descriptions of file formats
via
http://vlsicad.cs.ucla.edu/GSRC/bookshelf/formats/#V
and does not have to be duplicated.
Conventions
- blank characters(whitespace) are spaces and tabs.
- multiple blank characters are equivalent to one
- a colon (:) must always be preceded and followed by a space
- the pound sign (#) denotes commented out lines
and is only guaranteed to be processed correctly
if all characters earlier on the line are blank characters
- names may contain upper- and lower-case characters, as well
as {"_", ":", "-" and "|"}.
Names are case-sensative! Keywords are not.
- Every file has a "standard" header containing
format name and revision (e.g., UCLA nets 1.0) in free format,
the date and time of creation, the user who created the file
(on OSes that support users, such as Windows 98/NT and Unix) and/or
the software that created the file. This information can
be embedded in comments at the beginning of the file and
is ignored by the parser.
- when the tokens expected to be found on the line are
successfully parsed, all characters until the end of
the line should be ignored (this allows for easy extensions);
a one-time warning must be issued by the parser if any
non-blank characters were ignored, "Non-blank characters
are ignored until the end of line XXX and, possibly, later".
- if the line-end character is encountered before
all tokens are parsed, a fatal error message "Unexpected
line end on line XXX" should generated.
- all units with dimensions, such as locations, offsets,
sizes, weights etc, can be specified as doubles. It is
possible to use integers, but we felt that restricting
to integers may be too risky as, e.g., LEFDEF specifies
doubles. We believe that resolving 3.000000 vs 2.99999
(that may arise if a particular program expects an integer)
is not difficult via rounding to integers and checking with
a reasonably small round-off tolerance. This should be done
by the programs that save doubles (i.e., test if a number
to be saved is epsilon-close to an integer, and if it is,
round it up before saving). Physical design tools that
use integers internally should read doubles and
check for overflow.
-
The format for glue files, and a discussion of
platform-[in]dependence are given to encourage everyone to use
the same .aux file format.
VI. Open Issues
VII. Availability Status of New Data Formats (see
separate page)
VIII. Resources
Appendix A. Note To Developers
Active bookshelf developers are strongly advised to
request membership in the
bookshelf group at GSRC with developer
priviledges and use the bookshelf-devel mailing list
to post their early formats drafts and announce implementation plans.
Please do not post questions before browsing archives.
For implementation, porting, installation and configuration issues,
consider
requesting membership in the
softdevel group at GSRC and using their mailing list.
© 1999
abk@cs.ucla.edu,
imarkov@cs.ucla.edu