Linux repositories inspector

pp(3ast)

ksh-devel

Korn Shell development environment

NAME

pp - ANSI C preprocessor library

SYNOPSIS

:PACKAGE: ast
#include <pp.h>
%include "pptokens.yacc
-lpp

DESCRIPTION

The pp library provides a tokenizing implementation of the C language preprocessor and supports K&R (Reiser), ANSI and C++ dialects. The preprocessor is comprised of 12 public functions, a global character class table accessed by macros, and a single global struct with 10 public elements.
pp operates in two modes. Standalone mode is used to implement the traditional standalone C preprocessor. Tokeinizing mode provides a function interface to a stream of preprocessed tokens. pp is by default ANSI; the only default predefined symbols are and Dialects (K&R, C++) and local conventions are determined by compiler specific probe(1) information that is included at runtime. The probe information can be overridden by providing a file with pragmas and definitions for each compiler implementation. This file is usually located in the compiler specific default include directory.
Directive, command line argument, option and pragma syntax is described in cpp(1). pp specific semantics are described below. Most semantic differences with standard or classic implementations are in the form of optimizations.
Options and pragmas map to function calls described below. For the remaining descriptions, ‘‘setting ppop(PP_operation)’’ is a shorthand for calling with the arguments appropriate for PP_operation.
The library interface describes only the public functions and struct elements. Static structs and pointers to structs are provided by the library. The user should not attempt to allocate structs. In particular, is meaningless for pp supplied structs.
The global struct provides readonly information. Any changes to must be done using the functions described below. has the following public elements:
The pp implementaion version string.
The current line sync directive name. Used for standalone line sync output. The default value is the empty string. See the function below.
The current output file name.
The pragma pass name for pp. The default value is
The string representation for the current input token.
The inclusive or of:
Set if was set.
Set if was set.
Set if standalone line syncs require a file argument.
Set if standalone line syncs require a third argument. The third argument is for include file push, for include file pop and null otherwise.
Set if was set.
Set if was set.
The list of directories to be searched for "..." include files. If the first directory name is "" then it is replaced by the directory of the including file at include time. The public elements of are:
The directory pathname.
The next directory, if it is the last in the list.
is the list of directories to be searched for <...> include files. This list may be
If was set then points to the symbol table entry for the current identifier token. is undefined for non-identifier tokens. Once defined, an identifier will always have the same pointer. If was also set then is defined for macro and keyword tokens and for all other identifiers. The elements of are:
The identifier name.
The inclusive or of the following flags:
Currently being expanded.
Builtin macro.
Macro expansion currently disabled.
Function-like macro.
Initialization macro.
Keyword identifier.
Loaded checkpoint macro.
macro.
No identifiers in macro body.
Predefined macro.
Also a predicate.
Readonly macro.
Ok to redefine.
Variadic function-like macro.
First unused symbol flag bit index. The bits from on are initially unset and may be set by the user.
Non-zero if the identifier is a macro. is the number of formal arguments for function-like macros and is the macro definition value, a terminated string that may contain internal mark sequences.
Initially set to and never modified by pp. This field may be set by the user.
The macro and identifier hash table. The hash(3) routines may be used to examine the table, with the exception that the following macros must be used for individual symbol lookup:
Return the pointer for 0 if not defined.
Return the pointer for If is not defined then allocate and return a new for it.
Error messages are reported using error(3) and the following globals relate to pp:
The level 2 error count. Error levels above 2 cause immediate exit. If is non-zero then the user program exit status should also be non-zero.
The current input file name.
The current input line number.
The debug trace level, by default. Larger negative numbers produce more trace information. Enabled when the user program is linked with the -g cc(1) option.
The level 1 error count. Warnings do not affect the exit status.
The functions are:
Passed to optjoin(3) to parse cpp(1) style options and arguments. The user may also supply application specific option parsers. Also handles non-standard options like the sun and GNU Hello in there, ever here of getopt(3)?
This is the standalone cpp(1) entry point. consumes all of the input and writes the preprocessed text to the output. A single call to is equivalent to, but more efficient than:
    ppop(PP_SPACEOUT, 1);
    while (pplex())
          ppprintf(" %s", pp.token);

The default comment handler that passes comments to the output. May be used as an argument to or the user may supply an application specific handler. is the comment head text, for C and for C++, is the comment body, is the comment tail text, for C and newline for C++, and is the comment starting line number.
Equivalent to error(3). All pp error and warning messages pass through The user may link with an application specific to override the library default.
The default include reference handler that outputs to the standard error. May be used as an argument to the or the user may supply an application specific handler. is the including file name, is the current include file name, is the current line number in and is non-zero if is being pushed or if file is being popped.
Pushes the terminated on the pp input stack. is the pseudo file name used in line syncs for and is the starting line number.
Returns the token type of the next input token. and where applicable are updated to refer to the new token. The token type constants are defined in for and for yacc(1) The token constant names match some are encoded by oring with tokens.
The numeric constant tokens and encodings are:
    T_DOUBLE          (N_NUMBER|N_REAL)
    T_DOUBLE_L        (N_NUMBER|N_REAL|N_LONG)
    T_FLOAT           (N_NUMBER|N_REAL|N_FLOAT)
    T_DECIMAL         (N_NUMBER)
    T_DECIMAL_L       (N_NUMBER|N_LONG)
    T_DECIMAL_U       (N_NUMBER|N_UNSIGNED)
    T_DECIMAL_UL      (N_NUMBER|N_UNSIGNED|N_LONG)
    T_OCTAL           (N_NUMBER|N_OCTAL)
    T_OCTAL_L         (N_NUMBER|N_OCTAL|N_LONG)
    T_OCTAL_U         (N_NUMBER|N_OCTAL|N_UNSIGNED)
    T_OCTAL_UL        (N_NUMBER|N_OCTAL|N_UNSIGNED|N_LONG)
    T_HEXADECIMAL     (N_NUMBER|N_HEXADECIMAL)
    T_HEXADECIMAL_L   (N_NUMBER|N_HEXADECIMAL|N_LONG)
    T_HEXADECIMAL_U   (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED)
    T_HEXADECIMAL_UL  (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED|N_LONG)

The normal C tokens are:
    T_ID              C identifier
    T_INVALID         invalid token
    T_HEADER          <..>
    T_CHARCONST       ’..’
    T_WCHARCONST      L’..’
    T_STRING          ".."
    T_WSTRING         L".."
    T_PTRMEM          ->
    T_ADDADD          ++
    T_SUBSUB          --
    T_LSHIFT          <<
    T_RSHIFT          >>
    T_LE              <=
    T_GE              >=
    T_EQ              ==
    T_NE              !=
    T_ANDAND          &&
    T_OROR            ||
    T_MPYEQ           *=
    T_DIVEQ           /=
    T_MODEQ           %=
    T_ADDEQ           +=
    T_SUBEQ           -=
    T_LSHIFTEQ        <<=
    T_RSHIFTEQ        >>=
    T_ANDEQ           &=
    T_XOREQ           ^=
    T_OREQ            |=
    T_TOKCAT          ##
    T_VARIADIC        ...
    T_DOTREF          .*    [if PP_PLUSPLUS]
    T_PTRMEMREF       ->*   [if PP_PLUSPLUS]
    T_SCOPE           ::    [if PP_PLUSPLUS]
    T_UMINUS          unary minus
If was set then the keyword tokens are also defined. Compiler differences and dialects are detected by the pp probe(1) information, and only the appropriate keywords are enabled. The ANSI keyword tokens are:
T_AUTO          T_BREAK          T_CASE           T_CHAR
T_CONTINUE      T_DEFAULT        T_DO             T_DOUBLE_T
T_ELSE          T_EXTERN         T_FLOAT_T        T_FOR
T_GOTO          T_IF             T_INT            T_LONG
T_REGISTER      T_RETURN         T_SHORT          T_SIZEOF
T_STATIC        T_STRUCT         T_SWITCH         T_TYPEDEF
T_UNION         T_UNSIGNED       T_WHILE          T_CONST
T_ENUM          T_SIGNED         T_VOID           T_VOLATILE

and the C++ keyword tokens are:
T_CATCH         T_CLASS          T_DELETE         T_FRIEND
T_INLINE        T_NEW            T_OPERATOR       T_OVERLOAD
T_PRIVATE       T_PROTECTED      T_PUBLIC         T_TEMPLATE
T_THIS          T_THROW          T_TRY            T_VIRTUAL

In addition, is recognized where appropriate. Additional keyword tokens may be added using
Many C implementations show no restraint in adding new keywords; some PC compilers have tripled the number of keywords. For the most part these new keywords introduce noise constructs that can be ignored for standard (reasonable) analysis and compilation. The noise keywords fall in four syntactic categories that map into the two noise keyword tokens and For points to the entire noise construct, including the offending noise keyword. The basic noise keyword categories are:
The simplest noise: a single keyword that is noise in any context and maps to
A noise keyword that precedes an optional grouping construct, either or and maps to
A noise keyword that consumes the remaining tokens in the line and maps to
A noise keyword that consumes the tokens up to the next and maps to
If is then implementation specific noise constructs are mapped to either or otherwise if is then noise constructs are completely ignored, otherwise the unmapped grouping noise tokens are returned.
Token encodings may be tested by the following macros:
Non-zero if is an integral or floating point numeric constant.
Non-zero if is an integral numeric constant.
Non-zero if is a floating point numeric constant.
Non-zero if is a C assignment operator.
Non-zero if must be separated from other tokens by space.
Non-zero if is a noise keyword.
The default line sync handler that outputs line sync pragmas for the C compiler front end. May be used as an argument to or the user may supply an application specific handler. is the line number and is the file name. If was set then the directive # lineid line "file" is output.
The default macro reference handler that outputs a macro reference pragmas. May be used as an argument to or the user may supply an application specific handler. is the macro pointer, is the reference file, is the reference line, and if is non-zero a macro value checksum is also output. The pragma syntax is #pragma pp:macref "symbol->name" line checksum.
is the option control interface. determines the type(s) of the remaining argument(s). Options marked by must be done before
is asserted as if by
Installs as the unknown builtin macro handler. Builtin macros are of the form is called with set to the unknown builtin macro name and set to the arguments. is a buffer that can be used for the return value. should be returned on error.





is defined as if by
The directive #string is executed.






















The directive #pragma pp:string is executed.

















The default handler that copies unknown directives and pragmas to the output. May be used as an argument to or the user may supply an application specific handler. This function is most often called after directive and pragma mapping. Any of the arguments may be is the directive name, is the pragma pass name, is the pragma option name, is the pragma option value, and is non-zero if a trailing newline is required if the pragma is copied to the output.
A printf(3) interface to the standalone pp output buffer. Macros provide limited control over output buffering: flushes the output buffer, flushes the output buffer if over character are buffered, returns the number of pending character in the output buffer, and places the character in the output buffer.

CAVEATS

The ANSI mode is intended to be true to the standard. The compatibility mode has been proven in practice, but there are surely dark corners of some implementations that may have been omitted.

SEE ALSO

cc(1), cpp(1), nmake(1), probe(1), yacc(1),
ast(3), error(3), hash(3), optjoin(3)

AUTHOR

Glenn Fowler
(Dennis Ritchie provided the original table driven lexer.)
AT&T Bell Laboratories
⇧ Top