Home | History | Annotate | Line # | Download | only in lint1
README.md revision 1.6
      1  1.6  rillig [//]: # ($NetBSD: README.md,v 1.6 2022/06/17 18:54:53 rillig Exp $)
      2  1.1  rillig 
      3  1.1  rillig # Introduction
      4  1.1  rillig 
      5  1.3  rillig Lint1 analyzes a single translation unit of C code.
      6  1.3  rillig 
      7  1.3  rillig * It reads the output of the C preprocessor, comments are retained.
      8  1.3  rillig * The lexer in `scan.l` and `lex.c` splits the input into tokens.
      9  1.3  rillig * The parser in `cgram.y` creates types and expressions from the tokens.
     10  1.3  rillig * It checks declarations in `decl.c`.
     11  1.3  rillig * It checks initializations in `init.c`.
     12  1.3  rillig * It checks types and expressions in `tree.c`.
     13  1.3  rillig 
     14  1.3  rillig To see how a specific lint message is triggered, read the corresponding unit
     15  1.1  rillig test in `tests/usr.bin/xlint/lint1/msg_???.c`.
     16  1.1  rillig 
     17  1.1  rillig # Features
     18  1.1  rillig 
     19  1.1  rillig ## Type checking
     20  1.1  rillig 
     21  1.1  rillig Lint has stricter type checking than most C compilers.
     22  1.1  rillig It warns about type conversions that may result in alignment problems,
     23  1.1  rillig see the test `msg_135.c` for examples.
     24  1.1  rillig 
     25  1.1  rillig ## Control flow analysis
     26  1.1  rillig 
     27  1.1  rillig Lint roughly tracks the control flow inside a single function.
     28  1.3  rillig It doesn't follow `goto` statements precisely though,
     29  1.3  rillig it rather assumes that each label is reachable.
     30  1.1  rillig See the test `msg_193.c` for examples.
     31  1.1  rillig 
     32  1.1  rillig ## Error handling
     33  1.1  rillig 
     34  1.1  rillig Lint tries to continue parsing and checking even after seeing errors.
     35  1.1  rillig This part of lint is not robust though, so expect some crashes here,
     36  1.1  rillig as variables may not be properly initialized or be null pointers.
     37  1.3  rillig The cleanup after handling a parse error is often incomplete.
     38  1.1  rillig 
     39  1.1  rillig # Fundamental types
     40  1.1  rillig 
     41  1.1  rillig Lint mainly analyzes expressions (`tnode_t`), which are formed from operators
     42  1.1  rillig (`op_t`) and their operands (`tnode_t`).
     43  1.1  rillig Each node has a type (`type_t`) and a few other properties.
     44  1.1  rillig 
     45  1.1  rillig ## type_t
     46  1.1  rillig 
     47  1.3  rillig The elementary types are `int`, `_Bool`, `unsigned long`, `pointer` and so on,
     48  1.3  rillig as defined in `tspec_t`.
     49  1.3  rillig 
     50  1.3  rillig Actual types like `int`, `const char *` are created by `gettyp(INT)`,
     51  1.3  rillig or by deriving new types from existing types, using `block_derive_pointer`,
     52  1.2  rillig `block_derive_array` and `block_derive_function`.
     53  1.1  rillig (See [below](#memory-management) for the meaning of the prefix `block_`.)
     54  1.1  rillig 
     55  1.1  rillig After a type has been created, it should not be modified anymore.
     56  1.1  rillig Ideally all references to types would be `const`, but that's a lot of work.
     57  1.3  rillig Before modifying a type,
     58  1.1  rillig it needs to be copied using `block_dup_type` or `expr_dup_type`.
     59  1.1  rillig 
     60  1.1  rillig ## tnode_t
     61  1.1  rillig 
     62  1.5  rillig When lint parses an expression,
     63  1.1  rillig it builds a tree of nodes representing the AST.
     64  1.5  rillig Each node has an operator that defines which other members may be accessed.
     65  1.1  rillig The operators and their properties are defined in `ops.def`.
     66  1.1  rillig Some examples for operators:
     67  1.1  rillig 
     68  1.1  rillig | Operator | Meaning                                                 |
     69  1.1  rillig |----------|---------------------------------------------------------|
     70  1.1  rillig | CON      | compile-time constant in `tn_val`                       |
     71  1.1  rillig | NAME     | references the identifier in `tn_sym`                   |
     72  1.1  rillig | UPLUS    | the unary operator `+tn_left`                           |
     73  1.1  rillig | PLUS     | the binary operator `tn_left + tn_right`                |
     74  1.1  rillig | CALL     | a function call, typically CALL(LOAD(NAME("function"))) |
     75  1.3  rillig | ICALL    | an indirect function call                               |
     76  1.1  rillig | CVT      | an implicit conversion or an explicit cast              |
     77  1.1  rillig 
     78  1.3  rillig See `debug_node` for how to interpret the members of `tnode_t`.
     79  1.3  rillig 
     80  1.1  rillig ## sym_t
     81  1.1  rillig 
     82  1.1  rillig There is a single symbol table (`symtab`) for the whole translation unit.
     83  1.1  rillig This means that the same identifier may appear multiple times.
     84  1.1  rillig To distinguish the identifiers, each symbol has a block level.
     85  1.1  rillig Symbols from inner scopes are added to the beginning of the table,
     86  1.1  rillig so they are found first when looking for the identifier.
     87  1.1  rillig 
     88  1.1  rillig # Memory management
     89  1.1  rillig 
     90  1.1  rillig ## Block scope
     91  1.1  rillig 
     92  1.1  rillig The memory that is allocated by the `block_*_alloc` functions is freed at the
     93  1.1  rillig end of analyzing the block, that is, after the closing `}`.
     94  1.1  rillig See `compound_statement_rbrace:` in `cgram.y`.
     95  1.1  rillig 
     96  1.1  rillig ## Expression scope
     97  1.1  rillig 
     98  1.1  rillig The memory that is allocated by the `expr_*_alloc` functions is freed at the
     99  1.1  rillig end of analyzing the expression.
    100  1.1  rillig See `expr_free_all`.
    101  1.1  rillig 
    102  1.1  rillig # Null pointers
    103  1.1  rillig 
    104  1.1  rillig * Expressions can be null.
    105  1.2  rillig     * This typically happens in case of syntax errors or other errors.
    106  1.1  rillig * The subtype of a pointer, array or function is never null.
    107  1.1  rillig 
    108  1.1  rillig # Common variable names
    109  1.1  rillig 
    110  1.1  rillig | Name | Type      | Meaning                                              |
    111  1.1  rillig |------|-----------|------------------------------------------------------|
    112  1.1  rillig | t    | `tspec_t` | a simple type such as `INT`, `FUNC`, `PTR`           |
    113  1.1  rillig | tp   | `type_t`  | a complete type such as `pointer to array[3] of int` |
    114  1.1  rillig | stp  | `type_t`  | the subtype of a pointer, array or function          |
    115  1.1  rillig | tn   | `tnode_t` | a tree node, mostly used for expressions             |
    116  1.1  rillig | op   | `op_t`    | an operator used in an expression                    |
    117  1.3  rillig | ln   | `tnode_t` | the left-hand operand of a binary operator           |
    118  1.3  rillig | rn   | `tnode_t` | the right-hand operand of a binary operator          |
    119  1.1  rillig | sym  | `sym_t`   | a symbol from the symbol table                       |
    120  1.1  rillig 
    121  1.3  rillig # Abbreviations in variable names
    122  1.1  rillig 
    123  1.3  rillig | Abbr | Expanded                                    |
    124  1.3  rillig |------|---------------------------------------------|
    125  1.3  rillig | l    | left                                        |
    126  1.3  rillig | r    | right                                       |
    127  1.3  rillig | o    | old (during type conversions)               |
    128  1.3  rillig | n    | new (during type conversions)               |
    129  1.3  rillig | op   | operator                                    |
    130  1.3  rillig | arg  | the number of the argument, for diagnostics |
    131  1.1  rillig 
    132  1.2  rillig # Debugging
    133  1.2  rillig 
    134  1.2  rillig Useful breakpoints are:
    135  1.2  rillig 
    136  1.2  rillig | Location                      | Remarks                                              |
    137  1.2  rillig |-------------------------------|------------------------------------------------------|
    138  1.2  rillig | build_binary in tree.c        | Creates an expression for a unary or binary operator |
    139  1.2  rillig | initialization_expr in init.c | Checks a single initializer                          |
    140  1.2  rillig | expr in tree.c                | Checks a full expression                             |
    141  1.2  rillig | typeok in tree.c              | Checks two types for compatibility                   |
    142  1.2  rillig | vwarning_at in err.c          | Prints a warning                                     |
    143  1.2  rillig | verror_at in err.c            | Prints an error                                      |
    144  1.2  rillig | assert_failed in err.c        | Prints the location of a failed assertion            |
    145  1.2  rillig 
    146  1.1  rillig # Tests
    147  1.1  rillig 
    148  1.1  rillig The tests are in `tests/usr.bin/xlint`.
    149  1.2  rillig By default, each test is run with the lint flags `-g` for GNU mode,
    150  1.1  rillig `-S` for C99 mode and `-w` to report warnings as errors.
    151  1.1  rillig 
    152  1.1  rillig Each test can override the lint flags using comments of the following forms:
    153  1.2  rillig 
    154  1.1  rillig * `/* lint1-flags: -tw */` replaces the default flags.
    155  1.1  rillig * `/* lint1-extra-flags: -p */` adds to the default flags.
    156  1.1  rillig 
    157  1.1  rillig Most tests check the diagnostics that lint generates.
    158  1.1  rillig They do this by placing `expect` comments near the location of the diagnostic.
    159  1.1  rillig The comment `/* expect+1: ... */` expects a diagnostic to be generated for the
    160  1.1  rillig code 1 line below, `/* expect-5: ... */` expects a diagnostic to be generated
    161  1.1  rillig for the code 5 lines above.
    162  1.1  rillig Each `expect` comment must be in a single line.
    163  1.6  rillig At the start and the end of the comment, the placeholder `...` stands for an
    164  1.6  rillig arbitrary sequence of characters.
    165  1.6  rillig There may be other code or comments in the same line of the `.c` file.
    166  1.1  rillig 
    167  1.1  rillig Each diagnostic has its own test `msg_???.c` that triggers the corresponding
    168  1.1  rillig diagnostic.
    169  1.1  rillig Most other tests focus on a single feature.
    170  1.1  rillig 
    171  1.1  rillig ## Adding a new test
    172  1.1  rillig 
    173  1.4  rillig 1. Run `make add-test NAME=test_name`.
    174  1.4  rillig 2. Sort the `FILES` lines in `../../tests/usr.bin/xlint/lint1/Makefile`.
    175  1.3  rillig 3. Make the test generate the desired diagnostics.
    176  1.4  rillig 4. Run `cd ../../tests/usr.bin/xlint/lint1 && sh ./accept.sh test_name`.
    177  1.4  rillig 5. Run `cd ../.. && cvs commit distrib/sets/lists/tests/mi tests/usr.bin/xlint`.
    178