Home | History | Annotate | Line # | Download | only in lint1
README.md revision 1.6
      1 [//]: # ($NetBSD: README.md,v 1.6 2022/06/17 18:54:53 rillig Exp $)
      2 
      3 # Introduction
      4 
      5 Lint1 analyzes a single translation unit of C code.
      6 
      7 * It reads the output of the C preprocessor, comments are retained.
      8 * The lexer in `scan.l` and `lex.c` splits the input into tokens.
      9 * The parser in `cgram.y` creates types and expressions from the tokens.
     10 * It checks declarations in `decl.c`.
     11 * It checks initializations in `init.c`.
     12 * It checks types and expressions in `tree.c`.
     13 
     14 To see how a specific lint message is triggered, read the corresponding unit
     15 test in `tests/usr.bin/xlint/lint1/msg_???.c`.
     16 
     17 # Features
     18 
     19 ## Type checking
     20 
     21 Lint has stricter type checking than most C compilers.
     22 It warns about type conversions that may result in alignment problems,
     23 see the test `msg_135.c` for examples.
     24 
     25 ## Control flow analysis
     26 
     27 Lint roughly tracks the control flow inside a single function.
     28 It doesn't follow `goto` statements precisely though,
     29 it rather assumes that each label is reachable.
     30 See the test `msg_193.c` for examples.
     31 
     32 ## Error handling
     33 
     34 Lint tries to continue parsing and checking even after seeing errors.
     35 This part of lint is not robust though, so expect some crashes here,
     36 as variables may not be properly initialized or be null pointers.
     37 The cleanup after handling a parse error is often incomplete.
     38 
     39 # Fundamental types
     40 
     41 Lint mainly analyzes expressions (`tnode_t`), which are formed from operators
     42 (`op_t`) and their operands (`tnode_t`).
     43 Each node has a type (`type_t`) and a few other properties.
     44 
     45 ## type_t
     46 
     47 The elementary types are `int`, `_Bool`, `unsigned long`, `pointer` and so on,
     48 as defined in `tspec_t`.
     49 
     50 Actual types like `int`, `const char *` are created by `gettyp(INT)`,
     51 or by deriving new types from existing types, using `block_derive_pointer`,
     52 `block_derive_array` and `block_derive_function`.
     53 (See [below](#memory-management) for the meaning of the prefix `block_`.)
     54 
     55 After a type has been created, it should not be modified anymore.
     56 Ideally all references to types would be `const`, but that's a lot of work.
     57 Before modifying a type,
     58 it needs to be copied using `block_dup_type` or `expr_dup_type`.
     59 
     60 ## tnode_t
     61 
     62 When lint parses an expression,
     63 it builds a tree of nodes representing the AST.
     64 Each node has an operator that defines which other members may be accessed.
     65 The operators and their properties are defined in `ops.def`.
     66 Some examples for operators:
     67 
     68 | Operator | Meaning                                                 |
     69 |----------|---------------------------------------------------------|
     70 | CON      | compile-time constant in `tn_val`                       |
     71 | NAME     | references the identifier in `tn_sym`                   |
     72 | UPLUS    | the unary operator `+tn_left`                           |
     73 | PLUS     | the binary operator `tn_left + tn_right`                |
     74 | CALL     | a function call, typically CALL(LOAD(NAME("function"))) |
     75 | ICALL    | an indirect function call                               |
     76 | CVT      | an implicit conversion or an explicit cast              |
     77 
     78 See `debug_node` for how to interpret the members of `tnode_t`.
     79 
     80 ## sym_t
     81 
     82 There is a single symbol table (`symtab`) for the whole translation unit.
     83 This means that the same identifier may appear multiple times.
     84 To distinguish the identifiers, each symbol has a block level.
     85 Symbols from inner scopes are added to the beginning of the table,
     86 so they are found first when looking for the identifier.
     87 
     88 # Memory management
     89 
     90 ## Block scope
     91 
     92 The memory that is allocated by the `block_*_alloc` functions is freed at the
     93 end of analyzing the block, that is, after the closing `}`.
     94 See `compound_statement_rbrace:` in `cgram.y`.
     95 
     96 ## Expression scope
     97 
     98 The memory that is allocated by the `expr_*_alloc` functions is freed at the
     99 end of analyzing the expression.
    100 See `expr_free_all`.
    101 
    102 # Null pointers
    103 
    104 * Expressions can be null.
    105     * This typically happens in case of syntax errors or other errors.
    106 * The subtype of a pointer, array or function is never null.
    107 
    108 # Common variable names
    109 
    110 | Name | Type      | Meaning                                              |
    111 |------|-----------|------------------------------------------------------|
    112 | t    | `tspec_t` | a simple type such as `INT`, `FUNC`, `PTR`           |
    113 | tp   | `type_t`  | a complete type such as `pointer to array[3] of int` |
    114 | stp  | `type_t`  | the subtype of a pointer, array or function          |
    115 | tn   | `tnode_t` | a tree node, mostly used for expressions             |
    116 | op   | `op_t`    | an operator used in an expression                    |
    117 | ln   | `tnode_t` | the left-hand operand of a binary operator           |
    118 | rn   | `tnode_t` | the right-hand operand of a binary operator          |
    119 | sym  | `sym_t`   | a symbol from the symbol table                       |
    120 
    121 # Abbreviations in variable names
    122 
    123 | Abbr | Expanded                                    |
    124 |------|---------------------------------------------|
    125 | l    | left                                        |
    126 | r    | right                                       |
    127 | o    | old (during type conversions)               |
    128 | n    | new (during type conversions)               |
    129 | op   | operator                                    |
    130 | arg  | the number of the argument, for diagnostics |
    131 
    132 # Debugging
    133 
    134 Useful breakpoints are:
    135 
    136 | Location                      | Remarks                                              |
    137 |-------------------------------|------------------------------------------------------|
    138 | build_binary in tree.c        | Creates an expression for a unary or binary operator |
    139 | initialization_expr in init.c | Checks a single initializer                          |
    140 | expr in tree.c                | Checks a full expression                             |
    141 | typeok in tree.c              | Checks two types for compatibility                   |
    142 | vwarning_at in err.c          | Prints a warning                                     |
    143 | verror_at in err.c            | Prints an error                                      |
    144 | assert_failed in err.c        | Prints the location of a failed assertion            |
    145 
    146 # Tests
    147 
    148 The tests are in `tests/usr.bin/xlint`.
    149 By default, each test is run with the lint flags `-g` for GNU mode,
    150 `-S` for C99 mode and `-w` to report warnings as errors.
    151 
    152 Each test can override the lint flags using comments of the following forms:
    153 
    154 * `/* lint1-flags: -tw */` replaces the default flags.
    155 * `/* lint1-extra-flags: -p */` adds to the default flags.
    156 
    157 Most tests check the diagnostics that lint generates.
    158 They do this by placing `expect` comments near the location of the diagnostic.
    159 The comment `/* expect+1: ... */` expects a diagnostic to be generated for the
    160 code 1 line below, `/* expect-5: ... */` expects a diagnostic to be generated
    161 for the code 5 lines above.
    162 Each `expect` comment must be in a single line.
    163 At the start and the end of the comment, the placeholder `...` stands for an
    164 arbitrary sequence of characters.
    165 There may be other code or comments in the same line of the `.c` file.
    166 
    167 Each diagnostic has its own test `msg_???.c` that triggers the corresponding
    168 diagnostic.
    169 Most other tests focus on a single feature.
    170 
    171 ## Adding a new test
    172 
    173 1. Run `make add-test NAME=test_name`.
    174 2. Sort the `FILES` lines in `../../tests/usr.bin/xlint/lint1/Makefile`.
    175 3. Make the test generate the desired diagnostics.
    176 4. Run `cd ../../tests/usr.bin/xlint/lint1 && sh ./accept.sh test_name`.
    177 5. Run `cd ../.. && cvs commit distrib/sets/lists/tests/mi tests/usr.bin/xlint`.
    178