Home | History | Annotate | Line # | Download | only in lint1
README.md revision 1.7
      1 [//]: # ($NetBSD: README.md,v 1.7 2022/07/03 19:47:34 rillig Exp $)
      2 
      3 # Introduction
      4 
      5 Lint1 analyzes a single translation unit of C code.
      6 
      7 * It reads the output of the C preprocessor, retaining the comments.
      8 * The lexer in `scan.l` and `lex.c` splits the input into tokens.
      9 * The parser in `cgram.y` creates types and expressions from the tokens.
     10 * It checks declarations in `decl.c`.
     11 * It checks initializations in `init.c`.
     12 * It checks types and expressions in `tree.c`.
     13 
     14 To see how a specific lint message is triggered, read the corresponding unit
     15 test in `tests/usr.bin/xlint/lint1/msg_???.c`.
     16 
     17 # Features
     18 
     19 ## Type checking
     20 
     21 Lint has stricter type checking than most C compilers.
     22 
     23 In _strict bool mode_, lint treats `bool` as a type that is incompatible with
     24 other scalar types, like in C#, Go, Java.
     25 See the test `d_c99_bool_strict.c` for details.
     26 
     27 Lint warns about type conversions that may result in alignment problems.
     28 See the test `msg_135.c` for examples.
     29 
     30 ## Control flow analysis
     31 
     32 Lint roughly tracks the control flow inside a single function.
     33 It doesn't follow `goto` statements precisely though,
     34 it rather assumes that each label is reachable.
     35 See the test `msg_193.c` for examples.
     36 
     37 ## Error handling
     38 
     39 Lint tries to continue parsing and checking even after seeing errors.
     40 This part of lint is not robust though, so expect some crashes here,
     41 as variables may not be properly initialized or be null pointers.
     42 The cleanup after handling a parse error is often incomplete.
     43 
     44 ## Configurable diagnostic messages
     45 
     46 Whether lint prints a message and whether each message is an error or a 
     47 warning depends on several things:
     48 
     49 * The language level, with its possible values:
     50     * traditional C (`-t`)
     51     * migration from traditional C and C90 (default)
     52     * C90 (`-s`)
     53     * C99 (`-S`)
     54     * C11 (`-Ac11`)
     55 * In GCC mode (`-g`), lint allows several GNU extensions,
     56   reducing the amount of printed messages.
     57 * In strict bool mode (`-T`), lint issues errors when `bool` is mixed with
     58   other scalar types, reusing the existing messages 107 and 211, while also
     59   defining new messages that are specific to strict bool mode.
     60 * The option `-a` performs the check for lossy conversions from large integer
     61   types, the option `-aa` extends this check to small integer types as well,
     62   reusing the same message ID.
     63 * The option `-X` suppresses arbitrary messages by their message ID.
     64 
     65 # Fundamental types
     66 
     67 Lint mainly analyzes expressions (`tnode_t`), which are formed from operators
     68 (`op_t`) and their operands (`tnode_t`).
     69 Each node has a type (`type_t`) and a few other properties.
     70 
     71 ## type_t
     72 
     73 The elementary types are `int`, `_Bool`, `unsigned long`, `pointer` and so on,
     74 as defined in `tspec_t`.
     75 
     76 Actual types like `int`, `const char *` are created by `gettyp(INT)`,
     77 or by deriving new types from existing types, using `block_derive_pointer`,
     78 `block_derive_array` and `block_derive_function`.
     79 (See [below](#memory-management) for the meaning of the prefix `block_`.)
     80 
     81 After a type has been created, it should not be modified anymore.
     82 Ideally all references to types would be `const`, but that's a lot of work.
     83 Before modifying a type,
     84 it needs to be copied using `block_dup_type` or `expr_dup_type`.
     85 
     86 ## tnode_t
     87 
     88 When lint parses an expression,
     89 it builds a tree of nodes representing the AST.
     90 Each node has an operator that defines which other members may be accessed.
     91 The operators and their properties are defined in `ops.def`.
     92 Some examples for operators:
     93 
     94 | Operator | Meaning                                                 |
     95 |----------|---------------------------------------------------------|
     96 | CON      | compile-time constant in `tn_val`                       |
     97 | NAME     | references the identifier in `tn_sym`                   |
     98 | UPLUS    | the unary operator `+tn_left`                           |
     99 | PLUS     | the binary operator `tn_left + tn_right`                |
    100 | CALL     | a function call, typically CALL(LOAD(NAME("function"))) |
    101 | ICALL    | an indirect function call                               |
    102 | CVT      | an implicit conversion or an explicit cast              |
    103 
    104 See `debug_node` for how to interpret the members of `tnode_t`.
    105 
    106 ## sym_t
    107 
    108 There is a single symbol table (`symtab`) for the whole translation unit.
    109 This means that the same identifier may appear multiple times.
    110 To distinguish the identifiers, each symbol has a block level.
    111 Symbols from inner scopes are added to the beginning of the table,
    112 so they are found first when looking for the identifier.
    113 
    114 # Memory management
    115 
    116 ## Block scope
    117 
    118 The memory that is allocated by the `block_*_alloc` functions is freed at the
    119 end of analyzing the block, that is, after the closing `}`.
    120 See `compound_statement_rbrace:` in `cgram.y`.
    121 
    122 ## Expression scope
    123 
    124 The memory that is allocated by the `expr_*_alloc` functions is freed at the
    125 end of analyzing the expression.
    126 See `expr_free_all`.
    127 
    128 # Null pointers
    129 
    130 * Expressions can be null.
    131     * This typically happens in case of syntax errors or other errors.
    132 * The subtype of a pointer, array or function is never null.
    133 
    134 # Common variable names
    135 
    136 | Name | Type      | Meaning                                              |
    137 |------|-----------|------------------------------------------------------|
    138 | t    | `tspec_t` | a simple type such as `INT`, `FUNC`, `PTR`           |
    139 | tp   | `type_t`  | a complete type such as `pointer to array[3] of int` |
    140 | stp  | `type_t`  | the subtype of a pointer, array or function          |
    141 | tn   | `tnode_t` | a tree node, mostly used for expressions             |
    142 | op   | `op_t`    | an operator used in an expression                    |
    143 | ln   | `tnode_t` | the left-hand operand of a binary operator           |
    144 | rn   | `tnode_t` | the right-hand operand of a binary operator          |
    145 | sym  | `sym_t`   | a symbol from the symbol table                       |
    146 
    147 # Abbreviations in variable names
    148 
    149 | Abbr | Expanded                                    |
    150 |------|---------------------------------------------|
    151 | l    | left                                        |
    152 | r    | right                                       |
    153 | o    | old (during type conversions)               |
    154 | n    | new (during type conversions)               |
    155 | op   | operator                                    |
    156 | arg  | the number of the argument, for diagnostics |
    157 
    158 # Debugging
    159 
    160 Useful breakpoints are:
    161 
    162 | Function            | File   | Remarks                                              |
    163 |---------------------|--------|------------------------------------------------------|
    164 | build_binary        | tree.c | Creates an expression for a unary or binary operator |
    165 | initialization_expr | init.c | Checks a single initializer                          |
    166 | expr                | tree.c | Checks a full expression                             |
    167 | typeok              | tree.c | Checks two types for compatibility                   |
    168 | vwarning_at         | err.c  | Prints a warning                                     |
    169 | verror_at           | err.c  | Prints an error                                      |
    170 | assert_failed       | err.c  | Prints the location of a failed assertion            |
    171 
    172 # Tests
    173 
    174 The tests are in `tests/usr.bin/xlint`.
    175 By default, each test is run with the lint flags `-g` for GNU mode,
    176 `-S` for C99 mode and `-w` to report warnings as errors.
    177 
    178 Each test can override the lint flags using comments of the following forms:
    179 
    180 * `/* lint1-flags: -tw */` replaces the default flags.
    181 * `/* lint1-extra-flags: -p */` adds to the default flags.
    182 
    183 Most tests check the diagnostics that lint generates.
    184 They do this by placing `expect` comments near the location of the diagnostic.
    185 The comment `/* expect+1: ... */` expects a diagnostic to be generated for the
    186 code 1 line below, `/* expect-5: ... */` expects a diagnostic to be generated
    187 for the code 5 lines above.
    188 Each `expect` comment must be in a single line.
    189 At the start and the end of the comment, the placeholder `...` stands for an
    190 arbitrary sequence of characters.
    191 There may be other code or comments in the same line of the `.c` file.
    192 
    193 Each diagnostic has its own test `msg_???.c` that triggers the corresponding
    194 diagnostic.
    195 Most other tests focus on a single feature.
    196 
    197 ## Adding a new test
    198 
    199 1. Run `make add-test NAME=test_name`.
    200 2. Run `cd ../../../tests/usr.bin/xlint/lint1`.
    201 3. Sort the `FILES` lines in `Makefile`.
    202 4. Make the test generate the desired diagnostics.
    203 5. Run `./accept.sh test_name` until it no longer complains.
    204 6. Run `cd ../../..`.
    205 7. Run `cvs commit distrib/sets/lists/tests/mi tests/usr.bin/xlint`.
    206