1 1.18 rillig [//]: # ($NetBSD: README.md,v 1.18 2024/03/31 20:28:45 rillig Exp $) 2 1.1 rillig 3 1.1 rillig # Introduction 4 1.1 rillig 5 1.3 rillig Lint1 analyzes a single translation unit of C code. 6 1.3 rillig 7 1.7 rillig * It reads the output of the C preprocessor, retaining the comments. 8 1.3 rillig * The lexer in `scan.l` and `lex.c` splits the input into tokens. 9 1.3 rillig * The parser in `cgram.y` creates types and expressions from the tokens. 10 1.17 rillig * The checks for declarations are in `decl.c`. 11 1.17 rillig * The checks for initializations are in `init.c`. 12 1.17 rillig * The checks for types and expressions are in `tree.c`. 13 1.3 rillig 14 1.3 rillig To see how a specific lint message is triggered, read the corresponding unit 15 1.1 rillig test in `tests/usr.bin/xlint/lint1/msg_???.c`. 16 1.1 rillig 17 1.1 rillig # Features 18 1.1 rillig 19 1.1 rillig ## Type checking 20 1.1 rillig 21 1.1 rillig Lint has stricter type checking than most C compilers. 22 1.7 rillig 23 1.7 rillig In _strict bool mode_, lint treats `bool` as a type that is incompatible with 24 1.7 rillig other scalar types, like in C#, Go, Java. 25 1.7 rillig See the test `d_c99_bool_strict.c` for details. 26 1.7 rillig 27 1.7 rillig Lint warns about type conversions that may result in alignment problems. 28 1.7 rillig See the test `msg_135.c` for examples. 29 1.1 rillig 30 1.1 rillig ## Control flow analysis 31 1.1 rillig 32 1.1 rillig Lint roughly tracks the control flow inside a single function. 33 1.3 rillig It doesn't follow `goto` statements precisely though, 34 1.3 rillig it rather assumes that each label is reachable. 35 1.1 rillig See the test `msg_193.c` for examples. 36 1.1 rillig 37 1.1 rillig ## Error handling 38 1.1 rillig 39 1.1 rillig Lint tries to continue parsing and checking even after seeing errors. 40 1.1 rillig This part of lint is not robust though, so expect some crashes here, 41 1.1 rillig as variables may not be properly initialized or be null pointers. 42 1.3 rillig The cleanup after handling a parse error is often incomplete. 43 1.1 rillig 44 1.7 rillig ## Configurable diagnostic messages 45 1.7 rillig 46 1.8 rillig Whether lint prints a message and whether each message is an error, a warning 47 1.8 rillig or just informational depends on several things: 48 1.7 rillig 49 1.7 rillig * The language level, with its possible values: 50 1.7 rillig * traditional C (`-t`) 51 1.17 rillig * migration from traditional C to C90 (default) 52 1.7 rillig * C90 (`-s`) 53 1.7 rillig * C99 (`-S`) 54 1.7 rillig * C11 (`-Ac11`) 55 1.17 rillig * C23 (`-Ac23`) 56 1.7 rillig * In GCC mode (`-g`), lint allows several GNU extensions, 57 1.7 rillig reducing the amount of printed messages. 58 1.7 rillig * In strict bool mode (`-T`), lint issues errors when `bool` is mixed with 59 1.7 rillig other scalar types, reusing the existing messages 107 and 211, while also 60 1.7 rillig defining new messages that are specific to strict bool mode. 61 1.7 rillig * The option `-a` performs the check for lossy conversions from large integer 62 1.7 rillig types, the option `-aa` extends this check to small integer types as well, 63 1.7 rillig reusing the same message ID. 64 1.7 rillig * The option `-X` suppresses arbitrary messages by their message ID. 65 1.8 rillig * The option `-q` enables additional queries that are not suitable as regular 66 1.8 rillig warnings but may be interesting to look at on a case-by-case basis. 67 1.7 rillig 68 1.12 rillig # Limitations 69 1.12 rillig 70 1.12 rillig Lint operates on the level of individual expressions. 71 1.12 rillig 72 1.12 rillig * It does not build an AST of the statements of a function, therefore it 73 1.12 rillig cannot reliably analyze the control flow in a single function. 74 1.12 rillig * It does not store the control flow properties of functions, therefore it 75 1.13 rillig cannot relate parameter nullability with the return value. 76 1.12 rillig * It does not have information about functions, except for their prototypes, 77 1.12 rillig therefore it cannot relate them across translation units. 78 1.12 rillig * It does not store detailed information about complex data types, therefore 79 1.12 rillig it cannot cross-check them across translation units. 80 1.12 rillig 81 1.1 rillig # Fundamental types 82 1.1 rillig 83 1.1 rillig Lint mainly analyzes expressions (`tnode_t`), which are formed from operators 84 1.1 rillig (`op_t`) and their operands (`tnode_t`). 85 1.10 rillig Each node has a data type (`type_t`) and a few other properties that depend on 86 1.10 rillig the operator. 87 1.1 rillig 88 1.1 rillig ## type_t 89 1.1 rillig 90 1.10 rillig The basic types are `int`, `_Bool`, `unsigned long`, `pointer` and so on, 91 1.3 rillig as defined in `tspec_t`. 92 1.3 rillig 93 1.10 rillig Concrete types like `int` or `const char *` are created by `gettyp(INT)`, 94 1.3 rillig or by deriving new types from existing types, using `block_derive_pointer`, 95 1.2 rillig `block_derive_array` and `block_derive_function`. 96 1.1 rillig (See [below](#memory-management) for the meaning of the prefix `block_`.) 97 1.1 rillig 98 1.1 rillig After a type has been created, it should not be modified anymore. 99 1.10 rillig Ideally all references to types would be `const`, but that's still on the 100 1.10 rillig to-do list and not trivial. 101 1.10 rillig In the meantime, before modifying a type, 102 1.1 rillig it needs to be copied using `block_dup_type` or `expr_dup_type`. 103 1.1 rillig 104 1.1 rillig ## tnode_t 105 1.1 rillig 106 1.5 rillig When lint parses an expression, 107 1.1 rillig it builds a tree of nodes representing the AST. 108 1.5 rillig Each node has an operator that defines which other members may be accessed. 109 1.14 rillig The operators and their properties are defined in `oper.c`. 110 1.1 rillig Some examples for operators: 111 1.1 rillig 112 1.16 rillig | Operator | Meaning | 113 1.16 rillig |----------|------------------------------------------------| 114 1.16 rillig | CON | compile-time constant in `u.value` | 115 1.16 rillig | NAME | references the identifier in `u.sym` | 116 1.16 rillig | UPLUS | the unary operator `+u.ops.left` | 117 1.16 rillig | PLUS | the binary operator `u.ops.left + u.ops.right` | 118 1.18 rillig | CALL | a function call | 119 1.16 rillig | CVT | an implicit conversion or an explicit cast | 120 1.10 rillig 121 1.10 rillig As an example, the expression `strcmp(names[i], "name")` has this internal 122 1.10 rillig structure: 123 1.10 rillig 124 1.10 rillig ~~~text 125 1.10 rillig 1: 'call' type 'int' 126 1.15 rillig 2: '&' type 'pointer to function(pointer to const char, pointer to const char) returning int' 127 1.15 rillig 3: 'name' 'strcmp' with extern 'function(pointer to const char, pointer to const char) returning int' 128 1.15 rillig 4: 'load' type 'pointer to const char' 129 1.15 rillig 5: '*' type 'pointer to const char', lvalue 130 1.15 rillig 6: '+' type 'pointer to pointer to const char' 131 1.15 rillig 7: 'load' type 'pointer to pointer to const char' 132 1.15 rillig 8: 'name' 'names' with auto 'pointer to pointer to const char', lvalue 133 1.15 rillig 9: '*' type 'long' 134 1.15 rillig 10: 'convert' type 'long' 135 1.15 rillig 11: 'load' type 'int' 136 1.15 rillig 12: 'name' 'i' with auto 'int', lvalue 137 1.15 rillig 13: 'constant' type 'long', value 8 138 1.15 rillig 14: 'convert' type 'pointer to const char' 139 1.15 rillig 15: '&' type 'pointer to char' 140 1.15 rillig 16: 'string' type 'array[5] of char', lvalue, "name" 141 1.10 rillig ~~~ 142 1.10 rillig 143 1.15 rillig | Lines | Notes | 144 1.15 rillig |------------|-------------------------------------------------------------| 145 1.15 rillig | 1, 2, 4, 7 | A function call consists of the function and its arguments. | 146 1.15 rillig | 4, 14 | The arguments of a call are ordered from left to right. | 147 1.15 rillig | 5, 6 | Array access is represented as `*(left + right)`. | 148 1.15 rillig | 9, 13 | Array and struct offsets are in premultiplied form. | 149 1.15 rillig | 9 | The type `ptrdiff_t` on this platform is `long`, not `int`. | 150 1.15 rillig | 13 | The size of a pointer on this platform is 8 bytes. | 151 1.1 rillig 152 1.3 rillig See `debug_node` for how to interpret the members of `tnode_t`. 153 1.3 rillig 154 1.1 rillig ## sym_t 155 1.1 rillig 156 1.1 rillig There is a single symbol table (`symtab`) for the whole translation unit. 157 1.1 rillig This means that the same identifier may appear multiple times. 158 1.1 rillig To distinguish the identifiers, each symbol has a block level. 159 1.1 rillig Symbols from inner scopes are added to the beginning of the table, 160 1.1 rillig so they are found first when looking for the identifier. 161 1.1 rillig 162 1.1 rillig # Memory management 163 1.1 rillig 164 1.1 rillig ## Block scope 165 1.1 rillig 166 1.1 rillig The memory that is allocated by the `block_*_alloc` functions is freed at the 167 1.1 rillig end of analyzing the block, that is, after the closing `}`. 168 1.1 rillig See `compound_statement_rbrace:` in `cgram.y`. 169 1.1 rillig 170 1.1 rillig ## Expression scope 171 1.1 rillig 172 1.1 rillig The memory that is allocated by the `expr_*_alloc` functions is freed at the 173 1.1 rillig end of analyzing the expression. 174 1.1 rillig See `expr_free_all`. 175 1.1 rillig 176 1.1 rillig # Null pointers 177 1.1 rillig 178 1.1 rillig * Expressions can be null. 179 1.2 rillig * This typically happens in case of syntax errors or other errors. 180 1.1 rillig * The subtype of a pointer, array or function is never null. 181 1.1 rillig 182 1.1 rillig # Common variable names 183 1.1 rillig 184 1.1 rillig | Name | Type | Meaning | 185 1.1 rillig |------|-----------|------------------------------------------------------| 186 1.1 rillig | t | `tspec_t` | a simple type such as `INT`, `FUNC`, `PTR` | 187 1.1 rillig | tp | `type_t` | a complete type such as `pointer to array[3] of int` | 188 1.1 rillig | stp | `type_t` | the subtype of a pointer, array or function | 189 1.1 rillig | tn | `tnode_t` | a tree node, mostly used for expressions | 190 1.1 rillig | op | `op_t` | an operator used in an expression | 191 1.3 rillig | ln | `tnode_t` | the left-hand operand of a binary operator | 192 1.3 rillig | rn | `tnode_t` | the right-hand operand of a binary operator | 193 1.1 rillig | sym | `sym_t` | a symbol from the symbol table | 194 1.1 rillig 195 1.3 rillig # Abbreviations in variable names 196 1.1 rillig 197 1.13 rillig | Abbr | Expanded | 198 1.13 rillig |------|----------------------------------------------| 199 1.13 rillig | l | left | 200 1.13 rillig | r | right | 201 1.13 rillig | o | old (during type conversions) | 202 1.13 rillig | n | new (during type conversions) | 203 1.13 rillig | op | operator | 204 1.13 rillig | arg | the number of the parameter, for diagnostics | 205 1.1 rillig 206 1.2 rillig # Debugging 207 1.2 rillig 208 1.2 rillig Useful breakpoints are: 209 1.2 rillig 210 1.9 rillig | Function/Code | File | Remarks | 211 1.9 rillig |---------------------|---------|------------------------------------------------------| 212 1.9 rillig | build_binary | tree.c | Creates an expression for a unary or binary operator | 213 1.9 rillig | initialization_expr | init.c | Checks a single initializer | 214 1.9 rillig | expr | tree.c | Checks a full expression | 215 1.9 rillig | typeok | tree.c | Checks two types for compatibility | 216 1.9 rillig | vwarning_at | err.c | Prints a warning | 217 1.9 rillig | verror_at | err.c | Prints an error | 218 1.9 rillig | assert_failed | err.c | Prints the location of a failed assertion | 219 1.9 rillig | `switch (yyn)` | cgram.c | Reduction of a grammar rule | 220 1.2 rillig 221 1.1 rillig # Tests 222 1.1 rillig 223 1.1 rillig The tests are in `tests/usr.bin/xlint`. 224 1.2 rillig By default, each test is run with the lint flags `-g` for GNU mode, 225 1.1 rillig `-S` for C99 mode and `-w` to report warnings as errors. 226 1.1 rillig 227 1.1 rillig Each test can override the lint flags using comments of the following forms: 228 1.2 rillig 229 1.1 rillig * `/* lint1-flags: -tw */` replaces the default flags. 230 1.1 rillig * `/* lint1-extra-flags: -p */` adds to the default flags. 231 1.1 rillig 232 1.1 rillig Most tests check the diagnostics that lint generates. 233 1.1 rillig They do this by placing `expect` comments near the location of the diagnostic. 234 1.1 rillig The comment `/* expect+1: ... */` expects a diagnostic to be generated for the 235 1.1 rillig code 1 line below, `/* expect-5: ... */` expects a diagnostic to be generated 236 1.1 rillig for the code 5 lines above. 237 1.11 rillig An `expect` comment cannot span multiple lines. 238 1.6 rillig At the start and the end of the comment, the placeholder `...` stands for an 239 1.6 rillig arbitrary sequence of characters. 240 1.6 rillig There may be other code or comments in the same line of the `.c` file. 241 1.1 rillig 242 1.1 rillig Each diagnostic has its own test `msg_???.c` that triggers the corresponding 243 1.1 rillig diagnostic. 244 1.1 rillig Most other tests focus on a single feature. 245 1.1 rillig 246 1.1 rillig ## Adding a new test 247 1.1 rillig 248 1.4 rillig 1. Run `make add-test NAME=test_name`. 249 1.7 rillig 2. Run `cd ../../../tests/usr.bin/xlint/lint1`. 250 1.11 rillig 3. Make the test generate the desired diagnostics. 251 1.11 rillig 4. Run `./accept.sh test_name` until it no longer complains. 252 1.11 rillig 5. Run `cd ../../..`. 253 1.11 rillig 6. Run `cvs commit distrib/sets/lists/tests/mi tests/usr.bin/xlint`. 254