README.md revision 1.10 1 1.10 rillig [//]: # ($NetBSD: README.md,v 1.10 2023/01/21 21:14:38 rillig Exp $)
2 1.1 rillig
3 1.1 rillig # Introduction
4 1.1 rillig
5 1.3 rillig Lint1 analyzes a single translation unit of C code.
6 1.3 rillig
7 1.7 rillig * It reads the output of the C preprocessor, retaining the comments.
8 1.3 rillig * The lexer in `scan.l` and `lex.c` splits the input into tokens.
9 1.3 rillig * The parser in `cgram.y` creates types and expressions from the tokens.
10 1.3 rillig * It checks declarations in `decl.c`.
11 1.3 rillig * It checks initializations in `init.c`.
12 1.3 rillig * It checks types and expressions in `tree.c`.
13 1.3 rillig
14 1.3 rillig To see how a specific lint message is triggered, read the corresponding unit
15 1.1 rillig test in `tests/usr.bin/xlint/lint1/msg_???.c`.
16 1.1 rillig
17 1.1 rillig # Features
18 1.1 rillig
19 1.1 rillig ## Type checking
20 1.1 rillig
21 1.1 rillig Lint has stricter type checking than most C compilers.
22 1.7 rillig
23 1.7 rillig In _strict bool mode_, lint treats `bool` as a type that is incompatible with
24 1.7 rillig other scalar types, like in C#, Go, Java.
25 1.7 rillig See the test `d_c99_bool_strict.c` for details.
26 1.7 rillig
27 1.7 rillig Lint warns about type conversions that may result in alignment problems.
28 1.7 rillig See the test `msg_135.c` for examples.
29 1.1 rillig
30 1.1 rillig ## Control flow analysis
31 1.1 rillig
32 1.1 rillig Lint roughly tracks the control flow inside a single function.
33 1.3 rillig It doesn't follow `goto` statements precisely though,
34 1.3 rillig it rather assumes that each label is reachable.
35 1.1 rillig See the test `msg_193.c` for examples.
36 1.1 rillig
37 1.1 rillig ## Error handling
38 1.1 rillig
39 1.1 rillig Lint tries to continue parsing and checking even after seeing errors.
40 1.1 rillig This part of lint is not robust though, so expect some crashes here,
41 1.1 rillig as variables may not be properly initialized or be null pointers.
42 1.3 rillig The cleanup after handling a parse error is often incomplete.
43 1.1 rillig
44 1.7 rillig ## Configurable diagnostic messages
45 1.7 rillig
46 1.8 rillig Whether lint prints a message and whether each message is an error, a warning
47 1.8 rillig or just informational depends on several things:
48 1.7 rillig
49 1.7 rillig * The language level, with its possible values:
50 1.7 rillig * traditional C (`-t`)
51 1.7 rillig * migration from traditional C and C90 (default)
52 1.7 rillig * C90 (`-s`)
53 1.7 rillig * C99 (`-S`)
54 1.7 rillig * C11 (`-Ac11`)
55 1.7 rillig * In GCC mode (`-g`), lint allows several GNU extensions,
56 1.7 rillig reducing the amount of printed messages.
57 1.7 rillig * In strict bool mode (`-T`), lint issues errors when `bool` is mixed with
58 1.7 rillig other scalar types, reusing the existing messages 107 and 211, while also
59 1.7 rillig defining new messages that are specific to strict bool mode.
60 1.7 rillig * The option `-a` performs the check for lossy conversions from large integer
61 1.7 rillig types, the option `-aa` extends this check to small integer types as well,
62 1.7 rillig reusing the same message ID.
63 1.7 rillig * The option `-X` suppresses arbitrary messages by their message ID.
64 1.8 rillig * The option `-q` enables additional queries that are not suitable as regular
65 1.8 rillig warnings but may be interesting to look at on a case-by-case basis.
66 1.7 rillig
67 1.1 rillig # Fundamental types
68 1.1 rillig
69 1.1 rillig Lint mainly analyzes expressions (`tnode_t`), which are formed from operators
70 1.1 rillig (`op_t`) and their operands (`tnode_t`).
71 1.10 rillig Each node has a data type (`type_t`) and a few other properties that depend on
72 1.10 rillig the operator.
73 1.1 rillig
74 1.1 rillig ## type_t
75 1.1 rillig
76 1.10 rillig The basic types are `int`, `_Bool`, `unsigned long`, `pointer` and so on,
77 1.3 rillig as defined in `tspec_t`.
78 1.3 rillig
79 1.10 rillig Concrete types like `int` or `const char *` are created by `gettyp(INT)`,
80 1.3 rillig or by deriving new types from existing types, using `block_derive_pointer`,
81 1.2 rillig `block_derive_array` and `block_derive_function`.
82 1.1 rillig (See [below](#memory-management) for the meaning of the prefix `block_`.)
83 1.1 rillig
84 1.1 rillig After a type has been created, it should not be modified anymore.
85 1.10 rillig Ideally all references to types would be `const`, but that's still on the
86 1.10 rillig to-do list and not trivial.
87 1.10 rillig In the meantime, before modifying a type,
88 1.1 rillig it needs to be copied using `block_dup_type` or `expr_dup_type`.
89 1.1 rillig
90 1.1 rillig ## tnode_t
91 1.1 rillig
92 1.5 rillig When lint parses an expression,
93 1.1 rillig it builds a tree of nodes representing the AST.
94 1.5 rillig Each node has an operator that defines which other members may be accessed.
95 1.1 rillig The operators and their properties are defined in `ops.def`.
96 1.1 rillig Some examples for operators:
97 1.1 rillig
98 1.10 rillig | Operator | Meaning |
99 1.10 rillig |----------|--------------------------------------------|
100 1.10 rillig | CON | compile-time constant in `tn_val` |
101 1.10 rillig | NAME | references the identifier in `tn_sym` |
102 1.10 rillig | UPLUS | the unary operator `+tn_left` |
103 1.10 rillig | PLUS | the binary operator `tn_left + tn_right` |
104 1.10 rillig | CALL | a direct function call |
105 1.10 rillig | ICALL | an indirect function call |
106 1.10 rillig | CVT | an implicit conversion or an explicit cast |
107 1.10 rillig
108 1.10 rillig As an example, the expression `strcmp(names[i], "name")` has this internal
109 1.10 rillig structure:
110 1.10 rillig
111 1.10 rillig ~~~text
112 1.10 rillig 1: 'call' type 'int'
113 1.10 rillig 2: '&' type 'pointer to function(pointer to const char, pointer to const char) returning int'
114 1.10 rillig 3: 'name' 'strcmp' with extern 'function(pointer to const char, pointer to const char) returning int'
115 1.10 rillig 4: 'push' type 'pointer to const char'
116 1.10 rillig 5: 'convert' type 'pointer to const char'
117 1.10 rillig 6: '&' type 'pointer to char'
118 1.10 rillig 7: 'string' type 'array[5] of char', lvalue, length 4, "name"
119 1.10 rillig 8: 'push' type 'pointer to const char'
120 1.10 rillig 9: 'load' type 'pointer to const char'
121 1.10 rillig 10: '*' type 'pointer to const char', lvalue
122 1.10 rillig 11: '+' type 'pointer to pointer to const char'
123 1.10 rillig 12: 'load' type 'pointer to pointer to const char'
124 1.10 rillig 13: 'name' 'names' with auto 'pointer to pointer to const char', lvalue
125 1.10 rillig 14: '*' type 'long'
126 1.10 rillig 15: 'convert' type 'long'
127 1.10 rillig 16: 'load' type 'int'
128 1.10 rillig 17: 'name' 'i' with auto 'int', lvalue
129 1.10 rillig 18: 'constant' type 'long', value 8
130 1.10 rillig ~~~
131 1.10 rillig
132 1.10 rillig | Lines | Notes |
133 1.10 rillig |--------|------------------------------------------------------------------|
134 1.10 rillig | 4, 8 | Each argument of the function call corresponds to a `PUSH` node. |
135 1.10 rillig | 5, 9 | The left operand of a `PUSH` node is the actual argument. |
136 1.10 rillig | 8 | The right operand is the `PUSH` node of the previous argument. |
137 1.10 rillig | 5, 9 | The arguments of a call are ordered from right to left. |
138 1.10 rillig | 10, 11 | Array access is represented as `*(left + right)`. |
139 1.10 rillig | 14, 18 | Array and struct offsets are in premultiplied form. |
140 1.10 rillig | 18 | The size of a pointer on this platform is 8 bytes. |
141 1.1 rillig
142 1.3 rillig See `debug_node` for how to interpret the members of `tnode_t`.
143 1.3 rillig
144 1.1 rillig ## sym_t
145 1.1 rillig
146 1.1 rillig There is a single symbol table (`symtab`) for the whole translation unit.
147 1.1 rillig This means that the same identifier may appear multiple times.
148 1.1 rillig To distinguish the identifiers, each symbol has a block level.
149 1.1 rillig Symbols from inner scopes are added to the beginning of the table,
150 1.1 rillig so they are found first when looking for the identifier.
151 1.1 rillig
152 1.1 rillig # Memory management
153 1.1 rillig
154 1.1 rillig ## Block scope
155 1.1 rillig
156 1.1 rillig The memory that is allocated by the `block_*_alloc` functions is freed at the
157 1.1 rillig end of analyzing the block, that is, after the closing `}`.
158 1.1 rillig See `compound_statement_rbrace:` in `cgram.y`.
159 1.1 rillig
160 1.1 rillig ## Expression scope
161 1.1 rillig
162 1.1 rillig The memory that is allocated by the `expr_*_alloc` functions is freed at the
163 1.1 rillig end of analyzing the expression.
164 1.1 rillig See `expr_free_all`.
165 1.1 rillig
166 1.1 rillig # Null pointers
167 1.1 rillig
168 1.1 rillig * Expressions can be null.
169 1.2 rillig * This typically happens in case of syntax errors or other errors.
170 1.1 rillig * The subtype of a pointer, array or function is never null.
171 1.1 rillig
172 1.1 rillig # Common variable names
173 1.1 rillig
174 1.1 rillig | Name | Type | Meaning |
175 1.1 rillig |------|-----------|------------------------------------------------------|
176 1.1 rillig | t | `tspec_t` | a simple type such as `INT`, `FUNC`, `PTR` |
177 1.1 rillig | tp | `type_t` | a complete type such as `pointer to array[3] of int` |
178 1.1 rillig | stp | `type_t` | the subtype of a pointer, array or function |
179 1.1 rillig | tn | `tnode_t` | a tree node, mostly used for expressions |
180 1.1 rillig | op | `op_t` | an operator used in an expression |
181 1.3 rillig | ln | `tnode_t` | the left-hand operand of a binary operator |
182 1.3 rillig | rn | `tnode_t` | the right-hand operand of a binary operator |
183 1.1 rillig | sym | `sym_t` | a symbol from the symbol table |
184 1.1 rillig
185 1.3 rillig # Abbreviations in variable names
186 1.1 rillig
187 1.3 rillig | Abbr | Expanded |
188 1.3 rillig |------|---------------------------------------------|
189 1.3 rillig | l | left |
190 1.3 rillig | r | right |
191 1.3 rillig | o | old (during type conversions) |
192 1.3 rillig | n | new (during type conversions) |
193 1.3 rillig | op | operator |
194 1.3 rillig | arg | the number of the argument, for diagnostics |
195 1.1 rillig
196 1.2 rillig # Debugging
197 1.2 rillig
198 1.2 rillig Useful breakpoints are:
199 1.2 rillig
200 1.9 rillig | Function/Code | File | Remarks |
201 1.9 rillig |---------------------|---------|------------------------------------------------------|
202 1.9 rillig | build_binary | tree.c | Creates an expression for a unary or binary operator |
203 1.9 rillig | initialization_expr | init.c | Checks a single initializer |
204 1.9 rillig | expr | tree.c | Checks a full expression |
205 1.9 rillig | typeok | tree.c | Checks two types for compatibility |
206 1.9 rillig | vwarning_at | err.c | Prints a warning |
207 1.9 rillig | verror_at | err.c | Prints an error |
208 1.9 rillig | assert_failed | err.c | Prints the location of a failed assertion |
209 1.9 rillig | `switch (yyn)` | cgram.c | Reduction of a grammar rule |
210 1.2 rillig
211 1.1 rillig # Tests
212 1.1 rillig
213 1.1 rillig The tests are in `tests/usr.bin/xlint`.
214 1.2 rillig By default, each test is run with the lint flags `-g` for GNU mode,
215 1.1 rillig `-S` for C99 mode and `-w` to report warnings as errors.
216 1.1 rillig
217 1.1 rillig Each test can override the lint flags using comments of the following forms:
218 1.2 rillig
219 1.1 rillig * `/* lint1-flags: -tw */` replaces the default flags.
220 1.1 rillig * `/* lint1-extra-flags: -p */` adds to the default flags.
221 1.1 rillig
222 1.1 rillig Most tests check the diagnostics that lint generates.
223 1.1 rillig They do this by placing `expect` comments near the location of the diagnostic.
224 1.1 rillig The comment `/* expect+1: ... */` expects a diagnostic to be generated for the
225 1.1 rillig code 1 line below, `/* expect-5: ... */` expects a diagnostic to be generated
226 1.1 rillig for the code 5 lines above.
227 1.1 rillig Each `expect` comment must be in a single line.
228 1.6 rillig At the start and the end of the comment, the placeholder `...` stands for an
229 1.6 rillig arbitrary sequence of characters.
230 1.6 rillig There may be other code or comments in the same line of the `.c` file.
231 1.1 rillig
232 1.1 rillig Each diagnostic has its own test `msg_???.c` that triggers the corresponding
233 1.1 rillig diagnostic.
234 1.1 rillig Most other tests focus on a single feature.
235 1.1 rillig
236 1.1 rillig ## Adding a new test
237 1.1 rillig
238 1.4 rillig 1. Run `make add-test NAME=test_name`.
239 1.7 rillig 2. Run `cd ../../../tests/usr.bin/xlint/lint1`.
240 1.7 rillig 3. Sort the `FILES` lines in `Makefile`.
241 1.7 rillig 4. Make the test generate the desired diagnostics.
242 1.7 rillig 5. Run `./accept.sh test_name` until it no longer complains.
243 1.7 rillig 6. Run `cd ../../..`.
244 1.7 rillig 7. Run `cvs commit distrib/sets/lists/tests/mi tests/usr.bin/xlint`.
245