History log of /src/usr.bin/indent/lexi.c |
Revision | | Date | Author | Comments |
1.242 |
| 03-Dec-2023 |
rillig | indent: inline input-related macros
No binary change.
|
1.241 |
| 03-Dec-2023 |
rillig | indent: use line number of the token start in diagnostics
Previously, the line number of the end of the token was used, which was confusing in debug mode.
|
1.240 |
| 03-Dec-2023 |
rillig | indent: fix line number counting in function definition
In a function definition that is split on two lines, if the first line ends with a '*', the following line break didn't include the line number.
|
1.239 |
| 26-Jun-2023 |
rillig | indent: improve heuristics for '*' as pointer in for loops
|
1.238 |
| 26-Jun-2023 |
rillig | indent: improve heuristics for '*' as a pointer type
|
1.237 |
| 26-Jun-2023 |
rillig | indent: clean up indentation
|
1.236 |
| 25-Jun-2023 |
rillig | indent: move cast detection from the lexer to the main processor
It is not the job of the lexer to modify the parser state.
|
1.235 |
| 25-Jun-2023 |
rillig | indent: treat 'complex' and 'imaginary' as type modifiers, not as types
|
1.234 |
| 25-Jun-2023 |
rillig | indent: fix formatting of parenthesized name in function definition
|
1.233 |
| 25-Jun-2023 |
rillig | indent: don't use strspn on inp_p, as it is not null-terminated
No functional change.
|
1.232 |
| 17-Jun-2023 |
rillig | indent: clean up
Extract duplicate code for handling line continuations.
Prevent theoretic undefined behavior in strspn, as inp.s is not null-terminated.
Remove adding extra space characters when processing comments, as these are not necessary to force a line of output.
No functional change.
|
1.231 |
| 17-Jun-2023 |
rillig | indent: miscellaneous cleanups
No binary change.
|
1.230 |
| 16-Jun-2023 |
rillig | indent: merge lexer symbols for type in/outside parentheses
|
1.229 |
| 14-Jun-2023 |
rillig | indent: clean up array indexing for parser symbols
With 'top' pointing to the actual top element, the array was indexed in the closed range from 0 to top. All other arrays are indexed by the usual half-open interval from 0 to len.
No functional change.
|
1.228 |
| 14-Jun-2023 |
rillig | indent: allow more than 20 nested parentheses or brackets
|
1.227 |
| 14-Jun-2023 |
rillig | indent: remove another flag from parser state
When processing a comment, the flag ps.next_col_1 was not used for the next token, but for a line within a comment. As its scope was limited to a single comment, there is no need to store it any longer than that
No functional change.
|
1.226 |
| 14-Jun-2023 |
rillig | indent: remove a redundant flag from the parser state
No functional change.
|
1.225 |
| 10-Jun-2023 |
rillig | indent: miscellaneous cleanups
|
1.224 |
| 10-Jun-2023 |
rillig | indent: clean up function names, fix blank lines in debug output
|
1.223 |
| 10-Jun-2023 |
rillig | indent: in debug mode, null-terminate buffers
|
1.222 |
| 10-Jun-2023 |
rillig | indent: clean up function and variable names
|
1.221 |
| 10-Jun-2023 |
rillig | indent: rename and sort variables in parser state
No functional change.
|
1.220 |
| 09-Jun-2023 |
rillig | indent: clean up lexer
No functional change.
|
1.219 |
| 09-Jun-2023 |
rillig | indent: improve heuristics for function declaration vs. definition
|
1.218 |
| 09-Jun-2023 |
rillig | indent: format its own code
|
1.217 |
| 08-Jun-2023 |
rillig | indent: remove fragile heuristic for detecting cast expressions
The assumption that in an expression of the form '(a * anything)', the '*' marks a pointer type was too simple-minded.
For now, fix the obvious cases and leave the others for later. If needed, they can be worked around using the '-T' option.
|
1.216 |
| 07-Jun-2023 |
rillig | indent: extract the stack of parser symbols to a separate struct
No functional change.
|
1.215 |
| 06-Jun-2023 |
rillig | indent: sort functions in call order
No functional change.
|
1.214 |
| 04-Jun-2023 |
rillig | indent: do not parse '&&&&&&&' as a single binary operator
|
1.213 |
| 04-Jun-2023 |
rillig | indent: fix '*=' to be a binary operator, not a unary one
|
1.212 |
| 04-Jun-2023 |
rillig | indent: remove read pointer from buffers that don't need it
The only buffer that needs a read pointer is the current input line in 'inp'.
No functional change.
|
1.211 |
| 04-Jun-2023 |
rillig | indent: rename struct field, for better symmetry
No binary change outside debug mode.
|
1.210 |
| 04-Jun-2023 |
rillig | lint: use separate lexer symbols for 'case' and 'default'
It's not strictly necessary since these tokens behave in the same way, still, the code is more straight-forward when there are separate tokens.
|
1.209 |
| 04-Jun-2023 |
rillig | indent: classify 'inline' as a modifier rather than a word
|
1.208 |
| 04-Jun-2023 |
rillig | indent: use separate lexer symbols for the different kinds of ':'
|
1.207 |
| 04-Jun-2023 |
rillig | indent: separate code for handling parentheses and brackets
Handling parentheses is more complicated than for brackets.
|
1.206 |
| 23-May-2023 |
rillig | indent: separate code for handling enums from the lexer
The lexer's responsibility is to generate tokens, it's not supposed to update the parser state. Centralize the state transitions that control indentation of enum constants to keep the lexer code clean.
Skip comments, newlines and preprocessing lines when updating the parser state for enum constants and for '*' in declarations.
|
1.205 |
| 23-May-2023 |
rillig | indent: split debug output into paragraphs
The paragraphs separate the different processing steps: getting a token from the lexer, processing the token, updating the parser state, sending a finished line to the output.
|
1.204 |
| 23-May-2023 |
rillig | indent: fix spacing in declarations in for loops
|
1.203 |
| 22-May-2023 |
rillig | indent: adjust indentation in lexer
No binary change.
|
1.202 |
| 20-May-2023 |
rillig | indent: extract the output state from the parser state
The parser state depends on the preprocessing lines, the output state shouldn't.
|
1.201 |
| 20-May-2023 |
rillig | indent: clean up lexing of word tokens
No functional change.
|
1.200 |
| 20-May-2023 |
rillig | indent: separate detection of function definitions from lexing '*'
No functional change.
|
1.199 |
| 18-May-2023 |
rillig | indent: manually wrap overly long lines
No functional change.
|
1.198 |
| 18-May-2023 |
rillig | indent: switch to standard code style
Taken from share/misc/indent.pro.
Indent does not wrap code to fit into the line width, it only does so for comments. The 'INDENT OFF' sections and too long lines will be addressed in a follow-up commit.
No functional change.
|
1.197 |
| 16-May-2023 |
rillig | indent: directly access the input buffer
No functional change.
|
1.196 |
| 16-May-2023 |
rillig | indent: allow comments in column 1 to be formatted
|
1.195 |
| 16-May-2023 |
rillig | indent: remove support for form feed characters inside a line
Form feeds are occasionally used to split code into pages, and this use is still supported. Having a form feed in the middle of a line is exotic.
|
1.194 |
| 16-May-2023 |
rillig | indent: fix handling of INDENT OFF/ON comments
Previously, the 'INDENT OFF' comments were interpreted when the newline token from the line above the comment was processed, which was earlier than could be reasonably expected.
The 'INDENT ON' comments were interpreted equally early, which led to the situation that the 'INDENT OFF' comments were preserved literally but the 'INDENT ON' comments weren't.
|
1.193 |
| 16-May-2023 |
rillig | indent: move parsing of 'INDENT OFF/ON' comments to the lexer
No functional change.
|
1.192 |
| 15-May-2023 |
rillig | indent: clean up detection of whether parentheses form a cast
No functional change.
|
1.191 |
| 15-May-2023 |
rillig | indent: improve type guessing, fix formatting of declarations
|
1.190 |
| 15-May-2023 |
rillig | indent: remove backslash line continuation outside preprocessing
The indenter did not handle these backslashes well, interpreting them as unary operators, and they are an edge case anyway. Line continuations in string literals and character constants are kept.
|
1.189 |
| 15-May-2023 |
rillig | indent: indent multi-line conditions
No functional change.
|
1.188 |
| 15-May-2023 |
rillig | indent: let indent format its own code
With manual corrections, as indent does not properly indent multi-line '?:' expressions nor multi-line controlling expressions.
|
1.187 |
| 15-May-2023 |
rillig | indent: clean up memory allocation
No functional change.
|
1.186 |
| 15-May-2023 |
rillig | indent: move debugging code to separate file
No functional change.
|
1.185 |
| 15-May-2023 |
rillig | indent: clean up memory and buffer management
Remove the need to explicitly initialize the buffers. To avoid subtracting null pointers or comparing them using '<', migrate the buffers from the (start, end) form to the (start, len) form. This form also avoids inconsistencies in whether 'buf.e == buf.s' or 'buf.s == buf.e' is used.
Make buffer.st const, to avoid accidental modification of the buffer's content.
Replace '*buf.e++ = ch' with buf_add_char, to avoid having to keep track how much unwritten space is left in the buffer. Remove all safety margins, that is, no more unchecked access to buf.st[-1] or appending using '*buf.e++'.
Fix line number counting in lex_word for words that contain line breaks.
No functional change.
|
1.184 |
| 14-May-2023 |
rillig | indent: only null-terminate the buffers if necessary
The only case where a buffer is used as a C-style string is when looking up a keyword.
No functional change.
|
1.183 |
| 14-May-2023 |
rillig | indent: reduce code for scanning tokens
The input line is guaranteed to end with '\n', so there's no need to carry another pointer around.
No functional change.
|
1.182 |
| 14-May-2023 |
rillig | indent: remove foreign RCS IDs
|
1.181 |
| 14-May-2023 |
rillig | indent: miscellaneous cleanups
|
1.180 |
| 14-May-2023 |
rillig | indent: reduce binary size
No functional change.
|
1.179 |
| 13-May-2023 |
rillig | indent: fix lexing of numbers that are spread over multiple lines
|
1.178 |
| 13-May-2023 |
rillig | indent: rename struct fields for buffers
No binary change except for assertion line numbers.
|
1.177 |
| 13-May-2023 |
rillig | indent: move debugging code to separate file
No functional change.
|
1.176 |
| 12-May-2023 |
rillig | indent: condense code for handling spaced expressions
No functional change outside debug mode.
|
1.175 |
| 11-May-2023 |
rillig | indent: move parser state variables to the parser_state struct
Include the variables in the debug output.
|
1.174 |
| 11-May-2023 |
rillig | indent: move force_nl into the parser state
This way, it is included in the debug output.
No functional change.
|
1.173 |
| 11-May-2023 |
rillig | indent: remove buggy code for swapping tokens
It is not the job of an indenter to swap tokens, even if it's only about placing comments elsewhere. The code that swapped the tokens was complicated, buggy and impossible to understand.
In -br (brace right) mode, indent no longer moves a '{' from the beginning of a line to the end of the previous line, as that was handled by the token swapping code as well. This change is unintended, but it will be easier to re-add that now that the code is simpler.
|
1.172 |
| 13-Feb-2022 |
rillig | indent: rename parser_state.p_l_follow and paren_level
The previous variable names were misleading.
Paren_level is not the current level of parentheses but the one from the beginning of the current output line. For better accuracy, rename it to line_start_paren_level.
P_l_follow is not the level of parentheses that will be active at some point in the future, as the previous name suggested. Instead, it is the level of parentheses right now. For better accuracy, rename it to nparen. This nicely matches its main usage, which is as index to the parser_state.paren array.
No binary change.
|
1.171 |
| 13-Feb-2022 |
rillig | indent: replace bitmasking code with struct
The struct directly represents the properties of a pair of parentheses, without forcing the human reader to decode any bitset. This makes it easier to find the remaining bugs in the heuristic for determining the kind of parentheses.
No functional change outside debug mode.
|
1.170 |
| 13-Feb-2022 |
rillig | indent: change parser_state.cast_mask to 0-based indexing
Having 1-based indexing was completely unexpected, and it didn't match the 0-based indexing of parser_state.paren_indents.
No functional change.
|
1.169 |
| 12-Feb-2022 |
rillig | indent: fix indentation of enum constants in typedef (since 2019-04-04)
The solution is not elegant since it adds a small state machine inside the parser state, but at least these states only depend on the sequence of token types and not on any other part of the parser state.
Reported in PR#55453.
|
1.168 |
| 12-Feb-2022 |
rillig | indent: extend debug logging for the parser state
The member names in struct parser_state are not trustworthy, for example in_decl does not correspond to the intuitive definition of "inside a declaration". To cope with this uncertainty, output the full state of the parser state to the debug log, not only the changes. This helps to track the inner state for small differences in the input, such as between 'typedef enum { TA, TB } TT' and 'enum { EA, EB } ET'.
This hopefully helps in fixing PR#55453.
No functional change outside debug mode.
|
1.167 |
| 28-Nov-2021 |
rillig | indent: treat L"string" as a single token
There is never whitespace between the 'L' and the string literal or the character constant. There might be a backslash-newline between them, but that case was not handled before either.
No functional change.
|
1.166 |
| 27-Nov-2021 |
rillig | indent: illustrate probably_looking_at_definition with examples
No functional change.
|
1.165 |
| 27-Nov-2021 |
rillig | indent: fix out of bounds memory access (since 2021-11-25)
|
1.164 |
| 25-Nov-2021 |
rillig | indent: rename ps.in_function_parameters to match reality
This flag is only set while parsing the parameters of a function definition, but not for a function declaration. See buffer_add in the test fmt_decl.
No functional change.
|
1.163 |
| 25-Nov-2021 |
rillig | indent: improve heuristic for spaces around '*' in declarations
|
1.162 |
| 25-Nov-2021 |
rillig | indent: eliminate 3 negations in tokenizer
No functional change.
|
1.161 |
| 25-Nov-2021 |
rillig | indent: extract lex_asterisk_unary into separate function
No functional change.
|
1.160 |
| 25-Nov-2021 |
rillig | indent: condense code for building tokens from characters
No functional change.
|
1.159 |
| 25-Nov-2021 |
rillig | indent: in lexi, assign lsym and next_unary in consistent order
No functional change.
|
1.158 |
| 25-Nov-2021 |
rillig | indent: fix heuristic for declaration/definition to post-1990 reality
|
1.157 |
| 25-Nov-2021 |
rillig | indent: fix space after function name for option '-pcs'
|
1.156 |
| 25-Nov-2021 |
rillig | indent: fix spacing for unknown type names in declarations
|
1.155 |
| 25-Nov-2021 |
rillig | indent: extract probably_looking_at_definition to separate function
This heuristic guesses wrong in many cases and will some cleanups.
No functional change.
|
1.154 |
| 25-Nov-2021 |
rillig | indent: merge duplicate code for parsing 'struct s *'
No functional change.
|
1.153 |
| 25-Nov-2021 |
rillig | indent: fix formatting of a few declarations involving unknown types
|
1.152 |
| 25-Nov-2021 |
rillig | indent: rename ps.in_stmt to in_stmt_or_decl
The previous name didn't match reality.
No functional change.
|
1.151 |
| 25-Nov-2021 |
rillig | indent: rename ps.ind_stmt to in_stmt_cont
This makes a comment redundant.
No functional change.
|
1.150 |
| 20-Nov-2021 |
rillig | indent: clean up lint annotation and tests
|
1.149 |
| 20-Nov-2021 |
rillig | indent: fix tokenizing of word-like tokens (since 2019-04-04)
After a backslash-newline, the first character of the next line is only part of the identifier if it is an identifier character.
indent-2000.10.11.14.46.04 | int var \ | +name = 4; indent-2012.11.20.03.02.57
indent-2014.09.04.04.06.07 | int var \ | +name = 4; indent-2019.02.03.03.19.29
indent-2019.04.04.15.27.35 | int var+name = 4; indent-2021.11.19.20.23.17
indent | int var + name = 4;
|
1.148 |
| 19-Nov-2021 |
rillig | indent: reduce casts to unsigned char for character classification
No functional change.
|
1.147 |
| 19-Nov-2021 |
rillig | indent: replace ps.procname with ps.is_function_definition
Omly the first character of ps.procname was ever read, and it was only compared to '\0'. Using a bool for this means simpler code, less memory and fewer wasted CPU cycles due to the removed strncpy.
No functional change.
|
1.146 |
| 19-Nov-2021 |
rillig | indent: fix formatting of function definitions (since 2019-04-04)
In the definition of a function with a pointer return type, the formatting depended on the name of the function. Function names matching [A-Za-z+] were formatted correctly, those containing [$0-9_] weren't.
|
1.145 |
| 19-Nov-2021 |
rillig | indent: merge duplicate code into is_identifier_part
No functional change.
|
1.144 |
| 19-Nov-2021 |
rillig | indent: fix lost function name (since 2019-04-04)
When indent searched for an identifier followed by a '(', to see whether the identifier is a function name, it didn't care that the input buffer could be resized due to a long line, which had made the pointer 'tp' invalid. Fix this by stopping the search at the end of the line. A better approach would be to have an unlimited lookahead buffer for situations like these. The code that deals with character input has already been extracted to io.c, so it's possible to implement that now.
While here, fix another access to undefined memory, after the loop.
There is still the issue of overwriting procname[0] with a blank, which results in inconsistent formatting depending on the function name, probably another case of accessing undefined memory, although the results have been reproducible, but that may have been pure luck.
The formatted code looks clearly broken, but that's still better than losing a token and destroying the whole file.
|
1.143 |
| 19-Nov-2021 |
rillig | indent: use character input API from the tokenizer
No functional change.
|
1.142 |
| 19-Nov-2021 |
rillig | indent: move character input handling from lexi.c to io.c
No functional change.
|
1.141 |
| 19-Nov-2021 |
rillig | indent: replace direct access to the input buffer
This is a preparation for abstracting away all the low-level details of handling the input. The goal is to fix the current bugs regarding line number counting, out of bounds memory access, and generally unreadable code.
No functional change.
|
1.140 |
| 19-Nov-2021 |
rillig | indent: group variables for input handling
No functional change.
|
1.139 |
| 18-Nov-2021 |
rillig | indent: prevent use-after-free bug
Triggered by the following artificial program:
---- snip ---- int * f ( void) { } ---- snap ----
|
1.138 |
| 07-Nov-2021 |
rillig | indent: various cleanups
Make several comments more precise.
Rename process_end_of_file to process_eof to match the token name.
Change the order of assignments in analyze_comment to keep the com_ind computations closer together.
In copy_comment_wrap, use pointer difference instead of pointer addition to stay away from undefined behavior.
No functional change.
|
1.137 |
| 07-Nov-2021 |
rillig | indent: rename ps.decl_nest to decl_level
This better matches the comment.
No functional change.
|
1.136 |
| 07-Nov-2021 |
rillig | indent: move ps.p_l_follow closer to lsym_type_outside_parentheses
This makes it easier to see the relation between these two.
No functional change.
|
1.135 |
| 07-Nov-2021 |
rillig | indent: rename type_at_paren_level_0 to type_outside_parentheses
For symmetry with type_in_parentheses.
No functional change.
|
1.134 |
| 07-Nov-2021 |
rillig | indent: distinguish between typename in parentheses and other words
This gets rid of two members of parser_state. No functional change for well-formed programs. The sequence of '++int' or '--size_t' may be formatted differently than before, but no program is expected to contain that sequence.
Rename lsym_ident to lsym_word since 'ident' was too specific. This token type is used for constants and string literals as well. Strictly speaking, a string literal is not a word, but at least it's better than before.
|
1.133 |
| 07-Nov-2021 |
rillig | indent: rename 'inbuf' functions to 'inp'
The variable 'inp' used to be named 'inbuf'. Make the function names correspond to the variable name again.
No functional change.
|
1.132 |
| 05-Nov-2021 |
rillig | indent: consistently use token.e[-1] for the last added character
No functional change.
|
1.131 |
| 05-Nov-2021 |
rillig | indent: add debug output for remaining members of parser_status
|
1.130 |
| 05-Nov-2021 |
rillig | indent: rename ps.curr_newline to next_col_1
For symmetry with ps.curr_col_1.
No functional change.
|
1.129 |
| 01-Nov-2021 |
rillig | indent: fix missing blank after 'return' (since 2021-10-31)
In indent.c 1.200 from 2021-10-31, the subtypes of identifier tokens were removed since they were redundant. An unintended side effect was that a parenthesized expression after 'return' was no longer separated by a blank.
Before that change, 'return' was tokenized as an lsym_ident with subtype kw_other, and want_space_before_lparen handled this case in the last line. After the change, 'return' was treated as an ordinary identifier, and unless the option '-pcs' (blank after function call) was given, the blank was removed.
The other keywords that had kw_other are not affected since they do not expect a '(' afterwards. These keywords are 'break', 'continue', 'goto', 'inline' and 'restrict'.
Curiously, there was not a single test case that covered 'return(expr)'.
While here, remove the trailing ',' from the enum lexer_symbol, which is not allowed in standard C, it is a GNU extension. Lint doesn't complain about this since the default LINTFLAGS include '-g' for GCC mode.
|
1.128 |
| 31-Oct-2021 |
rillig | indent: clean up
Initialize buffers in reading order, make comments more expressive, rename add_typename to register_typename, remove unused macro.
No functional change.
|
1.127 |
| 31-Oct-2021 |
rillig | indent: remove redundant keyword.is_type
It is still confusing that not all type keywords end up as lsym_type. Those that occur inside parentheses end up as identifiers instead. To see whether an identifier is a typename, query ps.curr_is_type and ps.prev_is_type.
No functional change.
|
1.126 |
| 31-Oct-2021 |
rillig | indent: replace kw_tag with lsym_tag
This leaves only one special type of token, which is lsym_ident, which in some cases represents a type name and in other cases an identifier, constant or string literal.
No functional change.
|
1.125 |
| 31-Oct-2021 |
rillig | indent: replace simple cases of keyword_kind with lexer_symbol
The remaining keyword kinds 'tag' and 'type' require a bit more thought, so do them in a separate step.
No functional change.
|
1.124 |
| 31-Oct-2021 |
rillig | indent: rename lsym_type to better reflect reality
Type names that occur in parentheses are parsed as lsym_ident having the subtype kw_type instead.
No functional change.
|
1.123 |
| 31-Oct-2021 |
rillig | indent: remove support for pre-1978 variable initialization
|
1.122 |
| 31-Oct-2021 |
rillig | indent: in debug log, print token subtype in same line
The keyword 'void' is parsed as lsym_type in some cases and lsym_ident in others. Its corresponding keyword is always kw_type though. Put the subtype into the same line as the other token information.
|
1.121 |
| 31-Oct-2021 |
rillig | indent: add separate lexer symbol for offsetof
No functional change.
|
1.120 |
| 31-Oct-2021 |
rillig | indent: add separate lexer symbol for sizeof
The plan is to get rid of the type keyword_kind, which largely overlaps with lexer_symbol.
No functional change.
|
1.119 |
| 31-Oct-2021 |
rillig | indent: clean up definition of keywords
Rename kw_struct_or_union_or_enum to the shorter kw_tag.
Merge kw_jump with kw_inline_or_restrict since they are handled in the same way.
No functional change.
|
1.118 |
| 31-Oct-2021 |
rillig | indent: condense lexi_alnum
No functional change.
|
1.117 |
| 30-Oct-2021 |
rillig | indent: rename prev_newline and prev_col_1 to curr
These two flags describe the token that is currently processed.
In process_binary_op, curr_newline can never be true since newline is not a binary operator, so remove that condition.
No functional change.
|
1.116 |
| 30-Oct-2021 |
rillig | indent: in debug output, list the new token first
|
1.115 |
| 30-Oct-2021 |
rillig | indent: clean up lexical analyzer
Use traditional type for small unsigned numbers instead of uint8_t; the required header was not included.
Remove assertion for debug mode; lint takes care of ensuring that the enum constants match the length of the names array.
Constify a name array.
Move the comparison function for bsearch closer to its caller.
No functional change.
|
1.114 |
| 29-Oct-2021 |
rillig | indent: remove redundant comments, remove punctuation from debug log
The comment about 'null stmt' between braces probably meant 'no statements between braces'.
The comments at psym_switch_expr only repeated what the code says or had been outdated 29 years ago already since opt.case_indent does not have to be 'one level down'.
In the debug log, the quotes around the symbol names are not necessary after a ':'. The parse stack also does not need this much punctuation.
Reducing a do-while loop to nothing instead of a statement saves a few CPU cycles. It works because after each lbrace, a stmt is pushed to the parser stack. This stmt can only ever be reduced to a stmt_list but never be removed.
|
1.113 |
| 29-Oct-2021 |
rillig | indent: in debug mode, log only differences for most ps members
|
1.112 |
| 29-Oct-2021 |
rillig | indent: add detailed debug logging for the parser state
|
1.111 |
| 29-Oct-2021 |
rillig | indent: merge isblank and is_hspace into ch_isblank
No functional change.
|
1.110 |
| 29-Oct-2021 |
rillig | indent: use prev/curr/next to refer to the current token
The word 'last' just didn't match with 'next'.
No functional change.
|
1.109 |
| 29-Oct-2021 |
rillig | indent: keep p_l_follow nonnegative, use consistent comparison
No functional change.
|
1.108 |
| 29-Oct-2021 |
rillig | indent: spell 'parentheses' properly in messages and comments
|
1.107 |
| 28-Oct-2021 |
rillig | indent: remove unused local variable in lexi
Since the previous commit, lexi is always called with the same argument, so remove that parameter.
The previous commit broke the debug logging by not printing "transient state" anymore. Replace this with "rolled back parser state" at the caller's site.
No functional change.
|
1.106 |
| 28-Oct-2021 |
rillig | indent: reduce negations in search_stmt_lookahead
No functional change.
|
1.105 |
| 26-Oct-2021 |
rillig | indent: make ps.keyword easier to understand
Previously, ps.keyword did not have any documentation and was not straight-forward. In some cases it was reset to kw_0, in others it was set to an interesting value. The idea behind it was to remember the kind of word of the previous token, to decide whether to have a space between sizeof or offsetof and a following '('.
No functional change.
|
1.104 |
| 26-Oct-2021 |
rillig | indent: fix debug logging
The parser state is not always 'ps', so the debug logging must use the correct state as well.
|
1.103 |
| 26-Oct-2021 |
rillig | indent: run indent on its own source code
With manual corrections afterwards, to compensate for the remaining bugs in indent.
Without the type definitions in .indent.pro, the opening braces of the functions kw_name and lexi_alnum would not be at the beginning of the line.
|
1.102 |
| 26-Oct-2021 |
rillig | indent: merge duplicate code in lexi_alnum
|
1.101 |
| 25-Oct-2021 |
rillig | indent: improve debug logging
Output the various details in chronological order.
|
1.100 |
| 25-Oct-2021 |
rillig | indent: split type token_type into 3 separate types
Previously, token_type was used for 3 different purposes:
1. symbol types from the lexer 2. symbol types on the parser stack 3. kind of control statement for 'if (expr)' and similar statements
Splitting the 41 constants into separate types makes it immediately clear that the parser stack never handles comments, preprocessing lines, newlines, form feeds, the inner structure of expressions.
Previously, the constant switch_expr was especially confusing since it was used for 3 different purposes: when returned from lexi, it represented the keyword 'switch', in the parser stack it represented 'switch (expr)', and it was used for a statement head as well.
The only overlap between the lexer symbols and the parser symbols are '{' and '}', and the keywords 'do' and 'else'. To increase confusion, the constants of the previous token_type were in apparently random order and before 2021, they had cryptic, highly abbreviated names.
No functional change.
|
1.99 |
| 24-Oct-2021 |
rillig | indent: rename form_feed to tt_lex_form_feed
No functional change.
|
1.98 |
| 24-Oct-2021 |
rillig | indent: split kw_for_or_if_or_while into separate constants
No functional change.
|
1.97 |
| 24-Oct-2021 |
rillig | indent: split kw_do_or_else into separate constants
It was unnecessarily confusing to have the token types keyword_do_else, keyword_do and keyword_else at the same time, without any hint in what they differed.
Some of the token types seem to be used by the lexer while others are used in the parse stack. Maybe all token types can be partitioned into these groups, which would suggest to use two different types for them. And if not, it's still clearer to have this distinction in the names of the constants.
No functional change.
|
1.96 |
| 24-Oct-2021 |
rillig | indent: define lexi_end as function instead of macro
|
1.95 |
| 24-Oct-2021 |
rillig | indent: run indent on its own source code
With manual corrections afterwards. Indent still does not get extra_expr_indent correctly, it also indents global variables after tagged declarations too deep.
No functional change.
|
1.94 |
| 24-Oct-2021 |
rillig | indent: rename nitems to array_length
|
1.93 |
| 24-Oct-2021 |
rillig | indent: sort includes
|
1.92 |
| 20-Oct-2021 |
rillig | indent: rename ps.last_u_d to match its comment
No functional change.
|
1.91 |
| 11-Oct-2021 |
rillig | indent: use separate variables for lexi_alnum and lexi
These two uses of the variable are independent of each other.
No functional change.
|
1.90 |
| 11-Oct-2021 |
rillig | indent: clean up comments in lexi and lexi_alnum
No functional change.
|
1.89 |
| 11-Oct-2021 |
rillig | indent: extract lexi_alnum from lexi
No functional change.
|
1.88 |
| 09-Oct-2021 |
rillig | indent: fix lint warning about bsearch discarding 'const'
lexi.c(433): warning: call to 'bsearch' effectively discards 'const' from argument [346]
|
1.87 |
| 08-Oct-2021 |
rillig | indent: merge duplicate code in lexer
No functional change.
|
1.86 |
| 08-Oct-2021 |
rillig | indent: rename in_or_st to init_or_struct
This makes a few comments redundant.
No functional change.
|
1.85 |
| 08-Oct-2021 |
rillig | indent: remove 'global' from the list of keywords
Since 1978, 'global' has not been a keyword in C. Moreover, it was declared as a type while its name would rather suggest a storage class.
Removing the keyword fixes the formatting of variables named 'global'.
|
1.84 |
| 08-Oct-2021 |
rillig | indent: clean up typename handling
Unexport typenames list.
Replace standard binary search with custom binary search that returns the inserting position.
In is_typename, take advantage of the buffer type instead of using the standard C recipe for str_ends_with.
No functional change.
|
1.83 |
| 08-Oct-2021 |
rillig | indent: enhance comments for lex_number state machine
No functional change.
|
1.82 |
| 08-Oct-2021 |
rillig | indent: improve local variable names
No functional change.
|
1.81 |
| 08-Oct-2021 |
rillig | indent: rename fill_buffer to inbuf_read_line
No functional change.
|
1.80 |
| 08-Oct-2021 |
rillig | indent: constify detection of function names
No functional change.
|
1.79 |
| 08-Oct-2021 |
rillig | indent: rename tokens lparen and rparen to be more precise
No functional change.
|
1.78 |
| 07-Oct-2021 |
rillig | indent: group variables for the input buffer
The input buffer follows the same concept as the intermediate buffers for label, code, comment and token, so use the same type for it.
No functional change.
|
1.77 |
| 07-Oct-2021 |
rillig | indent: clean up code, remove outdated wrong comments
No functional change.
|
1.76 |
| 07-Oct-2021 |
rillig | indent: use braces around multi-line statements
No functional change.
|
1.75 |
| 07-Oct-2021 |
rillig | indent: let the code breathe a bit by inserting empty lines
No functional change.
|
1.74 |
| 07-Oct-2021 |
rillig | indent: fix wrong or outdated comments
No functional change.
|
1.73 |
| 07-Oct-2021 |
rillig | indent: raise WARNS from the default 5 up to 6
|
1.72 |
| 05-Oct-2021 |
rillig | indent: use buffer type in debug_print_buf
That function had been created before 'struct buffer' was invented, therefore it used two pointers as parameters. Remove this redundancy.
No functional change.
|
1.71 |
| 05-Oct-2021 |
rillig | indent: run indent on lexi.c, with manual corrections
The variables 'keywords' and 'typenames' were indented using 8 spaces, even though -di0 was in effect, which should result in a single space, and -ut was in effect, which should result in a single tab instead of 8 spaces.
The option -eei does not work as advertised, the controlling expressions are only indented by the normal amount, which easily leads to confusion as to whether the code belongs to the condition or the following statement.
|
1.70 |
| 05-Oct-2021 |
rillig | indent: untangle complicated condition in probably_typedef
No functional change.
|
1.69 |
| 05-Oct-2021 |
rillig | indent: use proper escape sequence for form feed
This escape sequence has been available since at least 1978.
|
1.68 |
| 05-Oct-2021 |
rillig | indent: merge duplicate code into is_hspace
No functional change.
|
1.67 |
| 05-Oct-2021 |
rillig | indent: clean up code for appending to buffers
Use *e++ for appending and e[-1] for testing the previously appended character, like in other places in the code.
No functional change.
|
1.66 |
| 05-Oct-2021 |
rillig | indent: merge duplicate code for reading from input buffer
No functional change.
|
1.65 |
| 03-Oct-2021 |
rillig | indent: fix lint warning about signed '>>'
Lint couldn't infer that indent's list of type names will practically never contain more that 2 billion entries and that the result of '>>' would be the same in all cases.
|
1.64 |
| 27-Sep-2021 |
rillig | indent: use binary instead of linear search when adding types
No functional change.
|
1.63 |
| 27-Sep-2021 |
rillig | indent: extract is_typename from lexi
No functional change.
|
1.62 |
| 27-Sep-2021 |
rillig | indent: rename rwcode to keyword_kind, various cleanup
No idea what the 'rw' in 'rwcode' meant, it had been imported that way 28 years ago. Since rwcode specifies the kind of a keyword, the prefix 'kw_' makes sense.
No functional change.
|
1.61 |
| 26-Sep-2021 |
rillig | indent: unexport global variables
The variable match_state was write-only and was thus removed.
No functional change.
|
1.60 |
| 26-Sep-2021 |
rillig | indent: unexport keyword table, clean up
No functional change.
|
1.59 |
| 26-Sep-2021 |
rillig | indent: let indent format its own code -- in supervised mode
After running indent on the code, I manually selected each change that now looks better than before. The remaining changes are left for later. All in all, indent did a pretty good job, except for syntactic additions from after 1990, but that was to be expected. Examples for such additions are GCC's __attribute__ and C99 designated initializers.
Indent has only few knobs to tune the indentation. The knob for the continuation indentation applies to function declarations as well as to expressions. The knob for indentation of local variable declarations applies to struct members as well, even if these are members of a top-level struct.
Several code comments crossed the right margin in column 78. Several other code comments were correctly broken though. The cause for this difference was not obvious.
No functional change.
|
1.58 |
| 25-Sep-2021 |
rillig | indent: merge duplicate code for token buffers
No functional change.
|
1.57 |
| 25-Sep-2021 |
rillig | indent: extract probably_typedef into separate function
This condition is complicated enough that it warrants being split into several clauses, maybe even with an explanation.
No functional change.
|
1.56 |
| 25-Sep-2021 |
rillig | indent: reduce code and data size for lexing of numbers
Instead of having a table of strings (121 pointers + 121 data relocations), reduce that table to the actual character data and use a secondary table for looking up the correct row in the main table.
No functional change.
|
1.55 |
| 25-Sep-2021 |
rillig | indent: convert remaining ibool to bool
No functional change intended.
|
1.54 |
| 25-Sep-2021 |
rillig | indent: prepare for lint's strict bool mode
Before C99, C had no boolean type. Instead, indent used int for that, just like many other programs. Even with C99, bool and int can be used interchangeably in many situations, such as querying '!i' or '!ptr' or 'cond == 0'.
Since January 2021, lint provides the strict bool mode, which makes bool a non-arithmetic type that is incompatible with any other type. Having clearly separate types helps in understanding the code.
To migrate indent to strict bool mode, the first step is to apply all changes that keep the resulting binary the same. Since sizeof(bool) is 1 and sizeof(int) is 4, the type ibool serves as an intermediate type. For now it is defined to int, later it will become bool.
The current code compiles cleanly in C99 and C11 mode, as well as in lint's strict bool mode. There are a few tricky places:
In args.c in 'struct pro', there are two types of options: boolean and integer. Boolean options point to a bool variable, integer options point to an int variable. To keep the current structure of the code, the pointer has been changed to 'void *'. To ensure type safety, the definition of the options is done via preprocessor magic, which in C11 mode ensures the correct pointer types. (Add CFLAGS+=-std=gnu11 at the very bottom of the Makefile.)
In indent.c in process_preprocessing, a boolean variable is post-incremented. That variable is only assigned to another variable, and that variable is only used in a boolean context. To provoke a different behavior between the '++' and the '= true', the source code to be indented would need 1 << 32 preprocessing directives, which is unlikely to happen in practice.
In io.c in dump_line, the variables ps.in_stmt and ps.in_decl only ever get the values 0 and 1. For these values, the expressions 'a & ~b' and 'a && !b' are equivalent, in all versions of C. The compiler may generate different code for them, though.
In io.c in parse_indent_comment, the assignment to inhibit_formatting takes place in integer context. If the compiler is smart enough to detect the possible values of on_off, it may generate the same code before and after the change, but that is rather unlikely.
The second step of the migration will be to replace ibool with bool, step by step, just in case there are any hidden gotchas in the code, such as sizeof or pointer casts.
No change to the resulting binary.
|
1.53 |
| 25-Sep-2021 |
rillig | indent: remove ifdef for lint
NetBSD lint does not need them anymore, FreeBSD does not have lint.
|
1.52 |
| 25-Sep-2021 |
rillig | indent: make lex_char_or_string simpler
The previous code was so tricky that every second line needed a comment that explains what's going on. Replace the complicated code with the usual straight-forward string-copying patterns.
No functional change.
|
1.51 |
| 25-Sep-2021 |
rillig | indent: add nonnull memory allocation functions
The only functional change is a single error message.
|
1.50 |
| 25-Sep-2021 |
rillig | indent: group global variables for token buffer
No functional change.
|
1.49 |
| 25-Sep-2021 |
rillig | indent: inline macro 'token'
No functional change.
|
1.48 |
| 25-Sep-2021 |
rillig | indent: group global variables for code buffer
No functional change.
|
1.47 |
| 25-Sep-2021 |
rillig | indent: rename variables of type token_type
The previous variable name 'code' conflicts with the buffer of the same name.
No functional change.
|
1.46 |
| 24-Sep-2021 |
rillig | indent: group global variables for label buffer into struct
No functional change.
|
1.45 |
| 24-Sep-2021 |
rillig | indent: group global variables for the comment buffer
No functional change.
|
1.44 |
| 24-Sep-2021 |
rillig | indent: fix space-tab in indentation
|
1.43 |
| 26-Aug-2021 |
rillig | indent: extract lex_number, lex_word, lex_char_or_string
No functional change.
|
1.42 |
| 25-Aug-2021 |
rillig | indent: fix lint warnings about type conversions on ilp32
No functional change.
|
1.41 |
| 14-Mar-2021 |
rillig | indent: fix lint warnings
No functional change.
|
1.40 |
| 13-Mar-2021 |
rillig | indent: remove redundant parentheses
No functional change.
|
1.39 |
| 13-Mar-2021 |
rillig | indent: add debug logging for actually writing to the output file
Together with the results of the tokenizer and the 4 buffers for token, label, code and comment, the debug log now provides a good high-level view on how the indentation happens and where to look for the many remaining bugs.
|
1.38 |
| 12-Mar-2021 |
rillig | indent: use consistent indentation for 'else'
Half of the code used -ce, the other half the opposite -nce.
No functional change.
|
1.37 |
| 12-Mar-2021 |
rillig | indent: fix misleading indentation in indent's own code
No functional change.
|
1.36 |
| 12-Mar-2021 |
rillig | indent: move code for tokenizing numbers further up
Having it directly below the table makes it easier understandable.
I also tried to omit this function entirely by moving the code into the initializer itself, but that made the code redundant and furthermore increased the size of the resulting binary, probably because of the new relocation records.
No functional change.
|
1.35 |
| 11-Mar-2021 |
rillig | indent: reduce indentation of check_size functions
No functional change.
|
1.34 |
| 11-Mar-2021 |
rillig | indent: remove redundant cast after allocation functions
No functional change.
|
1.33 |
| 11-Mar-2021 |
rillig | indent: use consistent array indexing
No functional change.
|
1.32 |
| 11-Mar-2021 |
rillig | indent: merge duplicate code for reading from the input buffer
No functional change.
|
1.31 |
| 09-Mar-2021 |
rillig | indent: rename a few more token types
The previous names were either too short or ambiguous.
No functional change.
|
1.30 |
| 09-Mar-2021 |
rillig | indent: make token names more precise
The previous 'casestmt' was wrong since a case label is not a statement at all.
The previous 'swstmt' was overly short, and wrong as well, since it represents only the 'switch (expr)' part, which is not a complete switch statement. Same for 'ifstmt', 'whilestmt', 'forstmt'.
The previous word 'head' was not precise enough since it didn't specify exactly where the head ends and the body starts. Especially for handling the dangling else, this distinction is important.
No functional change.
|
1.29 |
| 09-Mar-2021 |
rillig | indent: rename a few tokens to be more obvious
For casual readers it is not obvious whether the 'sp' meant 'special' or 'space' or something entirely different.
|
1.28 |
| 09-Mar-2021 |
rillig | indent: manually indent comments
It's strange that indent's own code is not formatted by indent itself, which would be a good demonstration of its capabilities.
In its current state, I don't trust indent to get even the tokenization correct, therefore the only safe way is to format the code manually.
|
1.27 |
| 08-Mar-2021 |
rillig | indent: split bsearch comparison function
It may have been a clever trick to use the same memory layout for struct templ and a string pointer, but it's not worth the extra comment and difficulty in understanding the code.
No functional change.
|
1.26 |
| 08-Mar-2021 |
rillig | indent: inline macro for backslash
No functional change.
|
1.25 |
| 08-Mar-2021 |
rillig | indent: convert big macros to functions
Each of these buffers is only modified in a single file. This makes it unnecessary to declare the macros in the global header.
|
1.24 |
| 07-Mar-2021 |
rillig | indent: fix handling of '//' end-of-line comments
|
1.23 |
| 07-Mar-2021 |
rillig | indent: remove redundant parentheses around return value
No functional change.
|
1.22 |
| 07-Mar-2021 |
rillig | lint: move keyword 'continue' over to the other control flow keywords
No functional change since neither rw_jump nor rw_inline_or_restrict is mentioned in any switch statement, and lint didn't find any other suspicious enum operations.
|
1.21 |
| 07-Mar-2021 |
rillig | indent: use named constants for the different types of keywords
This reduces the magic numbers in the code. Most of these had their designated constant name written in a nearby comment anyway.
The one instance where arithmetic was performed on this new enum type (in indent.c) was a bit tricky to understand.
The combination rw_continue_or_inline_or_restrict looks strange, the 'continue' should intuitively belong to the other control flow keywords in rw_break_or_goto_or_return.
No functional change.
|
1.20 |
| 07-Mar-2021 |
rillig | indent: in debug mode, output detailed token information
The main ingredient for understanding how indent works is the tokenizer and the 4 buffers in which the text is collected.
Inspecting this debug log for the test comment-line-end makes it obvious why indent messes up code that contains '//' comments. The cause is that indent interprets '//' as an operator, just like '&&' or '||'. The sequence '/////' is interpreted as a single operator as well, by the way.
Since '//' is interpreted as an ordinary operator, any words following it are plain identifiers, usually several of them in a row, which is a syntax error. Depending on the context, the operator '//' is either a unary operator (no space around) or a binary operator (space around). This explains why the word 'line-end' is expanded to 'line - end'.
No functional change outside of debug mode.
|
1.19 |
| 07-Mar-2021 |
rillig | indent: for the token types, use enum instead of #define
This makes it easier to step through the code in a debugger.
No functional change.
|
1.18 |
| 07-Mar-2021 |
rillig | indent: use all headers in all files
This is a prerequisite for converting the token types to an enum instead of a preprocessor define, since the return type of lexi will become token_type. Having the enum will make debugging easier.
There was a single naming collision, which forced the variable in scan_profile to be renamed. All other token names are used nowhere else.
No change to the resulting binary.
|
1.17 |
| 19-Oct-2019 |
christos | use stdarg, annotate function as __printflike and fix broken formats.
|
1.16 |
| 04-Apr-2019 |
kamil | Upgrade indent(1)
Merge all the changes from the recent FreeBSD HEAD snapshot into our local copy.
FreeBSD actively maintains this program in their sources and their repository contains over 100 commits with changes.
Keep the delta between the FreeBSD and NetBSD versions to absolute minimum, mostly RCS Id and compatiblity fixes.
Major chages in this import:
- Added an option -ldi<N> to control indentation of local variable names. - Added option -P for loading user-provided files as profiles - Added -tsn for setting tabsize - Rename -nsac/-sac ("space after cast") to -ncs/-cs - Added option -fbs Enables (disables) splitting the function declaration and opening brace across two lines. - Respect SIMPLE_BACKUP_SUFFIX environment variable in indent(1) - Group global option variables into an options structure - Use bsearch() for looking up type keywords. - Don't produce unneeded space character in function declarators - Don't unnecessarily add a blank before a comment ends. - Don't ignore newlines after comments that follow braces.
Merge the FreeBSD intend(1) tests with our ATF framework. All tests pass.
Upgrade prepared by Manikishan Ghantasala. Final polishing by myself.
|
1.15 |
| 03-Feb-2019 |
mrg | - add or adjust /* FALLTHROUGH */ where appropriate - add __unreachable() after functions that can return but won't in this case, and thus can't be marked __dead easily
|
1.14 |
| 05-Jun-2016 |
dholland | branches: 1.14.16; Fix CSRG-era typo: typedef, not typdef. Spotted by Piotr Stefaniak.
|
1.13 |
| 12-Apr-2009 |
lukem | Fix WARNS=4 issues (-Wshadow -Wcast-qual -Wsign-compare)
|
1.12 |
| 07-Aug-2003 |
agc | branches: 1.12.42; Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22365, verified by myself.
|
1.11 |
| 26-May-2002 |
wiz | Remove #ifndef'd __STDC__ code. ANSIfy.
|
1.10 |
| 22-Mar-2002 |
kristerw | Recognize all C9x integer constants (ISO/IEC 9899:1999 section 6.4.4.1) Patch taken from FreeBSD.
Fixes PR bin/9219.
|
1.9 |
| 15-Mar-1999 |
kristerw | Made indent recognize the [fF], [uU], [lL], [uU][lL], [lL][lL], and [uU][lL][lL] constant suffixes. (PR bin/6516 by Brian Ginsbach)
|
1.8 |
| 19-Dec-1998 |
christos | char -> unsigned char, braces for gcc-2.8.1
|
1.7 |
| 25-Aug-1998 |
ross | Add { and } to shut up egcs. Reformat the more questionable code.
|
1.6 |
| 19-Oct-1997 |
lukem | WARNSify, fix .Nm usage, deprecate register, use <err.h>, KNFify (with indent!;)
|
1.5 |
| 18-Oct-1997 |
mrg | merge lite-2.
|
1.4 |
| 09-Sep-1997 |
agc | Bump number of elements in specials array from 100 to 1000. Typedefs are added to this array, and it silently ignores any attempts to enter more elements when the array is full.
|
1.3 |
| 09-Jan-1997 |
tls | RCS ID police
|
1.2 |
| 01-Aug-1993 |
mycroft | Add RCS identifiers.
|
1.1 |
| 09-Apr-1993 |
cgd | branches: 1.1.1; added, from net/2 (patch 124).
|
1.1.1.2 |
| 04-Apr-2019 |
kamil | FreeBSD indent r340138
|
1.1.1.1 |
| 06-Jun-1993 |
mrg | 4.4BSD-Lite2
|
1.12.42.1 |
| 13-May-2009 |
jym | Sync with HEAD.
Third (and last) commit. See http://mail-index.netbsd.org/source-changes/2009/05/13/msg221222.html
|
1.14.16.2 |
| 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
1.14.16.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|