Home | History | Annotate | Download | only in indent
History log of /src/usr.bin/indent/indent.c
RevisionDateAuthorComments
 1.396  07-Jan-2025  rillig indent: condense and simplify parsing code
 1.395  04-Jan-2025  rillig indent: fix indentation of adjacent multi-line initializers

The main topic of this change is parse.c:66, which makes the indentation
of statements uniform with the indentation of other parser symbols.

That change had the side effect of messing up the indentation of files
whose first line does not start in column 1, such as in ps_ind_level.c.
To fix this side effect, the initial indentation must be determined
before pushing the placeholder token psym_stmt during initialization.
 1.394  04-Jan-2025  rillig indent: make debug log more uniform
 1.393  04-Jan-2025  rillig indent: make debug output easier readable

The previous format had the values of the parser state on the left side
and the corresponding names on the right side. While it looked nicely
aligned, it was not suitable for focusing on the actual data. Replace
this format with the more common "key: value" format.

Use the names of the enum constants in the debug log, instead of the
previous "nice" names that needed one more level of mental translation
and in some cases contained unbalanced punctuation such as '{'.
 1.392  03-Jan-2025  rillig indent: fix line breaks in else-if sequences

The flag ps.want_newline did not adequately model the conditions under
which a line break should be inserted, thus the redesign.

A welcome side effect is that in statements like 'if (cond);', the
semicolon is now placed on a separate line, thus becoming more visible.
 1.391  12-Dec-2024  rillig indent: add error handling for I/O errors

Suggested by lint2.
 1.390  03-Dec-2023  rillig branches: 1.390.2;
indent: inline input-related macros

No binary change.
 1.389  03-Dec-2023  rillig indent: group input-related variables into a struct

No functional change.
 1.388  03-Dec-2023  rillig indent: use line number of the token start in diagnostics

Previously, the line number of the end of the token was used, which was
confusing in debug mode.
 1.387  27-Jun-2023  rillig indent: fix 'blank line above first statement in function body'
 1.386  26-Jun-2023  rillig indent: implement 'blank line above first statement in function body'
 1.385  26-Jun-2023  rillig indent: in -bad mode, don't add a blank line above a comment or '}'
 1.384  25-Jun-2023  rillig indent: move cast detection from the lexer to the main processor

It is not the job of the lexer to modify the parser state.
 1.383  25-Jun-2023  rillig indent: fix formatting of parenthesized name in function definition
 1.382  23-Jun-2023  rillig indent: properly store parser state in debug mode

The stacks in the parser state are allocated now and need to be copied
individually.

The test whether two paren stacks are equal was broken since 2023-06-14
14:11:28.
 1.381  18-Jun-2023  rillig indent: remove support for backspace in code and comments

The C code in the whole tree does not contain a single literal
backspace.
 1.380  17-Jun-2023  rillig indent: miscellaneous cleanups

No binary change.
 1.379  16-Jun-2023  rillig indent: merge lexer symbols for type in/outside parentheses
 1.378  16-Jun-2023  rillig indent: fix spacing between postfix operator and left parenthesis
 1.377  16-Jun-2023  rillig indent: improve heuristics for cast expressions
 1.376  16-Jun-2023  rillig indent: improve heuristics for cast expressions
 1.375  16-Jun-2023  rillig indent: improve heuristics for casts
 1.374  16-Jun-2023  rillig indent: fix indentation and linebreaks in typedef declarations
 1.373  16-Jun-2023  rillig indent: don't force a blank line between '}' and preprocessing line
 1.372  15-Jun-2023  rillig indent: consolidate handling of statement continuations
 1.371  15-Jun-2023  rillig indent: rename state variable to be more accurate

No binary change.
 1.370  15-Jun-2023  rillig indent: fix indentation of multi-line enum constant initializers
 1.369  15-Jun-2023  rillig indent: miscellaneous cleanups, more tests for edge cases
 1.368  15-Jun-2023  rillig indent: fix alignment of multi-line declarations
 1.367  14-Jun-2023  rillig indent: clean up the code, add a few tests
 1.366  14-Jun-2023  rillig indent: allow more than 128 brace levels
 1.365  14-Jun-2023  rillig indent: clean up array indexing for parser symbols

With 'top' pointing to the actual top element, the array was indexed in
the closed range from 0 to top. All other arrays are indexed by the
usual half-open interval from 0 to len.

No functional change.
 1.364  14-Jun-2023  rillig indent: allow more than 20 nested parentheses or brackets
 1.363  14-Jun-2023  rillig indent: merge duplicate code
 1.362  14-Jun-2023  rillig indent: fix formatting of comment after 'switch (expr)'
 1.361  14-Jun-2023  rillig indent: use correct preprocessing directive in error message
 1.360  14-Jun-2023  rillig indent: allow more than 5 levels of #if/#endif
 1.359  14-Jun-2023  rillig indent: remove another flag from parser state

When processing a comment, the flag ps.next_col_1 was not used for the
next token, but for a line within a comment. As its scope was limited
to a single comment, there is no need to store it any longer than that

No functional change.
 1.358  14-Jun-2023  rillig indent: merge parser symbols for stmt and stmt_list

They were handled in exactly the same way.
 1.357  10-Jun-2023  rillig indent: rename misleading variable

The name started with 'line_start', but the value is not always the
value from the beginning of the line.

No functional change.
 1.356  10-Jun-2023  rillig indent: fix debug output

When the parser state was first printed, there were unintended diff
markers. Treat the previous lexer symbol like the other parts of the
parser state, as omitting it from the diff output is confusing.
 1.355  10-Jun-2023  rillig indent: fix line break between semicolon and brace
 1.354  10-Jun-2023  rillig indent: miscellaneous cleanups
 1.353  10-Jun-2023  rillig indent: in debug mode, null-terminate buffers
 1.352  10-Jun-2023  rillig indent: fix indentation of continuation lines in initializers
 1.351  10-Jun-2023  rillig indent: clean up function and variable names
 1.350  10-Jun-2023  rillig indent: fix token classification in declarations

As a side effect, indent handles _Generic from C11 properly now, at
least in -nlp mode.
 1.349  10-Jun-2023  rillig indent: rename and sort variables in parser state

No functional change.
 1.348  09-Jun-2023  rillig indent: trim trailing blank lines
 1.347  09-Jun-2023  rillig indent: group lexer symbols by topic, sort processing functions

No functional change.
 1.346  09-Jun-2023  rillig indent: support C99 compound literals
 1.345  09-Jun-2023  rillig indent: don't treat function call expressions as cast expressions
 1.344  09-Jun-2023  rillig indent: eliminate unused variable

No functional change.
 1.343  09-Jun-2023  rillig indent: when an indentation is ambiguous, indent one level further

The '-eei' mode now applies whenever the indentation from a multi-line
expression could be confused with a following statement.
 1.342  09-Jun-2023  rillig indent: format its own code
 1.341  08-Jun-2023  rillig indent: remove fragile heuristic for detecting cast expressions

The assumption that in an expression of the form '(a * anything)', the
'*' marks a pointer type was too simple-minded.

For now, fix the obvious cases and leave the others for later. If
needed, they can be worked around using the '-T' option.
 1.340  08-Jun-2023  rillig indent: fix indentation of initializer lists with designators
 1.339  08-Jun-2023  rillig indent: clean up and condense code

No functional change.
 1.338  07-Jun-2023  rillig indent: extract the stack of parser symbols to a separate struct

No functional change.
 1.337  06-Jun-2023  rillig indent: compute indentation of 'case' labels on-demand

One less moving part to keep track of.

No functional change.
 1.336  05-Jun-2023  rillig indent: in 'if (expr)', the parentheses do not form a cast expression

No functional change. When stepping through the code in debug mode, it
was just too confusing that indent would log an 'unknown cast' in this
situation.
 1.335  05-Jun-2023  rillig indent: format own source code
 1.334  05-Jun-2023  rillig indent: don't remove blank line after 'if (expr) {'
 1.333  05-Jun-2023  rillig indent: do not report broken lines, report configuration on stderr
 1.332  05-Jun-2023  rillig indent: fix formatting of 'do' statements
 1.331  05-Jun-2023  rillig indent: make heuristics for '*' pointer types simpler

Previously, a '}' token did not reset the state machine, but it should.
 1.330  05-Jun-2023  rillig indent: fix trailing whitespace after comment
 1.329  05-Jun-2023  rillig indent: rename variables, clean up comments

No binary change.
 1.328  05-Jun-2023  rillig indent: clean up handling of whitespace

No functional change.
 1.327  04-Jun-2023  rillig indent: remove read pointer from buffers that don't need it

The only buffer that needs a read pointer is the current input line in
'inp'.

No functional change.
 1.326  04-Jun-2023  rillig indent: track the kind of '{' on the parser stack
 1.325  04-Jun-2023  rillig indent: ensure that the 'block init level' never goes negative

No functional change.
 1.324  04-Jun-2023  rillig indent: rename struct field, for better symmetry

No binary change outside debug mode.
 1.323  04-Jun-2023  rillig indent: fix formatting of compound expressions, at least partially
 1.322  04-Jun-2023  rillig lint: use separate lexer symbols for 'case' and 'default'

It's not strictly necessary since these tokens behave in the same way,
still, the code is more straight-forward when there are separate tokens.
 1.321  04-Jun-2023  rillig indent: classify 'inline' as a modifier rather than a word
 1.320  04-Jun-2023  rillig indent: use separate lexer symbols for the different kinds of ':'
 1.319  04-Jun-2023  rillig indent: handle the indentation of 'case' in a simpler way
 1.318  04-Jun-2023  rillig indent: separate code for handling parentheses and brackets

Handling parentheses is more complicated than for brackets.
 1.317  03-Jun-2023  rillig indent: fix indentation of adjacent '{'
 1.316  03-Jun-2023  rillig indent: clean up handling of brace indentation

No functional change.
 1.315  02-Jun-2023  rillig indent: force each statement on a new line

Previously, '{} while (cond)' was kept on a single line, even though the
'while' was independent of the '{}'.
 1.314  02-Jun-2023  rillig indent: remove newline between 'switch' and '{'
 1.313  02-Jun-2023  rillig indent: improve heuristics of classifying '*' as pointer or operator
 1.312  02-Jun-2023  rillig indent: clean up

Only print the 'token' buffer in debug mode if it is interesting, group
the blocks in handling of '(' tokens by topic, remove obsolete comment
from test.
 1.311  02-Jun-2023  rillig indent: fix formatting of declarations with preprocessing lines
 1.310  23-May-2023  rillig indent: separate code for handling enums from the lexer

The lexer's responsibility is to generate tokens, it's not supposed to
update the parser state. Centralize the state transitions that control
indentation of enum constants to keep the lexer code clean.

Skip comments, newlines and preprocessing lines when updating the parser
state for enum constants and for '*' in declarations.
 1.309  23-May-2023  rillig indent: fix indentation of struct declarations
 1.308  23-May-2023  rillig indent: split debug output into paragraphs

The paragraphs separate the different processing steps: getting a token
from the lexer, processing the token, updating the parser state, sending
a finished line to the output.
 1.307  23-May-2023  rillig indent: extract processing of a single token to separate function

No functional change.
 1.306  23-May-2023  rillig indent: fix spacing around '*' in declarations
 1.305  23-May-2023  rillig indent: fix spacing in declarations in for loops
 1.304  22-May-2023  rillig indent: fix spacing between block braces
 1.303  22-May-2023  rillig indent: implement suppressing optional blank lines
 1.302  21-May-2023  rillig indent: don't read out-of-bounds memory in preprocessing lines

(Since a few minutes.)

If a line '#if 0' was followed by an unlikely line '#', the second line
was interpreted as '#if' as well.

To detect this bug automatically, a dynamic analysis tool would need to
know that only the memory between lab.mem and lab.mem + lab.len has
defined content. This constraint, in turn, would throw up at the bottom
of copy_comment_wrap, which for a brief moment intentionally violates
this constraint.
 1.301  21-May-2023  rillig indent: don't error out on unrecognized preprocessor directives

This allows indent to be used on the GCC preprocessor output.
 1.300  20-May-2023  rillig indent: remove redundant checks in processing of '}'

No functional change.
 1.299  20-May-2023  rillig indent: extract the output state from the parser state

The parser state depends on the preprocessing lines, the output state
shouldn't.
 1.298  20-May-2023  rillig indent: implement blank line after function body
 1.297  20-May-2023  rillig indent: implement blank lines around conditional compilation
 1.296  20-May-2023  rillig indent: add debug logging for brace indentation

No functional change outside debug mode, as the initialization of
di_stack[0] was redundant.
 1.295  18-May-2023  rillig indent: remove detailed rules for blank before comment
 1.294  18-May-2023  rillig indent: rename a few functions

No functional change.
 1.293  18-May-2023  rillig indent: manually wrap overly long lines

No functional change.
 1.292  18-May-2023  rillig indent: switch to standard code style

Taken from share/misc/indent.pro.

Indent does not wrap code to fit into the line width, it only does so
for comments. The 'INDENT OFF' sections and too long lines will be
addressed in a follow-up commit.

No functional change.
 1.291  18-May-2023  rillig indent: remove unnecessary variable size optimization

Due to the enum that follows in the struct, the short variable was
padded to 4 bytes anyway.

No functional change.
 1.290  16-May-2023  rillig indent: directly access the input buffer

No functional change.
 1.289  16-May-2023  rillig indent: remove support for form feed characters inside a line

Form feeds are occasionally used to split code into pages, and this use
is still supported. Having a form feed in the middle of a line is
exotic.
 1.288  16-May-2023  rillig indent: remove blank between comment and parentheses or brackets

Finally, indent formats its own source code without messing up the
layout.
 1.287  16-May-2023  rillig indent: fix handling of INDENT OFF/ON comments

Previously, the 'INDENT OFF' comments were interpreted when the newline
token from the line above the comment was processed, which was earlier
than could be reasonably expected.

The 'INDENT ON' comments were interpreted equally early, which led to
the situation that the 'INDENT OFF' comments were preserved literally
but the 'INDENT ON' comments weren't.
 1.286  15-May-2023  rillig indent: clean up detection of whether parentheses form a cast

No functional change.
 1.285  15-May-2023  rillig indent: fix cast detection

In process_lparen_or_lbracket, ps.paren[...].maybe_cast was not
initialized, which may have been the cause for seemingly random spacing
around binary operators.

While here, clean up the code by reducing the number of accesses to the
parser state.
 1.284  15-May-2023  rillig indent: fix detection of casts

A word followed by a '(' does not start a cast expression.
 1.283  15-May-2023  rillig indent: fix type cast in function definition
 1.282  15-May-2023  rillig indent: fix duplicate space between comment and binary operator
 1.281  15-May-2023  rillig indent: format its own code, extend some comments

With manual corrections, as there are still some bugs left.

No functional change.
 1.280  15-May-2023  rillig indent: improve type guessing, fix formatting of declarations
 1.279  15-May-2023  rillig indent: fix spacing between function prototype and attributes
 1.278  15-May-2023  rillig indent: fix indentation of struct member names
 1.277  15-May-2023  rillig indent: indent multi-line conditions

No functional change.
 1.276  15-May-2023  rillig indent: fix indentation of statements after controlling expression
 1.275  15-May-2023  rillig indent: fix indentation of expressions in -nlp -eei mode
 1.274  15-May-2023  rillig indent: fix indentation of multi-line '?:' expressions in functions
 1.273  15-May-2023  rillig indent: let indent format its own code

With manual corrections, as indent does not properly indent multi-line
'?:' expressions nor multi-line controlling expressions.
 1.272  15-May-2023  rillig indent: fix spacing in for loop with declaration (since 2022-02-13)
 1.271  15-May-2023  rillig indent: remove redundant include lines
 1.270  15-May-2023  rillig indent: clean up memory allocation

No functional change.
 1.269  15-May-2023  rillig indent: move debugging code to separate file

No functional change.
 1.268  15-May-2023  rillig indent: clean up memory and buffer management

Remove the need to explicitly initialize the buffers. To avoid
subtracting null pointers or comparing them using '<', migrate the
buffers from the (start, end) form to the (start, len) form. This form
also avoids inconsistencies in whether 'buf.e == buf.s' or 'buf.s ==
buf.e' is used.

Make buffer.st const, to avoid accidental modification of the buffer's
content.

Replace '*buf.e++ = ch' with buf_add_char, to avoid having to keep track
how much unwritten space is left in the buffer. Remove all safety
margins, that is, no more unchecked access to buf.st[-1] or appending
using '*buf.e++'.

Fix line number counting in lex_word for words that contain line breaks.

No functional change.
 1.267  14-May-2023  rillig indent: only null-terminate the buffers if necessary

The only case where a buffer is used as a C-style string is when looking
up a keyword.

No functional change.
 1.266  14-May-2023  rillig indent: remove foreign RCS IDs
 1.265  14-May-2023  rillig indent: miscellaneous cleanups
 1.264  13-May-2023  rillig indent: prevent undefined behavior on unbalanced parentheses
 1.263  13-May-2023  rillig indent: do not add a blank at the beginning of a line

Most calls to output_line did already reset the variable. There may be
some untested edge cases in or after comments, but these should be fine
as well.
 1.262  13-May-2023  rillig indent: do not add a space before a comment that starts a line
 1.261  13-May-2023  rillig indent: replace __dead functions with return statements

No functional change.
 1.260  13-May-2023  rillig indent: use enum instead of magic numbers for tracking declarations

No functional change.
 1.259  13-May-2023  rillig indent: rename struct fields for buffers

No binary change except for assertion line numbers.
 1.258  13-May-2023  rillig indent: clean up a condition, add comments

No functional change.
 1.257  13-May-2023  rillig indent: preserve indentation of preprocessor directives
 1.256  12-May-2023  rillig indent: rename placeholder symbol for parser stack

No functional change outside debug mode.
 1.255  12-May-2023  rillig indent: remove code for parsing declarations without semicolon

The statement from the comment that declarations do not need semicolons
is wrong. A possible input that matched this rule is 'void f(void) { int
a }'.
 1.254  12-May-2023  rillig indent: remove statistics

The numbers from the statistics were wrong.
 1.253  12-May-2023  rillig indent: condense code for handling spaced expressions

No functional change outside debug mode.
 1.252  11-May-2023  rillig indent: don't touch comments in preprocessing lines

The indentation of multi-line comments was wrong, and the code for
handling them was too complicated.
 1.251  11-May-2023  rillig indent: remove broken code for handling blank lines

This fixes several bugs where blank lines were erroneously added or
removed, treating these old bugs for new bugs in different places.
These new bugs are expected to be easier to fix, as the old bugs will
not interfere anymore.
 1.250  11-May-2023  rillig indent: move parser state variables to the parser_state struct

Include the variables in the debug output.
 1.249  11-May-2023  rillig indent: eliminate a local variable for else-if handling

No functional change intended.
 1.248  11-May-2023  rillig indent: move force_nl into the parser state

This way, it is included in the debug output.

No functional change.
 1.247  11-May-2023  rillig indent: remove unnecessary assignments to last_else

No functional change intended.
 1.246  11-May-2023  rillig indent: remove buggy code for swapping tokens

It is not the job of an indenter to swap tokens, even if it's only about
placing comments elsewhere. The code that swapped the tokens was
complicated, buggy and impossible to understand.

In -br (brace right) mode, indent no longer moves a '{' from the
beginning of a line to the end of the previous line, as that was handled
by the token swapping code as well. This change is unintended, but it
will be easier to re-add that now that the code is simpler.
 1.245  09-May-2022  rillig indent: clean up control flow, remove Capsicum

No functional change.
 1.244  23-Apr-2022  rillig indent: group global variables related to output control

No functional change.
 1.243  23-Apr-2022  rillig indent: remove Capsicum support

NetBSD doesn't have Capsicum.
 1.242  13-Feb-2022  rillig indent: rename parser_state.p_l_follow and paren_level

The previous variable names were misleading.

Paren_level is not the current level of parentheses but the one from the
beginning of the current output line. For better accuracy, rename it to
line_start_paren_level.

P_l_follow is not the level of parentheses that will be active at some
point in the future, as the previous name suggested. Instead, it is the
level of parentheses right now. For better accuracy, rename it to
nparen. This nicely matches its main usage, which is as index to the
parser_state.paren array.

No binary change.
 1.241  13-Feb-2022  rillig indent: replace bitmasking code with struct

The struct directly represents the properties of a pair of parentheses,
without forcing the human reader to decode any bitset. This makes it
easier to find the remaining bugs in the heuristic for determining the
kind of parentheses.

No functional change outside debug mode.
 1.240  13-Feb-2022  rillig indent: change parser_state.cast_mask to 0-based indexing

Having 1-based indexing was completely unexpected, and it didn't match
the 0-based indexing of parser_state.paren_indents.

No functional change.
 1.239  28-Nov-2021  rillig indent: treat L"string" as a single token

There is never whitespace between the 'L' and the string literal or the
character constant. There might be a backslash-newline between them, but
that case was not handled before either.

No functional change.
 1.238  28-Nov-2021  rillig indent: clean up and document input handling

The transformation of moving comments from after an 'if (expr)' after
the following brace has a large implementation cost (about 300 lines of
code) and makes input handling quite complicated. Document the overall
idea to save future readers some time.

No functional change.
 1.237  27-Nov-2021  rillig indent: accept a few formatting suggestions from indent

The remaining issues are still that the conditions look ambiguous even
with -eei, and that __attribute__ is broken into a separate line.

No functional change.
 1.236  27-Nov-2021  rillig indent: rename dump functions to output

No functional change.
 1.235  27-Nov-2021  rillig indent: inline switch_buffer

The function name was not accurate all the time. Now that
inp_from_comment is a separate function, it doesn't make sense anymore
to offload the 3 simple statements to a separate function.

No functional change.
 1.234  26-Nov-2021  rillig indent: add buf_add_range for adding characters to a buffer

No functional change.
 1.233  26-Nov-2021  rillig indent: move ind_add from io.c to indent.c

It's a general-purpose function that is not directly related to input or
output.
 1.232  25-Nov-2021  rillig indent: rename ps.in_function_parameters to match reality

This flag is only set while parsing the parameters of a function
definition, but not for a function declaration. See buffer_add in the
test fmt_decl.

No functional change.
 1.231  25-Nov-2021  rillig indent: rename ps.in_stmt to in_stmt_or_decl

The previous name didn't match reality.

No functional change.
 1.230  25-Nov-2021  rillig indent: rename ps.ind_stmt to in_stmt_cont

This makes a comment redundant.

No functional change.
 1.229  25-Nov-2021  rillig indent: clean up style

No functional change.
 1.228  19-Nov-2021  rillig indent: reduce casts to unsigned char for character classification

No functional change.
 1.227  19-Nov-2021  rillig indent: fix included headers
 1.226  19-Nov-2021  rillig indent: replace ps.procname with ps.is_function_definition

Omly the first character of ps.procname was ever read, and it was only
compared to '\0'. Using a bool for this means simpler code, less
memory and fewer wasted CPU cycles due to the removed strncpy.

No functional change.
 1.225  19-Nov-2021  rillig indent: remove all references to inbuf from indent.c

No functional change.
 1.224  19-Nov-2021  rillig indent: move character input handling from indent.c to io.c

No functional change.
 1.223  19-Nov-2021  rillig indent: move character input from indent.c to io.c

No functional change.
 1.222  19-Nov-2021  rillig indent: replace direct access to the input buffer

This is a preparation for abstracting away all the low-level details of
handling the input. The goal is to fix the current bugs regarding line
number counting, out of bounds memory access, and generally unreadable
code.

No functional change.
 1.221  19-Nov-2021  rillig indent: add debug logging for input buffer handling
 1.220  19-Nov-2021  rillig indent: rename input buffer variables

From reading the names 'save_com' and 'sc_end', it was not obvious
enough that these two variables are the limits of the same buffer, the
names were just too unrelated.

No functional change.
 1.219  19-Nov-2021  rillig indent: group variables for input handling

No functional change.
 1.218  07-Nov-2021  rillig indent: fix handling of C99 comments after 'if (expr)'
 1.217  07-Nov-2021  rillig indent: demonstrate disappearing form feed
 1.216  07-Nov-2021  rillig indent: various cleanups

Make several comments more precise.

Rename process_end_of_file to process_eof to match the token name.

Change the order of assignments in analyze_comment to keep the com_ind
computations closer together.

In copy_comment_wrap, use pointer difference instead of pointer addition
to stay away from undefined behavior.

No functional change.
 1.215  07-Nov-2021  rillig indent: rename ps.decl_nest to decl_level

This better matches the comment.

No functional change.
 1.214  07-Nov-2021  rillig indent: reduce negations in process_else, clean up comments

No functional change.
 1.213  07-Nov-2021  rillig indent: only access buffer data in the range [buf.s, buf.e)

No functional change.
 1.212  07-Nov-2021  rillig indent: rename type_at_paren_level_0 to type_outside_parentheses

For symmetry with type_in_parentheses.

No functional change.
 1.211  07-Nov-2021  rillig indent: distinguish between typename in parentheses and other words

This gets rid of two members of parser_state. No functional change for
well-formed programs. The sequence of '++int' or '--size_t' may be
formatted differently than before, but no program is expected to contain
that sequence.

Rename lsym_ident to lsym_word since 'ident' was too specific. This
token type is used for constants and string literals as well. Strictly
speaking, a string literal is not a word, but at least it's better than
before.
 1.210  07-Nov-2021  rillig indent: rename 'inbuf' functions to 'inp'

The variable 'inp' used to be named 'inbuf'. Make the function names
correspond to the variable name again.

No functional change.
 1.209  05-Nov-2021  rillig indent: rename process_keyword_do to process_do, same for 'else'

Before the symbols from the tokenizer had the prefix 'lsym', the symbols
could not be simply called 'else' and 'do'. The functions for processing
the tokens followed that naming scheme.

When the prefix 'lsym' was introduced, the word 'keyword' was no longer
needed, neither in the constants nor in the function names.

No functional change.
 1.208  05-Nov-2021  rillig indent: rename ps.curr_newline to next_col_1

For symmetry with ps.curr_col_1.

No functional change.
 1.207  04-Nov-2021  rillig indent: split process_comment_in_code into separate functions

No functional change.
 1.206  04-Nov-2021  rillig indent: fix joining of adjacent unary '+' operators
 1.205  03-Nov-2021  rillig indent: inline indentation_after, shorten function name to ind_add

There were only few calls to indentation_after, so inlining it spares
the need to look at yet another function definition. Another effect is
that code.s and code.e appear in the code as a pair now, instead of a
single code.s, making the scope of the function call obvious.

In ind_add, there is no need to check for '\0' anymore since none of the
buffers can ever contain a null character, these are filtered out by
inbuf_read_line.

No functional change.
 1.204  01-Nov-2021  rillig indent: fix missing blank after 'return' (since 2021-10-31)

In indent.c 1.200 from 2021-10-31, the subtypes of identifier tokens
were removed since they were redundant. An unintended side effect was
that a parenthesized expression after 'return' was no longer separated
by a blank.

Before that change, 'return' was tokenized as an lsym_ident with subtype
kw_other, and want_space_before_lparen handled this case in the last
line. After the change, 'return' was treated as an ordinary identifier,
and unless the option '-pcs' (blank after function call) was given, the
blank was removed.

The other keywords that had kw_other are not affected since they do not
expect a '(' afterwards. These keywords are 'break', 'continue', 'goto',
'inline' and 'restrict'.

Curiously, there was not a single test case that covered 'return(expr)'.

While here, remove the trailing ',' from the enum lexer_symbol, which is
not allowed in standard C, it is a GNU extension. Lint doesn't complain
about this since the default LINTFLAGS include '-g' for GCC mode.
 1.203  31-Oct-2021  rillig indent: clean up

Initialize buffers in reading order, make comments more expressive,
rename add_typename to register_typename, remove unused macro.

No functional change.
 1.202  31-Oct-2021  rillig indent: for '-pcs', add blank between function and '('

Before indent-2021.09.30.21.48.12, the blank had always been added, even
in '-npcs' mode. Since then, the blank had never been added.

Now, add the blank in '-pcs' mode and omit it in '-npcs' mode.
 1.201  31-Oct-2021  rillig indent: replace kw_tag with lsym_tag

This leaves only one special type of token, which is lsym_ident, which
in some cases represents a type name and in other cases an identifier,
constant or string literal.

No functional change.
 1.200  31-Oct-2021  rillig indent: replace simple cases of keyword_kind with lexer_symbol

The remaining keyword kinds 'tag' and 'type' require a bit more thought,
so do them in a separate step.

No functional change.
 1.199  31-Oct-2021  rillig indent: rename lsym_type to better reflect reality

Type names that occur in parentheses are parsed as lsym_ident having the
subtype kw_type instead.

No functional change.
 1.198  31-Oct-2021  rillig indent: add separate lexer symbol for offsetof

No functional change.
 1.197  31-Oct-2021  rillig indent: add separate lexer symbol for sizeof

The plan is to get rid of the type keyword_kind, which largely overlaps
with lexer_symbol.

No functional change.
 1.196  30-Oct-2021  rillig indent: push down variable comment_buffered

No functional change.
 1.195  30-Oct-2021  rillig indent: rename prev_newline and prev_col_1 to curr

These two flags describe the token that is currently processed.

In process_binary_op, curr_newline can never be true since newline is
not a binary operator, so remove that condition.

No functional change.
 1.194  30-Oct-2021  rillig indent: reorder assignments in switch_buffer

No functional change.
 1.193  30-Oct-2021  rillig indent: move buffer functions further up

No functional change.
 1.192  30-Oct-2021  rillig indent: group variables by topic

No functional change.
 1.191  30-Oct-2021  rillig indent: prevent buffer overflow in search_stmt_comment

printf '{ if (%010000d) /*comment*/ ; }' '0' | indent
 1.190  30-Oct-2021  rillig indent: add debug logging for save_com

This will help in finding the proper fix for the assertion failure in
search_stmt_comment.

Add an assertion in search_stmt_lbrace to prevent the previous,
incomplete fix from being applied again.
 1.189  30-Oct-2021  rillig indent: prevent buffer overflows in 'if (expr) ... stmt'
 1.188  30-Oct-2021  rillig indent: revert previous fix of assertion failure

The strange code with the out of bounds memory access is needed to
transform 'if (expr) /* comment */ {' to 'if (expr) { /* comment */',
that is, to move the comment to the right.

Add a test that prevents "repairing" this code again.
 1.187  30-Oct-2021  rillig indent: fix assertion failure in search_stmt_comment

I have no idea why the code was written in such a convoluted way before.
By removing all the code that didn't make sense, everything just works
as expected, and the existing tests all pass, especially those in
token_comment.c that mention search_stmt_comment.
 1.186  30-Oct-2021  rillig indent: replace tabsize with hardcoded 8 in process_comma

On 2018-07-25, FreeBSD added the option '-ts' to make the tabulator size
configurable, replacing several constants 7, 8, 9 with tabsize. The 8 in
the expression 'max_col - 8' was not related to the tabulator size but
instead represents the typical width of a variable name. Subtracting a
tab from the right margin doesn't make sense since the right margin need
not be aligned on a tabstop.

See the test fmt_decl.c, where the declaration 'struct s0 a,b;' is split
into several lines because the estimate for the variable name following
the comma is too high. There would have been plenty of space to the
right to keep the whole declaration in a single line.

No functional change.
 1.185  30-Oct-2021  rillig indent: don't risk a buffer overflow in code_add_decl_indent

The buffers have a safety margin of 5 characters, so the bounds check is
not strictly necessary. It makes the code more uniform though.

No functional change.
 1.184  30-Oct-2021  rillig indent: clean up code_add_decl_indent

In layout computations, it is helpful for human readers to list the
summands in logical order. In this case, the expression 'code_len +
base_ind' was rather confusing, so replace it with 'base_ind +
code_len'. This makes the code straight-forward enough that it doesn't
need any comments anymore.

No functional change.
 1.183  30-Oct-2021  rillig indent: remove confusing modulo from code_add_decl_indent

The only effects of the modulo operation was to make indent slower and
to confuse human readers.

During the computation of the indentation, the main focus is on the
difference between the current indentation, as computed from the base
indentation and the current code, and the target indentation. All these
computations take opt.tabsize into account. When looking only at the
difference, whether or not a multiple of opt.tabsize is added does not
matter.

No functional change.
 1.182  30-Oct-2021  rillig indent: inline bloated call to 'parse' during initialization

No functional change.
 1.181  30-Oct-2021  rillig indent: condense code for parsing command line arguments

Previously, the cascade of 'if' statements suggested that there were 6
different cases to be handled when in reality there are only 3: no
arguments, 1 argument, 2 arguments. Let the code express this directly.

No functional change.
 1.180  30-Oct-2021  rillig indent: extract main_load_profiles from main_parse_command_line

No functional change.
 1.179  29-Oct-2021  rillig indent: remove redundant comments, remove punctuation from debug log

The comment about 'null stmt' between braces probably meant 'no
statements between braces'.

The comments at psym_switch_expr only repeated what the code says or had
been outdated 29 years ago already since opt.case_indent does not have
to be 'one level down'.

In the debug log, the quotes around the symbol names are not necessary
after a ':'. The parse stack also does not need this much punctuation.

Reducing a do-while loop to nothing instead of a statement saves a few
CPU cycles. It works because after each lbrace, a stmt is pushed to the
parser stack. This stmt can only ever be reduced to a stmt_list but
never be removed.
 1.178  29-Oct-2021  rillig indent: fix missing blank before binary operator
 1.177  29-Oct-2021  rillig indent: merge isblank and is_hspace into ch_isblank

No functional change.
 1.176  29-Oct-2021  rillig indent: replace segmentation fault with assertion
 1.175  29-Oct-2021  rillig indent: initialize 'ps' via code

This saves 3 kB of binary size since the parser state is rather large
and only very few members are initialized to non-zero values.

No functional change.
 1.174  29-Oct-2021  rillig indent: clean up main_init_globals

No functional change.
 1.173  29-Oct-2021  rillig indent: fix undefined behavior in buffer handling

Adding an arbitrary integer to a pointer may result in an out of bounds
pointer, so replace the addition with a pointer subtraction.

In the buffer handling functions, handle 'buf' and 'l' before 's' and
'e', since they are pairs.

In inbuf_read_line, use 's' instead of 'buf' to make the code easier to
understand for human readers.

No functional change.
 1.172  29-Oct-2021  rillig indent: mark obviously broken code
 1.171  29-Oct-2021  rillig indent: use prev/curr/next to refer to the current token

The word 'last' just didn't match with 'next'.

No functional change.
 1.170  29-Oct-2021  rillig indent: rename ps.dumped_decl_indent and indent_declaration

The word 'dump' in 'ps.dumped_decl_indent' was too close to dump_line,
which led to confusion since the variable controls whether the
indentation has been added to the code buffer, which happens way before
actually dumping the current line to the output file.

The function name 'indent_declaration' was too unspecific, it did not
reveal where the indentation of the declaration actually happened.

No functional change.
 1.169  29-Oct-2021  rillig indent: keep p_l_follow nonnegative, use consistent comparison

No functional change.
 1.168  29-Oct-2021  rillig indent: spell 'parentheses' properly in messages and comments
 1.167  28-Oct-2021  rillig indent: clean up indentation, comments, reduce

No functional change.
 1.166  28-Oct-2021  rillig indent: remove unused local variable in lexi

Since the previous commit, lexi is always called with the same argument,
so remove that parameter.

The previous commit broke the debug logging by not printing "transient
state" anymore. Replace this with "rolled back parser state" at the
caller's site.

No functional change.
 1.165  28-Oct-2021  rillig indent: reduce negations in search_stmt_lookahead

No functional change.
 1.164  28-Oct-2021  rillig indent: clean up comments and function names

Having accurate names for the lexer symbols and the parser symbols makes
most of the comments redundant. Remove these.

Rename process_decl to process_type, to match the name of the
corresponding lexer symbol. In this phase, it's just a single type
token, not a whole declaration.

No functional change.
 1.163  28-Oct-2021  rillig indent: make error messages for option parsing more precise
 1.162  26-Oct-2021  rillig indent: clean up process_comment

There is no undefined behavior since the compared characters are always
from the basic execution character set. All other cases are covered by
the condition above for now_len.

Fix debug logging for non-ASCII characters, previously a character was
output as \xffffffc3.
 1.161  26-Oct-2021  rillig indent: make ps.keyword easier to understand

Previously, ps.keyword did not have any documentation and was not
straight-forward. In some cases it was reset to kw_0, in others it was
set to an interesting value. The idea behind it was to remember the kind
of word of the previous token, to decide whether to have a space between
sizeof or offsetof and a following '('.

No functional change.
 1.160  26-Oct-2021  rillig indent: run indent on its own source code

With manual corrections afterwards, to compensate for the remaining bugs
in indent.

Without the type definitions in .indent.pro, the opening braces of the
functions kw_name and lexi_alnum would not be at the beginning of the
line.
 1.159  25-Oct-2021  rillig indent: improve debug logging

Output the various details in chronological order.
 1.158  25-Oct-2021  rillig indent: rename search_brace to search_stmt

No functional change.
 1.157  25-Oct-2021  rillig indent: rename local variable sp_sw to spaced_expr

The 'sp' probably meant 'space-enclosed'; no idea what 'sw' was meant to
mean. Maybe 'switch', but that would have been rather ambiguous when
talking about control flow statements.

No functional change.
 1.156  25-Oct-2021  rillig indent: split type token_type into 3 separate types

Previously, token_type was used for 3 different purposes:

1. symbol types from the lexer
2. symbol types on the parser stack
3. kind of control statement for 'if (expr)' and similar statements

Splitting the 41 constants into separate types makes it immediately
clear that the parser stack never handles comments, preprocessing lines,
newlines, form feeds, the inner structure of expressions.

Previously, the constant switch_expr was especially confusing since it
was used for 3 different purposes: when returned from lexi, it
represented the keyword 'switch', in the parser stack it represented
'switch (expr)', and it was used for a statement head as well.

The only overlap between the lexer symbols and the parser symbols are
'{' and '}', and the keywords 'do' and 'else'. To increase confusion,
the constants of the previous token_type were in apparently random
order and before 2021, they had cryptic, highly abbreviated names.

No functional change.
 1.155  24-Oct-2021  rillig indent: rename form_feed to tt_lex_form_feed

No functional change.
 1.154  24-Oct-2021  rillig indent: split kw_for_or_if_or_while into separate constants

No functional change.
 1.153  24-Oct-2021  rillig indent: split kw_do_or_else into separate constants

It was unnecessarily confusing to have the token types keyword_do_else,
keyword_do and keyword_else at the same time, without any hint in what
they differed.

Some of the token types seem to be used by the lexer while others are
used in the parse stack. Maybe all token types can be partitioned into
these groups, which would suggest to use two different types for them.
And if not, it's still clearer to have this distinction in the names of
the constants.

No functional change.
 1.152  24-Oct-2021  rillig indent: rename seen_quest to quest_level

The new name aligns with other similar variables like ind_level,
case_ind_level and ifdef_level. The old name 'seen' is mainly used for
bool variables.

No functional change.
 1.151  24-Oct-2021  rillig indent: fix indentation of ad-hoc tagged variables

Seen among others in usr.bin/indent/lexi.c, variable 'keywords'.
 1.150  24-Oct-2021  rillig indent: initialize variables in main_loop in declaration

No functional change.
 1.149  24-Oct-2021  rillig indent: run indent on its own source code

With manual corrections afterwards. Indent still does not get
extra_expr_indent correctly, it also indents global variables after
tagged declarations too deep.

No functional change.
 1.148  24-Oct-2021  rillig indent: clean up format of warnings and errors

Previously, warnings and errors had the form of C block comments. Before
NetBSD io.c 1.20 from 2019-10-19, this format made sense because the
diagnostics could end up in the same output stream as the formatted
output.

Since NetBSD io.c 1.20 from 2019-10-19, all diagnostics are redirected
to stderr. This change was not mentioned in the commit message back
then, it makes sense nevertheless. Since stdout and stderr now are
properly separated, there is no need anymore to keep the weird format
for warnings and errors. Switch to the standard 'error: file:line'
format.

Move the function 'diag' to indent.c to have access to the name of the
current input file.
 1.147  24-Oct-2021  rillig indent: fix line number counting at beginning of function body
 1.146  24-Oct-2021  rillig indent: rename nitems to array_length
 1.145  24-Oct-2021  rillig indent: replace global variable use_ff with function parameter
 1.144  20-Oct-2021  rillig indent: rename ps.last_u_d to match its comment

No functional change.
 1.143  20-Oct-2021  rillig indent: rename parser stack variables

No functional change.
 1.142  20-Oct-2021  rillig indent: rename blankline_requested variables

The words 'prefix' and 'postfix' sounded too much like horizontal
concepts, like in operators. The actual purpose of these variables is to
add blank lines before and after the current line, so use the same
wording as in the command line options.

No functional change.
 1.141  20-Oct-2021  rillig indent: invert condition in process_newline

It's hard to follow a condition that combines many negated terms with
'||'. Group the conditions by their origin.

The condition '!opt.break_after_comma && break_comma' still sounds like
a contradition, more investigations to follow.

No functional change.
 1.140  20-Oct-2021  rillig indent: rename next_blank_lines to blank_lines_to_output

The previous name was already an improvement over the name before that
(n_real_blanklines), but didn't express the intended purpose clearly
enough, so try another name.

No functional change.
 1.139  19-Oct-2021  rillig indent: if a file ends with indent off, don't add space-newline
 1.138  17-Oct-2021  rillig indent: parse int command line options strictly

On i386 and other platforms where LONG_MAX == INT_MAX, the test
t_errors/option_tabsize_very_large failed since the behavior on integer
overflow differs between ILP32 and LP64 platforms. Noticed by gson@.

Avoid this unintended difference by adding reasonable limits for each of
the integer options and by replacing atoi with strtol.
 1.137  09-Oct-2021  rillig indent: condense code for calculating indentations

No functional change.
 1.136  09-Oct-2021  rillig indent: extract common code for advancing a single tab

No functional change.
 1.135  08-Oct-2021  rillig indent: clean up comments, parentheses, debug messages, boolean operator

No functional change.
 1.134  08-Oct-2021  rillig indent: rename in_or_st to init_or_struct

This makes a few comments redundant.

No functional change.
 1.133  08-Oct-2021  rillig indent: rename fill_buffer to inbuf_read_line

No functional change.
 1.132  08-Oct-2021  rillig indent: clean up process_decl, replace unnecessary strlen

No functional change.
 1.131  08-Oct-2021  rillig indent: remove unnecessary forward declarations

No functional change.
 1.130  08-Oct-2021  rillig indent: reduce negations in main_loop

No functional change.
 1.129  08-Oct-2021  rillig indent: fix parsing of preprocessor lines with comments and strings
 1.128  08-Oct-2021  rillig indent: run indent on indent.h

The formatting looks mostly OK.

Some struct members had excessively long names, leaving no space for
their corresponding comments. Renamed some of them using well-known
abbreviations.

The formatting for debug_vis_range is messed up, no idea why. It is
clearly a function declaration, not a function definition, so there is
no need to place the function name in column 1.

No functional change.
 1.127  08-Oct-2021  rillig indent: split process_keyword_do_else into separate functions

No functional change.
 1.126  08-Oct-2021  rillig indent: rename tokens lparen and rparen to be more precise

No functional change.
 1.125  07-Oct-2021  rillig indent: rename bp_save to saved_inp_s, be_save to saved_inp_e

Using the same naming convention makes it easier to relate the
variables.

No functional change.
 1.124  07-Oct-2021  rillig indent: group variables for the input buffer

The input buffer follows the same concept as the intermediate buffers
for label, code, comment and token, so use the same type for it.

No functional change.
 1.123  07-Oct-2021  rillig indent: move definition of bufsize from header to implementation

No functional change.
 1.122  07-Oct-2021  rillig indent: rename opt.btype_2 to brace_same_line

No functional change.
 1.121  07-Oct-2021  rillig indent: clean up code, remove outdated wrong comments

No functional change.
 1.120  07-Oct-2021  rillig indent: use braces around multi-line statements

No functional change.
 1.119  07-Oct-2021  rillig indent: let the code breathe a bit by inserting empty lines

No functional change.
 1.118  07-Oct-2021  rillig indent: clean up comments

No functional change.
 1.117  07-Oct-2021  rillig indent: fix wrong or outdated comments

No functional change.
 1.116  07-Oct-2021  rillig indent: remove redundant comments

No functional change.
 1.115  07-Oct-2021  rillig indent: reduce indentation

No functional change.
 1.114  07-Oct-2021  rillig indent: remove global variable option_source

It is only needed at startup, while parsing the options. The string "?"
was not needed at all.

No functional change.
 1.113  07-Oct-2021  rillig indent: clean up colon handling

No functional change.
 1.112  07-Oct-2021  rillig indent: add high-level API for working with buffers

This makes the code more boring to read, which is actually good. Less
fiddling with memcpy and pointer arithmetics.

Since indent is not a high-performance tool used for bulk operations on
terabytes of source code, there is no need to squeeze out every possible
CPU cycle.

No functional change.
 1.111  07-Oct-2021  rillig indent: rename copy_id to copy_token

No functional change.
 1.110  07-Oct-2021  rillig indent: raise WARNS from the default 5 up to 6
 1.109  07-Oct-2021  rillig indent: prevent division by zero
 1.108  05-Oct-2021  rillig indent: rename n_real_blanklines

The word 'n' was not as helpful as possible, the word 'real' did not
give any clue at all about the variable's purpose.

No functional change.
 1.107  05-Oct-2021  rillig indent: fix off-by-one error for indented first line
 1.106  05-Oct-2021  rillig indent: make off-by-one error in main_prepare_parsing more visible

No functional change.
 1.105  05-Oct-2021  rillig indent: make variable names more expressive

The abbreviation 'dec' looked too much like 'decimal' instead of the
intended 'declaration'.

No functional change.
 1.104  05-Oct-2021  rillig indent: remove variable name prefix 'inout_'

This makes the variable names more readable. The prefix is not actually
needed to understand the code, it is rather distracting.

The compiler and lint will guard against any accidental mismatch between
pointer, integer and bool.

No functional change.
 1.103  05-Oct-2021  rillig indent: fix Clang-Tidy warnings, clean up bakcopy

The comment above and inside bakcopy had been outdated for at least the
last 28 years, the backup file is named "%s.BAK", not ".B%s".

Prevent buffer overflow for very long filenames (sprintf -> snprintf).
 1.102  05-Oct-2021  rillig indent: fix spelling in comments
 1.101  05-Oct-2021  rillig indent: merge duplicate code into is_hspace

No functional change.
 1.100  05-Oct-2021  rillig indent: clean up code for appending to buffers

Use *e++ for appending and e[-1] for testing the previously appended
character, like in other places in the code.

No functional change.
 1.99  05-Oct-2021  rillig indent: merge duplicate code for reading from input buffer

No functional change.
 1.98  03-Oct-2021  rillig indent: rename functions

There was no good reason for using the different verbs 'scan' and 'set'
for two functions that essentially do the same.

No functional change.
 1.97  03-Oct-2021  rillig indent: fix content of profile_name

Previously, profile_name included the leading "-P", which was confusing.
 1.96  30-Sep-2021  rillig indent: remove space between ')' and '(' in declarations
 1.95  30-Sep-2021  rillig indent: untangle want_blank_before_lparen

No functional change.
 1.94  30-Sep-2021  rillig indent: extract want_blank_before_lparen

No functional change.
 1.93  30-Sep-2021  rillig indent: add space between ',' and '[' in C99 initializations
 1.92  27-Sep-2021  rillig indent: let indent format the comments after previous refactoring

Before this refactoring, I had skipped this section of the code from
formatting since the 'default:' branch was enclosed in a block of its
own, and that block would have been indented one more level to the
right. Extracting that code into a separate function got rid of the
extra braces.

No functional change.
 1.91  27-Sep-2021  rillig indent: split search_brace into smaller functions

No functional change.
 1.90  27-Sep-2021  rillig indent: use binary instead of linear search when adding types

No functional change.
 1.89  27-Sep-2021  rillig indent: rename rwcode to keyword_kind, various cleanup

No idea what the 'rw' in 'rwcode' meant, it had been imported that way
28 years ago. Since rwcode specifies the kind of a keyword, the prefix
'kw_' makes sense.

No functional change.
 1.88  26-Sep-2021  rillig indent: unexport global variables

The variable match_state was write-only and was thus removed.

No functional change.
 1.87  26-Sep-2021  rillig indent: negate and rename option.leave_comma

The old name did not mirror the description in the manual page, and it
was the only option that is negated. Inverting it allows the options
table to be compressed.
 1.86  26-Sep-2021  rillig indent: let indent format its own code -- in supervised mode

After running indent on the code, I manually selected each change that
now looks better than before. The remaining changes are left for later.
All in all, indent did a pretty good job, except for syntactic additions
from after 1990, but that was to be expected. Examples for such
additions are GCC's __attribute__ and C99 designated initializers.

Indent has only few knobs to tune the indentation. The knob for the
continuation indentation applies to function declarations as well as to
expressions. The knob for indentation of local variable declarations
applies to struct members as well, even if these are members of a
top-level struct.

Several code comments crossed the right margin in column 78. Several
other code comments were correctly broken though. The cause for this
difference was not obvious.

No functional change.
 1.85  26-Sep-2021  rillig indent: fix missing space between comma and ellipsis

According to lint's C grammar, in standard C an ellipsis only occurs
after a comma. There are GCC extensions that allow an ellipsis as the
only function parameter, as well as in 'case a ... b', but these are
rare.
 1.84  25-Sep-2021  rillig indent: misc cleanup

No functional change.
 1.83  25-Sep-2021  rillig indent: convert found_err to bool

That variable had slipped through the migration since it consequently
used int for the declaration, the definition and all assignments.

No functional change.
 1.82  25-Sep-2021  rillig indent: use strlen instead of own implementation

The two loops looks similar but differ in a crucial detail that makes up
for a '+ 1'.

No functional change.
 1.81  25-Sep-2021  rillig indent: merge duplicate code for token buffers

No functional change.
 1.80  25-Sep-2021  rillig indent: clean up argument handling

No functional change.
 1.79  25-Sep-2021  rillig indent: un-abbreviate a few parser_state members, clean up comments

No functional change.
 1.78  25-Sep-2021  rillig indent: remove dead code for printing comments after empty lines

This code has been commented out for at least 29 years.

No functional change.
 1.77  25-Sep-2021  rillig indent: reduce code and data size for lexing of numbers

Instead of having a table of strings (121 pointers + 121 data
relocations), reduce that table to the actual character data and use a
secondary table for looking up the correct row in the main table.

No functional change.
 1.76  25-Sep-2021  rillig indent: rename option variable to be more expressive

No functional change.
 1.75  25-Sep-2021  rillig indent: convert remaining ibool to bool

No functional change intended.
 1.74  25-Sep-2021  rillig indent: convert parser_state from ibool to bool

indent.c:400:5: error: suggest parentheses around assignment used as
truth value
io.c:271:32: error: ‘~’ on a boolean expression

No functional change intended.
 1.73  25-Sep-2021  rillig indent: prepare for lint's strict bool mode

Before C99, C had no boolean type. Instead, indent used int for that,
just like many other programs. Even with C99, bool and int can be used
interchangeably in many situations, such as querying '!i' or '!ptr' or
'cond == 0'.

Since January 2021, lint provides the strict bool mode, which makes bool
a non-arithmetic type that is incompatible with any other type. Having
clearly separate types helps in understanding the code.

To migrate indent to strict bool mode, the first step is to apply all
changes that keep the resulting binary the same. Since sizeof(bool) is
1 and sizeof(int) is 4, the type ibool serves as an intermediate type.
For now it is defined to int, later it will become bool.

The current code compiles cleanly in C99 and C11 mode, as well as in
lint's strict bool mode. There are a few tricky places:

In args.c in 'struct pro', there are two types of options: boolean and
integer. Boolean options point to a bool variable, integer options
point to an int variable. To keep the current structure of the code,
the pointer has been changed to 'void *'. To ensure type safety, the
definition of the options is done via preprocessor magic, which in C11
mode ensures the correct pointer types. (Add CFLAGS+=-std=gnu11 at the
very bottom of the Makefile.)

In indent.c in process_preprocessing, a boolean variable is
post-incremented. That variable is only assigned to another variable,
and that variable is only used in a boolean context. To provoke a
different behavior between the '++' and the '= true', the source code
to be indented would need 1 << 32 preprocessing directives, which is
unlikely to happen in practice.

In io.c in dump_line, the variables ps.in_stmt and ps.in_decl only ever
get the values 0 and 1. For these values, the expressions 'a & ~b' and
'a && !b' are equivalent, in all versions of C. The compiler may
generate different code for them, though.

In io.c in parse_indent_comment, the assignment to inhibit_formatting
takes place in integer context. If the compiler is smart enough to
detect the possible values of on_off, it may generate the same code
before and after the change, but that is rather unlikely.

The second step of the migration will be to replace ibool with bool,
step by step, just in case there are any hidden gotchas in the code,
such as sizeof or pointer casts.

No change to the resulting binary.
 1.72  25-Sep-2021  rillig indent: merge duplicate code for initializing buffers

No functional change.
 1.71  25-Sep-2021  rillig indent: clean up initialization of options

The default values in 'struct pro' were redundant but all consistent,
even with the commented defaults in main_parse_command_line.

No functional change.
 1.70  25-Sep-2021  rillig indent: remove ifdef for lint

NetBSD lint does not need them anymore, FreeBSD does not have lint.
 1.69  25-Sep-2021  rillig indent: move statistical values into a separate struct

No functional change.
 1.68  25-Sep-2021  rillig indent: add nonnull memory allocation functions

The only functional change is a single error message.
 1.67  25-Sep-2021  rillig indent: group global variables for token buffer

No functional change.
 1.66  25-Sep-2021  rillig indent: inline macro 'token'

No functional change.
 1.65  25-Sep-2021  rillig indent: group global variables for code buffer

No functional change.
 1.64  25-Sep-2021  rillig indent: rename variables of type token_type

The previous variable name 'code' conflicts with the buffer of the same
name.

No functional change.
 1.63  24-Sep-2021  rillig indent: group global variables for label buffer into struct

No functional change.
 1.62  24-Sep-2021  rillig indent: group global variables for the comment buffer

No functional change.
 1.61  25-Aug-2021  rillig indent: fix lint warnings about type conversions on ilp32

No functional change.
 1.60  26-Mar-2021  rillig indent: fix Clang build everywhere but on amd64

No idea why Clang didn't complain about this on amd64, only on all other
platforms.
 1.59  14-Mar-2021  rillig indent: fix lint warnings

No functional change.
 1.58  13-Mar-2021  rillig indent: add debug logging for switching the input buffer

No functional change outside debug mode.
 1.57  13-Mar-2021  rillig indent: distinguish between 'column' and 'indentation'

column == 1 + indentation.

In addition, indentation is a relative distance while column is an
absolute position. Therefore, don't confuse these two concepts, to
prevent off-by-one errors.

No functional change.
 1.56  13-Mar-2021  rillig indent: rename pr_comment to process_comment, clean up documentation

No functional change.
 1.55  13-Mar-2021  rillig indent: fix handling of '/*' in string literal in preprocessing line

Previously, the '/*' in the string literal had been interpreted as the
beginning of a comment, which was wrong. Because of that, the variable
declaration in the following line was still interpreted as part of the
comment. The comment even continued until the end of the file.

Due to indent's forgiving nature, it neither complained nor even
mentioned that anything had gone wrong. The decision of rather
producing wrong output than failing early is a dangerous one.

At least, there should have been an error message that at the end of the
file, the parser was still in a a comment, expecting the closing '*/'.
 1.54  13-Mar-2021  rillig indent: split 'main_loop' into several functions

No functional change.
 1.53  13-Mar-2021  rillig indent: split 'main' into manageable parts

Since several years (maybe even decades) compilers know how to inline
static functions that are only used once. Therefore there is no need to
have overly long functions anymore, especially not 'main', which is only
called a single time and thus does not add any noticeable performance
degradation.

No functional change.
 1.52  13-Mar-2021  rillig indent: remove redundant parentheses

No functional change.
 1.51  13-Mar-2021  rillig indent: fix confusing variable names

The word 'col' should only be used for the 1-based column number. This
name is completely inappropriate for a line length since that provokes
off-by-one errors. The name 'cols' would be acceptable although
confusing since it sounds so similar to 'col'.

Therefore, rename variables that are related to the maximum line length
to 'line_length' since that makes for obvious code and nicely relates to
the description of the option in the manual page.

No functional change.
 1.50  13-Mar-2021  rillig indent: inline calls to count_spaces and count_spaces_until

These two functions operated on column numbers instead of indentation,
which required adjustments of '+ 1' and '- 1'. Their names were
completely wrong since these functions did not count anything, instead
they computed the column.

No functional change.
 1.49  13-Mar-2021  rillig indent: replace compute_code_column with compute_code_indent

The goal is to only ever be concerned about the _indentation_ of a
token, never the _column_ it appears in. Having only one of these
avoids off-by-one errors.

No functional change.
 1.48  13-Mar-2021  rillig indent: add debug logging for actually writing to the output file

Together with the results of the tokenizer and the 4 buffers for token,
label, code and comment, the debug log now provides a good high-level
view on how the indentation happens and where to look for the many
remaining bugs.
 1.47  13-Mar-2021  rillig indent: replace pad_output with output_indent

Calculating the indentation is simpler than calculating the column,
since that saves the constant addition and subtraction of the 1.

No functional change.
 1.46  12-Mar-2021  rillig indent: replace 'target' with 'indent' in function names

The word 'target' was not as specific as possible.

No functional change.
 1.45  12-Mar-2021  rillig indent: use consistent indentation for 'else'

Half of the code used -ce, the other half the opposite -nce.

No functional change.
 1.44  12-Mar-2021  rillig indent: manually fix indentation

No functional change.
 1.43  11-Mar-2021  rillig indent: reduce indentation of check_size functions

No functional change.
 1.42  11-Mar-2021  rillig indent: remove redundant cast after allocation functions

No functional change.
 1.41  09-Mar-2021  rillig indent: extract search_brace from main

No functional change.
 1.40  09-Mar-2021  rillig indent: extract capsicum code out of the main function

No functional change.
 1.39  09-Mar-2021  rillig indent: rename a few more token types

The previous names were either too short or ambiguous.

No functional change.
 1.38  09-Mar-2021  rillig indent: make token names more precise

The previous 'casestmt' was wrong since a case label is not a statement
at all.

The previous 'swstmt' was overly short, and wrong as well, since it
represents only the 'switch (expr)' part, which is not a complete switch
statement. Same for 'ifstmt', 'whilestmt', 'forstmt'.

The previous word 'head' was not precise enough since it didn't specify
exactly where the head ends and the body starts. Especially for
handling the dangling else, this distinction is important.

No functional change.
 1.37  09-Mar-2021  rillig indent: rename a few tokens to be more obvious

For casual readers it is not obvious whether the 'sp' meant 'special' or
'space' or something entirely different.
 1.36  09-Mar-2021  rillig indent: manually indent comments

It's strange that indent's own code is not formatted by indent itself,
which would be a good demonstration of its capabilities.

In its current state, I don't trust indent to get even the tokenization
correct, therefore the only safe way is to format the code manually.
 1.35  08-Mar-2021  rillig indent: inline macro for backslash

No functional change.
 1.34  08-Mar-2021  rillig indent: convert big macros to functions

Each of these buffers is only modified in a single file. This makes it
unnecessary to declare the macros in the global header.
 1.33  08-Mar-2021  rillig indent: fix printing of uninitialized 'token' in debug output
 1.32  07-Mar-2021  rillig indent: sprinkle a few const

No functional change.
 1.31  07-Mar-2021  rillig indent: use named constants for the different types of keywords

This reduces the magic numbers in the code. Most of these had their
designated constant name written in a nearby comment anyway.

The one instance where arithmetic was performed on this new enum type
(in indent.c) was a bit tricky to understand.

The combination rw_continue_or_inline_or_restrict looks strange, the
'continue' should intuitively belong to the other control flow keywords
in rw_break_or_goto_or_return.

No functional change.
 1.30  07-Mar-2021  rillig indent: for the token types, use enum instead of #define

This makes it easier to step through the code in a debugger.

No functional change.
 1.29  07-Mar-2021  rillig indent: use all headers in all files

This is a prerequisite for converting the token types to an enum instead
of a preprocessor define, since the return type of lexi will become
token_type. Having the enum will make debugging easier.

There was a single naming collision, which forced the variable in
scan_profile to be renamed. All other token names are used nowhere
else.

No change to the resulting binary.
 1.28  06-Mar-2021  rillig indent: fix space-tab alignment in indent's own code

These parts are not fixed automatically by indent since they are in box
comments.

No functional change.
 1.27  23-Apr-2020  joerg Avoid common symbol declarations
 1.26  19-Oct-2019  christos use stdarg, annotate function as __printflike and fix broken formats.
 1.25  04-Apr-2019  kamil Upgrade indent(1)

Merge all the changes from the recent FreeBSD HEAD snapshot
into our local copy.

FreeBSD actively maintains this program in their sources and their
repository contains over 100 commits with changes.

Keep the delta between the FreeBSD and NetBSD versions to absolute
minimum, mostly RCS Id and compatiblity fixes.

Major chages in this import:

- Added an option -ldi<N> to control indentation of local variable names.
- Added option -P for loading user-provided files as profiles
- Added -tsn for setting tabsize
- Rename -nsac/-sac ("space after cast") to -ncs/-cs
- Added option -fbs Enables (disables) splitting the function declaration and opening brace across two lines.
- Respect SIMPLE_BACKUP_SUFFIX environment variable in indent(1)
- Group global option variables into an options structure
- Use bsearch() for looking up type keywords.
- Don't produce unneeded space character in function declarators
- Don't unnecessarily add a blank before a comment ends.
- Don't ignore newlines after comments that follow braces.

Merge the FreeBSD intend(1) tests with our ATF framework.
All tests pass.

Upgrade prepared by Manikishan Ghantasala.
Final polishing by myself.
 1.24  03-Feb-2019  mrg - add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily
 1.23  05-Sep-2016  sevan branches: 1.23.14;
Drop main() prototype.
 1.22  25-Feb-2016  ginsbach Fix obvious contraction spelling mistakes by adding missing apostrophes.
 1.21  22-Feb-2016  ginsbach Use warnx(3).
 1.20  22-Feb-2016  ginsbach Use errx(3).
 1.19  04-Sep-2014  mrg port the -ut / -nut options from freebsd. -ut (default) enables tabs
in output, the -nut uses spaces.
 1.18  12-Apr-2009  lukem branches: 1.18.24;
Fix WARNS=4 issues (-Wshadow -Wcast-qual -Wsign-compare)
 1.17  21-Jul-2008  lukem branches: 1.17.6;
Remove the \n and tabs from the __COPYRIGHT() strings.
Tweak to use a consistent format.
 1.16  30-Oct-2004  dsl branches: 1.16.28;
Add (unsigned char) cast to ctype functions
 1.15  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.14  19-Jun-2003  christos PR/21645: Mishka: Localized comments don't work with indent.
 1.13  26-May-2002  wiz Remove #ifndef'd __STDC__ code. ANSIfy.
 1.12  20-Aug-2001  wiz precede, not preceed.
 1.11  16-Jun-2001  kleink Handle a labeled statement at the beginning of a function correctly;
from Nagae Hidetake <nagae@tk.airnet.ne.jp> in PR bin/12781.
 1.10  19-Dec-1998  christos char -> unsigned char, braces for gcc-2.8.1
 1.9  08-Oct-1998  wsanchez Get rid of multiply defined common symbols
 1.8  06-Sep-1998  mellon Support indenting standard input. When indenting standard input, write output to standard output.
 1.7  25-Aug-1998  ross Add { and } to shut up egcs. Reformat the more questionable code.
 1.6  19-Oct-1997  lukem WARNSify, fix .Nm usage, deprecate register, use <err.h>, KNFify (with indent!;)
 1.5  18-Oct-1997  mrg merge lite-2.
 1.4  09-Jan-1997  tls RCS ID police
 1.3  07-May-1996  jtc Include appropriate header files to bring prototypes into scope.
Removed explicit errno declarations.
 1.2  01-Aug-1993  mycroft Add RCS identifiers.
 1.1  09-Apr-1993  cgd branches: 1.1.1;
added, from net/2 (patch 124).
 1.1.1.2  04-Apr-2019  kamil FreeBSD indent r340138
 1.1.1.1  08-Jun-1993  mrg 4.4BSD-Lite2
 1.16.28.1  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.17.6.1  13-May-2009  jym Sync with HEAD.

Third (and last) commit. See http://mail-index.netbsd.org/source-changes/2009/05/13/msg221222.html
 1.18.24.1  21-Sep-2014  snj Pull up following revision(s) (requested by mrg in ticket #110):
usr.bin/indent/io.c: revision 1.15
usr.bin/indent/indent_globs.h: revision 1.10
usr.bin/indent/args.c: revision 1.11
usr.bin/indent/indent.1: revision 1.23
usr.bin/indent/indent.c: revision 1.19
port the -ut / -nut options from freebsd. -ut (default) enables tabs
in output, the -nut uses spaces.
 1.23.14.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.23.14.1  10-Jun-2019  christos Sync with HEAD
 1.390.2.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed