Home | History | Annotate | Download | only in indent
History log of /src/usr.bin/indent/lexi.c
RevisionDateAuthorComments
 1.242  03-Dec-2023  rillig indent: inline input-related macros

No binary change.
 1.241  03-Dec-2023  rillig indent: use line number of the token start in diagnostics

Previously, the line number of the end of the token was used, which was
confusing in debug mode.
 1.240  03-Dec-2023  rillig indent: fix line number counting in function definition

In a function definition that is split on two lines, if the first line
ends with a '*', the following line break didn't include the line
number.
 1.239  26-Jun-2023  rillig indent: improve heuristics for '*' as pointer in for loops
 1.238  26-Jun-2023  rillig indent: improve heuristics for '*' as a pointer type
 1.237  26-Jun-2023  rillig indent: clean up indentation
 1.236  25-Jun-2023  rillig indent: move cast detection from the lexer to the main processor

It is not the job of the lexer to modify the parser state.
 1.235  25-Jun-2023  rillig indent: treat 'complex' and 'imaginary' as type modifiers, not as types
 1.234  25-Jun-2023  rillig indent: fix formatting of parenthesized name in function definition
 1.233  25-Jun-2023  rillig indent: don't use strspn on inp_p, as it is not null-terminated

No functional change.
 1.232  17-Jun-2023  rillig indent: clean up

Extract duplicate code for handling line continuations.

Prevent theoretic undefined behavior in strspn, as inp.s is not
null-terminated.

Remove adding extra space characters when processing comments, as these
are not necessary to force a line of output.

No functional change.
 1.231  17-Jun-2023  rillig indent: miscellaneous cleanups

No binary change.
 1.230  16-Jun-2023  rillig indent: merge lexer symbols for type in/outside parentheses
 1.229  14-Jun-2023  rillig indent: clean up array indexing for parser symbols

With 'top' pointing to the actual top element, the array was indexed in
the closed range from 0 to top. All other arrays are indexed by the
usual half-open interval from 0 to len.

No functional change.
 1.228  14-Jun-2023  rillig indent: allow more than 20 nested parentheses or brackets
 1.227  14-Jun-2023  rillig indent: remove another flag from parser state

When processing a comment, the flag ps.next_col_1 was not used for the
next token, but for a line within a comment. As its scope was limited
to a single comment, there is no need to store it any longer than that

No functional change.
 1.226  14-Jun-2023  rillig indent: remove a redundant flag from the parser state

No functional change.
 1.225  10-Jun-2023  rillig indent: miscellaneous cleanups
 1.224  10-Jun-2023  rillig indent: clean up function names, fix blank lines in debug output
 1.223  10-Jun-2023  rillig indent: in debug mode, null-terminate buffers
 1.222  10-Jun-2023  rillig indent: clean up function and variable names
 1.221  10-Jun-2023  rillig indent: rename and sort variables in parser state

No functional change.
 1.220  09-Jun-2023  rillig indent: clean up lexer

No functional change.
 1.219  09-Jun-2023  rillig indent: improve heuristics for function declaration vs. definition
 1.218  09-Jun-2023  rillig indent: format its own code
 1.217  08-Jun-2023  rillig indent: remove fragile heuristic for detecting cast expressions

The assumption that in an expression of the form '(a * anything)', the
'*' marks a pointer type was too simple-minded.

For now, fix the obvious cases and leave the others for later. If
needed, they can be worked around using the '-T' option.
 1.216  07-Jun-2023  rillig indent: extract the stack of parser symbols to a separate struct

No functional change.
 1.215  06-Jun-2023  rillig indent: sort functions in call order

No functional change.
 1.214  04-Jun-2023  rillig indent: do not parse '&&&&&&&' as a single binary operator
 1.213  04-Jun-2023  rillig indent: fix '*=' to be a binary operator, not a unary one
 1.212  04-Jun-2023  rillig indent: remove read pointer from buffers that don't need it

The only buffer that needs a read pointer is the current input line in
'inp'.

No functional change.
 1.211  04-Jun-2023  rillig indent: rename struct field, for better symmetry

No binary change outside debug mode.
 1.210  04-Jun-2023  rillig lint: use separate lexer symbols for 'case' and 'default'

It's not strictly necessary since these tokens behave in the same way,
still, the code is more straight-forward when there are separate tokens.
 1.209  04-Jun-2023  rillig indent: classify 'inline' as a modifier rather than a word
 1.208  04-Jun-2023  rillig indent: use separate lexer symbols for the different kinds of ':'
 1.207  04-Jun-2023  rillig indent: separate code for handling parentheses and brackets

Handling parentheses is more complicated than for brackets.
 1.206  23-May-2023  rillig indent: separate code for handling enums from the lexer

The lexer's responsibility is to generate tokens, it's not supposed to
update the parser state. Centralize the state transitions that control
indentation of enum constants to keep the lexer code clean.

Skip comments, newlines and preprocessing lines when updating the parser
state for enum constants and for '*' in declarations.
 1.205  23-May-2023  rillig indent: split debug output into paragraphs

The paragraphs separate the different processing steps: getting a token
from the lexer, processing the token, updating the parser state, sending
a finished line to the output.
 1.204  23-May-2023  rillig indent: fix spacing in declarations in for loops
 1.203  22-May-2023  rillig indent: adjust indentation in lexer

No binary change.
 1.202  20-May-2023  rillig indent: extract the output state from the parser state

The parser state depends on the preprocessing lines, the output state
shouldn't.
 1.201  20-May-2023  rillig indent: clean up lexing of word tokens

No functional change.
 1.200  20-May-2023  rillig indent: separate detection of function definitions from lexing '*'

No functional change.
 1.199  18-May-2023  rillig indent: manually wrap overly long lines

No functional change.
 1.198  18-May-2023  rillig indent: switch to standard code style

Taken from share/misc/indent.pro.

Indent does not wrap code to fit into the line width, it only does so
for comments. The 'INDENT OFF' sections and too long lines will be
addressed in a follow-up commit.

No functional change.
 1.197  16-May-2023  rillig indent: directly access the input buffer

No functional change.
 1.196  16-May-2023  rillig indent: allow comments in column 1 to be formatted
 1.195  16-May-2023  rillig indent: remove support for form feed characters inside a line

Form feeds are occasionally used to split code into pages, and this use
is still supported. Having a form feed in the middle of a line is
exotic.
 1.194  16-May-2023  rillig indent: fix handling of INDENT OFF/ON comments

Previously, the 'INDENT OFF' comments were interpreted when the newline
token from the line above the comment was processed, which was earlier
than could be reasonably expected.

The 'INDENT ON' comments were interpreted equally early, which led to
the situation that the 'INDENT OFF' comments were preserved literally
but the 'INDENT ON' comments weren't.
 1.193  16-May-2023  rillig indent: move parsing of 'INDENT OFF/ON' comments to the lexer

No functional change.
 1.192  15-May-2023  rillig indent: clean up detection of whether parentheses form a cast

No functional change.
 1.191  15-May-2023  rillig indent: improve type guessing, fix formatting of declarations
 1.190  15-May-2023  rillig indent: remove backslash line continuation outside preprocessing

The indenter did not handle these backslashes well, interpreting them as
unary operators, and they are an edge case anyway. Line continuations
in string literals and character constants are kept.
 1.189  15-May-2023  rillig indent: indent multi-line conditions

No functional change.
 1.188  15-May-2023  rillig indent: let indent format its own code

With manual corrections, as indent does not properly indent multi-line
'?:' expressions nor multi-line controlling expressions.
 1.187  15-May-2023  rillig indent: clean up memory allocation

No functional change.
 1.186  15-May-2023  rillig indent: move debugging code to separate file

No functional change.
 1.185  15-May-2023  rillig indent: clean up memory and buffer management

Remove the need to explicitly initialize the buffers. To avoid
subtracting null pointers or comparing them using '<', migrate the
buffers from the (start, end) form to the (start, len) form. This form
also avoids inconsistencies in whether 'buf.e == buf.s' or 'buf.s ==
buf.e' is used.

Make buffer.st const, to avoid accidental modification of the buffer's
content.

Replace '*buf.e++ = ch' with buf_add_char, to avoid having to keep track
how much unwritten space is left in the buffer. Remove all safety
margins, that is, no more unchecked access to buf.st[-1] or appending
using '*buf.e++'.

Fix line number counting in lex_word for words that contain line breaks.

No functional change.
 1.184  14-May-2023  rillig indent: only null-terminate the buffers if necessary

The only case where a buffer is used as a C-style string is when looking
up a keyword.

No functional change.
 1.183  14-May-2023  rillig indent: reduce code for scanning tokens

The input line is guaranteed to end with '\n', so there's no need to
carry another pointer around.

No functional change.
 1.182  14-May-2023  rillig indent: remove foreign RCS IDs
 1.181  14-May-2023  rillig indent: miscellaneous cleanups
 1.180  14-May-2023  rillig indent: reduce binary size

No functional change.
 1.179  13-May-2023  rillig indent: fix lexing of numbers that are spread over multiple lines
 1.178  13-May-2023  rillig indent: rename struct fields for buffers

No binary change except for assertion line numbers.
 1.177  13-May-2023  rillig indent: move debugging code to separate file

No functional change.
 1.176  12-May-2023  rillig indent: condense code for handling spaced expressions

No functional change outside debug mode.
 1.175  11-May-2023  rillig indent: move parser state variables to the parser_state struct

Include the variables in the debug output.
 1.174  11-May-2023  rillig indent: move force_nl into the parser state

This way, it is included in the debug output.

No functional change.
 1.173  11-May-2023  rillig indent: remove buggy code for swapping tokens

It is not the job of an indenter to swap tokens, even if it's only about
placing comments elsewhere. The code that swapped the tokens was
complicated, buggy and impossible to understand.

In -br (brace right) mode, indent no longer moves a '{' from the
beginning of a line to the end of the previous line, as that was handled
by the token swapping code as well. This change is unintended, but it
will be easier to re-add that now that the code is simpler.
 1.172  13-Feb-2022  rillig indent: rename parser_state.p_l_follow and paren_level

The previous variable names were misleading.

Paren_level is not the current level of parentheses but the one from the
beginning of the current output line. For better accuracy, rename it to
line_start_paren_level.

P_l_follow is not the level of parentheses that will be active at some
point in the future, as the previous name suggested. Instead, it is the
level of parentheses right now. For better accuracy, rename it to
nparen. This nicely matches its main usage, which is as index to the
parser_state.paren array.

No binary change.
 1.171  13-Feb-2022  rillig indent: replace bitmasking code with struct

The struct directly represents the properties of a pair of parentheses,
without forcing the human reader to decode any bitset. This makes it
easier to find the remaining bugs in the heuristic for determining the
kind of parentheses.

No functional change outside debug mode.
 1.170  13-Feb-2022  rillig indent: change parser_state.cast_mask to 0-based indexing

Having 1-based indexing was completely unexpected, and it didn't match
the 0-based indexing of parser_state.paren_indents.

No functional change.
 1.169  12-Feb-2022  rillig indent: fix indentation of enum constants in typedef (since 2019-04-04)

The solution is not elegant since it adds a small state machine inside
the parser state, but at least these states only depend on the sequence
of token types and not on any other part of the parser state.

Reported in PR#55453.
 1.168  12-Feb-2022  rillig indent: extend debug logging for the parser state

The member names in struct parser_state are not trustworthy, for example
in_decl does not correspond to the intuitive definition of "inside a
declaration". To cope with this uncertainty, output the full state of
the parser state to the debug log, not only the changes. This helps to
track the inner state for small differences in the input, such as
between 'typedef enum { TA, TB } TT' and 'enum { EA, EB } ET'.

This hopefully helps in fixing PR#55453.

No functional change outside debug mode.
 1.167  28-Nov-2021  rillig indent: treat L"string" as a single token

There is never whitespace between the 'L' and the string literal or the
character constant. There might be a backslash-newline between them, but
that case was not handled before either.

No functional change.
 1.166  27-Nov-2021  rillig indent: illustrate probably_looking_at_definition with examples

No functional change.
 1.165  27-Nov-2021  rillig indent: fix out of bounds memory access (since 2021-11-25)
 1.164  25-Nov-2021  rillig indent: rename ps.in_function_parameters to match reality

This flag is only set while parsing the parameters of a function
definition, but not for a function declaration. See buffer_add in the
test fmt_decl.

No functional change.
 1.163  25-Nov-2021  rillig indent: improve heuristic for spaces around '*' in declarations
 1.162  25-Nov-2021  rillig indent: eliminate 3 negations in tokenizer

No functional change.
 1.161  25-Nov-2021  rillig indent: extract lex_asterisk_unary into separate function

No functional change.
 1.160  25-Nov-2021  rillig indent: condense code for building tokens from characters

No functional change.
 1.159  25-Nov-2021  rillig indent: in lexi, assign lsym and next_unary in consistent order

No functional change.
 1.158  25-Nov-2021  rillig indent: fix heuristic for declaration/definition to post-1990 reality
 1.157  25-Nov-2021  rillig indent: fix space after function name for option '-pcs'
 1.156  25-Nov-2021  rillig indent: fix spacing for unknown type names in declarations
 1.155  25-Nov-2021  rillig indent: extract probably_looking_at_definition to separate function

This heuristic guesses wrong in many cases and will some cleanups.

No functional change.
 1.154  25-Nov-2021  rillig indent: merge duplicate code for parsing 'struct s *'

No functional change.
 1.153  25-Nov-2021  rillig indent: fix formatting of a few declarations involving unknown types
 1.152  25-Nov-2021  rillig indent: rename ps.in_stmt to in_stmt_or_decl

The previous name didn't match reality.

No functional change.
 1.151  25-Nov-2021  rillig indent: rename ps.ind_stmt to in_stmt_cont

This makes a comment redundant.

No functional change.
 1.150  20-Nov-2021  rillig indent: clean up lint annotation and tests
 1.149  20-Nov-2021  rillig indent: fix tokenizing of word-like tokens (since 2019-04-04)

After a backslash-newline, the first character of the next line is only
part of the identifier if it is an identifier character.

indent-2000.10.11.14.46.04
| int var \
| +name = 4;
indent-2012.11.20.03.02.57

indent-2014.09.04.04.06.07
| int var \
| +name = 4;
indent-2019.02.03.03.19.29

indent-2019.04.04.15.27.35
| int var+name = 4;
indent-2021.11.19.20.23.17

indent
| int var + name = 4;
 1.148  19-Nov-2021  rillig indent: reduce casts to unsigned char for character classification

No functional change.
 1.147  19-Nov-2021  rillig indent: replace ps.procname with ps.is_function_definition

Omly the first character of ps.procname was ever read, and it was only
compared to '\0'. Using a bool for this means simpler code, less
memory and fewer wasted CPU cycles due to the removed strncpy.

No functional change.
 1.146  19-Nov-2021  rillig indent: fix formatting of function definitions (since 2019-04-04)

In the definition of a function with a pointer return type, the
formatting depended on the name of the function. Function names
matching [A-Za-z+] were formatted correctly, those containing [$0-9_]
weren't.
 1.145  19-Nov-2021  rillig indent: merge duplicate code into is_identifier_part

No functional change.
 1.144  19-Nov-2021  rillig indent: fix lost function name (since 2019-04-04)

When indent searched for an identifier followed by a '(', to see whether
the identifier is a function name, it didn't care that the input buffer
could be resized due to a long line, which had made the pointer 'tp'
invalid. Fix this by stopping the search at the end of the line. A
better approach would be to have an unlimited lookahead buffer for
situations like these. The code that deals with character input has
already been extracted to io.c, so it's possible to implement that now.

While here, fix another access to undefined memory, after the loop.

There is still the issue of overwriting procname[0] with a blank, which
results in inconsistent formatting depending on the function name,
probably another case of accessing undefined memory, although the
results have been reproducible, but that may have been pure luck.

The formatted code looks clearly broken, but that's still better than
losing a token and destroying the whole file.
 1.143  19-Nov-2021  rillig indent: use character input API from the tokenizer

No functional change.
 1.142  19-Nov-2021  rillig indent: move character input handling from lexi.c to io.c

No functional change.
 1.141  19-Nov-2021  rillig indent: replace direct access to the input buffer

This is a preparation for abstracting away all the low-level details of
handling the input. The goal is to fix the current bugs regarding line
number counting, out of bounds memory access, and generally unreadable
code.

No functional change.
 1.140  19-Nov-2021  rillig indent: group variables for input handling

No functional change.
 1.139  18-Nov-2021  rillig indent: prevent use-after-free bug

Triggered by the following artificial program:

---- snip ----
int *
f
( void)
{
}
---- snap ----
 1.138  07-Nov-2021  rillig indent: various cleanups

Make several comments more precise.

Rename process_end_of_file to process_eof to match the token name.

Change the order of assignments in analyze_comment to keep the com_ind
computations closer together.

In copy_comment_wrap, use pointer difference instead of pointer addition
to stay away from undefined behavior.

No functional change.
 1.137  07-Nov-2021  rillig indent: rename ps.decl_nest to decl_level

This better matches the comment.

No functional change.
 1.136  07-Nov-2021  rillig indent: move ps.p_l_follow closer to lsym_type_outside_parentheses

This makes it easier to see the relation between these two.

No functional change.
 1.135  07-Nov-2021  rillig indent: rename type_at_paren_level_0 to type_outside_parentheses

For symmetry with type_in_parentheses.

No functional change.
 1.134  07-Nov-2021  rillig indent: distinguish between typename in parentheses and other words

This gets rid of two members of parser_state. No functional change for
well-formed programs. The sequence of '++int' or '--size_t' may be
formatted differently than before, but no program is expected to contain
that sequence.

Rename lsym_ident to lsym_word since 'ident' was too specific. This
token type is used for constants and string literals as well. Strictly
speaking, a string literal is not a word, but at least it's better than
before.
 1.133  07-Nov-2021  rillig indent: rename 'inbuf' functions to 'inp'

The variable 'inp' used to be named 'inbuf'. Make the function names
correspond to the variable name again.

No functional change.
 1.132  05-Nov-2021  rillig indent: consistently use token.e[-1] for the last added character

No functional change.
 1.131  05-Nov-2021  rillig indent: add debug output for remaining members of parser_status
 1.130  05-Nov-2021  rillig indent: rename ps.curr_newline to next_col_1

For symmetry with ps.curr_col_1.

No functional change.
 1.129  01-Nov-2021  rillig indent: fix missing blank after 'return' (since 2021-10-31)

In indent.c 1.200 from 2021-10-31, the subtypes of identifier tokens
were removed since they were redundant. An unintended side effect was
that a parenthesized expression after 'return' was no longer separated
by a blank.

Before that change, 'return' was tokenized as an lsym_ident with subtype
kw_other, and want_space_before_lparen handled this case in the last
line. After the change, 'return' was treated as an ordinary identifier,
and unless the option '-pcs' (blank after function call) was given, the
blank was removed.

The other keywords that had kw_other are not affected since they do not
expect a '(' afterwards. These keywords are 'break', 'continue', 'goto',
'inline' and 'restrict'.

Curiously, there was not a single test case that covered 'return(expr)'.

While here, remove the trailing ',' from the enum lexer_symbol, which is
not allowed in standard C, it is a GNU extension. Lint doesn't complain
about this since the default LINTFLAGS include '-g' for GCC mode.
 1.128  31-Oct-2021  rillig indent: clean up

Initialize buffers in reading order, make comments more expressive,
rename add_typename to register_typename, remove unused macro.

No functional change.
 1.127  31-Oct-2021  rillig indent: remove redundant keyword.is_type

It is still confusing that not all type keywords end up as lsym_type.
Those that occur inside parentheses end up as identifiers instead. To
see whether an identifier is a typename, query ps.curr_is_type and
ps.prev_is_type.

No functional change.
 1.126  31-Oct-2021  rillig indent: replace kw_tag with lsym_tag

This leaves only one special type of token, which is lsym_ident, which
in some cases represents a type name and in other cases an identifier,
constant or string literal.

No functional change.
 1.125  31-Oct-2021  rillig indent: replace simple cases of keyword_kind with lexer_symbol

The remaining keyword kinds 'tag' and 'type' require a bit more thought,
so do them in a separate step.

No functional change.
 1.124  31-Oct-2021  rillig indent: rename lsym_type to better reflect reality

Type names that occur in parentheses are parsed as lsym_ident having the
subtype kw_type instead.

No functional change.
 1.123  31-Oct-2021  rillig indent: remove support for pre-1978 variable initialization
 1.122  31-Oct-2021  rillig indent: in debug log, print token subtype in same line

The keyword 'void' is parsed as lsym_type in some cases and lsym_ident
in others. Its corresponding keyword is always kw_type though. Put the
subtype into the same line as the other token information.
 1.121  31-Oct-2021  rillig indent: add separate lexer symbol for offsetof

No functional change.
 1.120  31-Oct-2021  rillig indent: add separate lexer symbol for sizeof

The plan is to get rid of the type keyword_kind, which largely overlaps
with lexer_symbol.

No functional change.
 1.119  31-Oct-2021  rillig indent: clean up definition of keywords

Rename kw_struct_or_union_or_enum to the shorter kw_tag.

Merge kw_jump with kw_inline_or_restrict since they are handled in the
same way.

No functional change.
 1.118  31-Oct-2021  rillig indent: condense lexi_alnum

No functional change.
 1.117  30-Oct-2021  rillig indent: rename prev_newline and prev_col_1 to curr

These two flags describe the token that is currently processed.

In process_binary_op, curr_newline can never be true since newline is
not a binary operator, so remove that condition.

No functional change.
 1.116  30-Oct-2021  rillig indent: in debug output, list the new token first
 1.115  30-Oct-2021  rillig indent: clean up lexical analyzer

Use traditional type for small unsigned numbers instead of uint8_t; the
required header was not included.

Remove assertion for debug mode; lint takes care of ensuring that the
enum constants match the length of the names array.

Constify a name array.

Move the comparison function for bsearch closer to its caller.

No functional change.
 1.114  29-Oct-2021  rillig indent: remove redundant comments, remove punctuation from debug log

The comment about 'null stmt' between braces probably meant 'no
statements between braces'.

The comments at psym_switch_expr only repeated what the code says or had
been outdated 29 years ago already since opt.case_indent does not have
to be 'one level down'.

In the debug log, the quotes around the symbol names are not necessary
after a ':'. The parse stack also does not need this much punctuation.

Reducing a do-while loop to nothing instead of a statement saves a few
CPU cycles. It works because after each lbrace, a stmt is pushed to the
parser stack. This stmt can only ever be reduced to a stmt_list but
never be removed.
 1.113  29-Oct-2021  rillig indent: in debug mode, log only differences for most ps members
 1.112  29-Oct-2021  rillig indent: add detailed debug logging for the parser state
 1.111  29-Oct-2021  rillig indent: merge isblank and is_hspace into ch_isblank

No functional change.
 1.110  29-Oct-2021  rillig indent: use prev/curr/next to refer to the current token

The word 'last' just didn't match with 'next'.

No functional change.
 1.109  29-Oct-2021  rillig indent: keep p_l_follow nonnegative, use consistent comparison

No functional change.
 1.108  29-Oct-2021  rillig indent: spell 'parentheses' properly in messages and comments
 1.107  28-Oct-2021  rillig indent: remove unused local variable in lexi

Since the previous commit, lexi is always called with the same argument,
so remove that parameter.

The previous commit broke the debug logging by not printing "transient
state" anymore. Replace this with "rolled back parser state" at the
caller's site.

No functional change.
 1.106  28-Oct-2021  rillig indent: reduce negations in search_stmt_lookahead

No functional change.
 1.105  26-Oct-2021  rillig indent: make ps.keyword easier to understand

Previously, ps.keyword did not have any documentation and was not
straight-forward. In some cases it was reset to kw_0, in others it was
set to an interesting value. The idea behind it was to remember the kind
of word of the previous token, to decide whether to have a space between
sizeof or offsetof and a following '('.

No functional change.
 1.104  26-Oct-2021  rillig indent: fix debug logging

The parser state is not always 'ps', so the debug logging must use the
correct state as well.
 1.103  26-Oct-2021  rillig indent: run indent on its own source code

With manual corrections afterwards, to compensate for the remaining bugs
in indent.

Without the type definitions in .indent.pro, the opening braces of the
functions kw_name and lexi_alnum would not be at the beginning of the
line.
 1.102  26-Oct-2021  rillig indent: merge duplicate code in lexi_alnum
 1.101  25-Oct-2021  rillig indent: improve debug logging

Output the various details in chronological order.
 1.100  25-Oct-2021  rillig indent: split type token_type into 3 separate types

Previously, token_type was used for 3 different purposes:

1. symbol types from the lexer
2. symbol types on the parser stack
3. kind of control statement for 'if (expr)' and similar statements

Splitting the 41 constants into separate types makes it immediately
clear that the parser stack never handles comments, preprocessing lines,
newlines, form feeds, the inner structure of expressions.

Previously, the constant switch_expr was especially confusing since it
was used for 3 different purposes: when returned from lexi, it
represented the keyword 'switch', in the parser stack it represented
'switch (expr)', and it was used for a statement head as well.

The only overlap between the lexer symbols and the parser symbols are
'{' and '}', and the keywords 'do' and 'else'. To increase confusion,
the constants of the previous token_type were in apparently random
order and before 2021, they had cryptic, highly abbreviated names.

No functional change.
 1.99  24-Oct-2021  rillig indent: rename form_feed to tt_lex_form_feed

No functional change.
 1.98  24-Oct-2021  rillig indent: split kw_for_or_if_or_while into separate constants

No functional change.
 1.97  24-Oct-2021  rillig indent: split kw_do_or_else into separate constants

It was unnecessarily confusing to have the token types keyword_do_else,
keyword_do and keyword_else at the same time, without any hint in what
they differed.

Some of the token types seem to be used by the lexer while others are
used in the parse stack. Maybe all token types can be partitioned into
these groups, which would suggest to use two different types for them.
And if not, it's still clearer to have this distinction in the names of
the constants.

No functional change.
 1.96  24-Oct-2021  rillig indent: define lexi_end as function instead of macro
 1.95  24-Oct-2021  rillig indent: run indent on its own source code

With manual corrections afterwards. Indent still does not get
extra_expr_indent correctly, it also indents global variables after
tagged declarations too deep.

No functional change.
 1.94  24-Oct-2021  rillig indent: rename nitems to array_length
 1.93  24-Oct-2021  rillig indent: sort includes
 1.92  20-Oct-2021  rillig indent: rename ps.last_u_d to match its comment

No functional change.
 1.91  11-Oct-2021  rillig indent: use separate variables for lexi_alnum and lexi

These two uses of the variable are independent of each other.

No functional change.
 1.90  11-Oct-2021  rillig indent: clean up comments in lexi and lexi_alnum

No functional change.
 1.89  11-Oct-2021  rillig indent: extract lexi_alnum from lexi

No functional change.
 1.88  09-Oct-2021  rillig indent: fix lint warning about bsearch discarding 'const'

lexi.c(433): warning: call to 'bsearch' effectively discards 'const'
from argument [346]
 1.87  08-Oct-2021  rillig indent: merge duplicate code in lexer

No functional change.
 1.86  08-Oct-2021  rillig indent: rename in_or_st to init_or_struct

This makes a few comments redundant.

No functional change.
 1.85  08-Oct-2021  rillig indent: remove 'global' from the list of keywords

Since 1978, 'global' has not been a keyword in C. Moreover, it was
declared as a type while its name would rather suggest a storage class.

Removing the keyword fixes the formatting of variables named 'global'.
 1.84  08-Oct-2021  rillig indent: clean up typename handling

Unexport typenames list.

Replace standard binary search with custom binary search that returns
the inserting position.

In is_typename, take advantage of the buffer type instead of using
the standard C recipe for str_ends_with.

No functional change.
 1.83  08-Oct-2021  rillig indent: enhance comments for lex_number state machine

No functional change.
 1.82  08-Oct-2021  rillig indent: improve local variable names

No functional change.
 1.81  08-Oct-2021  rillig indent: rename fill_buffer to inbuf_read_line

No functional change.
 1.80  08-Oct-2021  rillig indent: constify detection of function names

No functional change.
 1.79  08-Oct-2021  rillig indent: rename tokens lparen and rparen to be more precise

No functional change.
 1.78  07-Oct-2021  rillig indent: group variables for the input buffer

The input buffer follows the same concept as the intermediate buffers
for label, code, comment and token, so use the same type for it.

No functional change.
 1.77  07-Oct-2021  rillig indent: clean up code, remove outdated wrong comments

No functional change.
 1.76  07-Oct-2021  rillig indent: use braces around multi-line statements

No functional change.
 1.75  07-Oct-2021  rillig indent: let the code breathe a bit by inserting empty lines

No functional change.
 1.74  07-Oct-2021  rillig indent: fix wrong or outdated comments

No functional change.
 1.73  07-Oct-2021  rillig indent: raise WARNS from the default 5 up to 6
 1.72  05-Oct-2021  rillig indent: use buffer type in debug_print_buf

That function had been created before 'struct buffer' was invented,
therefore it used two pointers as parameters. Remove this redundancy.

No functional change.
 1.71  05-Oct-2021  rillig indent: run indent on lexi.c, with manual corrections

The variables 'keywords' and 'typenames' were indented using 8 spaces,
even though -di0 was in effect, which should result in a single space,
and -ut was in effect, which should result in a single tab instead of 8
spaces.

The option -eei does not work as advertised, the controlling expressions
are only indented by the normal amount, which easily leads to confusion
as to whether the code belongs to the condition or the following
statement.
 1.70  05-Oct-2021  rillig indent: untangle complicated condition in probably_typedef

No functional change.
 1.69  05-Oct-2021  rillig indent: use proper escape sequence for form feed

This escape sequence has been available since at least 1978.
 1.68  05-Oct-2021  rillig indent: merge duplicate code into is_hspace

No functional change.
 1.67  05-Oct-2021  rillig indent: clean up code for appending to buffers

Use *e++ for appending and e[-1] for testing the previously appended
character, like in other places in the code.

No functional change.
 1.66  05-Oct-2021  rillig indent: merge duplicate code for reading from input buffer

No functional change.
 1.65  03-Oct-2021  rillig indent: fix lint warning about signed '>>'

Lint couldn't infer that indent's list of type names will practically
never contain more that 2 billion entries and that the result of '>>'
would be the same in all cases.
 1.64  27-Sep-2021  rillig indent: use binary instead of linear search when adding types

No functional change.
 1.63  27-Sep-2021  rillig indent: extract is_typename from lexi

No functional change.
 1.62  27-Sep-2021  rillig indent: rename rwcode to keyword_kind, various cleanup

No idea what the 'rw' in 'rwcode' meant, it had been imported that way
28 years ago. Since rwcode specifies the kind of a keyword, the prefix
'kw_' makes sense.

No functional change.
 1.61  26-Sep-2021  rillig indent: unexport global variables

The variable match_state was write-only and was thus removed.

No functional change.
 1.60  26-Sep-2021  rillig indent: unexport keyword table, clean up

No functional change.
 1.59  26-Sep-2021  rillig indent: let indent format its own code -- in supervised mode

After running indent on the code, I manually selected each change that
now looks better than before. The remaining changes are left for later.
All in all, indent did a pretty good job, except for syntactic additions
from after 1990, but that was to be expected. Examples for such
additions are GCC's __attribute__ and C99 designated initializers.

Indent has only few knobs to tune the indentation. The knob for the
continuation indentation applies to function declarations as well as to
expressions. The knob for indentation of local variable declarations
applies to struct members as well, even if these are members of a
top-level struct.

Several code comments crossed the right margin in column 78. Several
other code comments were correctly broken though. The cause for this
difference was not obvious.

No functional change.
 1.58  25-Sep-2021  rillig indent: merge duplicate code for token buffers

No functional change.
 1.57  25-Sep-2021  rillig indent: extract probably_typedef into separate function

This condition is complicated enough that it warrants being split into
several clauses, maybe even with an explanation.

No functional change.
 1.56  25-Sep-2021  rillig indent: reduce code and data size for lexing of numbers

Instead of having a table of strings (121 pointers + 121 data
relocations), reduce that table to the actual character data and use a
secondary table for looking up the correct row in the main table.

No functional change.
 1.55  25-Sep-2021  rillig indent: convert remaining ibool to bool

No functional change intended.
 1.54  25-Sep-2021  rillig indent: prepare for lint's strict bool mode

Before C99, C had no boolean type. Instead, indent used int for that,
just like many other programs. Even with C99, bool and int can be used
interchangeably in many situations, such as querying '!i' or '!ptr' or
'cond == 0'.

Since January 2021, lint provides the strict bool mode, which makes bool
a non-arithmetic type that is incompatible with any other type. Having
clearly separate types helps in understanding the code.

To migrate indent to strict bool mode, the first step is to apply all
changes that keep the resulting binary the same. Since sizeof(bool) is
1 and sizeof(int) is 4, the type ibool serves as an intermediate type.
For now it is defined to int, later it will become bool.

The current code compiles cleanly in C99 and C11 mode, as well as in
lint's strict bool mode. There are a few tricky places:

In args.c in 'struct pro', there are two types of options: boolean and
integer. Boolean options point to a bool variable, integer options
point to an int variable. To keep the current structure of the code,
the pointer has been changed to 'void *'. To ensure type safety, the
definition of the options is done via preprocessor magic, which in C11
mode ensures the correct pointer types. (Add CFLAGS+=-std=gnu11 at the
very bottom of the Makefile.)

In indent.c in process_preprocessing, a boolean variable is
post-incremented. That variable is only assigned to another variable,
and that variable is only used in a boolean context. To provoke a
different behavior between the '++' and the '= true', the source code
to be indented would need 1 << 32 preprocessing directives, which is
unlikely to happen in practice.

In io.c in dump_line, the variables ps.in_stmt and ps.in_decl only ever
get the values 0 and 1. For these values, the expressions 'a & ~b' and
'a && !b' are equivalent, in all versions of C. The compiler may
generate different code for them, though.

In io.c in parse_indent_comment, the assignment to inhibit_formatting
takes place in integer context. If the compiler is smart enough to
detect the possible values of on_off, it may generate the same code
before and after the change, but that is rather unlikely.

The second step of the migration will be to replace ibool with bool,
step by step, just in case there are any hidden gotchas in the code,
such as sizeof or pointer casts.

No change to the resulting binary.
 1.53  25-Sep-2021  rillig indent: remove ifdef for lint

NetBSD lint does not need them anymore, FreeBSD does not have lint.
 1.52  25-Sep-2021  rillig indent: make lex_char_or_string simpler

The previous code was so tricky that every second line needed a comment
that explains what's going on. Replace the complicated code with the
usual straight-forward string-copying patterns.

No functional change.
 1.51  25-Sep-2021  rillig indent: add nonnull memory allocation functions

The only functional change is a single error message.
 1.50  25-Sep-2021  rillig indent: group global variables for token buffer

No functional change.
 1.49  25-Sep-2021  rillig indent: inline macro 'token'

No functional change.
 1.48  25-Sep-2021  rillig indent: group global variables for code buffer

No functional change.
 1.47  25-Sep-2021  rillig indent: rename variables of type token_type

The previous variable name 'code' conflicts with the buffer of the same
name.

No functional change.
 1.46  24-Sep-2021  rillig indent: group global variables for label buffer into struct

No functional change.
 1.45  24-Sep-2021  rillig indent: group global variables for the comment buffer

No functional change.
 1.44  24-Sep-2021  rillig indent: fix space-tab in indentation
 1.43  26-Aug-2021  rillig indent: extract lex_number, lex_word, lex_char_or_string

No functional change.
 1.42  25-Aug-2021  rillig indent: fix lint warnings about type conversions on ilp32

No functional change.
 1.41  14-Mar-2021  rillig indent: fix lint warnings

No functional change.
 1.40  13-Mar-2021  rillig indent: remove redundant parentheses

No functional change.
 1.39  13-Mar-2021  rillig indent: add debug logging for actually writing to the output file

Together with the results of the tokenizer and the 4 buffers for token,
label, code and comment, the debug log now provides a good high-level
view on how the indentation happens and where to look for the many
remaining bugs.
 1.38  12-Mar-2021  rillig indent: use consistent indentation for 'else'

Half of the code used -ce, the other half the opposite -nce.

No functional change.
 1.37  12-Mar-2021  rillig indent: fix misleading indentation in indent's own code

No functional change.
 1.36  12-Mar-2021  rillig indent: move code for tokenizing numbers further up

Having it directly below the table makes it easier understandable.

I also tried to omit this function entirely by moving the code into the
initializer itself, but that made the code redundant and furthermore
increased the size of the resulting binary, probably because of the new
relocation records.

No functional change.
 1.35  11-Mar-2021  rillig indent: reduce indentation of check_size functions

No functional change.
 1.34  11-Mar-2021  rillig indent: remove redundant cast after allocation functions

No functional change.
 1.33  11-Mar-2021  rillig indent: use consistent array indexing

No functional change.
 1.32  11-Mar-2021  rillig indent: merge duplicate code for reading from the input buffer

No functional change.
 1.31  09-Mar-2021  rillig indent: rename a few more token types

The previous names were either too short or ambiguous.

No functional change.
 1.30  09-Mar-2021  rillig indent: make token names more precise

The previous 'casestmt' was wrong since a case label is not a statement
at all.

The previous 'swstmt' was overly short, and wrong as well, since it
represents only the 'switch (expr)' part, which is not a complete switch
statement. Same for 'ifstmt', 'whilestmt', 'forstmt'.

The previous word 'head' was not precise enough since it didn't specify
exactly where the head ends and the body starts. Especially for
handling the dangling else, this distinction is important.

No functional change.
 1.29  09-Mar-2021  rillig indent: rename a few tokens to be more obvious

For casual readers it is not obvious whether the 'sp' meant 'special' or
'space' or something entirely different.
 1.28  09-Mar-2021  rillig indent: manually indent comments

It's strange that indent's own code is not formatted by indent itself,
which would be a good demonstration of its capabilities.

In its current state, I don't trust indent to get even the tokenization
correct, therefore the only safe way is to format the code manually.
 1.27  08-Mar-2021  rillig indent: split bsearch comparison function

It may have been a clever trick to use the same memory layout for struct
templ and a string pointer, but it's not worth the extra comment and
difficulty in understanding the code.

No functional change.
 1.26  08-Mar-2021  rillig indent: inline macro for backslash

No functional change.
 1.25  08-Mar-2021  rillig indent: convert big macros to functions

Each of these buffers is only modified in a single file. This makes it
unnecessary to declare the macros in the global header.
 1.24  07-Mar-2021  rillig indent: fix handling of '//' end-of-line comments
 1.23  07-Mar-2021  rillig indent: remove redundant parentheses around return value

No functional change.
 1.22  07-Mar-2021  rillig lint: move keyword 'continue' over to the other control flow keywords

No functional change since neither rw_jump nor rw_inline_or_restrict is
mentioned in any switch statement, and lint didn't find any other
suspicious enum operations.
 1.21  07-Mar-2021  rillig indent: use named constants for the different types of keywords

This reduces the magic numbers in the code. Most of these had their
designated constant name written in a nearby comment anyway.

The one instance where arithmetic was performed on this new enum type
(in indent.c) was a bit tricky to understand.

The combination rw_continue_or_inline_or_restrict looks strange, the
'continue' should intuitively belong to the other control flow keywords
in rw_break_or_goto_or_return.

No functional change.
 1.20  07-Mar-2021  rillig indent: in debug mode, output detailed token information

The main ingredient for understanding how indent works is the tokenizer
and the 4 buffers in which the text is collected.

Inspecting this debug log for the test comment-line-end makes it obvious
why indent messes up code that contains '//' comments. The cause is
that indent interprets '//' as an operator, just like '&&' or '||'. The
sequence '/////' is interpreted as a single operator as well, by the
way.

Since '//' is interpreted as an ordinary operator, any words following
it are plain identifiers, usually several of them in a row, which is a
syntax error. Depending on the context, the operator '//' is either a
unary operator (no space around) or a binary operator (space around).
This explains why the word 'line-end' is expanded to 'line - end'.

No functional change outside of debug mode.
 1.19  07-Mar-2021  rillig indent: for the token types, use enum instead of #define

This makes it easier to step through the code in a debugger.

No functional change.
 1.18  07-Mar-2021  rillig indent: use all headers in all files

This is a prerequisite for converting the token types to an enum instead
of a preprocessor define, since the return type of lexi will become
token_type. Having the enum will make debugging easier.

There was a single naming collision, which forced the variable in
scan_profile to be renamed. All other token names are used nowhere
else.

No change to the resulting binary.
 1.17  19-Oct-2019  christos use stdarg, annotate function as __printflike and fix broken formats.
 1.16  04-Apr-2019  kamil Upgrade indent(1)

Merge all the changes from the recent FreeBSD HEAD snapshot
into our local copy.

FreeBSD actively maintains this program in their sources and their
repository contains over 100 commits with changes.

Keep the delta between the FreeBSD and NetBSD versions to absolute
minimum, mostly RCS Id and compatiblity fixes.

Major chages in this import:

- Added an option -ldi<N> to control indentation of local variable names.
- Added option -P for loading user-provided files as profiles
- Added -tsn for setting tabsize
- Rename -nsac/-sac ("space after cast") to -ncs/-cs
- Added option -fbs Enables (disables) splitting the function declaration and opening brace across two lines.
- Respect SIMPLE_BACKUP_SUFFIX environment variable in indent(1)
- Group global option variables into an options structure
- Use bsearch() for looking up type keywords.
- Don't produce unneeded space character in function declarators
- Don't unnecessarily add a blank before a comment ends.
- Don't ignore newlines after comments that follow braces.

Merge the FreeBSD intend(1) tests with our ATF framework.
All tests pass.

Upgrade prepared by Manikishan Ghantasala.
Final polishing by myself.
 1.15  03-Feb-2019  mrg - add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily
 1.14  05-Jun-2016  dholland branches: 1.14.16;
Fix CSRG-era typo: typedef, not typdef. Spotted by Piotr Stefaniak.
 1.13  12-Apr-2009  lukem Fix WARNS=4 issues (-Wshadow -Wcast-qual -Wsign-compare)
 1.12  07-Aug-2003  agc branches: 1.12.42;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.11  26-May-2002  wiz Remove #ifndef'd __STDC__ code. ANSIfy.
 1.10  22-Mar-2002  kristerw Recognize all C9x integer constants (ISO/IEC 9899:1999 section 6.4.4.1)
Patch taken from FreeBSD.

Fixes PR bin/9219.
 1.9  15-Mar-1999  kristerw Made indent recognize the [fF], [uU], [lL], [uU][lL], [lL][lL], and
[uU][lL][lL] constant suffixes. (PR bin/6516 by Brian Ginsbach)
 1.8  19-Dec-1998  christos char -> unsigned char, braces for gcc-2.8.1
 1.7  25-Aug-1998  ross Add { and } to shut up egcs. Reformat the more questionable code.
 1.6  19-Oct-1997  lukem WARNSify, fix .Nm usage, deprecate register, use <err.h>, KNFify (with indent!;)
 1.5  18-Oct-1997  mrg merge lite-2.
 1.4  09-Sep-1997  agc Bump number of elements in specials array from 100 to 1000.
Typedefs are added to this array, and it silently ignores
any attempts to enter more elements when the array is full.
 1.3  09-Jan-1997  tls RCS ID police
 1.2  01-Aug-1993  mycroft Add RCS identifiers.
 1.1  09-Apr-1993  cgd branches: 1.1.1;
added, from net/2 (patch 124).
 1.1.1.2  04-Apr-2019  kamil FreeBSD indent r340138
 1.1.1.1  06-Jun-1993  mrg 4.4BSD-Lite2
 1.12.42.1  13-May-2009  jym Sync with HEAD.

Third (and last) commit. See http://mail-index.netbsd.org/source-changes/2009/05/13/msg221222.html
 1.14.16.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.14.16.1  10-Jun-2019  christos Sync with HEAD

RSS XML Feed