Home | History | Annotate | Download | only in indent
History log of /src/usr.bin/indent/parse.c
RevisionDateAuthorComments
 1.85  07-Jan-2025  rillig indent: condense and simplify parsing code
 1.84  07-Jan-2025  rillig indent: fix indentation of statement after deeply nested 'if'
 1.83  07-Jan-2025  rillig indent: fix indentation of comment above 'else'

Previously, indent assumed that no 'else' would follow.
 1.82  04-Jan-2025  rillig indent: fix indentation of adjacent multi-line initializers

The main topic of this change is parse.c:66, which makes the indentation
of statements uniform with the indentation of other parser symbols.

That change had the side effect of messing up the indentation of files
whose first line does not start in column 1, such as in ps_ind_level.c.
To fix this side effect, the initial indentation must be determined
before pushing the placeholder token psym_stmt during initialization.
 1.81  04-Jan-2025  rillig indent: make debug log more uniform
 1.80  04-Jan-2025  rillig indent: make debug output easier readable

The previous format had the values of the parser state on the left side
and the corresponding names on the right side. While it looked nicely
aligned, it was not suitable for focusing on the actual data. Replace
this format with the more common "key: value" format.

Use the names of the enum constants in the debug log, instead of the
previous "nice" names that needed one more level of mental translation
and in some cases contained unbalanced punctuation such as '{'.
 1.79  18-Jun-2023  rillig branches: 1.79.2;
indent: untangle code for handling the statement indentation

The expression 'psyms.level-- - 2' did too much in a single line, so
extract the '--' to a separate statement, to highlight the symmetry
between the 'sym' and 'ind_level' code.

No functional change.
 1.78  17-Jun-2023  rillig indent: miscellaneous cleanups

No binary change.
 1.77  14-Jun-2023  rillig indent: clean up the code, add a few tests
 1.76  14-Jun-2023  rillig indent: allow more than 128 brace levels
 1.75  14-Jun-2023  rillig indent: fix out-of-bounds read when reducing a statement

Since parse.c 1.73 from today. The parser symbol psym_stmt_list that was
removed in that commit acted as a stop symbol, so that psyms_reduce_stmt
would save a memory access.
 1.74  14-Jun-2023  rillig indent: clean up array indexing for parser symbols

With 'top' pointing to the actual top element, the array was indexed in
the closed range from 0 to top. All other arrays are indexed by the
usual half-open interval from 0 to len.

No functional change.
 1.73  14-Jun-2023  rillig indent: merge parser symbols for stmt and stmt_list

They were handled in exactly the same way.
 1.72  10-Jun-2023  rillig indent: fix stack overflow, add more tests

For several parser symbols, 2 symbols are pushed in a row, which led to
an out-of-bounds write.
 1.71  10-Jun-2023  rillig indent: miscellaneous cleanups
 1.70  09-Jun-2023  rillig indent: format its own code
 1.69  07-Jun-2023  rillig indent: extract the stack of parser symbols to a separate struct

No functional change.
 1.68  06-Jun-2023  rillig indent: sort functions in call order

No functional change.
 1.67  06-Jun-2023  rillig indent: compute indentation of 'case' labels on-demand

One less moving part to keep track of.

No functional change.
 1.66  05-Jun-2023  rillig indent: rename variables, clean up comments

No binary change.
 1.65  04-Jun-2023  rillig indent: track the kind of '{' on the parser stack
 1.64  03-Jun-2023  rillig indent: clean up handling of brace indentation

No functional change.
 1.63  02-Jun-2023  rillig indent: fix formatting of declarations with preprocessing lines
 1.62  23-May-2023  rillig indent: split debug output into paragraphs

The paragraphs separate the different processing steps: getting a token
from the lexer, processing the token, updating the parser state, sending
a finished line to the output.
 1.61  18-May-2023  rillig indent: manually wrap overly long lines

No functional change.
 1.60  18-May-2023  rillig indent: switch to standard code style

Taken from share/misc/indent.pro.

Indent does not wrap code to fit into the line width, it only does so
for comments. The 'INDENT OFF' sections and too long lines will be
addressed in a follow-up commit.

No functional change.
 1.59  16-May-2023  rillig indent: allow comments in column 1 to be formatted
 1.58  15-May-2023  rillig indent: format its own code, extend some comments

With manual corrections, as there are still some bugs left.

No functional change.
 1.57  15-May-2023  rillig indent: remove redundant include lines
 1.56  15-May-2023  rillig indent: clean up memory and buffer management

Remove the need to explicitly initialize the buffers. To avoid
subtracting null pointers or comparing them using '<', migrate the
buffers from the (start, end) form to the (start, len) form. This form
also avoids inconsistencies in whether 'buf.e == buf.s' or 'buf.s ==
buf.e' is used.

Make buffer.st const, to avoid accidental modification of the buffer's
content.

Replace '*buf.e++ = ch' with buf_add_char, to avoid having to keep track
how much unwritten space is left in the buffer. Remove all safety
margins, that is, no more unchecked access to buf.st[-1] or appending
using '*buf.e++'.

Fix line number counting in lex_word for words that contain line breaks.

No functional change.
 1.55  14-May-2023  rillig indent: remove foreign RCS IDs
 1.54  13-May-2023  rillig indent: move debugging code to separate file

No functional change.
 1.53  12-May-2023  rillig indent: rename placeholder symbol for parser stack

No functional change outside debug mode.
 1.52  12-May-2023  rillig tests/indent: test pushing the placeholder symbol to the parser stack
 1.51  12-May-2023  rillig indent: condense code for handling spaced expressions

No functional change outside debug mode.
 1.50  11-May-2023  rillig indent: remove buggy code for swapping tokens

It is not the job of an indenter to swap tokens, even if it's only about
placing comments elsewhere. The code that swapped the tokens was
complicated, buggy and impossible to understand.

In -br (brace right) mode, indent no longer moves a '{' from the
beginning of a line to the end of the previous line, as that was handled
by the token swapping code as well. This change is unintended, but it
will be easier to re-add that now that the code is simpler.
 1.49  22-Apr-2022  rillig indent: remove FreeBSD IDs

Most of the IDs were empty anyway.
 1.48  07-Nov-2021  rillig indent: various cleanups

Make several comments more precise.

Rename process_end_of_file to process_eof to match the token name.

Change the order of assignments in analyze_comment to keep the com_ind
computations closer together.

In copy_comment_wrap, use pointer difference instead of pointer addition
to stay away from undefined behavior.

No functional change.
 1.47  29-Oct-2021  rillig indent: remove redundant comments, remove punctuation from debug log

The comment about 'null stmt' between braces probably meant 'no
statements between braces'.

The comments at psym_switch_expr only repeated what the code says or had
been outdated 29 years ago already since opt.case_indent does not have
to be 'one level down'.

In the debug log, the quotes around the symbol names are not necessary
after a ':'. The parse stack also does not need this much punctuation.

Reducing a do-while loop to nothing instead of a statement saves a few
CPU cycles. It works because after each lbrace, a stmt is pushed to the
parser stack. This stmt can only ever be reduced to a stmt_list but
never be removed.
 1.46  29-Oct-2021  rillig indent: remove redundant comments

The comments only repeated what the constants for the parser symbols
already express in their names. In the past, the names of these
constants were inconsistent and misleading; back then, it made sense to
make the comments express the actual meaning of the constants.
 1.45  29-Oct-2021  rillig indent: reduce indentation in parse, extract decl_level

No functional change.
 1.44  28-Oct-2021  rillig indent: clean up indentation, comments, reduce

No functional change.
 1.43  28-Oct-2021  rillig indent: clean up comments and function names

Having accurate names for the lexer symbols and the parser symbols makes
most of the comments redundant. Remove these.

Rename process_decl to process_type, to match the name of the
corresponding lexer symbol. In this phase, it's just a single type
token, not a whole declaration.

No functional change.
 1.42  26-Oct-2021  rillig indent: run indent on its own source code

With manual corrections afterwards, to compensate for the remaining bugs
in indent.

Without the type definitions in .indent.pro, the opening braces of the
functions kw_name and lexi_alnum would not be at the beginning of the
line.
 1.41  25-Oct-2021  rillig indent: do not output token in debug mode

When the parse stack is manipulated, the text of the token is not
relevant anymore and may even be confusing, for example when parsing
if_expr, the token may contain "}".
 1.40  25-Oct-2021  rillig indent: rename search_brace to search_stmt

No functional change.
 1.39  25-Oct-2021  rillig indent: split type token_type into 3 separate types

Previously, token_type was used for 3 different purposes:

1. symbol types from the lexer
2. symbol types on the parser stack
3. kind of control statement for 'if (expr)' and similar statements

Splitting the 41 constants into separate types makes it immediately
clear that the parser stack never handles comments, preprocessing lines,
newlines, form feeds, the inner structure of expressions.

Previously, the constant switch_expr was especially confusing since it
was used for 3 different purposes: when returned from lexi, it
represented the keyword 'switch', in the parser stack it represented
'switch (expr)', and it was used for a statement head as well.

The only overlap between the lexer symbols and the parser symbols are
'{' and '}', and the keywords 'do' and 'else'. To increase confusion,
the constants of the previous token_type were in apparently random
order and before 2021, they had cryptic, highly abbreviated names.

No functional change.
 1.38  24-Oct-2021  rillig indent: split kw_do_or_else into separate constants

It was unnecessarily confusing to have the token types keyword_do_else,
keyword_do and keyword_else at the same time, without any hint in what
they differed.

Some of the token types seem to be used by the lexer while others are
used in the parse stack. Maybe all token types can be partitioned into
these groups, which would suggest to use two different types for them.
And if not, it's still clearer to have this distinction in the names of
the constants.

No functional change.
 1.37  24-Oct-2021  rillig indent: run indent on its own source code

With manual corrections afterwards. Indent still does not get
extra_expr_indent correctly, it also indents global variables after
tagged declarations too deep.

No functional change.
 1.36  20-Oct-2021  rillig indent: rename parser stack variables

No functional change.
 1.35  08-Oct-2021  rillig indent: clean up comments, parentheses, debug messages, boolean operator

No functional change.
 1.34  08-Oct-2021  rillig indent: clean up 'parse', add test for dangling else

No functional change.
 1.33  07-Oct-2021  rillig indent: rename opt.btype_2 to brace_same_line

No functional change.
 1.32  07-Oct-2021  rillig indent: let the code breathe a bit by inserting empty lines

No functional change.
 1.31  07-Oct-2021  rillig indent: clean up comments

No functional change.
 1.30  07-Oct-2021  rillig indent: remove redundant comments

No functional change.
 1.29  05-Oct-2021  rillig indent: fix Clang-Tidy warnings, clean up bakcopy

The comment above and inside bakcopy had been outdated for at least the
last 28 years, the backup file is named "%s.BAK", not ".B%s".

Prevent buffer overflow for very long filenames (sprintf -> snprintf).
 1.28  05-Oct-2021  rillig indent: rename local char variable, reduce scope of counters

No functional change.
 1.27  26-Sep-2021  rillig indent: let indent format its own code -- in supervised mode

After running indent on the code, I manually selected each change that
now looks better than before. The remaining changes are left for later.
All in all, indent did a pretty good job, except for syntactic additions
from after 1990, but that was to be expected. Examples for such
additions are GCC's __attribute__ and C99 designated initializers.

Indent has only few knobs to tune the indentation. The knob for the
continuation indentation applies to function declarations as well as to
expressions. The knob for indentation of local variable declarations
applies to struct members as well, even if these are members of a
top-level struct.

Several code comments crossed the right margin in column 78. Several
other code comments were correctly broken though. The cause for this
difference was not obvious.

No functional change.
 1.26  25-Sep-2021  rillig indent: un-abbreviate a few parser_state members, clean up comments

No functional change.
 1.25  25-Sep-2021  rillig indent: convert remaining ibool to bool

No functional change intended.
 1.24  25-Sep-2021  rillig indent: prepare for lint's strict bool mode

Before C99, C had no boolean type. Instead, indent used int for that,
just like many other programs. Even with C99, bool and int can be used
interchangeably in many situations, such as querying '!i' or '!ptr' or
'cond == 0'.

Since January 2021, lint provides the strict bool mode, which makes bool
a non-arithmetic type that is incompatible with any other type. Having
clearly separate types helps in understanding the code.

To migrate indent to strict bool mode, the first step is to apply all
changes that keep the resulting binary the same. Since sizeof(bool) is
1 and sizeof(int) is 4, the type ibool serves as an intermediate type.
For now it is defined to int, later it will become bool.

The current code compiles cleanly in C99 and C11 mode, as well as in
lint's strict bool mode. There are a few tricky places:

In args.c in 'struct pro', there are two types of options: boolean and
integer. Boolean options point to a bool variable, integer options
point to an int variable. To keep the current structure of the code,
the pointer has been changed to 'void *'. To ensure type safety, the
definition of the options is done via preprocessor magic, which in C11
mode ensures the correct pointer types. (Add CFLAGS+=-std=gnu11 at the
very bottom of the Makefile.)

In indent.c in process_preprocessing, a boolean variable is
post-incremented. That variable is only assigned to another variable,
and that variable is only used in a boolean context. To provoke a
different behavior between the '++' and the '= true', the source code
to be indented would need 1 << 32 preprocessing directives, which is
unlikely to happen in practice.

In io.c in dump_line, the variables ps.in_stmt and ps.in_decl only ever
get the values 0 and 1. For these values, the expressions 'a & ~b' and
'a && !b' are equivalent, in all versions of C. The compiler may
generate different code for them, though.

In io.c in parse_indent_comment, the assignment to inhibit_formatting
takes place in integer context. If the compiler is smart enough to
detect the possible values of on_off, it may generate the same code
before and after the change, but that is rather unlikely.

The second step of the migration will be to replace ibool with bool,
step by step, just in case there are any hidden gotchas in the code,
such as sizeof or pointer casts.

No change to the resulting binary.
 1.23  25-Sep-2021  rillig indent: remove ifdef for lint

NetBSD lint does not need them anymore, FreeBSD does not have lint.
 1.22  25-Sep-2021  rillig indent: group global variables for token buffer

No functional change.
 1.21  25-Sep-2021  rillig indent: inline macro 'token'

No functional change.
 1.20  25-Sep-2021  rillig indent: group global variables for code buffer

No functional change.
 1.19  25-Sep-2021  rillig indent: rename variables of type token_type

The previous variable name 'code' conflicts with the buffer of the same
name.

No functional change.
 1.18  12-Mar-2021  rillig indent: use consistent indentation for 'else'

Half of the code used -ce, the other half the opposite -nce.

No functional change.
 1.17  09-Mar-2021  rillig indent: make token names more precise

The previous 'casestmt' was wrong since a case label is not a statement
at all.

The previous 'swstmt' was overly short, and wrong as well, since it
represents only the 'switch (expr)' part, which is not a complete switch
statement. Same for 'ifstmt', 'whilestmt', 'forstmt'.

The previous word 'head' was not precise enough since it didn't specify
exactly where the head ends and the body starts. Especially for
handling the dangling else, this distinction is important.

No functional change.
 1.16  09-Mar-2021  rillig indent: extract reduce_stmt from reduce

This refactoring reduces the indentation of the code, as well as
removing any ambiguity as to which 'switch' statement a 'break' belongs,
as there are no more nested 'switch' statements.

No functional change.
 1.15  09-Mar-2021  rillig indent: manually indent comments

It's strange that indent's own code is not formatted by indent itself,
which would be a good demonstration of its capabilities.

In its current state, I don't trust indent to get even the tokenization
correct, therefore the only safe way is to format the code manually.
 1.14  07-Mar-2021  rillig indent: in debug mode, output detailed token information

The main ingredient for understanding how indent works is the tokenizer
and the 4 buffers in which the text is collected.

Inspecting this debug log for the test comment-line-end makes it obvious
why indent messes up code that contains '//' comments. The cause is
that indent interprets '//' as an operator, just like '&&' or '||'. The
sequence '/////' is interpreted as a single operator as well, by the
way.

Since '//' is interpreted as an ordinary operator, any words following
it are plain identifiers, usually several of them in a row, which is a
syntax error. Depending on the context, the operator '//' is either a
unary operator (no space around) or a binary operator (space around).
This explains why the word 'line-end' is expanded to 'line - end'.

No functional change outside of debug mode.
 1.13  07-Mar-2021  rillig indent: for the token types, use enum instead of #define

This makes it easier to step through the code in a debugger.

No functional change.
 1.12  07-Mar-2021  rillig indent: use all headers in all files

This is a prerequisite for converting the token types to an enum instead
of a preprocessor define, since the return type of lexi will become
token_type. Having the enum will make debugging easier.

There was a single naming collision, which forced the variable in
scan_profile to be renamed. All other token names are used nowhere
else.

No change to the resulting binary.
 1.11  06-Mar-2021  rillig indent: fix space-tab alignment in indent's own code

These parts are not fixed automatically by indent since they are in box
comments.

No functional change.
 1.10  19-Oct-2019  christos use stdarg, annotate function as __printflike and fix broken formats.
 1.9  04-Apr-2019  kamil Upgrade indent(1)

Merge all the changes from the recent FreeBSD HEAD snapshot
into our local copy.

FreeBSD actively maintains this program in their sources and their
repository contains over 100 commits with changes.

Keep the delta between the FreeBSD and NetBSD versions to absolute
minimum, mostly RCS Id and compatiblity fixes.

Major chages in this import:

- Added an option -ldi<N> to control indentation of local variable names.
- Added option -P for loading user-provided files as profiles
- Added -tsn for setting tabsize
- Rename -nsac/-sac ("space after cast") to -ncs/-cs
- Added option -fbs Enables (disables) splitting the function declaration and opening brace across two lines.
- Respect SIMPLE_BACKUP_SUFFIX environment variable in indent(1)
- Group global option variables into an options structure
- Use bsearch() for looking up type keywords.
- Don't produce unneeded space character in function declarators
- Don't unnecessarily add a blank before a comment ends.
- Don't ignore newlines after comments that follow braces.

Merge the FreeBSD intend(1) tests with our ATF framework.
All tests pass.

Upgrade prepared by Manikishan Ghantasala.
Final polishing by myself.
 1.8  03-Feb-2019  mrg - add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily
 1.7  07-Aug-2003  agc branches: 1.7.98;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22365, verified by myself.
 1.6  26-May-2002  wiz Remove #ifndef'd __STDC__ code. ANSIfy.
 1.5  19-Oct-1997  lukem WARNSify, fix .Nm usage, deprecate register, use <err.h>, KNFify (with indent!;)
 1.4  18-Oct-1997  mrg merge lite-2.
 1.3  09-Jan-1997  tls RCS ID police
 1.2  01-Aug-1993  mycroft Add RCS identifiers.
 1.1  09-Apr-1993  cgd branches: 1.1.1;
added, from net/2 (patch 124).
 1.1.1.2  04-Apr-2019  kamil FreeBSD indent r340138
 1.1.1.1  06-Jun-1993  mrg 4.4BSD-Lite2
 1.7.98.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.7.98.1  10-Jun-2019  christos Sync with HEAD
 1.79.2.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed