History log of /src/usr.bin/indent/parse.c |
Revision | | Date | Author | Comments |
1.85 |
| 07-Jan-2025 |
rillig | indent: condense and simplify parsing code
|
1.84 |
| 07-Jan-2025 |
rillig | indent: fix indentation of statement after deeply nested 'if'
|
1.83 |
| 07-Jan-2025 |
rillig | indent: fix indentation of comment above 'else'
Previously, indent assumed that no 'else' would follow.
|
1.82 |
| 04-Jan-2025 |
rillig | indent: fix indentation of adjacent multi-line initializers
The main topic of this change is parse.c:66, which makes the indentation of statements uniform with the indentation of other parser symbols.
That change had the side effect of messing up the indentation of files whose first line does not start in column 1, such as in ps_ind_level.c. To fix this side effect, the initial indentation must be determined before pushing the placeholder token psym_stmt during initialization.
|
1.81 |
| 04-Jan-2025 |
rillig | indent: make debug log more uniform
|
1.80 |
| 04-Jan-2025 |
rillig | indent: make debug output easier readable
The previous format had the values of the parser state on the left side and the corresponding names on the right side. While it looked nicely aligned, it was not suitable for focusing on the actual data. Replace this format with the more common "key: value" format.
Use the names of the enum constants in the debug log, instead of the previous "nice" names that needed one more level of mental translation and in some cases contained unbalanced punctuation such as '{'.
|
1.79 |
| 18-Jun-2023 |
rillig | branches: 1.79.2; indent: untangle code for handling the statement indentation
The expression 'psyms.level-- - 2' did too much in a single line, so extract the '--' to a separate statement, to highlight the symmetry between the 'sym' and 'ind_level' code.
No functional change.
|
1.78 |
| 17-Jun-2023 |
rillig | indent: miscellaneous cleanups
No binary change.
|
1.77 |
| 14-Jun-2023 |
rillig | indent: clean up the code, add a few tests
|
1.76 |
| 14-Jun-2023 |
rillig | indent: allow more than 128 brace levels
|
1.75 |
| 14-Jun-2023 |
rillig | indent: fix out-of-bounds read when reducing a statement
Since parse.c 1.73 from today. The parser symbol psym_stmt_list that was removed in that commit acted as a stop symbol, so that psyms_reduce_stmt would save a memory access.
|
1.74 |
| 14-Jun-2023 |
rillig | indent: clean up array indexing for parser symbols
With 'top' pointing to the actual top element, the array was indexed in the closed range from 0 to top. All other arrays are indexed by the usual half-open interval from 0 to len.
No functional change.
|
1.73 |
| 14-Jun-2023 |
rillig | indent: merge parser symbols for stmt and stmt_list
They were handled in exactly the same way.
|
1.72 |
| 10-Jun-2023 |
rillig | indent: fix stack overflow, add more tests
For several parser symbols, 2 symbols are pushed in a row, which led to an out-of-bounds write.
|
1.71 |
| 10-Jun-2023 |
rillig | indent: miscellaneous cleanups
|
1.70 |
| 09-Jun-2023 |
rillig | indent: format its own code
|
1.69 |
| 07-Jun-2023 |
rillig | indent: extract the stack of parser symbols to a separate struct
No functional change.
|
1.68 |
| 06-Jun-2023 |
rillig | indent: sort functions in call order
No functional change.
|
1.67 |
| 06-Jun-2023 |
rillig | indent: compute indentation of 'case' labels on-demand
One less moving part to keep track of.
No functional change.
|
1.66 |
| 05-Jun-2023 |
rillig | indent: rename variables, clean up comments
No binary change.
|
1.65 |
| 04-Jun-2023 |
rillig | indent: track the kind of '{' on the parser stack
|
1.64 |
| 03-Jun-2023 |
rillig | indent: clean up handling of brace indentation
No functional change.
|
1.63 |
| 02-Jun-2023 |
rillig | indent: fix formatting of declarations with preprocessing lines
|
1.62 |
| 23-May-2023 |
rillig | indent: split debug output into paragraphs
The paragraphs separate the different processing steps: getting a token from the lexer, processing the token, updating the parser state, sending a finished line to the output.
|
1.61 |
| 18-May-2023 |
rillig | indent: manually wrap overly long lines
No functional change.
|
1.60 |
| 18-May-2023 |
rillig | indent: switch to standard code style
Taken from share/misc/indent.pro.
Indent does not wrap code to fit into the line width, it only does so for comments. The 'INDENT OFF' sections and too long lines will be addressed in a follow-up commit.
No functional change.
|
1.59 |
| 16-May-2023 |
rillig | indent: allow comments in column 1 to be formatted
|
1.58 |
| 15-May-2023 |
rillig | indent: format its own code, extend some comments
With manual corrections, as there are still some bugs left.
No functional change.
|
1.57 |
| 15-May-2023 |
rillig | indent: remove redundant include lines
|
1.56 |
| 15-May-2023 |
rillig | indent: clean up memory and buffer management
Remove the need to explicitly initialize the buffers. To avoid subtracting null pointers or comparing them using '<', migrate the buffers from the (start, end) form to the (start, len) form. This form also avoids inconsistencies in whether 'buf.e == buf.s' or 'buf.s == buf.e' is used.
Make buffer.st const, to avoid accidental modification of the buffer's content.
Replace '*buf.e++ = ch' with buf_add_char, to avoid having to keep track how much unwritten space is left in the buffer. Remove all safety margins, that is, no more unchecked access to buf.st[-1] or appending using '*buf.e++'.
Fix line number counting in lex_word for words that contain line breaks.
No functional change.
|
1.55 |
| 14-May-2023 |
rillig | indent: remove foreign RCS IDs
|
1.54 |
| 13-May-2023 |
rillig | indent: move debugging code to separate file
No functional change.
|
1.53 |
| 12-May-2023 |
rillig | indent: rename placeholder symbol for parser stack
No functional change outside debug mode.
|
1.52 |
| 12-May-2023 |
rillig | tests/indent: test pushing the placeholder symbol to the parser stack
|
1.51 |
| 12-May-2023 |
rillig | indent: condense code for handling spaced expressions
No functional change outside debug mode.
|
1.50 |
| 11-May-2023 |
rillig | indent: remove buggy code for swapping tokens
It is not the job of an indenter to swap tokens, even if it's only about placing comments elsewhere. The code that swapped the tokens was complicated, buggy and impossible to understand.
In -br (brace right) mode, indent no longer moves a '{' from the beginning of a line to the end of the previous line, as that was handled by the token swapping code as well. This change is unintended, but it will be easier to re-add that now that the code is simpler.
|
1.49 |
| 22-Apr-2022 |
rillig | indent: remove FreeBSD IDs
Most of the IDs were empty anyway.
|
1.48 |
| 07-Nov-2021 |
rillig | indent: various cleanups
Make several comments more precise.
Rename process_end_of_file to process_eof to match the token name.
Change the order of assignments in analyze_comment to keep the com_ind computations closer together.
In copy_comment_wrap, use pointer difference instead of pointer addition to stay away from undefined behavior.
No functional change.
|
1.47 |
| 29-Oct-2021 |
rillig | indent: remove redundant comments, remove punctuation from debug log
The comment about 'null stmt' between braces probably meant 'no statements between braces'.
The comments at psym_switch_expr only repeated what the code says or had been outdated 29 years ago already since opt.case_indent does not have to be 'one level down'.
In the debug log, the quotes around the symbol names are not necessary after a ':'. The parse stack also does not need this much punctuation.
Reducing a do-while loop to nothing instead of a statement saves a few CPU cycles. It works because after each lbrace, a stmt is pushed to the parser stack. This stmt can only ever be reduced to a stmt_list but never be removed.
|
1.46 |
| 29-Oct-2021 |
rillig | indent: remove redundant comments
The comments only repeated what the constants for the parser symbols already express in their names. In the past, the names of these constants were inconsistent and misleading; back then, it made sense to make the comments express the actual meaning of the constants.
|
1.45 |
| 29-Oct-2021 |
rillig | indent: reduce indentation in parse, extract decl_level
No functional change.
|
1.44 |
| 28-Oct-2021 |
rillig | indent: clean up indentation, comments, reduce
No functional change.
|
1.43 |
| 28-Oct-2021 |
rillig | indent: clean up comments and function names
Having accurate names for the lexer symbols and the parser symbols makes most of the comments redundant. Remove these.
Rename process_decl to process_type, to match the name of the corresponding lexer symbol. In this phase, it's just a single type token, not a whole declaration.
No functional change.
|
1.42 |
| 26-Oct-2021 |
rillig | indent: run indent on its own source code
With manual corrections afterwards, to compensate for the remaining bugs in indent.
Without the type definitions in .indent.pro, the opening braces of the functions kw_name and lexi_alnum would not be at the beginning of the line.
|
1.41 |
| 25-Oct-2021 |
rillig | indent: do not output token in debug mode
When the parse stack is manipulated, the text of the token is not relevant anymore and may even be confusing, for example when parsing if_expr, the token may contain "}".
|
1.40 |
| 25-Oct-2021 |
rillig | indent: rename search_brace to search_stmt
No functional change.
|
1.39 |
| 25-Oct-2021 |
rillig | indent: split type token_type into 3 separate types
Previously, token_type was used for 3 different purposes:
1. symbol types from the lexer 2. symbol types on the parser stack 3. kind of control statement for 'if (expr)' and similar statements
Splitting the 41 constants into separate types makes it immediately clear that the parser stack never handles comments, preprocessing lines, newlines, form feeds, the inner structure of expressions.
Previously, the constant switch_expr was especially confusing since it was used for 3 different purposes: when returned from lexi, it represented the keyword 'switch', in the parser stack it represented 'switch (expr)', and it was used for a statement head as well.
The only overlap between the lexer symbols and the parser symbols are '{' and '}', and the keywords 'do' and 'else'. To increase confusion, the constants of the previous token_type were in apparently random order and before 2021, they had cryptic, highly abbreviated names.
No functional change.
|
1.38 |
| 24-Oct-2021 |
rillig | indent: split kw_do_or_else into separate constants
It was unnecessarily confusing to have the token types keyword_do_else, keyword_do and keyword_else at the same time, without any hint in what they differed.
Some of the token types seem to be used by the lexer while others are used in the parse stack. Maybe all token types can be partitioned into these groups, which would suggest to use two different types for them. And if not, it's still clearer to have this distinction in the names of the constants.
No functional change.
|
1.37 |
| 24-Oct-2021 |
rillig | indent: run indent on its own source code
With manual corrections afterwards. Indent still does not get extra_expr_indent correctly, it also indents global variables after tagged declarations too deep.
No functional change.
|
1.36 |
| 20-Oct-2021 |
rillig | indent: rename parser stack variables
No functional change.
|
1.35 |
| 08-Oct-2021 |
rillig | indent: clean up comments, parentheses, debug messages, boolean operator
No functional change.
|
1.34 |
| 08-Oct-2021 |
rillig | indent: clean up 'parse', add test for dangling else
No functional change.
|
1.33 |
| 07-Oct-2021 |
rillig | indent: rename opt.btype_2 to brace_same_line
No functional change.
|
1.32 |
| 07-Oct-2021 |
rillig | indent: let the code breathe a bit by inserting empty lines
No functional change.
|
1.31 |
| 07-Oct-2021 |
rillig | indent: clean up comments
No functional change.
|
1.30 |
| 07-Oct-2021 |
rillig | indent: remove redundant comments
No functional change.
|
1.29 |
| 05-Oct-2021 |
rillig | indent: fix Clang-Tidy warnings, clean up bakcopy
The comment above and inside bakcopy had been outdated for at least the last 28 years, the backup file is named "%s.BAK", not ".B%s".
Prevent buffer overflow for very long filenames (sprintf -> snprintf).
|
1.28 |
| 05-Oct-2021 |
rillig | indent: rename local char variable, reduce scope of counters
No functional change.
|
1.27 |
| 26-Sep-2021 |
rillig | indent: let indent format its own code -- in supervised mode
After running indent on the code, I manually selected each change that now looks better than before. The remaining changes are left for later. All in all, indent did a pretty good job, except for syntactic additions from after 1990, but that was to be expected. Examples for such additions are GCC's __attribute__ and C99 designated initializers.
Indent has only few knobs to tune the indentation. The knob for the continuation indentation applies to function declarations as well as to expressions. The knob for indentation of local variable declarations applies to struct members as well, even if these are members of a top-level struct.
Several code comments crossed the right margin in column 78. Several other code comments were correctly broken though. The cause for this difference was not obvious.
No functional change.
|
1.26 |
| 25-Sep-2021 |
rillig | indent: un-abbreviate a few parser_state members, clean up comments
No functional change.
|
1.25 |
| 25-Sep-2021 |
rillig | indent: convert remaining ibool to bool
No functional change intended.
|
1.24 |
| 25-Sep-2021 |
rillig | indent: prepare for lint's strict bool mode
Before C99, C had no boolean type. Instead, indent used int for that, just like many other programs. Even with C99, bool and int can be used interchangeably in many situations, such as querying '!i' or '!ptr' or 'cond == 0'.
Since January 2021, lint provides the strict bool mode, which makes bool a non-arithmetic type that is incompatible with any other type. Having clearly separate types helps in understanding the code.
To migrate indent to strict bool mode, the first step is to apply all changes that keep the resulting binary the same. Since sizeof(bool) is 1 and sizeof(int) is 4, the type ibool serves as an intermediate type. For now it is defined to int, later it will become bool.
The current code compiles cleanly in C99 and C11 mode, as well as in lint's strict bool mode. There are a few tricky places:
In args.c in 'struct pro', there are two types of options: boolean and integer. Boolean options point to a bool variable, integer options point to an int variable. To keep the current structure of the code, the pointer has been changed to 'void *'. To ensure type safety, the definition of the options is done via preprocessor magic, which in C11 mode ensures the correct pointer types. (Add CFLAGS+=-std=gnu11 at the very bottom of the Makefile.)
In indent.c in process_preprocessing, a boolean variable is post-incremented. That variable is only assigned to another variable, and that variable is only used in a boolean context. To provoke a different behavior between the '++' and the '= true', the source code to be indented would need 1 << 32 preprocessing directives, which is unlikely to happen in practice.
In io.c in dump_line, the variables ps.in_stmt and ps.in_decl only ever get the values 0 and 1. For these values, the expressions 'a & ~b' and 'a && !b' are equivalent, in all versions of C. The compiler may generate different code for them, though.
In io.c in parse_indent_comment, the assignment to inhibit_formatting takes place in integer context. If the compiler is smart enough to detect the possible values of on_off, it may generate the same code before and after the change, but that is rather unlikely.
The second step of the migration will be to replace ibool with bool, step by step, just in case there are any hidden gotchas in the code, such as sizeof or pointer casts.
No change to the resulting binary.
|
1.23 |
| 25-Sep-2021 |
rillig | indent: remove ifdef for lint
NetBSD lint does not need them anymore, FreeBSD does not have lint.
|
1.22 |
| 25-Sep-2021 |
rillig | indent: group global variables for token buffer
No functional change.
|
1.21 |
| 25-Sep-2021 |
rillig | indent: inline macro 'token'
No functional change.
|
1.20 |
| 25-Sep-2021 |
rillig | indent: group global variables for code buffer
No functional change.
|
1.19 |
| 25-Sep-2021 |
rillig | indent: rename variables of type token_type
The previous variable name 'code' conflicts with the buffer of the same name.
No functional change.
|
1.18 |
| 12-Mar-2021 |
rillig | indent: use consistent indentation for 'else'
Half of the code used -ce, the other half the opposite -nce.
No functional change.
|
1.17 |
| 09-Mar-2021 |
rillig | indent: make token names more precise
The previous 'casestmt' was wrong since a case label is not a statement at all.
The previous 'swstmt' was overly short, and wrong as well, since it represents only the 'switch (expr)' part, which is not a complete switch statement. Same for 'ifstmt', 'whilestmt', 'forstmt'.
The previous word 'head' was not precise enough since it didn't specify exactly where the head ends and the body starts. Especially for handling the dangling else, this distinction is important.
No functional change.
|
1.16 |
| 09-Mar-2021 |
rillig | indent: extract reduce_stmt from reduce
This refactoring reduces the indentation of the code, as well as removing any ambiguity as to which 'switch' statement a 'break' belongs, as there are no more nested 'switch' statements.
No functional change.
|
1.15 |
| 09-Mar-2021 |
rillig | indent: manually indent comments
It's strange that indent's own code is not formatted by indent itself, which would be a good demonstration of its capabilities.
In its current state, I don't trust indent to get even the tokenization correct, therefore the only safe way is to format the code manually.
|
1.14 |
| 07-Mar-2021 |
rillig | indent: in debug mode, output detailed token information
The main ingredient for understanding how indent works is the tokenizer and the 4 buffers in which the text is collected.
Inspecting this debug log for the test comment-line-end makes it obvious why indent messes up code that contains '//' comments. The cause is that indent interprets '//' as an operator, just like '&&' or '||'. The sequence '/////' is interpreted as a single operator as well, by the way.
Since '//' is interpreted as an ordinary operator, any words following it are plain identifiers, usually several of them in a row, which is a syntax error. Depending on the context, the operator '//' is either a unary operator (no space around) or a binary operator (space around). This explains why the word 'line-end' is expanded to 'line - end'.
No functional change outside of debug mode.
|
1.13 |
| 07-Mar-2021 |
rillig | indent: for the token types, use enum instead of #define
This makes it easier to step through the code in a debugger.
No functional change.
|
1.12 |
| 07-Mar-2021 |
rillig | indent: use all headers in all files
This is a prerequisite for converting the token types to an enum instead of a preprocessor define, since the return type of lexi will become token_type. Having the enum will make debugging easier.
There was a single naming collision, which forced the variable in scan_profile to be renamed. All other token names are used nowhere else.
No change to the resulting binary.
|
1.11 |
| 06-Mar-2021 |
rillig | indent: fix space-tab alignment in indent's own code
These parts are not fixed automatically by indent since they are in box comments.
No functional change.
|
1.10 |
| 19-Oct-2019 |
christos | use stdarg, annotate function as __printflike and fix broken formats.
|
1.9 |
| 04-Apr-2019 |
kamil | Upgrade indent(1)
Merge all the changes from the recent FreeBSD HEAD snapshot into our local copy.
FreeBSD actively maintains this program in their sources and their repository contains over 100 commits with changes.
Keep the delta between the FreeBSD and NetBSD versions to absolute minimum, mostly RCS Id and compatiblity fixes.
Major chages in this import:
- Added an option -ldi<N> to control indentation of local variable names. - Added option -P for loading user-provided files as profiles - Added -tsn for setting tabsize - Rename -nsac/-sac ("space after cast") to -ncs/-cs - Added option -fbs Enables (disables) splitting the function declaration and opening brace across two lines. - Respect SIMPLE_BACKUP_SUFFIX environment variable in indent(1) - Group global option variables into an options structure - Use bsearch() for looking up type keywords. - Don't produce unneeded space character in function declarators - Don't unnecessarily add a blank before a comment ends. - Don't ignore newlines after comments that follow braces.
Merge the FreeBSD intend(1) tests with our ATF framework. All tests pass.
Upgrade prepared by Manikishan Ghantasala. Final polishing by myself.
|
1.8 |
| 03-Feb-2019 |
mrg | - add or adjust /* FALLTHROUGH */ where appropriate - add __unreachable() after functions that can return but won't in this case, and thus can't be marked __dead easily
|
1.7 |
| 07-Aug-2003 |
agc | branches: 1.7.98; Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22365, verified by myself.
|
1.6 |
| 26-May-2002 |
wiz | Remove #ifndef'd __STDC__ code. ANSIfy.
|
1.5 |
| 19-Oct-1997 |
lukem | WARNSify, fix .Nm usage, deprecate register, use <err.h>, KNFify (with indent!;)
|
1.4 |
| 18-Oct-1997 |
mrg | merge lite-2.
|
1.3 |
| 09-Jan-1997 |
tls | RCS ID police
|
1.2 |
| 01-Aug-1993 |
mycroft | Add RCS identifiers.
|
1.1 |
| 09-Apr-1993 |
cgd | branches: 1.1.1; added, from net/2 (patch 124).
|
1.1.1.2 |
| 04-Apr-2019 |
kamil | FreeBSD indent r340138
|
1.1.1.1 |
| 06-Jun-1993 |
mrg | 4.4BSD-Lite2
|
1.7.98.2 |
| 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
1.7.98.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
1.79.2.1 |
| 02-Aug-2025 |
perseant | Sync with HEAD
|