Cross Reference: /src/usr.bin/indent/parse.c

History log of /src/usr.bin/indent/parse.c
Revision	Date	Author	Comments
1.85	07-Jan-2025	rillig	indent: condense and simplify parsing code
1.84	07-Jan-2025	rillig	indent: fix indentation of statement after deeply nested 'if'
1.83	07-Jan-2025	rillig	indent: fix indentation of comment above 'else' Previously, indent assumed that no 'else' would follow.
1.82	04-Jan-2025	rillig	indent: fix indentation of adjacent multi-line initializers The main topic of this change is parse.c:66, which makes the indentation of statements uniform with the indentation of other parser symbols. That change had the side effect of messing up the indentation of files whose first line does not start in column 1, such as in ps_ind_level.c. To fix this side effect, the initial indentation must be determined before pushing the placeholder token psym_stmt during initialization.
1.81	04-Jan-2025	rillig	indent: make debug log more uniform
1.80	04-Jan-2025	rillig	indent: make debug output easier readable The previous format had the values of the parser state on the left side and the corresponding names on the right side. While it looked nicely aligned, it was not suitable for focusing on the actual data. Replace this format with the more common "key: value" format. Use the names of the enum constants in the debug log, instead of the previous "nice" names that needed one more level of mental translation and in some cases contained unbalanced punctuation such as '{'.
1.79	18-Jun-2023	rillig	branches: 1.79.2; indent: untangle code for handling the statement indentation The expression 'psyms.level-- - 2' did too much in a single line, so extract the '--' to a separate statement, to highlight the symmetry between the 'sym' and 'ind_level' code. No functional change.
1.78	17-Jun-2023	rillig	indent: miscellaneous cleanups No binary change.
1.77	14-Jun-2023	rillig	indent: clean up the code, add a few tests
1.76	14-Jun-2023	rillig	indent: allow more than 128 brace levels
1.75	14-Jun-2023	rillig	indent: fix out-of-bounds read when reducing a statement Since parse.c 1.73 from today. The parser symbol psym_stmt_list that was removed in that commit acted as a stop symbol, so that psyms_reduce_stmt would save a memory access.
1.74	14-Jun-2023	rillig	indent: clean up array indexing for parser symbols With 'top' pointing to the actual top element, the array was indexed in the closed range from 0 to top. All other arrays are indexed by the usual half-open interval from 0 to len. No functional change.
1.73	14-Jun-2023	rillig	indent: merge parser symbols for stmt and stmt_list They were handled in exactly the same way.
1.72	10-Jun-2023	rillig	indent: fix stack overflow, add more tests For several parser symbols, 2 symbols are pushed in a row, which led to an out-of-bounds write.
1.71	10-Jun-2023	rillig	indent: miscellaneous cleanups
1.70	09-Jun-2023	rillig	indent: format its own code
1.69	07-Jun-2023	rillig	indent: extract the stack of parser symbols to a separate struct No functional change.
1.68	06-Jun-2023	rillig	indent: sort functions in call order No functional change.
1.67	06-Jun-2023	rillig	indent: compute indentation of 'case' labels on-demand One less moving part to keep track of. No functional change.
1.66	05-Jun-2023	rillig	indent: rename variables, clean up comments No binary change.
1.65	04-Jun-2023	rillig	indent: track the kind of '{' on the parser stack
1.64	03-Jun-2023	rillig	indent: clean up handling of brace indentation No functional change.
1.63	02-Jun-2023	rillig	indent: fix formatting of declarations with preprocessing lines
1.62	23-May-2023	rillig	indent: split debug output into paragraphs The paragraphs separate the different processing steps: getting a token from the lexer, processing the token, updating the parser state, sending a finished line to the output.
1.61	18-May-2023	rillig	indent: manually wrap overly long lines No functional change.
1.60	18-May-2023	rillig	indent: switch to standard code style Taken from share/misc/indent.pro. Indent does not wrap code to fit into the line width, it only does so for comments. The 'INDENT OFF' sections and too long lines will be addressed in a follow-up commit. No functional change.
1.59	16-May-2023	rillig	indent: allow comments in column 1 to be formatted
1.58	15-May-2023	rillig	indent: format its own code, extend some comments With manual corrections, as there are still some bugs left. No functional change.
1.57	15-May-2023	rillig	indent: remove redundant include lines
1.56	15-May-2023	rillig	indent: clean up memory and buffer management Remove the need to explicitly initialize the buffers. To avoid subtracting null pointers or comparing them using '<', migrate the buffers from the (start, end) form to the (start, len) form. This form also avoids inconsistencies in whether 'buf.e == buf.s' or 'buf.s == buf.e' is used. Make buffer.st const, to avoid accidental modification of the buffer's content. Replace 'buf.e++ = ch' with buf_add_char, to avoid having to keep track how much unwritten space is left in the buffer. Remove all safety margins, that is, no more unchecked access to buf.st[-1] or appending using 'buf.e++'. Fix line number counting in lex_word for words that contain line breaks. No functional change.
1.55	14-May-2023	rillig	indent: remove foreign RCS IDs
1.54	13-May-2023	rillig	indent: move debugging code to separate file No functional change.
1.53	12-May-2023	rillig	indent: rename placeholder symbol for parser stack No functional change outside debug mode.
1.52	12-May-2023	rillig	tests/indent: test pushing the placeholder symbol to the parser stack
1.51	12-May-2023	rillig	indent: condense code for handling spaced expressions No functional change outside debug mode.
1.50	11-May-2023	rillig	indent: remove buggy code for swapping tokens It is not the job of an indenter to swap tokens, even if it's only about placing comments elsewhere. The code that swapped the tokens was complicated, buggy and impossible to understand. In -br (brace right) mode, indent no longer moves a '{' from the beginning of a line to the end of the previous line, as that was handled by the token swapping code as well. This change is unintended, but it will be easier to re-add that now that the code is simpler.
1.49	22-Apr-2022	rillig	indent: remove FreeBSD IDs Most of the IDs were empty anyway.
1.48	07-Nov-2021	rillig	indent: various cleanups Make several comments more precise. Rename process_end_of_file to process_eof to match the token name. Change the order of assignments in analyze_comment to keep the com_ind computations closer together. In copy_comment_wrap, use pointer difference instead of pointer addition to stay away from undefined behavior. No functional change.
1.47	29-Oct-2021	rillig	indent: remove redundant comments, remove punctuation from debug log The comment about 'null stmt' between braces probably meant 'no statements between braces'. The comments at psym_switch_expr only repeated what the code says or had been outdated 29 years ago already since opt.case_indent does not have to be 'one level down'. In the debug log, the quotes around the symbol names are not necessary after a ':'. The parse stack also does not need this much punctuation. Reducing a do-while loop to nothing instead of a statement saves a few CPU cycles. It works because after each lbrace, a stmt is pushed to the parser stack. This stmt can only ever be reduced to a stmt_list but never be removed.
1.46	29-Oct-2021	rillig	indent: remove redundant comments The comments only repeated what the constants for the parser symbols already express in their names. In the past, the names of these constants were inconsistent and misleading; back then, it made sense to make the comments express the actual meaning of the constants.
1.45	29-Oct-2021	rillig	indent: reduce indentation in parse, extract decl_level No functional change.
1.44	28-Oct-2021	rillig	indent: clean up indentation, comments, reduce No functional change.
1.43	28-Oct-2021	rillig	indent: clean up comments and function names Having accurate names for the lexer symbols and the parser symbols makes most of the comments redundant. Remove these. Rename process_decl to process_type, to match the name of the corresponding lexer symbol. In this phase, it's just a single type token, not a whole declaration. No functional change.
1.42	26-Oct-2021	rillig	indent: run indent on its own source code With manual corrections afterwards, to compensate for the remaining bugs in indent. Without the type definitions in .indent.pro, the opening braces of the functions kw_name and lexi_alnum would not be at the beginning of the line.
1.41	25-Oct-2021	rillig	indent: do not output token in debug mode When the parse stack is manipulated, the text of the token is not relevant anymore and may even be confusing, for example when parsing if_expr, the token may contain "}".
1.40	25-Oct-2021	rillig	indent: rename search_brace to search_stmt No functional change.
1.39	25-Oct-2021	rillig	indent: split type token_type into 3 separate types Previously, token_type was used for 3 different purposes: 1. symbol types from the lexer 2. symbol types on the parser stack 3. kind of control statement for 'if (expr)' and similar statements Splitting the 41 constants into separate types makes it immediately clear that the parser stack never handles comments, preprocessing lines, newlines, form feeds, the inner structure of expressions. Previously, the constant switch_expr was especially confusing since it was used for 3 different purposes: when returned from lexi, it represented the keyword 'switch', in the parser stack it represented 'switch (expr)', and it was used for a statement head as well. The only overlap between the lexer symbols and the parser symbols are '{' and '}', and the keywords 'do' and 'else'. To increase confusion, the constants of the previous token_type were in apparently random order and before 2021, they had cryptic, highly abbreviated names. No functional change.
1.38	24-Oct-2021	rillig	indent: split kw_do_or_else into separate constants It was unnecessarily confusing to have the token types keyword_do_else, keyword_do and keyword_else at the same time, without any hint in what they differed. Some of the token types seem to be used by the lexer while others are used in the parse stack. Maybe all token types can be partitioned into these groups, which would suggest to use two different types for them. And if not, it's still clearer to have this distinction in the names of the constants. No functional change.
1.37	24-Oct-2021	rillig	indent: run indent on its own source code With manual corrections afterwards. Indent still does not get extra_expr_indent correctly, it also indents global variables after tagged declarations too deep. No functional change.
1.36	20-Oct-2021	rillig	indent: rename parser stack variables No functional change.
1.35	08-Oct-2021	rillig	indent: clean up comments, parentheses, debug messages, boolean operator No functional change.
1.34	08-Oct-2021	rillig	indent: clean up 'parse', add test for dangling else No functional change.
1.33	07-Oct-2021	rillig	indent: rename opt.btype_2 to brace_same_line No functional change.
1.32	07-Oct-2021	rillig	indent: let the code breathe a bit by inserting empty lines No functional change.
1.31	07-Oct-2021	rillig	indent: clean up comments No functional change.
1.30	07-Oct-2021	rillig	indent: remove redundant comments No functional change.
1.29	05-Oct-2021	rillig	indent: fix Clang-Tidy warnings, clean up bakcopy The comment above and inside bakcopy had been outdated for at least the last 28 years, the backup file is named "%s.BAK", not ".B%s". Prevent buffer overflow for very long filenames (sprintf -> snprintf).
1.28	05-Oct-2021	rillig	indent: rename local char variable, reduce scope of counters No functional change.
1.27	26-Sep-2021	rillig	indent: let indent format its own code -- in supervised mode After running indent on the code, I manually selected each change that now looks better than before. The remaining changes are left for later. All in all, indent did a pretty good job, except for syntactic additions from after 1990, but that was to be expected. Examples for such additions are GCC's __attribute__ and C99 designated initializers. Indent has only few knobs to tune the indentation. The knob for the continuation indentation applies to function declarations as well as to expressions. The knob for indentation of local variable declarations applies to struct members as well, even if these are members of a top-level struct. Several code comments crossed the right margin in column 78. Several other code comments were correctly broken though. The cause for this difference was not obvious. No functional change.
1.26	25-Sep-2021	rillig	indent: un-abbreviate a few parser_state members, clean up comments No functional change.
1.25	25-Sep-2021	rillig	indent: convert remaining ibool to bool No functional change intended.
1.24	25-Sep-2021	rillig	indent: prepare for lint's strict bool mode Before C99, C had no boolean type. Instead, indent used int for that, just like many other programs. Even with C99, bool and int can be used interchangeably in many situations, such as querying '!i' or '!ptr' or 'cond == 0'. Since January 2021, lint provides the strict bool mode, which makes bool a non-arithmetic type that is incompatible with any other type. Having clearly separate types helps in understanding the code. To migrate indent to strict bool mode, the first step is to apply all changes that keep the resulting binary the same. Since sizeof(bool) is 1 and sizeof(int) is 4, the type ibool serves as an intermediate type. For now it is defined to int, later it will become bool. The current code compiles cleanly in C99 and C11 mode, as well as in lint's strict bool mode. There are a few tricky places: In args.c in 'struct pro', there are two types of options: boolean and integer. Boolean options point to a bool variable, integer options point to an int variable. To keep the current structure of the code, the pointer has been changed to 'void *'. To ensure type safety, the definition of the options is done via preprocessor magic, which in C11 mode ensures the correct pointer types. (Add CFLAGS+=-std=gnu11 at the very bottom of the Makefile.) In indent.c in process_preprocessing, a boolean variable is post-incremented. That variable is only assigned to another variable, and that variable is only used in a boolean context. To provoke a different behavior between the '++' and the '= true', the source code to be indented would need 1 << 32 preprocessing directives, which is unlikely to happen in practice. In io.c in dump_line, the variables ps.in_stmt and ps.in_decl only ever get the values 0 and 1. For these values, the expressions 'a & ~b' and 'a && !b' are equivalent, in all versions of C. The compiler may generate different code for them, though. In io.c in parse_indent_comment, the assignment to inhibit_formatting takes place in integer context. If the compiler is smart enough to detect the possible values of on_off, it may generate the same code before and after the change, but that is rather unlikely. The second step of the migration will be to replace ibool with bool, step by step, just in case there are any hidden gotchas in the code, such as sizeof or pointer casts. No change to the resulting binary.
1.23	25-Sep-2021	rillig	indent: remove ifdef for lint NetBSD lint does not need them anymore, FreeBSD does not have lint.
1.22	25-Sep-2021	rillig	indent: group global variables for token buffer No functional change.
1.21	25-Sep-2021	rillig	indent: inline macro 'token' No functional change.
1.20	25-Sep-2021	rillig	indent: group global variables for code buffer No functional change.
1.19	25-Sep-2021	rillig	indent: rename variables of type token_type The previous variable name 'code' conflicts with the buffer of the same name. No functional change.
1.18	12-Mar-2021	rillig	indent: use consistent indentation for 'else' Half of the code used -ce, the other half the opposite -nce. No functional change.
1.17	09-Mar-2021	rillig	indent: make token names more precise The previous 'casestmt' was wrong since a case label is not a statement at all. The previous 'swstmt' was overly short, and wrong as well, since it represents only the 'switch (expr)' part, which is not a complete switch statement. Same for 'ifstmt', 'whilestmt', 'forstmt'. The previous word 'head' was not precise enough since it didn't specify exactly where the head ends and the body starts. Especially for handling the dangling else, this distinction is important. No functional change.
1.16	09-Mar-2021	rillig	indent: extract reduce_stmt from reduce This refactoring reduces the indentation of the code, as well as removing any ambiguity as to which 'switch' statement a 'break' belongs, as there are no more nested 'switch' statements. No functional change.
1.15	09-Mar-2021	rillig	indent: manually indent comments It's strange that indent's own code is not formatted by indent itself, which would be a good demonstration of its capabilities. In its current state, I don't trust indent to get even the tokenization correct, therefore the only safe way is to format the code manually.
1.14	07-Mar-2021	rillig	indent: in debug mode, output detailed token information The main ingredient for understanding how indent works is the tokenizer and the 4 buffers in which the text is collected. Inspecting this debug log for the test comment-line-end makes it obvious why indent messes up code that contains '//' comments. The cause is that indent interprets '//' as an operator, just like '&&' or '\|\|'. The sequence '/////' is interpreted as a single operator as well, by the way. Since '//' is interpreted as an ordinary operator, any words following it are plain identifiers, usually several of them in a row, which is a syntax error. Depending on the context, the operator '//' is either a unary operator (no space around) or a binary operator (space around). This explains why the word 'line-end' is expanded to 'line - end'. No functional change outside of debug mode.
1.13	07-Mar-2021	rillig	indent: for the token types, use enum instead of #define This makes it easier to step through the code in a debugger. No functional change.
1.12	07-Mar-2021	rillig	indent: use all headers in all files This is a prerequisite for converting the token types to an enum instead of a preprocessor define, since the return type of lexi will become token_type. Having the enum will make debugging easier. There was a single naming collision, which forced the variable in scan_profile to be renamed. All other token names are used nowhere else. No change to the resulting binary.
1.11	06-Mar-2021	rillig	indent: fix space-tab alignment in indent's own code These parts are not fixed automatically by indent since they are in box comments. No functional change.
1.10	19-Oct-2019	christos	use stdarg, annotate function as __printflike and fix broken formats.
1.9	04-Apr-2019	kamil	Upgrade indent(1) Merge all the changes from the recent FreeBSD HEAD snapshot into our local copy. FreeBSD actively maintains this program in their sources and their repository contains over 100 commits with changes. Keep the delta between the FreeBSD and NetBSD versions to absolute minimum, mostly RCS Id and compatiblity fixes. Major chages in this import: - Added an option -ldi<N> to control indentation of local variable names. - Added option -P for loading user-provided files as profiles - Added -tsn for setting tabsize - Rename -nsac/-sac ("space after cast") to -ncs/-cs - Added option -fbs Enables (disables) splitting the function declaration and opening brace across two lines. - Respect SIMPLE_BACKUP_SUFFIX environment variable in indent(1) - Group global option variables into an options structure - Use bsearch() for looking up type keywords. - Don't produce unneeded space character in function declarators - Don't unnecessarily add a blank before a comment ends. - Don't ignore newlines after comments that follow braces. Merge the FreeBSD intend(1) tests with our ATF framework. All tests pass. Upgrade prepared by Manikishan Ghantasala. Final polishing by myself.
1.8	03-Feb-2019	mrg	- add or adjust /* FALLTHROUGH */ where appropriate - add __unreachable() after functions that can return but won't in this case, and thus can't be marked __dead easily
1.7	07-Aug-2003	agc	branches: 1.7.98; Move UCB-licensed code from 4-clause to 3-clause licence. Patches provided by Joel Baker in PR 22365, verified by myself.
1.6	26-May-2002	wiz	Remove #ifndef'd __STDC__ code. ANSIfy.
1.5	19-Oct-1997	lukem	WARNSify, fix .Nm usage, deprecate register, use <err.h>, KNFify (with indent!;)
1.4	18-Oct-1997	mrg	merge lite-2.
1.3	09-Jan-1997	tls	RCS ID police
1.2	01-Aug-1993	mycroft	Add RCS identifiers.
1.1	09-Apr-1993	cgd	branches: 1.1.1; added, from net/2 (patch 124).
1.1.1.2	04-Apr-2019	kamil	FreeBSD indent r340138
1.1.1.1	06-Jun-1993	mrg	4.4BSD-Lite2
1.7.98.2	13-Apr-2020	martin	Mostly merge changes from HEAD upto 20200411
1.7.98.1	10-Jun-2019	christos	Sync with HEAD
1.79.2.1	02-Aug-2025	perseant	Sync with HEAD

OpenGrok