Skip to content

Support plugin-style scripts for source format enforcement#531

Closed
yadij wants to merge 13 commits intosquid-cache:masterfrom
yadij:v5-maintenance-update
Closed

Support plugin-style scripts for source format enforcement#531
yadij wants to merge 13 commits intosquid-cache:masterfrom
yadij:v5-maintenance-update

Conversation

@yadij
Copy link
Contributor

@yadij yadij commented Dec 31, 2019

Allow the source-maintenance script to run arbitrary code or sub-scripts
to perform enforcement of Squid code style and content.

Code placed in the scripts/maintenance/ sub-folder MUST meet the
following criteria:

  • be self-executable,
  • receive filename of the code file to be touched as one and only
    command-line parameter,
  • always dump the file contents to stdout (with or without edits),
  • not depend on any other code in this sub-folder being run first.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a major question and a minor change request.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generalizing the application of the source code formatting/adjusting scripts is probably a good idea. I left specific change requests in that direction.

I am not yet convinced that HERE/NULL-removal scripts belong to the official Squid repository. By committing them, we essentially promise that they work reliably for all custom Squid code. IIRC, none of the known removal scripts are that good and, more importantly, making them that good is a waste of resources. After the official code is stripped from HERE and NULLs: Relative to the long-term nature of the official repository files, few people will need to apply conversion scripts to few branches for a short period of time.

Instead of committing those temporary scripts to the official Squid repository, I suspect that we should automatically check that the new code does not contain HEREs, NULLs, and such. That check can mishandle a few esoteric cases without causing much harm. And it does not need to be in the official Squid repository.

@kinkie
Copy link
Contributor

kinkie commented Jan 2, 2020 via email

@rousskov
Copy link
Contributor

rousskov commented Jan 2, 2020

@rousskov: I am not yet convinced that HERE/NULL-removal scripts belong to the official Squid repository. By committing them, we essentially promise that they work reliably for all custom Squid code.

@kinkie: I see no such promise. It’s up to the users of the script to ensure they work for the specific use case.

I see two problems with this logic:

  1. It sounds like you are saying that Squid Project distributes two sets of programs: One set consists of programs where users may have a reasonable expectation that the program does what it is supposed to do, and another set where the user should be super careful with running the program from the set because it may mangle user's uncommitted source files, publish private keys, shut down the server, etc. I wonder where you see the line between those two sets? In other words, how can a user tell that a Project-distributed program is not reliable and requires case-by-case review/analysis?
  2. The interface here is not an execution of an individual source code adjusting script. It is an execution of scripts/source-maintenance.sh or, eventually, something like make pretty. It is unreasonable, IMO, to expect the user to carefully examine that target beyond a brief description of what it does. We want (well, I cannot speak for others, so we should want) developers to run make pretty (or equivalent) often and without fear or substantial time investment into safety analysis. That desire/recommendation implies, IMO, a certain level of confidence in the scripts general applicability and safety. The NULL/HERE removal scripts I have seen were meant for a different use case (i.e., careful "one-time" execution by folks who know what they are doing and can be warned), which allowed us to make those scripts much simpler.

It would be a manual process,

I am not sure which process you are referring to, but Amos proposes adding a script to remove HEREs and NULLs. That script will be executed by scripts/source-maintenance.sh. No manual process beyond the execution of that script itself (which should be further automated behind a make pretty or a similar target).

and we will anyway be stuck in the middle of the river for years,

Why?! The idea here, AFAICT, is to remove all NULLs and all HEREs from the supported branches in a few commits. If done right, this would take less than a week, even with all the special cases that should be done by hand.

A developer can run the same script against their custom branches if they want to reduce future merge conflicts (and most developers either should not care about special cases or can fix them by cherry-picking official commits designated to those special cases). From a developer point of view, that will also take a few days.

And if some developer sits on their custom branch for a few years without applying these changes, then they are clearly not integrating with recent official code and hence are not a concern.

In summary, I see no "stuck for years" situation if we manage this correctly. This, BTW, is one of the reasons why I suggested removing NULLs and HEREs now, before branching v5. It would effectively remove the pain from two branches (v5 and master) at once, with no synchronization effort.

as shown by the last few years.

During the last few years we were manually removing NULLs and HEREs in new/modified code lines. That painful process is the opposite of what the script-driven process should do, so it is not a valid comparison. To end these pains, various developers, including Amos and I, proposed scripts that would auto-remove most NULLs and HEREs. This specific PR adds a placeholder to integrate those scripts with scripts/source-maintenance.sh. I question the need for that integration, but I (still) support removing NULLs and HEREs using scripts (and possibly a little bit of manual effort for a few special cases in the official repositories).

@rousskov
Copy link
Contributor

rousskov commented Jan 2, 2020

Perhaps we can attack the same problem from another direction: Let's remove NULLs and HEREs from v4 and master using a pair of stand-alone scripts and then see what happens. If there is sufficient demand for integrating those scripts with scripts/source-maintenance.sh, then we will integrate them.

@yadij
Copy link
Contributor Author

yadij commented Jan 4, 2020

@rousskov: I am not yet convinced that HERE/NULL-removal scripts belong to the official Squid repository. By committing them, we essentially promise that they work reliably for all custom Squid code.

@kinkie: I see no such promise. It’s up to the users of the script to ensure they work for the specific use case.

I see two problems with this logic:

1. It sounds like you are saying that Squid Project distributes two sets of programs: One set consists of programs where users may have a reasonable expectation that the program does what it is supposed to do, and another set where the user should be super careful with running the 

We promise that the scripts/source-maintenance.sh does what we intend it to, with the source open for review if anyone has doubts about what that is. We do not promise that nothing gets mangled in the sources (custom or official). So even we have to read the patch it creates before commit. Right now that is done by me manually before triggering the Format Enforcement PRs to official.

2. The interface here is not an execution of an _individual_ source code adjusting script.

We already expect people running the script to review its results. Nothing changes in that regard.

Outside of large PRs being committed with lots of formatting changes that output should be small or nil and restricted to the devs own touched code. There is a small change here in that these large PR events may be increased slightly by addition of new policy rules to the AWK script. We control that with the PR review process though, and the rule addition occurs first so we can use that as an easily identifiable rebase point for old code with generic how-to to reduce the pain.

It would be a manual process,

I am not sure which process you are referring to, but Amos proposes adding a script to remove HEREs and NULLs.

Not exactly. I am proposing a script to do that kind of painful code edit less painfully. They are just the obvious examples.

That script will be executed by scripts/source-maintenance.sh. No manual process beyond the execution of that script itself (which should be further automated behind a make pretty or a similar target).

That is not true. There is always manual review of its output needed (as mentioned above).

and we will anyway be stuck in the middle of the river for years,

Why?! The idea here, AFAICT, is to remove all NULLs and all HEREs from the supported branches in a few commits. If done right, this would take less than a week, even with all the special cases that should be done by hand.

A developer can run the same script against their custom branches if they want to reduce future merge conflicts (and most developers either should not care about special cases or can fix them by cherry-picking official commits designated to those special cases). From a developer point of view, that will also take a few days.

And if some developer sits on their custom branch for a few years without applying these changes, then they are clearly not integrating with recent official code and hence are not a concern.

In summary, I see no "stuck for years" situation if we manage this correctly.

That was exactly the argument you used 9 years ago to argue for the painful manual way forward. Yet here we are, less than 1/7th of necessary removals completed. Somewhere between 56 and infinity years to go.

This, BTW, is one of the reasons why I suggested removing NULLs and HEREs now, before branching v5. It would effectively remove the pain from two branches (v5 and master) at once, with no synchronization effort.

This is the argument I used at the v3.4, v3.5 and v4 branching time. Which you countered with how much work it would be for people with custom code forked off our repository. Irony.

Anyhow, this script PR is not for HERE/NULL removal. It is about reducing pain for that type of change. Even if we block again on those particular macros there are other code snippets that can use it for much smaller updates.

@rousskov
Copy link
Contributor

rousskov commented Jan 4, 2020

That was exactly the argument you used 9 years ago to argue for the painful manual way forward.

You are misrepresenting what happened 9 years ago.

@rousskov
Copy link
Contributor

rousskov commented Jan 4, 2020

This is the argument I used at the v3.4, v3.5 and v4 branching time. Which you countered with how much work it would be for people with custom code forked off our repository. Irony.

You are misrepresenting what your NULL/HERE removal suggestion was and, hence, why it was rejected.

@rousskov
Copy link
Contributor

rousskov commented Jan 6, 2020

We promise that the scripts/source-maintenance.sh does what we intend it to.

I agree. AFAICT, kinkie did not. My two points were meant to convince him to change (or clarify) his opinion. He has not done it yet, so we are still blocked on that key disagreement.

We do not promise that nothing gets mangled in the sources.

I disagree. The first promise is meaningless without this second component. "We promise that this guillotine cures your headache" is not a meaningful promise in my world (thought I admit that a person overly focused on literal/dictionary interpretations may find such a promise meaningful). By requiring developers to run a script, we implicitly promise that it will not do serious irreversible harm. The proposed addition is incompatible with that promise because it may remove or mangle a source file (which may have no copies).

So even we have to read the patch it creates before commit.

Sure, but reversing a patch cannot recover lost modifications because the patch is generated against committed/staged sources rather than sources that existed before the recommended script invocation. We should promise that the last step in the following sequence is possible:

  1. git clone (state A)
  2. edit src/main.cc (to arrive at state B)
  3. run "make pretty" or equivalent (creating state C)
  4. easily reverse "make pretty" changes, if any (back to state B)

The above sketch is not a complete/final specification, but a good starting point (more on that below).

Kinkie: It’s up to the users of the script to ensure they work for the specific case

Alex: 2. The interface here is not an execution of an individual source code adjusting script. It is an execution of scripts/source-maintenance.sh or, eventually, something like make pretty. It is unreasonable, IMO, to expect the user to carefully examine that target beyond a brief description of what it does....

Amos: We already expect people running the script to review its results. Nothing changes in that regard.

You are missing the context/point of my second bullet. Kinkie implied that users are responsible for figuring out whether the script is safe to run in their use case -- we make no promises, especially about their custom code. My point was that it would be unreasonable to task the user with such an analysis because we are talking not just about (currently non-existent) small awk script but the entire source code mangling behemoth. I am talking about impracticality of the pre-run analysis (implicitly required by the no-promise approach). You are talking about post-run review of damages.

The related "manual process" discussion is similarly centered around the implication that users are going to apply some small manual action and, hence, can be tasked with making sure that the action script will do what they want. Your followup there appears to be missing that context/point -- we are talking past each other. I am skipping that part...

As for "Nothing changes in that regard", we do (or at least should) expect a change: Developers should eventually be required (or at least strongly encouraged) to run make pretty or equivalent. Today, there is no such requirement (and some of us actually discourage running the source code formatting script and resist attempts to make it more portable or user-friendly).

This specific PR does not change the policy, of course, but one of the arguments attempting to justify this PR existence, AFAICT, is that other developers would be able to remove NULLs/HEREs (and alike) from their custom sources. That argument would be dead on arrival without the implication that we will start encouraging (or even requiring) developers to format their sources.

Please do not get me wrong though: I think we should require proper source code formatting, and I welcome changes that would get us closer to that goal. The infrastructure currently sketched by this PR is a small step in the opposite direction. However, before fixing that problem, we have to agree that the formatting script should either avoid making irreversible changes to sources or should refuse to run on uncommitted/unstaged sources (i.e. relying on git to preserve any custom code before it is formatted). This intent needs further formalization/detalization than the 4-step sketch above, but until there is consensus that we promise no harm (for some reasonable definition of harm), there is no point in detailing this further!

@yadij yadij force-pushed the v5-maintenance-update branch from c6c3315 to b22ac01 Compare January 12, 2020 07:13
@yadij yadij changed the title Support AWK rules for source maintenance WIP: Support AWK rules for source maintenance Feb 10, 2020
@yadij yadij requested a review from kinkie March 9, 2020 02:35
@kinkie
Copy link
Contributor

kinkie commented Mar 9, 2020

There's a number of topics that emerged in the conversation, I'll try to touch them all with my viewpoint in LIFO order:

  • project-supporting tools are mostly for our own benefit; as such they offer very little promise. As an example, their interface is often ugly but not worth improving. People who are uncomfortable with their output should generally just not use them
  • IMO these do belong to the source repository. Anything that we use to build and improve the project, and that has the slimmest chance of being reuseable, belongs there.
  • this is especially true now where the main hurdle is that a large scale change would have to be replicated through some long-running branches
  • I generally hope that anything we cook will not cause any major damage; let's not optimize for the unlikely case. I am confident that the time spent in this conversation already exceeds any time needed to fix breakages
  • I like the 4-step process, especially if automated

@rousskov
Copy link
Contributor

rousskov commented Mar 9, 2020

There's a number of topics that emerged in the conversation, I'll try to touch them all with my viewpoint

FWIW, I probably disagree with your assertions in the first three bullets, unsure what the unstated implications of the fourth bullet are, and glad we may have an agreement on the basic principles behind the 4-step sketch (i.e. your last bullet). BTW, the same basic principles went into the recent #565.

I believe that further discussion should be moved to IRC or a voice meeting because we are not making (enough) progress here and the disagreements appear to be rather vast. It would be good to narrow down the scope for that meeting agenda though, but we can try without it, especially if it is a voice meeting.

@yadij yadij mentioned this pull request Mar 24, 2020
@yadij yadij force-pushed the v5-maintenance-update branch from b22ac01 to c5f9d9b Compare March 24, 2020 08:55
@yadij yadij changed the title WIP: Support AWK rules for source maintenance Support plugin-style scripts for source format enforcement Mar 24, 2020
@yadij
Copy link
Contributor Author

yadij commented Mar 24, 2020

To make some progress on this and the other PRs waiting on it I am switching to Alex design.

@yadij yadij requested a review from rousskov March 24, 2020 09:17
@yadij yadij added the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Mar 24, 2020
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for moving this PR forward!

I hope it is possible to adjust the proposed plugin interface so that it supports existing formater.pl and sort-includes.pl plugins while remaining simple. Please let me know if you need help with that adjustment.

I also left a few specific change requests, mostly high-level ones. As an experiment, I am prefixing low-level polishing change requests with Nit: to emphasize that I consider the problem worth fixing but not worth arguing about. These markings also help elevate the status of other (high-level and/or otherwise important) change requests (that have no commonly-accepted marking AFAIK). You may want to start with those because they may invalidate/obsolete "nitpicking" ones.

@yadij yadij force-pushed the v5-maintenance-update branch from c5f9d9b to 77e9dd6 Compare March 29, 2020 14:19
@rousskov
Copy link
Contributor

FYI: I have updated my earlier review, resolving all the addressed change requests. It took a lot of time so I cannot promise a quick return, but please request my review when it is time for another round. Thank you.

@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label Mar 30, 2020
yadij and others added 6 commits April 8, 2020 01:06
Alex insists that this style of placing each change in separate scripts is better
than having one scritp do all changes to each source file despite the increase in
file I/O open+read+write+close sequences from N to N*M when N is number of source
code files and M the number of changes to be made.
exit propigates out to halt the entire script immediately.
Preventing other file formatting following one issue.
@yadij
Copy link
Contributor Author

yadij commented Apr 9, 2020

As for #2 the reason is that you insisted we add arbitrary code support instead of just declarative AWK rules (#531 (comment)).

There are at least two PRs coming to add scripts to this plugin directory. One that moves the sort-includes script there, and one that adds nullptr enforcement.

Co-Authored-By: Amos Jeffries <yadij@users.noreply.github.com>
@rousskov
Copy link
Contributor

rousskov commented Apr 10, 2020

the reason is that you insisted we add arbitrary code support instead of just declarative AWK rules

That was the reason you have implemented the general plugin support, yes, but that is not a valid justification for committing that support: Just because B is better than A does not mean we should commit B. There needs to be a good reason for committing either. I do not think we need to agree on this point though -- if you have a plan to add at least one or, better, two specific plugins, then the principle agreement on that plan is sufficient to move forward with merging this PR. I am just clarifying the logic behind my request for the principal agreement on plugins here; I am not trying to continue this part of the argument!

There are at least two PRs coming to add scripts to this plugin directory. One that moves the sort-includes script there, and one that adds nullptr enforcement.

If you manage to move the existing "includes sorting" script into plugins directory, it would be sufficient to justify adding this general mechanism. When I looked at that script, it was impossible to move it without changing its functionality:

  1. The current plugin support cannot warn about changing includes order like the existing sorting script does.
  2. The current plugin support will warn about whitespace-only changes unlike the existing sorting script (that does not warn about whitespace-only changes).

Are you going to drop and add those warnings during the move? Or are you going to enhance the plugin support so that it can differentiate between whitespace-only and non-whitespace changes?

As for nullptr enforcement, are you going to exclude C headers and sources from being subject to these plugins? There are currently 600+ NULLs in C sources alone. AFAICT, C does not have nullptr.

@yadij
Copy link
Contributor Author

yadij commented Apr 11, 2020

the reason is that you insisted we add arbitrary code support instead of just declarative AWK rules

That was the reason you have implemented the general plugin support, yes, but that is not a valid justification for committing that support: Just because B is better than A does not mean we should commit B. There needs to be a good reason for committing either. I do not think we need to agree on this point though -- if you have a plan to add at least one or, better, two specific plugins, then the principle agreement on that plan is sufficient to move forward with merging this PR.

I am not sure I understand what you were asking for. If there is something you want to be added to the description, please state clearly what you are wanting.

There are at least two PRs coming to add scripts to this plugin directory. One that moves the sort-includes script there, and one that adds nullptr enforcement.

If you manage to move the existing "includes sorting" script into plugins directory, it would be sufficient to justify adding this general mechanism. When I looked at that script, it was impossible to move it without changing its functionality:

The sort-includes.pl script and how it is called is what this PR was modeled on. It can be moved as-is into the scripts/maintenance/ directory. All the audit requested changes to this PR apply equally to how that script was being called - including the arguments for warning on whitespace change, diff replacement of MD5, and not touching unstaged files.

As for nullptr enforcement, are you going to exclude C headers and sources from being subject to these plugins? There are currently 600+ NULLs in C sources alone. AFAICT, C does not have nullptr.

Of course. Only C++ files will have the nullptr enforcement applied. I will leave more details on that to the relevant followup PRs.

@yadij yadij removed the S-waiting-for-author author action is expected (and usually required) label Apr 11, 2020
@rousskov
Copy link
Contributor

Alex: When I looked at that script, it was impossible to move it without changing its functionality: ...

Amos: All the audit requested changes to this PR apply equally to how that script was being called

AFAICT, you imply that you expect that a future PR will change how includes are sorted in order to migrate sorting to the new plugin interface. That change is OK with me; I wanted to make sure that you realize that the existing sorting code is not compatible with the new plugin interface.

I would prefer to have an agreement on multiple anticipated plugin examples, but one example is sufficient to justify merging this PR, so I am resolving my last change request.

Please note that there is currently no agreement regarding NULL fixing, the current plugin code applies to C files, and the current plugin API makes it difficult for a plugin to distinguish C from C++ sources. I agree that we can discuss these problems separately. Hopefully, we will find a way to reach an agreement there.

@rousskov rousskov dismissed their stale review April 11, 2020 15:16

Thank you for addressing my change requests.

exit 1
fi

# On squid-cache.org we have to use the python scripted md5sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole squid-cache.org block is rotten and is by now unnecessary. In fact, it hasn't worked for years.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. It is out of scope though for this feature change.

Copy link
Contributor

@kinkie kinkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one nit and one comment, go ahead please!

# Code Style formatting maintenance
#
if test "${ASVER}"; then
for SCRIPT in `git ls-files scripts/maintenance/`; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why git ls-files and not just listing the files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be a complete or even valid answer to your question, but:

  • git ls-files ignores files ignored by git. This is very helpful when editors and such create helper/temporary files inside our directory.
  • AFAICT, git ls-files has better (than ls -R) support for listing files in sub-directories. We might group scripts in sub-directories of the scripts/maintenance directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much what Alex said plus the benefit that it ignores objects which are not in the repository. So unintentional files are not going to be arbitrarily executed.

@rousskov rousskov added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Apr 13, 2020
@yadij
Copy link
Contributor Author

yadij commented Apr 13, 2020

Was the author tag just supposed to indicated this was waiting on my reply to kinkie? or something else?

@rousskov
Copy link
Contributor

Was the author tag just supposed to indicated this was waiting on my reply to kinkie? or something else?

The flag indicates that the ball is on your side. IIRC, when I set this flag, this PR was awaiting your response to Francesco and a clearance for merging. I will reset it to S-waiting-for-committer now.

@rousskov rousskov added S-waiting-for-committer privileged action is expected (and usually required) and removed S-waiting-for-author author action is expected (and usually required) labels Apr 13, 2020
@yadij yadij added the M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels label Apr 13, 2020
squid-anubis pushed a commit that referenced this pull request Apr 13, 2020
Allow the source-maintenance script to run arbitrary code or sub-scripts
to perform enforcement of Squid code style and content.

Code placed in the scripts/maintenance/ sub-folder MUST meet the
following criteria:

 * be self-executable,
 * receive filename of the code file to be touched as one and only
   command-line parameter,
 * always dump the file contents to stdout (with or without edits),
 * not depend on any other code in this sub-folder being run first.
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Apr 13, 2020
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels labels Apr 13, 2020
@yadij yadij deleted the v5-maintenance-update branch August 16, 2020 08:50
squidadm pushed a commit to squidadm/squid that referenced this pull request Mar 31, 2021
…he#531)

Allow the source-maintenance script to run arbitrary code or sub-scripts
to perform enforcement of Squid code style and content.

Code placed in the scripts/maintenance/ sub-folder MUST meet the
following criteria:

 * be self-executable,
 * receive filename of the code file to be touched as one and only
   command-line parameter,
 * always dump the file contents to stdout (with or without edits),
 * not depend on any other code in this sub-folder being run first.
yadij added a commit that referenced this pull request Apr 2, 2021
Allow the source-maintenance script to run arbitrary code or sub-scripts
to perform enforcement of Squid code style and content.

Code placed in the scripts/maintenance/ sub-folder MUST meet the
following criteria:

 * be self-executable,
 * receive filename of the code file to be touched as one and only
   command-line parameter,
 * always dump the file contents to stdout (with or without edits),
 * not depend on any other code in this sub-folder being run first.
@rousskov rousskov removed the S-waiting-for-committer privileged action is expected (and usually required) label Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

M-merged https://github.com/measurement-factory/anubis#pull-request-labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants