Support plugin-style scripts for source format enforcement#531
Support plugin-style scripts for source format enforcement#531yadij wants to merge 13 commits intosquid-cache:masterfrom
Conversation
rousskov
left a comment
There was a problem hiding this comment.
I have a major question and a minor change request.
rousskov
left a comment
There was a problem hiding this comment.
Generalizing the application of the source code formatting/adjusting scripts is probably a good idea. I left specific change requests in that direction.
I am not yet convinced that HERE/NULL-removal scripts belong to the official Squid repository. By committing them, we essentially promise that they work reliably for all custom Squid code. IIRC, none of the known removal scripts are that good and, more importantly, making them that good is a waste of resources. After the official code is stripped from HERE and NULLs: Relative to the long-term nature of the official repository files, few people will need to apply conversion scripts to few branches for a short period of time.
Instead of committing those temporary scripts to the official Squid repository, I suspect that we should automatically check that the new code does not contain HEREs, NULLs, and such. That check can mishandle a few esoteric cases without causing much harm. And it does not need to be in the official Squid repository.
|
On Wed, 1 Jan 2020 at 17:59, Alex Rousskov ***@***.***> wrote:
***@***.**** requested changes on this pull request.
Generalizing the application of the source code formatting/adjusting
scripts is probably a good idea. I left specific change requests in that
direction.
I am not yet convinced that HERE/NULL-removal scripts belong to the
official Squid repository. By committing them, we essentially promise that
they work reliably for all custom Squid code. IIRC, none of the known
removal scripts are that good and, more importantly, making them that good
is a waste of resources. After the official code is stripped from HERE and
NULLs: Relative to the long-term nature of the official repository files,
few people will need to apply conversion scripts to few branches for a
short period of time.
I disagree here, I see no such promise. It’s up to the users of the script
to ensure they work for the specific use case.
Instead of committing those temporary scripts to the official Squid
repository, I suspect that we should automatically *check* that the new
code does not contain HEREs, NULLs, and such. That check can mishandle a
few esoteric cases without causing much harm. And it does not need to be in
the official Squid repository.
It would be a manual process, and we will anyway be stuck in the middle of
the river for years, as shown by the last few years.
I'm in favour of biting the bullet and just going.
… ------------------------------
In scripts/source-maintenance.sh
<#531 (comment)>:
> @@ -66,7 +66,8 @@ for FILENAME in `git ls-files`; do
#
# Code Style formatting maintenance
#
- if test "${ASVER}"; then
+ awk -f ./scripts/maintenance.awk "${FILENAME}" >"${FILENAME}.awkbak" && mv "${FILENAME}.awkbak" "${FILENAME}"
We should iterate through individual scripts named using a hard-coded
pattern (e.g. ./scripts/maintenance-cpp-* or ./scripts/maintenance/cpp-*)
rather than dumping all adjustments into one script.
The script execution command should be made interpreter-neutral (rather
than forcing AWK usage for every adjustment script). The script itself
should supply its interpreter name (and invocation parameters).
Finally, existing script(s) should be adjusted to use the new approach.
------------------------------
In scripts/source-maintenance.sh
<#531 (comment)>:
> @@ -66,7 +66,8 @@ for FILENAME in `git ls-files`; do
#
# Code Style formatting maintenance
#
- if test "${ASVER}"; then
+ awk -f ./scripts/maintenance.awk "${FILENAME}" >"${FILENAME}.awkbak" && mv "${FILENAME}.awkbak" "${FILENAME}"
Please also avoid using the backup-associated bak extension for newly
created files. Something like .next or .future would be a lot more
clear/natural.
------------------------------
In scripts/maintenance.awk
<#531 (comment)>:
> @@ -0,0 +1,15 @@
+## Copyright (C) 1996-2019 The Squid Software Foundation and contributors
After the "apply all scripts that are named using a hard-coded pattern"
change request is implemented, this do-nothing file will not be needed in
*this* PR.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#531?email_source=notifications&email_token=ABHPVDFOMSYGONJTV4M2ITDQ3TKXJA5CNFSM4KBSP742YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQO46DA#pullrequestreview-337497868>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHPVDESI7XH2PWSP3WHZRDQ3TKXJANCNFSM4KBSP74Q>
.
|
I see two problems with this logic:
I am not sure which process you are referring to, but Amos proposes adding a script to remove HEREs and NULLs. That script will be executed by
Why?! The idea here, AFAICT, is to remove all NULLs and all HEREs from the supported branches in a few commits. If done right, this would take less than a week, even with all the special cases that should be done by hand. A developer can run the same script against their custom branches if they want to reduce future merge conflicts (and most developers either should not care about special cases or can fix them by cherry-picking official commits designated to those special cases). From a developer point of view, that will also take a few days. And if some developer sits on their custom branch for a few years without applying these changes, then they are clearly not integrating with recent official code and hence are not a concern. In summary, I see no "stuck for years" situation if we manage this correctly. This, BTW, is one of the reasons why I suggested removing NULLs and HEREs now, before branching v5. It would effectively remove the pain from two branches (v5 and master) at once, with no synchronization effort.
During the last few years we were manually removing NULLs and HEREs in new/modified code lines. That painful process is the opposite of what the script-driven process should do, so it is not a valid comparison. To end these pains, various developers, including Amos and I, proposed scripts that would auto-remove most NULLs and HEREs. This specific PR adds a placeholder to integrate those scripts with |
|
Perhaps we can attack the same problem from another direction: Let's remove NULLs and HEREs from v4 and master using a pair of stand-alone scripts and then see what happens. If there is sufficient demand for integrating those scripts with |
We promise that the scripts/source-maintenance.sh does what we intend it to, with the source open for review if anyone has doubts about what that is. We do not promise that nothing gets mangled in the sources (custom or official). So even we have to read the patch it creates before commit. Right now that is done by me manually before triggering the Format Enforcement PRs to official.
We already expect people running the script to review its results. Nothing changes in that regard. Outside of large PRs being committed with lots of formatting changes that output should be small or nil and restricted to the devs own touched code. There is a small change here in that these large PR events may be increased slightly by addition of new policy rules to the AWK script. We control that with the PR review process though, and the rule addition occurs first so we can use that as an easily identifiable rebase point for old code with generic how-to to reduce the pain.
Not exactly. I am proposing a script to do that kind of painful code edit less painfully. They are just the obvious examples.
That is not true. There is always manual review of its output needed (as mentioned above).
That was exactly the argument you used 9 years ago to argue for the painful manual way forward. Yet here we are, less than 1/7th of necessary removals completed. Somewhere between 56 and infinity years to go.
This is the argument I used at the v3.4, v3.5 and v4 branching time. Which you countered with how much work it would be for people with custom code forked off our repository. Irony. Anyhow, this script PR is not for HERE/NULL removal. It is about reducing pain for that type of change. Even if we block again on those particular macros there are other code snippets that can use it for much smaller updates. |
You are misrepresenting what happened 9 years ago. |
You are misrepresenting what your NULL/HERE removal suggestion was and, hence, why it was rejected. |
I agree. AFAICT, kinkie did not. My two points were meant to convince him to change (or clarify) his opinion. He has not done it yet, so we are still blocked on that key disagreement.
I disagree. The first promise is meaningless without this second component. "We promise that this guillotine cures your headache" is not a meaningful promise in my world (thought I admit that a person overly focused on literal/dictionary interpretations may find such a promise meaningful). By requiring developers to run a script, we implicitly promise that it will not do serious irreversible harm. The proposed addition is incompatible with that promise because it may remove or mangle a source file (which may have no copies).
Sure, but reversing a patch cannot recover lost modifications because the patch is generated against committed/staged sources rather than sources that existed before the recommended script invocation. We should promise that the last step in the following sequence is possible:
The above sketch is not a complete/final specification, but a good starting point (more on that below).
You are missing the context/point of my second bullet. Kinkie implied that users are responsible for figuring out whether the script is safe to run in their use case -- we make no promises, especially about their custom code. My point was that it would be unreasonable to task the user with such an analysis because we are talking not just about (currently non-existent) small awk script but the entire source code mangling behemoth. I am talking about impracticality of the pre-run analysis (implicitly required by the no-promise approach). You are talking about post-run review of damages. The related "manual process" discussion is similarly centered around the implication that users are going to apply some small manual action and, hence, can be tasked with making sure that the action script will do what they want. Your followup there appears to be missing that context/point -- we are talking past each other. I am skipping that part... As for "Nothing changes in that regard", we do (or at least should) expect a change: Developers should eventually be required (or at least strongly encouraged) to run This specific PR does not change the policy, of course, but one of the arguments attempting to justify this PR existence, AFAICT, is that other developers would be able to remove NULLs/HEREs (and alike) from their custom sources. That argument would be dead on arrival without the implication that we will start encouraging (or even requiring) developers to format their sources. Please do not get me wrong though: I think we should require proper source code formatting, and I welcome changes that would get us closer to that goal. The infrastructure currently sketched by this PR is a small step in the opposite direction. However, before fixing that problem, we have to agree that the formatting script should either avoid making irreversible changes to sources or should refuse to run on uncommitted/unstaged sources (i.e. relying on git to preserve any custom code before it is formatted). This intent needs further formalization/detalization than the 4-step sketch above, but until there is consensus that we promise no harm (for some reasonable definition of harm), there is no point in detailing this further! |
c6c3315 to
b22ac01
Compare
|
There's a number of topics that emerged in the conversation, I'll try to touch them all with my viewpoint in LIFO order:
|
FWIW, I probably disagree with your assertions in the first three bullets, unsure what the unstated implications of the fourth bullet are, and glad we may have an agreement on the basic principles behind the 4-step sketch (i.e. your last bullet). BTW, the same basic principles went into the recent #565. I believe that further discussion should be moved to IRC or a voice meeting because we are not making (enough) progress here and the disagreements appear to be rather vast. It would be good to narrow down the scope for that meeting agenda though, but we can try without it, especially if it is a voice meeting. |
b22ac01 to
c5f9d9b
Compare
|
To make some progress on this and the other PRs waiting on it I am switching to Alex design. |
rousskov
left a comment
There was a problem hiding this comment.
Thank you for moving this PR forward!
I hope it is possible to adjust the proposed plugin interface so that it supports existing formater.pl and sort-includes.pl plugins while remaining simple. Please let me know if you need help with that adjustment.
I also left a few specific change requests, mostly high-level ones. As an experiment, I am prefixing low-level polishing change requests with Nit: to emphasize that I consider the problem worth fixing but not worth arguing about. These markings also help elevate the status of other (high-level and/or otherwise important) change requests (that have no commonly-accepted marking AFAIK). You may want to start with those because they may invalidate/obsolete "nitpicking" ones.
c5f9d9b to
77e9dd6
Compare
|
FYI: I have updated my earlier review, resolving all the addressed change requests. It took a lot of time so I cannot promise a quick return, but please request my review when it is time for another round. Thank you. |
Alex insists that this style of placing each change in separate scripts is better than having one scritp do all changes to each source file despite the increase in file I/O open+read+write+close sequences from N to N*M when N is number of source code files and M the number of changes to be made.
exit propigates out to halt the entire script immediately. Preventing other file formatting following one issue.
|
As for #2 the reason is that you insisted we add arbitrary code support instead of just declarative AWK rules (#531 (comment)). There are at least two PRs coming to add scripts to this plugin directory. One that moves the sort-includes script there, and one that adds nullptr enforcement. |
Co-Authored-By: Amos Jeffries <yadij@users.noreply.github.com>
That was the reason you have implemented the general plugin support, yes, but that is not a valid justification for committing that support: Just because B is better than A does not mean we should commit B. There needs to be a good reason for committing either. I do not think we need to agree on this point though -- if you have a plan to add at least one or, better, two specific plugins, then the principle agreement on that plan is sufficient to move forward with merging this PR. I am just clarifying the logic behind my request for the principal agreement on plugins here; I am not trying to continue this part of the argument!
If you manage to move the existing "includes sorting" script into plugins directory, it would be sufficient to justify adding this general mechanism. When I looked at that script, it was impossible to move it without changing its functionality:
Are you going to drop and add those warnings during the move? Or are you going to enhance the plugin support so that it can differentiate between whitespace-only and non-whitespace changes? As for nullptr enforcement, are you going to exclude C headers and sources from being subject to these plugins? There are currently 600+ NULLs in C sources alone. AFAICT, C does not have |
I am not sure I understand what you were asking for. If there is something you want to be added to the description, please state clearly what you are wanting.
The sort-includes.pl script and how it is called is what this PR was modeled on. It can be moved as-is into the scripts/maintenance/ directory. All the audit requested changes to this PR apply equally to how that script was being called - including the arguments for warning on whitespace change, diff replacement of MD5, and not touching unstaged files.
Of course. Only C++ files will have the nullptr enforcement applied. I will leave more details on that to the relevant followup PRs. |
AFAICT, you imply that you expect that a future PR will change how includes are sorted in order to migrate sorting to the new plugin interface. That change is OK with me; I wanted to make sure that you realize that the existing sorting code is not compatible with the new plugin interface. I would prefer to have an agreement on multiple anticipated plugin examples, but one example is sufficient to justify merging this PR, so I am resolving my last change request. Please note that there is currently no agreement regarding NULL fixing, the current plugin code applies to C files, and the current plugin API makes it difficult for a plugin to distinguish C from C++ sources. I agree that we can discuss these problems separately. Hopefully, we will find a way to reach an agreement there. |
Thank you for addressing my change requests.
| exit 1 | ||
| fi | ||
|
|
||
| # On squid-cache.org we have to use the python scripted md5sum |
There was a problem hiding this comment.
This whole squid-cache.org block is rotten and is by now unnecessary. In fact, it hasn't worked for years.
There was a problem hiding this comment.
I know. It is out of scope though for this feature change.
kinkie
left a comment
There was a problem hiding this comment.
one nit and one comment, go ahead please!
| # Code Style formatting maintenance | ||
| # | ||
| if test "${ASVER}"; then | ||
| for SCRIPT in `git ls-files scripts/maintenance/`; do |
There was a problem hiding this comment.
Nit: why git ls-files and not just listing the files?
There was a problem hiding this comment.
This may not be a complete or even valid answer to your question, but:
git ls-filesignores files ignored by git. This is very helpful when editors and such create helper/temporary files inside our directory.- AFAICT,
git ls-fileshas better (thanls -R) support for listing files in sub-directories. We might group scripts in sub-directories of thescripts/maintenancedirectory.
There was a problem hiding this comment.
Pretty much what Alex said plus the benefit that it ignores objects which are not in the repository. So unintentional files are not going to be arbitrarily executed.
|
Was the author tag just supposed to indicated this was waiting on my reply to kinkie? or something else? |
The flag indicates that the ball is on your side. IIRC, when I set this flag, this PR was awaiting your response to Francesco and a clearance for merging. I will reset it to |
Allow the source-maintenance script to run arbitrary code or sub-scripts to perform enforcement of Squid code style and content. Code placed in the scripts/maintenance/ sub-folder MUST meet the following criteria: * be self-executable, * receive filename of the code file to be touched as one and only command-line parameter, * always dump the file contents to stdout (with or without edits), * not depend on any other code in this sub-folder being run first.
…he#531) Allow the source-maintenance script to run arbitrary code or sub-scripts to perform enforcement of Squid code style and content. Code placed in the scripts/maintenance/ sub-folder MUST meet the following criteria: * be self-executable, * receive filename of the code file to be touched as one and only command-line parameter, * always dump the file contents to stdout (with or without edits), * not depend on any other code in this sub-folder being run first.
Allow the source-maintenance script to run arbitrary code or sub-scripts to perform enforcement of Squid code style and content. Code placed in the scripts/maintenance/ sub-folder MUST meet the following criteria: * be self-executable, * receive filename of the code file to be touched as one and only command-line parameter, * always dump the file contents to stdout (with or without edits), * not depend on any other code in this sub-folder being run first.
Allow the source-maintenance script to run arbitrary code or sub-scripts
to perform enforcement of Squid code style and content.
Code placed in the scripts/maintenance/ sub-folder MUST meet the
following criteria:
command-line parameter,