-
-
Notifications
You must be signed in to change notification settings - Fork 479
[WFCORE-7335] Block deployments if the security manager subsyste… #6498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
| maximumPermissionsNode = MAXIMUM_PERMISSIONS.resolveModelAttribute(context, deploymentPermissionsModel); | ||
| } catch (ExpressionResolver.ExpressionResolutionUserException ex) { | ||
| // TODO: better exception choice ? | ||
| throw (System.getSecurityManager() != null) ? new RuntimeException(ex) : ex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just throw the original exception.
Don't use an exception type to try and influence the behavior of the operation; use OperationContext.setRollbackOnly().
I also don't think throwing a RuntimeException would abort the boot anyway. :) It would roll back a change after boot, but would not abort boot. OperationContext.setRollbackOnly() aborts the boot.
Please include a quick comment explaining the rationale for doing that; i.e. don't make people use git blame and research the JIRA.
Also, the current code at L84 throws an exception in a similar situation (detection of invalid config), so if the goal is to abort the boot, the same SM check + OperationContext.setRollbackOnly() seems applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info, I was not aware of the OperationContext.setRollbackOnly(), which seems to solve the issue.
With that said, yesterday I was trying to come up with an Integration test, something similar to https://github.com/wildfly/wildfly/blob/68bc534d9ef0adf349c10b278d7a310d429763ab/testsuite/integration/vdx/src/test/java/org/wildfly/test/integration/vdx/standalone/SmokeStandaloneTestCase.java. (groovy file for modifying the standalone.xml with invalid config, checking logs...)
But it seems that I had some issues regarding setting up -secmgr and pretty much checking halting/non-halting asserts. Do you think I am on the right track (vdx/standalone), or is there a better test solution that you can direct me to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test wouldn't belong in full testsuite/vdx. That testsuite module is only about testing handling of reporting of XML errors. This isn't really a matter of an XML error.
This seems like something that can be tested in WildFly Core's testsuite/manualmode. It needs to be in manualmode because that module is the place for tests that need to control the entire lifecycle of the server. I assume you need to do that as the test will cause the server VM to exit.
Have a look at the org.jboss.as.test.shared.AssumeTestGroupUtil class, which allows a test to see if it is running with the security manager enabled. You don't need to worry about turning on the SM. WF Core runs different test jobs, and PRs get tested with a job that has the SM one. (See the status block below.) There are also nightly jobs with the SM enabled. So just write the test to always run and use AssumeTestGroupUtil to control the behavior of the test based on whether the SM is enabled. If there is nothing useful to test if the SM is not enabled, then throw an AssumptionViolatedException if AssumeTestGroupUtil.isSecurityManagerDisabled() == true.
It's possible you don't need a broken config in the initial boot of the server. If not, that may give you more flexibility. The test can reload an existing server into admin-only mode, add a config that will fail in a normal boot, and then reload into normal mode. And then check what happened (plus clean up). What you are adding here are checks that will not execute in an admin-only server. The performBoottime method should not execute in admin-only.
See also the org.jboss.as.test.shared.logging package in testsuite/shared for utilities that can be used to capture logging data. I assume what the test would need to do is look at logging data. The reload will fail but I'm pretty sure you can't get useful data from any failure response. It would be in the log.
4e3494a to
30d873e
Compare
30d873e to
be19eb9
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
be19eb9 to
256dbcf
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
yersan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managing the life cycle of the server in the test suite is a bit tricky sometimes, and it could influence on other tests, so I've added some comments about what I think could mitigate it
...anager/src/main/java/org/wildfly/extension/security/manager/SecurityManagerSubsystemAdd.java
Outdated
Show resolved
Hide resolved
...st/java/org/jboss/as/test/manualmode/secman/UnresolvedExpressionSecurityManagerTestCase.java
Outdated
Show resolved
Hide resolved
...st/java/org/jboss/as/test/manualmode/secman/UnresolvedExpressionSecurityManagerTestCase.java
Outdated
Show resolved
Hide resolved
256dbcf to
fa42e22
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
It seems it did not help too much, but avoiding the thread.sleep is good. I have not looked closely, but I wonder if reading the logs to assert the error IDs gets confused because of traces added by other tests ... |
This comment was marked as outdated.
This comment was marked as outdated.
Yes, that is very likely; the logs from the failed test do not contain |
fa42e22 to
39d7a16
Compare
|
Ok, updated the tests to use a custom log file instead of a shared Initially, I wanted to search for the expression error and the fatal shutdown log. But for some odd reason, I could not get a So I had to try a different solution, and substitute the
With that said, I am not 100% sure if the Edit: it seems like one of the jobs failed due to concurrent file read between jobs (?) |
|
@yersan, since there seems to be an issue regarding Windows FD handling (which I am not entirely sure how to fix). I am considering changing the test approach slightly by not testing the logs at all, but rather by testing if the reload failed, such as using WDYT? |
yersan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mskacelik The testsuite is using the log facility in other tests, so the approach by looking at the logs should work.
...st/java/org/jboss/as/test/manualmode/secman/UnresolvedExpressionSecurityManagerTestCase.java
Show resolved
Hide resolved
| if (client != null) { | ||
| logHandlerSetup.tearDown(client); | ||
| client.close(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the problems on Windows, see this note in the TestLogHandlerSetupTask
Lines 95 to 97 in 035d412
| // remove file, note this needs to be done after the operations have been executed as we need to ensure that | |
| // no FD's are open. This can be an issue on Windows. | |
| Files.deleteIfExists(logPath); |
So, it looks like the test could not be correctly triggering down the TestLogHandlerSetupTask. So, maybe the test is arriving here with an invalid client variable, which drives us to the code at L167:
client = TestSuiteEnvironment.getModelControllerClient();
You are modifying the local client reference, but not the one defined as class field instance.
Not sure if that would resolve the issue you currently have, but it seems it could be related.
I suggest not to pass a client variable to the cleanupSecurityManagerSubsystem method; always use the class instance client variable instead to ensure that the client variable that arrives at UnresolvedExpressionSecurityManagerTestCase#tearDown() has the expected value so you can tear down the TestLogHandlerSetupTask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that parameter should not be there; that is from a previous implementation using try-with-resources.
I also changed a test a bit (checking a WFLYCTL0193 instead of checking number of occurrences of start log)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, it did not resolve the Windows issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been taking a look, but I have not found anything at first glance that can give me a hint about why in your case, when the log is torn down, Windows complains about the log file being in use.
The InstallationManagerBootTestCase is using the same setup to configure the logs, and there are no issues there. This needs a closer look, maybe configuring and removing the log on each test instead of at the end of the all tests ... but I'm not sure why it is complaining on Windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe configuring and removing the log on each test instead of at the end of the all tests ...
Is that not what the current tests are doing? (i.e., JUnit4's @Before and @After).
But it seems that outside of the Windows tests even the Linux tests failed, due to java.io.IOException: WFLYPRT0054: Channel closed.
The InstallationManagerBootTestCase is using the same setup to configure the logs, and there are no issues there
Maybe since the runtime is halting, reloading, and changing from admin to non-admin mode, it has some weird testing behaviors. The same way I find weird that there is a need for the setUpLogDirPath and tearLogDirPath... (unlike the other tests).
But there is the * new PropertyPermission("jboss.server.log.dir", "read") in the note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked the logging handling mechanism a bit + change the step order:
from: wrong expression config -> trigger boot -> verify logs -> fix config
to: wrong expression config -> trigger boot -> fix config -> verify logs
So lets see what it will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, at least all the other jobs passed, but the Windows job failure is rather unusual. Not really sure why it's happening and why it's not happening to other tests that are similar to this. To the best of my understanding, Windows tests are executed without a security manager, so there should be no unusual behavior in the test results...
eba33e4 to
3267bbb
Compare
3267bbb to
6f47df4
Compare
| // remove file, note this needs to be done after the operations have been executed as we need to ensure that | ||
| // no FD's are open. This can be an issue on Windows. | ||
| Files.deleteIfExists(logPath); | ||
| // small retry/backoff loop to handle lingering OS file locks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good try, but this is not going to work. It seems the main issue here is that the file is still in use by the running server. You are invoking logHandlerSetup.tearDown(client) before the container gets stopped, so delaying the deletion here is not going to take any effect. Calling it before the container gets stopped is expected, indeed you need to connect to the server and rollback the changes. We need to understand why in other tests, the file is deleted without issues even if the server is running, and in this test, it is not.
If we try to workaround it, there are chances where we could be putting problems under the carpet that in the future will arise again, so I would avoid it.
I understand this is a test issue, and it is not related to the fix itself, so if we do not find the root cause, we could assume the error by catching the exception in your test case and creating a follow up Jira with the intention of being fixed sooner than later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just an attempt, but yes I agree with you. Going to investigate further.
6f47df4 to
3e056e6
Compare
… invalid config and the security manager is enabled.
df01095 to
9307336
Compare
|
Core -> Full Integration Build 14797 outcome was FAILURE using a merge of 9307336 Failed tests |
|
Just to check this looks still blocked by the testsuite? |
Yes, beacuse of the failed Windows job. I had a chat with @.yersan about it that we will try to investigate the test manually in VM, but because of PTOs (among other things) we didn't get a chance for it. |
/cc @darranl
issue: https://issues.redhat.com/browse/WFCORE-7335
This is only a draft PR to showcase a possible solution.
When
standalone.xmlsecurity manager subsystem configuration is invalid, f.g:Upon WF instance startup, the
ExpressionResolverImpl.javawill log and throw an exception of typeOperationClientExceptionwildfly-core/controller/src/main/java/org/jboss/as/controller/ExpressionResolverImpl.java
Line 282 in 70bb2a1
OperationClientExceptionmeans that:This exception is then handled in the
AbstractOperationContext:wildfly-core/controller/src/main/java/org/jboss/as/controller/AbstractOperationContext.java
Lines 1074 to 1084 in 70bb2a1
So here is the fundamental problem.
Possible Solution
I have come up with a solution by wrapping the
OperationClientExceptionof the expression resolver in theSecurityManagerSubsystemAdd, which, from my understanding, is only executed during the start-up of WildFly (boot). This wrapped exception is not handled in theAbstractOperationContext, making the WildFly startup fail due to the exception.So, in the current implementation with invalid configuration:
-secmgr)-secmgr=> logs the error, but the boot won't fail-secmgr=> logs the error, but boot will fail--admin-onlymode => both CLI and booting (invalidXML) won't fail (with or without-secmgr)Note
--admin-onlymode behavior valid in this case?wildfly-corerepository or in thewildfly(repository) integration tests.RuntimeException, but maybe other exceptions would be suited better.