Conversation
…ragment. (#7804) (may still rearrange the code before finalizing)
query that becomes unreasonably expensive for an installation with a lot of download/access activity. This solution is very efficient, but very PostgresQL-specific, hence the use of a stored function. (#7804)
…(but still work in progress; #7804)
…d users" in the comments, for some reason #7804)
…xternal curation labels PR. (#7804)
qqmyers
left a comment
There was a problem hiding this comment.
Looks good - lot's of changes but straight forward - adding/moving caching code.
pdurbin
left a comment
There was a problem hiding this comment.
Overall, this is looking good. I left some comments.
| -- This creates a function that ESTIMATES the size of the | ||
| -- GuestbookResponse table (for the metrics display), instead | ||
| -- of relying on straight "SELECT COUNT(*) ..." | ||
| -- Significant potential savings for an active installation. |
There was a problem hiding this comment.
Right now we see a precise number like "31,523,665 Downloads". Since the number will be estimated, does it mean it will be rounded? "32,000,000 Downloads"?
There was a problem hiding this comment.
This is a very good StackOverflow thread about this:
https://stackoverflow.com/questions/7943233/fast-way-to-discover-the-row-count-of-a-table-in-postgresql
There was a problem hiding this comment.
"COUNT(*)" is notoriously expensive in PostgresQL for large tables (brute force is the only counting option; so the larger the table, the longer it takes). So I'm operating on the assumption that nobody looking at the home page needs that number to be precise down to the last 3 or 4 digits.
An alternative solution would be to have some application-scope caching service calculate the exact count every hour or so (again, under the assumption that nobody should care whether the number is up-to-date in real time).
| if (checkDvoCacheForCommandAuthorization(dvo.getId(), CreateDataverseCommand.class, commandMap) == null) { | ||
| boolean canIssueCommand = false; | ||
| canIssueCommand = permissionService.requestOn(dvRequestService.getDataverseRequest(), dvo).canIssue(command); | ||
| logger.info("rerieved authorization for " + command.toString() + " on dvo " + dvo.getId()); |
There was a problem hiding this comment.
We should probably reduce to logger.fine.
| Map<Class<? extends Command<?>>, Boolean> newDvoCommandMap = new HashMap<>(); | ||
| commandMap.put(dvo.getId(), newDvoCommandMap); | ||
| return addCommandtoDvoCommandMap(dvo, command, newDvoCommandMap); | ||
| logger.info("using cached authorization for " + command.toString() + " on dvo " + dvo.getId()); |
|
|
||
| // This map stores looked up permission results (boolean), for multiple DvObjects | ||
| // (referenced by the Long ids), for multiple Commands. The current session | ||
| // user is assumed when lookups are performed. |
There was a problem hiding this comment.
If we're operating on the session does that mean that a user will now need to log out and log in again if they've been given a new role? (To be honest, I don't know if this was already required or not.) Or if a role was removed?
There was a problem hiding this comment.
When a user logs out and logs in as another user - I understand that the ViewScope of the wrapper ensures that no stale permissions are cached. But it's one of the things that need to be tested carefully.
If a role is revoked in real time, or something else changes on the permission side, the wrapper does continue caching the stale authorization status. This is a known possibility that has always been there. In most cases, the user will NOT be able to perform any tasks they are no longer authorized to. E.g., a link to edit the dataset may still be on the page, but an attempt to actually save any changes will fail, when the authorization is re-checked on execution of the UpdateDatasetVersionCommand, etc. But, there are also some exceptions.
There was a problem hiding this comment.
@pdurbin I'm not sure I understand where you ask "If we're operating on the session". Why do you say we're operating on a session? (this bean being view scoped)
| boolean canIssueCommand = false; | ||
| try { | ||
| canIssueCommand = permissionService.userOn(AuthenticatedUsers.get(),dataverse).canIssueCommand("CreateNewDatasetCommand"); | ||
| logger.info("rerieved auth users can create datasets"); |
There was a problem hiding this comment.
Typo ("rerieved") and this should be logger.fine.
| if (checkDvoCacheForCommandAuthorization(dataverse.getId(), CreateNewDatasetCommand.class, authUsersCommandMap) == null) { | ||
| boolean canIssueCommand = false; | ||
| try { | ||
| canIssueCommand = permissionService.userOn(AuthenticatedUsers.get(),dataverse).canIssueCommand("CreateNewDatasetCommand"); |
There was a problem hiding this comment.
It looks like this canIssueCommand method is deprecated and the only usages are this line and the one below. Should we consider deleting this deprecated method (since the only usages are in this pull request) and figuring out how to use non-deprecated methods? I hope this means no longer passing a string like "CreateNewDatasetCommand" which seems suboptimal if we ever want to rename that command.
There was a problem hiding this comment.
I'm not seeing an obvious best way to do this even on a second look...
Wondering if that was the reason these instances of this deprecated method were left in the code, when used in this context. (Again, this is not a typical permission lookup, in that we are checking authorization for an entire Group - of AuthenticatedUsers as a whole).
The @deprecated comment - "use DynamicPermissionQuery instead" may not apply to this case... I'm assuming by "Dynamic" it means "RequestPermissionQuery"... but those work entirely on DataverseRequests, as the name suggests. And a DataverseRequest in turn needs a User, while AuthenticatedUsers is a Group.
However, if the hard-coded command name strings are the main concern, the above can be replaced with permissionService.isUserAllowedOn(AuthenticatedUsers.get(), CreateDataverseCommand.class, dataverse); etc.
...the only catch is that .isUserAllowedOn(...)is also marked as deprecated.
I'm beginning to think that we should not worry much about the "deprecated" part on its own... The reason it's deprecated is because it's not going to catch any request-level group permissions - i.e. ip groups - but it should still be entirely safe to call it on AuthenticatedUsers.get()... I'll run it by @scolapasta too.
There was a problem hiding this comment.
OK, I'm going to go ahead and implement the change described in the last comment - switch to a method that does not require passing the command name as a literal string; that would allow me to further streamline the code in the wrapper, and make the handling of these AuthenticatedUsers group lookups more similar to how "normal" user authorizations are handled in there.
We'll discuss the "deprecated" part of it separately, but I'm feeling more confident that it's not an issue, under the circumstances.
There was a problem hiding this comment.
@scolapasta I'm moving the PR back into CR. I feel it should be ready for QA; as I have addressed everything that was brought up. But could you please review this particular thread. I addressed some of the above - I got rid of what @pdurbin pointed out was the biggest problem - the hard-coded command names in the code, old-style. But I am still relying on (another) deprecated method.
Please see if my understanding and justification of why it should be ok in this particular situation makes sense. And/or if I overlooked a non-deprecated simple way of achieving the check in question.
| canIssueCommand = permissionService.userOn(AuthenticatedUsers.get(),dataverse).canIssueCommand("CreateDataverseCommand"); | ||
| logger.info("rerieved auth users can create dataverses"); | ||
| } catch (ClassNotFoundException ex) { | ||
| logger.info("ClassNotFoundException checking if authenticated users can create dataverses in dataverse."); | ||
| } | ||
| addCommandAuthorizationToDvoCache(dataverse.getId(), CreateDataverseCommand.class, authUsersCommandMap, canIssueCommand); | ||
| } else { | ||
| logger.info("using cached authUsersCanCreateDataversesInDataverse result"); |
There was a problem hiding this comment.
See comments above about canIssueCommand being deprecated.
Also, "rerieved" typo and logger.fine would be better, I think.
| } | ||
| return rsyncDownload; | ||
| } | ||
|
|
There was a problem hiding this comment.
Maybe add a comment here and in the corresponding isRsyncOnly method in SystemConfig.java that the logic appears in both places. I could see these two methods drifting apart over time, getting out of, well, sync, if you will.
Perhaps this also applies to other methods. It looks like the wrapper used to call into SystemConfig and now the logic is duplicated. This one just stood out to me with its split and regex.
There was a problem hiding this comment.
OK, I just realized that this hasn't been addressed yet. Yes, quite a few of these methods are now duplicated between the 2 places. I had a reason to choose to have these methods implemented inside the SettingsWrapper (as opposed to just calling SystemConfig and then caching the result). I can explain... but I'll take a whack at reducing the duplication first.
There was a problem hiding this comment.
OK, so my original rationale for duplicating these methods was pure penny-, or rather, query-pinching. Performing a check like this (that relies on looking up a single Setting) in the wrapper - as opposed to calling SystemConfig and caching the result - saves 1 (One) query total. (Because the wrapper already has all the settings in memory, looked up all at once; and SystemConfig will in fact have to look up that setting in the database.
After way more deliberation than this issue deserved, I am undoing the duplication of logic in this, and a couple of other methods. These few query lookups - from a small, heavily cached table - are not worth the drawbacks of duplicating the code, and potentially ending up with the pages and the API getting different configuration results.
Saving these queries may be achieved in the future via some application-scoped caching mechanism, as mentioned elsewhere in the PR.
| <o:resourceInclude path="/CustomizationFilesServlet?customFileType=header" rendered="#{!widgetWrapper.widgetView}"/> | ||
| <o:importFunctions type="edu.harvard.iq.dataverse.authorization.groups.impl.builtin.AuthenticatedUsers" /> | ||
| <ui:param name="showAddDataverseLink" value="#{permissionServiceBean.userOn(AuthenticatedUsers:get(),dataverseServiceBean.findRootDataverse()).canIssueCommand('CreateDataverseCommand') }"/> | ||
| <ui:param name="showAddDatasetLink" value="#{permissionServiceBean.userOn(AuthenticatedUsers:get(),dataverseServiceBean.findRootDataverse()).canIssueCommand('AbstractCreateDatasetCommand') }"/> |
There was a problem hiding this comment.
Here are a couple places where we seem to be getting rid of the deprecated canIssueCommand method. Great.
Probably what's happening here is that they were moved to the backing bean. Please see my comment above.
There was a problem hiding this comment.
So no, we are not getting rid of the deprecated canIssueCommand - it has simply been moved into the wrapper, and is now only executed once, instead of repeatedly.
I'll look into a new, non-deprecated method of doing this.
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
…tedUsers, that does not require passing the command name as a literal string; the next step is to streamline the code duplicated between the 2 methods, similarly to how canIssueCommand() for regular users is organized in the wrapper - now that we can. (#7804)
…/dataverse into 7804-dataverse-page-speedup
| public boolean isRootDataverse() { | ||
| return dataverse.getOwner() == null; | ||
| public boolean isRootDataverse() { | ||
| return dataverse == null ? false : dataverse.getOwner() == null; |
There was a problem hiding this comment.
why was this changed? would dataverse ever be null here? (I'm not a big fan of checking for null when something should never be null and it only hides an error elsewhere in the code).
There was a problem hiding this comment.
I believe I did it for no specific reason, out of habit. I somehow still had that assumption, that it's generally a good idea always. But you are right of course; reversing.
... there is that "create" mode, where we don't have a dataverse... but I guess it is safe to say, that if the page tries to evaluate isRootDataverse() in create mode, it should be considered an error/problem that should not be hidden.
| return dataverse == null ? false : dataverse.getOwner() == null; | ||
| } | ||
|
|
||
| // Wondering what this method is for - what would be the situation, where |
There was a problem hiding this comment.
@landreev at quick glance, we do have the ownerId as a parameter when you are in createMode (since that parameter tells you where to create the new dataverse)
|
|
||
| // This map stores looked up permission results (boolean), for multiple DvObjects | ||
| // (referenced by the Long ids), for multiple Commands. The current session | ||
| // user is assumed when lookups are performed. |
There was a problem hiding this comment.
@pdurbin I'm not sure I understand where you ask "If we're operating on the session". Why do you say we're operating on a session? (this bean being view scoped)
| private String guidesBaseUrl = null; | ||
|
|
||
|
|
||
| private String siteUrl = null; |
There was a problem hiding this comment.
I wonder if instead of having a field for each of these, it could make sense to store them in a map. Of course the code calling into the map would need to know what type of object to expect and cast appropriately (since the map would be of <String,Object>
There was a problem hiding this comment.
@scolapasta the bean is VIewScoped, but when the permission lookups are performed, the current session is used to identify the user... - so it is "operating on the session" in that sense, maybe?
There was a problem hiding this comment.
So, just to be clear, a user should not have to log out/in for any permissions changes - view scoped beans will be new when you first go to a page; though it is kept while performing any ajax calls on the page (as opposed to request scoped). So if there is a call to the permissions or settings wrapper that happens during an ajax call (e.g. switching tabs),. then user will not see any role changes. But if they reload the page, they would.
There was a problem hiding this comment.
I'm not sure what's going on here; my comment above was supposed to be in another thread - in response to where you were asking
@pdurbin I'm not sure I understand where you ask "If we're operating on the session"
Sorry for any potential confusion. We are in agreement otherwise. (but, based on your reply, you knew which discussion it belonged to; it looks like github mixes up comment threads occasionally).
As for you original question, above:
I wonder if instead of having a field for each of these, it could make sense to store them in a map.
That was how I started implementing it. But then realized that it would not end up saving much in terms of code logic; type handling, that you mention above, and/or having to define special map keys for values that are not settings, etc.
|
Issues found:
Done with functional testing, will do a brief stress test. |

What this PR does / why we need it:
This PR should make the home/dataverse page snappier.
This is not by any means an exhaustive optimization of everything that could be improved. More like picking the most low-hanging fruit/fixing the more glaring inefficiencies. But this should be a good start.
Quantifiable results (testing the dataverse page on the root collection using a copy of the IQSS prod. database on the perf. cluster, w/ the 10 "official benchmark datasets", comparing to 5.6):
The number of database queries issued in order to load the page via a browser is reduced 3X (from 900+ to below 300);
The average time it takes to load the page, when accessing the page repeatedly from the command line w/ curl is reduced 2X.
It is harder to predict how much this will improve the performance of the page in actual IQSS production. The average load time will always be higher, or much higher, in prod. when it's under regular load; in absolute number of seconds. The improvement ratio should be of the same order (I know that some queries that are dropped in this PR take much longer on the prod. RDS than on the test cluster db). On top of that, it's harder still to predict how much better this new build will behave when the application is bogged down by something happening outside of the dataverse page itself... The only way to find out for sure will be to deploy this in our production.
As a side effect of this PR - mostly on account of the fixes in the dataverse_header.xhtml fragment - all the other pages that are loading the fragment will be issuing fewer queries too. There's a possibility that other improvements made here could be applied to other pages and components, but that should be done under separate issues.
Certain potential improvements were discussed over the course of working on this PR, but are NOT being addressed in it. For example, various application-wide caching solutions (as opposed to/in addition to the viewscoped caching wrapper services we already have in place). We will continue investigating these ideas under separate dedicated issues.
Which issue(s) this PR closes:
Closes #7804
Special notes for your reviewer:
Suggestions on how to test this:
Mostly we need to carefully retest the dataverse page in all its various modes. Since most of the changes were connected to the rendering logic in the various jsf components of the page, we need to ensure that all the links and buttons are still there, that users still can do what they are authorized to do, and the other way around. I know this sounds like "re-test everything" - sorry.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: