Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Sep 27, 2016

What changes were proposed in this pull request?

Added a overloaded method getApplicationInfoList(limit: Int) in UIRoot so that it can truncate list of applications before doing the transformation (ApplicationsListResource.appHistoryInfoToPublicAppInfo).

How was this patch tested?

manual tests

* @param limit the number of applications to return
* @return List of known applications with a limit.
*/
def getApplicationInfoList(limit: Int): Iterator[ApplicationInfo] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use Integer.MaxValue to mean unlimited, then just the call to limit() works as intended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it is better to add a default value here indicating unlimited so that the following method can be totally removed
def getApplicationInfoList: Iterator[ApplicationInfo]

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page was out of date when I commented below so I missed this, but this is what I was suggesting below

@QueryParam("limit") limit: Integer)
: Iterator[ApplicationInfo] = {
val allApps = uiRoot.getApplicationInfoList
val allApps = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you can do with a one-liner, by adding .view here? uiRoot.getApplicationInfoList.view?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view is good point.
However, it doesn't apply here because uiRoot.getApplicationInfoList.view is too late and the transformation of the whole list within getApplicationInfoList() still happens before the .view method. Changing getApplicationInfoList to return a view is not a good idea since it breaks its original intention.

To make it one line, I can change the method signature to getApplicationInfoList(limit: java.lang.Integer) which is a little bit ugly.

Copy link
Member

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a nice catch after I missed it. I've added a handful of comments, this fix can be minimized a bit.

*/
def getApplicationInfoList(limit: Int): Iterator[ApplicationInfo] = {
val list =
if (limit <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for this if/else, the take function has this behavior built in.

getApplicationList.take(limit)
}

list.iterator.map(ApplicationsListResource.appHistoryInfoToPublicAppInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take can be inlined here due to the reason above

private[spark] trait UIRoot {
def getSparkUI(appKey: String): Option[SparkUI]
def getApplicationInfoList: Iterator[ApplicationInfo]
def getApplicationInfoList(limit: Int): Iterator[ApplicationInfo]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the limited uses of getApplicationInfoList it would make more sense to me to just add the limit param it with a default value (limit=-1 would return the whole list)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value doesn't help because

  1. The parameter limit that you have added in /applications endpoint is an Integer. So it can be null. When it is null, the default value won't help. Still need to differentiate null with not null.
  2. When limit is a negative number, take(-1) is same as take(0) which returns an empty list. So I still need to judge whether it is less than or equals to 0.

Copy link
Member

@ajbozarth ajbozarth Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The null check occurs at the endpoint though so it doesn't affect this function.
  2. take(-1) and take(0) don't return the same thing, any negative input to take returns the whole list, this was checked in my previous pr and it is the default value in historypage.js

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok somewhere in my testing I made a misassumption, but having the default be MaxInt would work fine too

Copy link
Member Author

@wgtmac wgtmac Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking!
BTW, I've made the changes according to your suggestions.

@ajbozarth
Copy link
Member

So some of my previous comments seemed to have been misinformed but overall I think replacing the current getApplicationInfoList with getApplicationInfoList(limit: Int) with a set default value (such as MaxInt) would be a preferable solution.

val includeCompleted = adjStatus.contains(ApplicationStatus.COMPLETED)
val includeRunning = adjStatus.contains(ApplicationStatus.RUNNING)
val appList = allApps.filter { app =>
allApps.filter { app =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've just realized a fundamental issue with this code change. This filter is being done after the limit now. This mean if you limit to say 100 app but set the date to a year ago you'd expect 100 apps from a year ago, but this instead will return apps from a year ago from the last 100 apps, which for a busy server would be 0.
This means the limit must be run after this filter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I missed that part. Very good suggestion.

Now I've made changes to return a view of the list so that we can filter at first and then take the limit. The previous getApplicationInfoList remains unchanged.

Thoughts?

@ajbozarth
Copy link
Member

If this passes tests, LGTM

@vanzin
Copy link
Contributor

vanzin commented Sep 28, 2016

ok to test

@SparkQA
Copy link

SparkQA commented Sep 28, 2016

Test build #66003 has finished for PR 15248 at commit 6c84507.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@QueryParam("limit") limit: Integer)
: Iterator[ApplicationInfo] = {
val allApps = uiRoot.getApplicationInfoList
val numApps: Int = if (limit != null) limit else Integer.MAX_VALUE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option(limit).getOrElse(Integer.MaxValue) is probably a little more idiomatic. At least, using the Scala constant is.

*/
def getApplicationList(): Iterable[ApplicationHistoryInfo] = {
provider.getListing()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should getApplicationList() even exist? it sounds like you'd never really want to get the whole materialized list of apps. It looks like it's either called and filtered, or called for its iterator. Is a deeper improvement here to just return an iterator from the provider?

Looking deeper, I suppose one issue here is that the underlying data structure this comes from is mutable and changing. But then I think that means any view-based or iterator-based result runs the risk of traversing it while it's modified. Hm.

Maybe it's still no worse to do so, to return an iterator and immediately filter to get the right subset copy, and proceed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that getApplicationList() can be removed because SparkUI class is still using this API though deleting it is just more pieces of code.

Yes, both original approach and view-based approach are suffering the risking of reading data from a mutable data source. However, I took a look at the code. The iterator (or the view) returned is from the mutable LinkedHashMap applications of FsHistoryProvider. It is only changed via FsHistoryProvider.mergeApplicationListing() by merging the original applications map with newly-updated apps to a new LinkedHashMap and change the applications reference to it. So the returned iterator or view is still binding to the old LinkedHashMap applications which is safe. I may be wrong, can you confirm?

I'll think about using the iterator returned from getApplicationList() as you said in the last comment.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen I have deleted getApplicationListView() and now it directly uses iterator to do the filter and map. Let me know if you have any concern. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally I found this approach also have some issues. Both live UI and history UI depend on /api/v1/applications endpoint which uses UIRoot class to get the app list. UIRoot has two subclasses: SparkUI and HistoryServer. SparkUI's getApplicationList() returns iterator of ApplicationInfo directly, and HistoryServer's getApplicationList() needs to map ApplicationHistoryInfo to ApplicationInfo for the whole list which is the most time-consuming part. So it is unavoidable to do the transformation for the whole list if we want to return Iterator[ApplicationInfo] from getApplicationList(). I can add a method to return Iterator[ApplicationHistoryInfo] in UIRoot class, but it makes no sense for SparkUI to implement this irrelevant method. Finally I decided to add back the getApplicationListView() method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me ask a different question -- when would you ever want to call getApplicationInfoList, given this problem? the other usages seem to just pick out a single element, so it's wasteful for the same reason. It just seems weird to have two methods for this.

The existing method already only promises an Iterator which seems like the right thing to do. The problem is that it gets there by materializing a whole collection. Why is that unavoidable -- you can map() an iterator's elements without any evaluation of its elements or needing to first materialize the collection.

That's why I think getApplicationInfo is really what needs to change to provide something else like an Iterator, and that ends up being pretty easy -- FsHistoryProvider just has to give an iterator over applications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note parallel to what @srowen is saying here, if you look at this change alongside your other pr #15247 getApplicationInfoList is actually no longer used. So either change how it works to avoid adding the new view function or deleting it completely would be good options

@SparkQA
Copy link

SparkQA commented Sep 28, 2016

Test build #66063 has finished for PR 15248 at commit d81d9d5.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2016

Test build #66062 has finished for PR 15248 at commit a07ff90.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66064 has finished for PR 15248 at commit 07b78a3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66073 has finished for PR 15248 at commit 4ea20b5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66079 has finished for PR 15248 at commit 5a29919.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants