Skip to content

Comments

Vector tile 2.x, sort during#42

Merged
mapsam merged 22 commits intomasterfrom
vector-tile-2x
Dec 1, 2017
Merged

Vector tile 2.x, sort during#42
mapsam merged 22 commits intomasterfrom
vector-tile-2x

Conversation

@mapsam
Copy link
Contributor

@mapsam mapsam commented Nov 22, 2017

Integrating vector tile 2.x and a "sort during"

What changed?

  • using a std::priority_queue for the ResultObject references, which is sorted upon emplacing into the queue, rather than sorting after the entire loop finishes
  • integrate vector-tile 2.x branch to decode geometry and properties for us (removes the geometry_processors.cpp file) - includes a new visitor function for decoding variants into v8 values
  • adds a separate code path for radius 0.0, resolves radius=0 leads to zero results with PIP #36 (closing Check for "within" results from closest_point #41)

benchmarks

this branch

UV_THREADPOOL_SIZE=1 node bench/vtquery.bench.js --iterations=1000 --concurrency=1

1: pip: many building polygons ... 956 runs/s (1046ms)
2: pip: many building polygons, single layer ... 1053 runs/s (950ms)
3: query: many building polygons, single layer ... 928 runs/s (1078ms)
4: query: linestrings, mapbox streets roads ... 1709 runs/s (585ms)
5: query: polygons, mapbox streets buildings ... 935 runs/s (1069ms)
6: query: all things - dense single tile ... 510 runs/s (1962ms)
7: query: all things - dense nine tiles ... 65 runs/s (15418ms)
8: elevation: terrain tile nepal ... 849 runs/s (1178ms)
9: geometry: 2000 points in a single tile, no properties ... 2053 runs/s (487ms)
10: geometry: 2000 points in a single tile, with properties ... 1645 runs/s (608ms)
11: geometry: 2000 linestrings in a single tile, no properties ... 1086 runs/s (921ms)
12: geometry: 2000 linestrings in a single tile, with properties ... 952 runs/s (1050ms)
13: geometry: 2000 polygons in a single tile, no properties ... 690 runs/s (1450ms)
14: geometry: 2000 polygons in a single tile, with properties ... 643 runs/s (1556ms)

master

UV_THREADPOOL_SIZE=1 node bench/vtquery.bench.js --iterations=1000 --concurrency=1

1: pip: many building polygons ... 1100 runs/s (909ms)
2: pip: many building polygons, single layer ... 1248 runs/s (801ms)
3: query: many building polygons, single layer ... 356 runs/s (2806ms)
4: query: linestrings, mapbox streets roads ... 685 runs/s (1459ms)
5: query: polygons, mapbox streets buildings ... 362 runs/s (2766ms)
6: query: all things - dense single tile ... 242 runs/s (4127ms)
7: query: all things - dense nine tiles ... 39 runs/s (25482ms)
8: elevation: terrain tile nepal ... 947 runs/s (1056ms)
9: geometry: 2000 points in a single tile, no properties ... 1410 runs/s (709ms)
10: geometry: 2000 points in a single tile, with properties ... 511 runs/s (1956ms)
11: geometry: 2000 linestrings in a single tile, no properties ... 875 runs/s (1143ms)
12: geometry: 2000 linestrings in a single tile, with properties ... 423 runs/s (2366ms)
13: geometry: 2000 polygons in a single tile, no properties ... 715 runs/s (1398ms)
14: geometry: 2000 polygons in a single tile, with properties ... 378 runs/s (2646ms)

cc @flippmoke

v8::Local<v8::Object> tilequery_properties_obj = Nan::New<v8::Object>();
for (auto const& prop : feature->properties) {
properties_obj->Set(Nan::New<v8::String>(prop.first).ToLocalChecked(), get_property_value(prop.second));
set_property(prop, properties_obj);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flippmoke this loop never ... loops. Am I referencing feature->properties correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

v8::Local<v8::Object> tilequery_properties_obj = Nan::New<v8::Object>();
tilequery_properties_obj->Set(Nan::New("distance").ToLocalChecked(), Nan::New<v8::Number>(feature->distance));
std::string og_geom = getGeomTypeString(feature->original_geometry_type);
tilequery_properties_obj->Set(Nan::New("geometry").ToLocalChecked(), Nan::New<v8::String>(og_geom).ToLocalChecked());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flippmoke The original geometry type string shows up in the final GeoJSON, but layer_name doesn't (one line lower). Not sure why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed

@mapsam
Copy link
Contributor Author

mapsam commented Nov 22, 2017

This is currently blocked by mapbox/spatial-algorithms#21

@mapsam
Copy link
Contributor Author

mapsam commented Nov 27, 2017

Spatial algorithms ☝️ branch has been merged and has a dev release on Mason. Onward.

This was referenced Nov 27, 2017
@mapsam
Copy link
Contributor Author

mapsam commented Nov 27, 2017

Looks like -weffc++ errors are throwing:

../src/vtquery.cpp:266:49: error: comparing floating point with == or != is unsafe [-Werror=float-equal]
                         if (cp_info.distance == 0.0) {
                                                 ^~~

but it looks like this isn't a big deal according to SO. I imagine I can bypass this by reversing the conditional to something like if (cp_info.distance >= 0.0) { ... }. Is this what you would do @flippmoke?

@mapsam mapsam mentioned this pull request Nov 27, 2017
12 tasks
@mapsam mapsam changed the title [WIP] Vector tile 2.x, sort during Vector tile 2.x, sort during Nov 27, 2017
@mapsam mapsam requested a review from springmeyer November 27, 2017 22:51
@springmeyer
Copy link
Contributor

springmeyer commented Nov 28, 2017

@mapsam - awesome work on this. Did a quick review and here are my thoughts:

  • do you have test coverage of sorted results? This PR looks like it changed the method of sorting and looks good to me - but do you have any tests that assert that the sorting of features is the same before and after? In particular do you have any tests that cover what happens when two or more features are the same exact distance? In this case what order do they end up in? Is it stable?
  • I noticed you can remove some repeated code in the geometry handling. How about applying this? https://gist.github.com/springmeyer/c3f88c31dc58e536f8b5dd5b8b015aaa
  • I noticed (unrelated to this PR) you have a string called uknown - is that misspelling intended?
  • The Nan::New<v8::Array> constructor takes a size, which will optimize its creation when you know the final size. In your case I think you do: you can pass Nan::New<v8::Array>(results_queue_.size()); in one case (when creating the features_array) and Nan::New<v8::Array>(2) in another case (when creating coordinates). This will not likely improve performance in any measurable way but will nevertheless be more efficient, so it is worth applying I think.

Copy link
Contributor

@springmeyer springmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made minor comments above. This looks great otherwise, so approving.

@mapsam
Copy link
Contributor Author

mapsam commented Nov 28, 2017

do you have test coverage of sorted results?

I don't - that feels like a really good test to add. I'll work on updating one of them to check if the values are the same in the resulting feature collection.

I noticed you can remove some repeated code in the geometry handling. How about applying this? /springmeyer/c3f88c31dc58e536f8b5dd5b8b015aaa

Mind updating this link @springmeyer?

string called uknown - is that misspelling intended?

Indeed it is. I will update it 👌

My todos:

  • Nan::New<v8::Array> constructor size
  • update uknown spelling
  • implement @springmeyer's suggestion once I see it!
  • test: cover what happens when two or more features are the same exact distance
  • test: cover when results include a 0.0 point in polygon (ensure it is the first result)

@springmeyer
Copy link
Contributor

@mapsam - okay, if the difference is in spatial-algos, then I'm comfortable with you applying a fuzzy comparison in vtquery rather than exact. It looks like this is what spatial-algos is doing by using BOOST_CHECK_CLOSE with an epsilon: https://github.com/mapbox/spatial-algorithms/blob/master/test/unit/closest_point/test-closest-point.cpp

@mapsam
Copy link
Contributor Author

mapsam commented Nov 29, 2017

@springmeyer how does this look as a viable checkClose function?

function checkClose(a, b, epsilon) {
  return 1*(a-b) < epsilon;
}

And usage in tests with same epsilon as spatial-algorithms

assert.ok(checkClose(number_a, number_b, 1e-6));

@springmeyer
Copy link
Contributor

springmeyer commented Nov 29, 2017 via email

@mapsam
Copy link
Contributor Author

mapsam commented Nov 29, 2017

Great! Okay, only one more thing I'm seeing - linux seems to return results with the same distance in a different order, despite the distance being the same number. Must they be in the same order? I would expect them to be in parsing order based on the operations of the priority queue, but this is hitting my expertise in how the priority queue resorts itself.

@springmeyer
Copy link
Contributor

@mapsam - I'm not sure why, but I have a hunch and if I'm right @flippmoke can explain the details tomorrow. My hunch is that the sort is not "stable". In the past when using std::sort we've found we needed to move to http://en.cppreference.com/w/cpp/algorithm/stable_sort to get tests passing on osx and linux for fixtures. Since you are using priority_queue I wonder if https://lemire.me/blog/2017/03/13/stable-priority-queues/ is relevant. Again @flippmoke will know better than me, but this is a starting hunch.

@mapsam
Copy link
Contributor Author

mapsam commented Nov 29, 2017

@springmeyer I lied so badly! 🙈

We're never using distance from closest_point as a return, and are only using it in the conditional comparison. What we're returning is a value from cheap-ruler based on latitude and longitude inputs. So basically we convert vector tile coordinates into latitude and longitude coordinates (cast to doubles), then use cheap-ruler to calculate the distance between them in meters with this function.

Numbers, as they move through vtquery (logged with std::clog):

# x,y
tilecoords:  601, 258
lnglat:      -87.7963, 41.8675
meters:      9.43636
ResObjDistance: 9.43636
js distance: 9.436356889343624

So, logging these numbers, it seems that closest_point and cheap-ruler work as expected and keep our numbers nice and rounded. It's when we pass from the v8 thread into a javascript value when we start getting sloppy. I think all javascript floating numbers are 64 bit, so not sure if there's anything we can do here other than continue asserting on the checkClose function?

@mapsam
Copy link
Contributor Author

mapsam commented Nov 29, 2017

Just a confirmation that these tests are passing on macosx and not linux. Appears we cannot rely on order - @flippmoke have you seen priority_queue's react differently on different operating systems?

screen shot 2017-11-28 at 5 47 05 pm

@springmeyer
Copy link
Contributor

@mapsam, nice debugging. As an extra assurance that the numbers are changing at the JS conversion I recommend adding a std::setprecision call to your std::clog call. Set the precision higher so that you can see more digits after the decimal (equal to the JS). What I’d look for is that even with higher display precision the ‘ResObjDistance’ output is still different.

@mapsam
Copy link
Contributor Author

mapsam commented Nov 29, 2017

Nice @springmeyer - here's the output with std::setprecision(20):

tilecoords:     601, 258
lnglat:         -87.796286344528198242, 41.867499528053855329
meters:         9.4363568893436244878
ResObjDistance: 9.4363568893436244878

Not sure how to read this, though.

@springmeyer
Copy link
Contributor

@mapsam - thanks for posting that. So that helps show that the precision output was misleading because it was rounding when the C++ number was printed. So:

ResObjDistance: 9.43636

Is actually:

ResObjDistance: 9.4363568893436244878

Which look identical when printed (other than the precision):

9.4363568893436244878 # C++
9.436356889343624 # JS

Am I following along right?

@mapsam
Copy link
Contributor Author

mapsam commented Nov 29, 2017

@springmeyer yep, that's right. I think it'll be best to just make sure our tests are checking a rounded number, considering anything beyond 4 decimal points is in the micrometers of distance.

I tried to recreate the priority queue sorting differences in a docker container (ubuntu:16.04) without any luck. The sort results end up the same. Perhaps I'm not using the proper versions?

~/mug/priority_queue_diffs$ clang++ -c -std=c++14 main.cpp && clang++ -o main main.o && ./main
d_a: 13.7984
d_c: 13.7984
d_b: 4.5493
d_d: 1.3325
~/mug/priority_queue_diffs$ docker run -it pqd
root@a01a48cfd288:/usr/local/src# g++ -c -std=c++14 main.cpp && g++ -o main main.o && ./main
d_a: 13.7984
d_c: 13.7984
d_b: 4.5493
d_d: 1.3325

@springmeyer
Copy link
Contributor

springmeyer commented Nov 29, 2017 via email

@mapsam
Copy link
Contributor Author

mapsam commented Nov 30, 2017

@springmeyer I've tried a few more attempts, and was able to recreate the error once, but I think I got lucky as I'm not able to get it compiled any longer. Now this branch includes a dockerfile you can build with docker build -t vtquery . and connect to it via docker run -it vtquery and then run make test inside, but I'm seeing some issues:

Error: /usr/local/src/lib/binding/vtquery.node: undefined symbol: __asan_report_store4

Not sure what's happening here, but perhaps I've set up the dockerfile incorrectly?

@springmeyer
Copy link
Contributor

springmeyer commented Nov 30, 2017 via email

@mapsam
Copy link
Contributor Author

mapsam commented Nov 30, 2017

Lovely, that did it @springmeyer. Thanks for the 👀! Now back to the investigation.

@mapsam
Copy link
Contributor Author

mapsam commented Nov 30, 2017

I've removed the test for the time being and opened a ticket #44 to remind us of this in the future.

@mapsam
Copy link
Contributor Author

mapsam commented Dec 1, 2017

Failing tests are due to travis regressions which have been ticketed in mapbox/node-cpp-skel#93

@mapsam
Copy link
Contributor Author

mapsam commented Dec 1, 2017

Thanks @flippmoke and @springmeyer for the help here!

@mapsam mapsam merged commit 0a5423a into master Dec 1, 2017
@mapsam mapsam deleted the vector-tile-2x branch December 1, 2017 18:38
@springmeyer
Copy link
Contributor

epic PR @mapsam 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

radius=0 leads to zero results with PIP

2 participants