Skip to content

Use new coarsen_graph primitive in Louvain#1362

Merged
rapids-bot[bot] merged 21 commits intorapidsai:branch-0.18from
ChuckHastings:fea_louvain_use_renumber_primitives
Feb 10, 2021
Merged

Use new coarsen_graph primitive in Louvain#1362
rapids-bot[bot] merged 21 commits intorapidsai:branch-0.18from
ChuckHastings:fea_louvain_use_renumber_primitives

Conversation

@ChuckHastings
Copy link
Copy Markdown
Collaborator

@ChuckHastings ChuckHastings commented Jan 28, 2021

Modify experimental::louvain to use the new coarsen_graph primitive. This replaces the original implementation of shrink_graph.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jan 28, 2021

Codecov Report

Merging #1362 (adcd593) into branch-0.18 (2fb0725) will increase coverage by 0.19%.
The diff coverage is 59.25%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #1362      +/-   ##
===============================================
+ Coverage        60.38%   60.57%   +0.19%     
===============================================
  Files               67       69       +2     
  Lines             3029     3120      +91     
===============================================
+ Hits              1829     1890      +61     
- Misses            1200     1230      +30     
Impacted Files Coverage Δ
python/cugraph/centrality/__init__.py 100.00% <ø> (ø)
python/cugraph/dask/structure/renumber.py 0.00% <0.00%> (ø)
python/cugraph/link_analysis/pagerank.py 100.00% <ø> (ø)
python/cugraph/comms/comms.py 34.52% <25.00%> (ø)
python/cugraph/dask/common/input_utils.py 23.07% <28.57%> (+1.14%) ⬆️
python/cugraph/dask/common/mg_utils.py 37.50% <38.09%> (-2.50%) ⬇️
python/cugraph/community/spectral_clustering.py 72.54% <38.46%> (-11.67%) ⬇️
python/cugraph/structure/number_map.py 59.20% <50.00%> (+3.24%) ⬆️
python/cugraph/utilities/utils.py 70.07% <61.53%> (-1.48%) ⬇️
python/cugraph/structure/graph.py 66.99% <76.47%> (+0.19%) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 039b857...adcd593. Read the comment docs.

@BradReesWork BradReesWork added this to the 0.18 milestone Jan 29, 2021
@BradReesWork BradReesWork added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 29, 2021
@ChuckHastings ChuckHastings marked this pull request as ready for review February 5, 2021 22:45
@ChuckHastings ChuckHastings requested a review from a team as a code owner February 5, 2021 22:45
@ChuckHastings
Copy link
Copy Markdown
Collaborator Author

rerun tests

1 similar comment
@ChuckHastings
Copy link
Copy Markdown
Collaborator Author

rerun tests

Copy link
Copy Markdown
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me except for few minor complaints.

Comment thread cpp/src/community/louvain.cu Outdated
thrust::copy(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
thrust::make_counting_iterator<vertex_t>(0),
thrust::make_counting_iterator<vertex_t>(graph_view.get_number_of_vertices()),
vertex_ids_v.begin());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we better use thrust::sequence? Both may do the same thing, but I guess thrust::sequence is more specific so easier to figure out what's happening even before reading the entire code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. Included in the next push.

clustering,
runner.get_dendrogram().num_levels());

// FIXME: Consider returning the Dendrogram at some point
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this FIXME still relevant? It seems like this code is returning a dendorgram unless get_dendrogram() below is a misnomer.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could consider moving the FIXME, but I think it is still relevant.

The current python API (and every version since we integrated) expects the C++ code to flatten the result and return the final clustering. The networkx python implementation of this returns a Dendrogram and the caller can choose at what level to flatten the Dendrogram later. This seems better... but I didn't want to break the python API at this point.

Comment thread cpp/src/experimental/louvain.cuh Outdated
current_graph_view_.get_local_adj_matrix_partition_row_last(0) -
current_graph_view_.get_local_adj_matrix_partition_row_first(0),
1,
stream_);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better avoid directly accessing offsets() here.

Currently we can do this with

local_num_edges_ = edge_t{0};
for (size_t i = 0; i < current_graph_view_.get_number_of_local_adj_matrix_partitions(); ++i) {
  local_num_edges += current_graph_view_.get_number_of_local_adj_matrix_partition_edges(i);
}

Ideally, I think we need to add get_number_of_local_edges() (which basically does what the code above does) to graph_view_t (I thought I did this but I cannot find...).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping that local_num_edges_ will become obsolete once I modify update_by_delta_modularity to use the new primitives. But that won't be until the next release.

I will make this change (occurs in at least one other place in louvain.cuh) and push an update.

int32_t const *labels,
bool do_expensive_check);

template std::tuple<std::unique_ptr<graph_t<int32_t, int64_t, float, true, true>>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we keep an eye on compilation time, please consider using extern template declarations (EIDecl's) in the corresponding header to suppress automatic template instantiations for these template parameter combos (these ones).

In the *.cu they get un-suppressed by the (EIDir) directives below.

For readability, you can adopt the model of placing all the EIDecl's (extern template) in an eidecl_<filename>.hpp file that gets included at the end of the header file for the function below.

This is good practice that will pay off compile-time benefits over time (especially when there are many combinations of tparams).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done for all of the graph_functions.hpp functions. The renumber_edgelist ones that you modified last month, these coarsen_graph ones, the relabel function and the extract_induced_subgraphs. I'd suggest we do everything in that file at once. Perhaps we should create an issue in 0.19 to address this.

Comment thread cpp/src/experimental/louvain.cuh Outdated
1,
stream_);
local_num_edges_ = edge_t{0};
for (size_t i = 0; i < graph_view.get_number_of_local_adj_matrix_partitions(); ++i) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing custom loop by algorithm (in this case, probably std::accumulate()).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a temporary solution until @seunghwak exposes a get_number_of_local_edges() method on the graph view.

I cannot see a clean way - given the current graph_view API - to simplify this into an std algorithm. The only good way I saw was to use std::transform_reduce which isn't available in C++ 14 (our current baseline requirement). I could use thrust::transform_reduce and have it specify a host function, I suppose. I'm inclined to let Seunghwa add the method (probably in version 0.19) and using it here rather than create a sophisticated temporary solution.

Copy link
Copy Markdown
Collaborator

@aschaffer aschaffer Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto sz = graph_view.get_number_of_local_adj_matrix_partitions();
local_num_edges_ = thrust::reduce(thrust::host, 
                      thrust::make_counting_iterator(0), thrust::make_counting_iterator(sz),
                      [graph_view] (auto indx1, auto indx2){
                         return graph_view.get_number_of_local_adj_matrix_partition_edges(indx1)+
                                    graph_view.get_number_of_local_adj_matrix_partition_edges(indx2);
                      });

Wouldn't that work?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that works. The reduction function has to be able to be applied to the output of the reduction function, it can't do a transformation from indices to sums. That's why transform_reduce exists.

I just pushed a new version using thrust::transform_reduce. It's a bit ugly because the graph_view object is not trivially copyable. Ultimately this will go away once the new method is provided.

Comment thread cpp/src/experimental/louvain.cuh Outdated
base_vertex_id_ = current_graph_view_.get_local_vertex_first();

local_num_edges_ = edge_t{0};
for (size_t i = 0; i < current_graph_view_.get_number_of_local_adj_matrix_partitions(); ++i) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing custom loop by algorithm (in this case, probably std::accumulate()).

src_v.end(),
thrust::make_zip_iterator(thrust::make_tuple(dst_v.begin(), weight_v.begin())));
rmm::device_uvector<vertex_t> numbering_indices(numbering_map.size(), stream_);
thrust::sequence(rmm::exec_policy(stream_)->on(stream_),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why creating O(n) sequences rather than using O(1) counting_iterator? (above and in a few other places)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cugraph::experimental::relabel expects std::tuple<vertex_t const *, vertex_t const *> as the second parameter (where this is used). I don't know how to make a counting iterator that provides a vertex_t const *.

The original code (before converting to using relabel) did in place sorts so the sequence needed to be realized. If there is a way to convert it to a counting iterator, I'd be happy to try it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thrust::sort(), thrust::sort_by_key() expect a modifiable range (hence cannot be counting iterators).

Comment thread cpp/tests/community/louvain_test.cu Outdated
// this is the behavior while we still support Pascal (device_prop.major < 7)
//
cudaDeviceProp device_prop;
CUDA_CHECK(cudaGetDeviceProperties(&device_prop, 0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use
const cudaDeviceProp& get_device_properties() const
https://github.com/rapidsai/raft/blob/branch-0.18/cpp/include/raft/handle.hpp#L181 instead.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in next push

Comment thread cpp/tests/community/louvain_test.cu Outdated
cudaDeviceProp device_prop;
CUDA_CHECK(cudaGetDeviceProperties(&device_prop, 0));

if (device_prop.major < 7) {
Copy link
Copy Markdown
Contributor

@seunghwak seunghwak Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test file be .cu?

I think we should allow C++ users using g++ to use libcugraph, and to test that, I think our C++ files better be .cpp than .cu.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And better use rmm::device_uvector instead of rmm::device_vector. rmm::device_uvector compiles with g++. rmm:device_vector is a thrust::device_vector and requires nvcc.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed all of the Louvain derived tests to .cpp. Modified tests to use rmm::device_uvector.

Comment thread cpp/tests/community/louvain_test.cu Outdated
// this is the behavior while we still support Pascal (device_prop.major < 7)
//
cudaDeviceProp device_prop;
CUDA_CHECK(cudaGetDeviceProperties(&device_prop, 0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, RAFT has a utility function to get cudaDeviceProp.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment thread cpp/tests/community/louvain_test.cu Outdated
cudaMemcpy((void*)&(cluster_id[0]),
result_v.data().get(),
sizeof(int) * num_verts,
cudaMemcpyDeviceToHost);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use raft::update_host followed by stream sync.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified all Louvain derived tests to use raft::update_host followed by cudaDeviceSynchronize

Copy link
Copy Markdown
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly reviewed the updates for checking for Pascal and those look good but I agree with Seunghwa's suggestions for the C++ tests.

@ChuckHastings ChuckHastings requested a review from a team as a code owner February 9, 2021 18:56
Copy link
Copy Markdown
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!!

@rlratzel
Copy link
Copy Markdown
Contributor

rerun tests

@BradReesWork
Copy link
Copy Markdown
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7e58591 into rapidsai:branch-0.18 Feb 10, 2021
@ChuckHastings ChuckHastings deleted the fea_louvain_use_renumber_primitives branch February 10, 2021 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants