Use new coarsen_graph primitive in Louvain#1362
Use new coarsen_graph primitive in Louvain#1362rapids-bot[bot] merged 21 commits intorapidsai:branch-0.18from
Conversation
1) Separate flatten_dendogram from dendogram class 2) Add initialize_dendogram_level function 3) Created an ECG variation of Louvain that initializes the dendogram with a random ordering of vertex ids rather than creating a new graph.
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #1362 +/- ##
===============================================
+ Coverage 60.38% 60.57% +0.19%
===============================================
Files 67 69 +2
Lines 3029 3120 +91
===============================================
+ Hits 1829 1890 +61
- Misses 1200 1230 +30
Continue to review full report at Codecov.
|
|
rerun tests |
1 similar comment
|
rerun tests |
seunghwak
left a comment
There was a problem hiding this comment.
This looks good to me except for few minor complaints.
| thrust::copy(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()), | ||
| thrust::make_counting_iterator<vertex_t>(0), | ||
| thrust::make_counting_iterator<vertex_t>(graph_view.get_number_of_vertices()), | ||
| vertex_ids_v.begin()); |
There was a problem hiding this comment.
Should we better use thrust::sequence? Both may do the same thing, but I guess thrust::sequence is more specific so easier to figure out what's happening even before reading the entire code.
There was a problem hiding this comment.
Of course. Included in the next push.
| clustering, | ||
| runner.get_dendrogram().num_levels()); | ||
|
|
||
| // FIXME: Consider returning the Dendrogram at some point |
There was a problem hiding this comment.
Is this FIXME still relevant? It seems like this code is returning a dendorgram unless get_dendrogram() below is a misnomer.
There was a problem hiding this comment.
I could consider moving the FIXME, but I think it is still relevant.
The current python API (and every version since we integrated) expects the C++ code to flatten the result and return the final clustering. The networkx python implementation of this returns a Dendrogram and the caller can choose at what level to flatten the Dendrogram later. This seems better... but I didn't want to break the python API at this point.
| current_graph_view_.get_local_adj_matrix_partition_row_last(0) - | ||
| current_graph_view_.get_local_adj_matrix_partition_row_first(0), | ||
| 1, | ||
| stream_); |
There was a problem hiding this comment.
Better avoid directly accessing offsets() here.
Currently we can do this with
local_num_edges_ = edge_t{0};
for (size_t i = 0; i < current_graph_view_.get_number_of_local_adj_matrix_partitions(); ++i) {
local_num_edges += current_graph_view_.get_number_of_local_adj_matrix_partition_edges(i);
}
Ideally, I think we need to add get_number_of_local_edges() (which basically does what the code above does) to graph_view_t (I thought I did this but I cannot find...).
There was a problem hiding this comment.
I'm hoping that local_num_edges_ will become obsolete once I modify update_by_delta_modularity to use the new primitives. But that won't be until the next release.
I will make this change (occurs in at least one other place in louvain.cuh) and push an update.
| int32_t const *labels, | ||
| bool do_expensive_check); | ||
|
|
||
| template std::tuple<std::unique_ptr<graph_t<int32_t, int64_t, float, true, true>>, |
There was a problem hiding this comment.
Since we keep an eye on compilation time, please consider using extern template declarations (EIDecl's) in the corresponding header to suppress automatic template instantiations for these template parameter combos (these ones).
In the *.cu they get un-suppressed by the (EIDir) directives below.
For readability, you can adopt the model of placing all the EIDecl's (extern template) in an eidecl_<filename>.hpp file that gets included at the end of the header file for the function below.
This is good practice that will pay off compile-time benefits over time (especially when there are many combinations of tparams).
There was a problem hiding this comment.
This should be done for all of the graph_functions.hpp functions. The renumber_edgelist ones that you modified last month, these coarsen_graph ones, the relabel function and the extract_induced_subgraphs. I'd suggest we do everything in that file at once. Perhaps we should create an issue in 0.19 to address this.
| 1, | ||
| stream_); | ||
| local_num_edges_ = edge_t{0}; | ||
| for (size_t i = 0; i < graph_view.get_number_of_local_adj_matrix_partitions(); ++i) { |
There was a problem hiding this comment.
Consider replacing custom loop by algorithm (in this case, probably std::accumulate()).
There was a problem hiding this comment.
This is a temporary solution until @seunghwak exposes a get_number_of_local_edges() method on the graph view.
I cannot see a clean way - given the current graph_view API - to simplify this into an std algorithm. The only good way I saw was to use std::transform_reduce which isn't available in C++ 14 (our current baseline requirement). I could use thrust::transform_reduce and have it specify a host function, I suppose. I'm inclined to let Seunghwa add the method (probably in version 0.19) and using it here rather than create a sophisticated temporary solution.
There was a problem hiding this comment.
auto sz = graph_view.get_number_of_local_adj_matrix_partitions();
local_num_edges_ = thrust::reduce(thrust::host,
thrust::make_counting_iterator(0), thrust::make_counting_iterator(sz),
[graph_view] (auto indx1, auto indx2){
return graph_view.get_number_of_local_adj_matrix_partition_edges(indx1)+
graph_view.get_number_of_local_adj_matrix_partition_edges(indx2);
});
Wouldn't that work?
There was a problem hiding this comment.
I don't think that works. The reduction function has to be able to be applied to the output of the reduction function, it can't do a transformation from indices to sums. That's why transform_reduce exists.
I just pushed a new version using thrust::transform_reduce. It's a bit ugly because the graph_view object is not trivially copyable. Ultimately this will go away once the new method is provided.
| base_vertex_id_ = current_graph_view_.get_local_vertex_first(); | ||
|
|
||
| local_num_edges_ = edge_t{0}; | ||
| for (size_t i = 0; i < current_graph_view_.get_number_of_local_adj_matrix_partitions(); ++i) { |
There was a problem hiding this comment.
Consider replacing custom loop by algorithm (in this case, probably std::accumulate()).
| src_v.end(), | ||
| thrust::make_zip_iterator(thrust::make_tuple(dst_v.begin(), weight_v.begin()))); | ||
| rmm::device_uvector<vertex_t> numbering_indices(numbering_map.size(), stream_); | ||
| thrust::sequence(rmm::exec_policy(stream_)->on(stream_), |
There was a problem hiding this comment.
Why creating O(n) sequences rather than using O(1) counting_iterator? (above and in a few other places)
There was a problem hiding this comment.
cugraph::experimental::relabel expects std::tuple<vertex_t const *, vertex_t const *> as the second parameter (where this is used). I don't know how to make a counting iterator that provides a vertex_t const *.
The original code (before converting to using relabel) did in place sorts so the sequence needed to be realized. If there is a way to convert it to a counting iterator, I'd be happy to try it.
There was a problem hiding this comment.
True, thrust::sort(), thrust::sort_by_key() expect a modifiable range (hence cannot be counting iterators).
| // this is the behavior while we still support Pascal (device_prop.major < 7) | ||
| // | ||
| cudaDeviceProp device_prop; | ||
| CUDA_CHECK(cudaGetDeviceProperties(&device_prop, 0)); |
There was a problem hiding this comment.
You can use
const cudaDeviceProp& get_device_properties() const
https://github.com/rapidsai/raft/blob/branch-0.18/cpp/include/raft/handle.hpp#L181 instead.
There was a problem hiding this comment.
Updated in next push
| cudaDeviceProp device_prop; | ||
| CUDA_CHECK(cudaGetDeviceProperties(&device_prop, 0)); | ||
|
|
||
| if (device_prop.major < 7) { |
There was a problem hiding this comment.
Should this test file be .cu?
I think we should allow C++ users using g++ to use libcugraph, and to test that, I think our C++ files better be .cpp than .cu.
There was a problem hiding this comment.
And better use rmm::device_uvector instead of rmm::device_vector. rmm::device_uvector compiles with g++. rmm:device_vector is a thrust::device_vector and requires nvcc.
There was a problem hiding this comment.
Renamed all of the Louvain derived tests to .cpp. Modified tests to use rmm::device_uvector.
| // this is the behavior while we still support Pascal (device_prop.major < 7) | ||
| // | ||
| cudaDeviceProp device_prop; | ||
| CUDA_CHECK(cudaGetDeviceProperties(&device_prop, 0)); |
There was a problem hiding this comment.
Again, RAFT has a utility function to get cudaDeviceProp.
| cudaMemcpy((void*)&(cluster_id[0]), | ||
| result_v.data().get(), | ||
| sizeof(int) * num_verts, | ||
| cudaMemcpyDeviceToHost); |
There was a problem hiding this comment.
Better use raft::update_host followed by stream sync.
There was a problem hiding this comment.
Modified all Louvain derived tests to use raft::update_host followed by cudaDeviceSynchronize
rlratzel
left a comment
There was a problem hiding this comment.
Mostly reviewed the updates for checking for Pascal and those look good but I agree with Seunghwa's suggestions for the C++ tests.
|
rerun tests |
|
@gpucibot merge |
Modify experimental::louvain to use the new
coarsen_graphprimitive. This replaces the original implementation ofshrink_graph.