Build a Dendrogram class, adapt Louvain/Leiden/ECG to use it by ChuckHastings · Pull Request #1359 · rapidsai/cugraph

ChuckHastings · 2021-01-27T18:48:55Z

Preparing for MNMG Leiden and ECG identified an area for code cleanup.

The original cuGraph implementation of Louvain would flatten the hierarchical clustering as it was computed, filling (and returning) the final clustering. This adds an awkward step in the middle of the Louvain computation. Additionally, since Louvain (and Leiden and ECG which derive from it) is actually a hierarchical clustering algorithm it would be nice to generate the actual Dendrogram.

This PR implements a Dendrogram class, a function for flattening the Dendrogram, and modifies Louvain, Leiden and ECG to use the Dendrogram class.

It was suggested that the Dendrogram class could be moved to raft, decided to defer that until later, it's easy enough to move.

1) Separate flatten_dendogram from dendogram class 2) Add initialize_dendogram_level function 3) Created an ECG variation of Louvain that initializes the dendogram with a random ordering of vertex ids rather than creating a new graph.

codecov-io · 2021-01-27T20:52:45Z

Codecov Report

Merging #1359 (9532b13) into branch-0.18 (2fb0725) will increase coverage by 0.25%.
The diff coverage is 52.38%.

@@               Coverage Diff               @@
##           branch-0.18    #1359      +/-   ##
===============================================
+ Coverage        60.38%   60.64%   +0.25%     
===============================================
  Files               67       69       +2     
  Lines             3029     3120      +91     
===============================================
+ Hits              1829     1892      +63     
- Misses            1200     1228      +28

Impacted Files	Coverage Δ
python/cugraph/centrality/__init__.py	`100.00% <ø> (ø)`
python/cugraph/dask/structure/renumber.py	`0.00% <0.00%> (ø)`
python/cugraph/link_analysis/pagerank.py	`100.00% <ø> (ø)`
python/cugraph/comms/comms.py	`34.52% <25.00%> (ø)`
python/cugraph/dask/common/input_utils.py	`23.07% <28.57%> (+1.14%)`	⬆️
python/cugraph/dask/common/mg_utils.py	`37.50% <38.09%> (-2.50%)`	⬇️
python/cugraph/community/spectral_clustering.py	`72.54% <38.46%> (-11.67%)`	⬇️
python/cugraph/structure/number_map.py	`59.20% <50.00%> (+3.24%)`	⬆️
python/cugraph/structure/graph.py	`66.99% <76.47%> (+0.19%)`	⬆️
python/cugraph/utilities/utils.py	`72.44% <85.71%> (+0.88%)`	⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7266cdb...9532b13. Read the comment docs.

ChuckHastings · 2021-01-28T18:24:14Z

rerun tests

seunghwak · 2021-02-02T19:43:52Z

+
+#include <memory>
+#include <rmm/device_buffer.hpp>
+#include <vector>


Do we have any guidance on ordering include statements?

What I do is including more local headers first (rmm headers are more local than std C++ headers) like the following.

#include <rmm/device_buffer.hpp> #include <memory> #include <vector>

AFAIK, this is quite widely used in RAPIDS, and I started to follow this practice, and we may better stick to this unless we have other guidelines?

I had forgotten to leave the blank line and clang-format reordered them. This will be updated in my next push

seunghwak · 2021-02-02T19:46:04Z

+template <typename vertex_t>
+class Dendrogram {
+ public:
+  Dendrogram() : level_size_(), level_ptr_() {}


Do we need this? This does not do anything special, so can't we just rely on the default constructor (automatically generated by the compiler).

Residual. An earlier version (pre-PR) had something that was actually initialized here. When I separated the code for flattening the Dendrogram out of the class (in preparation for moving to RAFT), the constructor became simpler.

Removed in next push.

seunghwak · 2021-02-02T19:47:26Z

+namespace cugraph {
+
+template <typename vertex_t>
+class Dendrogram {


What's our guideline on class naming? IIRC, we're asked to use dendrogram_t instead of Dendrogram.

I like class names to start with a capital letter. dendrogram_t should be a type (which technically could be a class )

We have a mix in our code base. I prefer the capital letter starting a class name, and using the _t suffix on type parameters in the templates.

We should codify our coding standards better.

Yes, I thought the same: naming convention is dendogram_t. If not, then what is the rule of naming types (a class is a type...)?

seunghwak · 2021-02-02T19:51:43Z

+
+  void add_level(vertex_t num_verts)
+  {
+    cudaStream_t stream{0};


This is not stream ready. Better follow rmm's approach?

https://github.com/rapidsai/rmm/blob/branch-0.18/include/rmm/device_buffer.hpp#L289

That might be more than we need, especially since RAFT (where this class will be moved in 0.19) doesn't use RMM the same way. I'll push a change to partially address this. Let me know if you think I should go further than the next push or if that's sufficient for now (since we'll have to refactor this method for RAFT integration).

We at least need a mechanism to specify the stream to execute this. The RMM way is one example (and I think following the RMM style is good to enforce some consistency across RAPIDS projects) but if this is too much and there is a simpler way, I am OK as well. But we at least need a mechanism to specify the stream to execute this.

seunghwak · 2021-02-02T19:55:11Z

+
+  size_t num_levels() const { return level_size_.size(); }
+
+  vertex_t *get_level_ptr_unsafe(size_t level) const


Just FYI,

I have used nocheck instead of unsafe, but I think unsafe is a better name.

If no one opposes, I will submit a PR replacing nocheck with unsafe to be more consistent.

Little more on this.

I just realized that I used nocheck following cudf.

https://github.com/rapidsai/cudf/blob/branch-0.18/cpp/include/cudf/column/column_device_view.cuh#L167

cudf is using unsafe as well (e.g. set_bit_unsafe) but nocheck is used in cudf::column_device_view which is more widely used than bit manipulation utilities.

This being said, any preference between nocheck vs unsafe?

I should probably switch to nocheck. I recalled using the unsafe suffix in some previous work and it seemed to be appropriate here. However, after your comment I looked back at some of that. Generally what I have done in the past is:

nocheck suffix indicating don't check bounds

unsafe suffix indicating the method is not thread-safe

That seems like a better and more consistent naming approach.

Changed to nocheck in next push

seunghwak · 2021-02-02T19:56:02Z

 #include <ctime>
 #include <utilities/error.hpp>
-#include "utilities/graph_utils.cuh"
+#include <utilities/graph_utils.cuh>


Better order include statements?

I can do that in the next push.

seunghwak · 2021-02-02T20:01:56Z

-                                   IdxT *parts,
-                                   ValT *weights)
+__global__ void match_check_kernel(
+  IdxT size, IdxT num_verts, IdxT *offsets, IdxT *indices, IdxT *parts, ValT *weights)


Should we better use thrust?

thrust::transform( ... [...]__device__(auto ...) { auto source = thrust::upper_bound(thrust::seq, ...); ... } );

What do we get by writing a custom kernel?

Don't know. This custom kernel was already there. My clang-format restructured this line.

I'd rather address this issue in the MNMG ECG implementation when we refactor this than to do this now. I'm adding a FIXME to remind me to do that.

I agree w/ @seunghwak . The long-held policy has been to avoid custom-written kernels as much as possible. And use thrust / CUB instead.

Let me restate my response, perhaps I was too terse.

I agree, this seems like a place where we should just use thrust/cub. There's no reason that I can think of that this custom kernel would be better than a thrust/cub implementation. There's nothing in this kernel that takes advantage of features that would make a custom kernel perform better, and generally it increases the maintenance cost of the code.

I modified this file (ecg.cu) solely for the purpose of using the Dendrogram class, since the ECG algorithm calls Louvain and I modified Louvain. I'm reluctant, in general, to rewrite things in the code that I tangentially touch. It tends to make the work continue without getting anything finished and merged into the baseline.

The only reason this was even in the list of changes is because I ran clang-format and it changed the format of these lines of code. I made no changes relevant to this kernel or the calls to it.

I have a branch where I have started the MNMG ECG work. In that branch I will completely refactor how the ECG code works. My intention is to remove this kernel in that branch which hopefully will be merged sometime during release 0.19.

If we think it's critical that I do this for release 0.18, I can certainly take a few hours and refactor this once I'm done addressing the device_uvector request.

seunghwak · 2021-02-02T20:05:11Z

+#include <rmm/thrust_rmm_allocator.h>
+#include <community/dendrogram.cuh>
+#include <experimental/graph_functions.hpp>
+#include <raft/handle.hpp>


Better order include statements?

Fixed in next push.

@seunghwak, include statements order is dictated by clang-format

If you put blank lines in between blocks of statements, clang-format will honor the blocks.

seunghwak · 2021-02-02T20:06:52Z

+  weight_t wt = runner(max_level, resolution);

-  return runner(clustering, max_level, resolution);
+  thrust::device_vector<vertex_t> vertex_ids_v(graph.number_of_vertices);


If you're not dealing with non-arithmetic types, rmm::device_uvector will be more efficient and more stream ready.

Note that rmm::device_uvector does not invoke default constructor to initialize vector elements, so if your code expects initialization, you need to call thrust::fill().

Changed in the next push.

BradReesWork · 2021-02-02T20:08:10Z

-                                   IdxT *parts,
-                                   ValT *weights)
+__global__ void match_check_kernel(
+  IdxT size, IdxT num_verts, IdxT *offsets, IdxT *indices, IdxT *parts, ValT *weights)


What are you using "IdxT" rather than "index_t" ? It seems like we are not using capitalized types anywhere else

This is an existing kernel that has existed for a while. All I did was reformat.

I'd like to defer this until I finish implementing MNMG ECG when I will be reworking all of this code. (@seunghwak suggested I delete the kernel entirely).

BradReesWork · 2021-02-02T20:21:42Z

+  //    a distributed implementation of get_permutation_vector, preferably without
+  //    comms...
+  //
+


Is there an issue to address this TODO? If so, add reference. If not, should there be? I would prefer not to lose sight that these TODOs need to be addressed

Deleted the TODO (and the others in this file). These were notes to myself related to refactoring ECG for MNMG.

The work described in this note is already in an MNMG ECG branch that stalled when @seunghwak and I discussed a new graph primitive that is required.

BradReesWork · 2021-02-02T20:48:10Z

@ChuckHastings any thought on moving Dendrogram into RAFT? I believe @cjnolet has a need

ChuckHastings · 2021-02-02T20:54:22Z

@ChuckHastings any thought on moving Dendrogram into RAFT? I believe @cjnolet has a need

My plan was to get everything working here first and move to raft in 0.19. Fewer moving parts that way. I think @cjnolet was not in a big hurry.

cjnolet · 2021-02-02T20:56:56Z

My plan was to get everything working here first and move to raft in 0.19. Fewer moving parts that way. I think @cjnolet was not in a big hurry.

Sounds good. I'll be moving a bunch of sparse prims into RAFT in 0.19 as well, including the single-linkage clustering.

aschaffer · 2021-02-02T21:29:38Z

+
+  size_t num_levels() const { return level_size_.size(); }
+
+  vertex_t *get_level_ptr_unsafe(size_t level) const


It is dangerous to have a const getter return a non-const pointer to a member. Something like this, for example, can happen:

template<typename T> void foo(Dendrogram<T> const& d) { d.get_level_ptr_unsafe(0)[0] = T{3}; } int main(void) { Dendrogram<int> d; foo(d); return 0; }

foo() should not be permitted to modify the const& argument, d. But it can. This breaks the contract that the getter is supposed to fulfill (namely being an immutable getter).
(to compile this simple example, I changed device_buffer to std::vector<vertex_t>, but the idea is the same).

At the very least, get_level_ptr_unsafe() should not be a const member.

Sorry, that was careless. Next push will address this.

There should be a const version that returns a const pointer and a non-const version that returns a non-const pointer as it is used in both contexts.

aschaffer · 2021-02-02T21:30:49Z

+
+  vertex_t get_level_size_unsafe(size_t level) const { return level_size_[level]; }
+
+  vertex_t *current_level_begin() const { return get_level_ptr_unsafe(current_level()); }


Same comment about making this a non-const getter.

aschaffer · 2021-02-02T21:30:58Z

+
+  vertex_t *current_level_begin() const { return get_level_ptr_unsafe(current_level()); }
+
+  vertex_t *current_level_end() const { return current_level_begin() + current_level_size(); }


Same comment about making this a non-const getter.

aschaffer · 2021-02-02T21:32:32Z

+namespace cugraph {
+
+template <typename vertex_t>
+class Dendrogram {


Yes, I thought the same: naming convention is dendogram_t. If not, then what is the rule of naming types (a class is a type...)?

aschaffer · 2021-02-02T21:34:20Z

-                                   IdxT *parts,
-                                   ValT *weights)
+__global__ void match_check_kernel(
+  IdxT size, IdxT num_verts, IdxT *offsets, IdxT *indices, IdxT *parts, ValT *weights)


I agree w/ @seunghwak . The long-held policy has been to avoid custom-written kernels as much as possible. And use thrust / CUB instead.

aschaffer · 2021-02-02T21:39:54Z

+  using graph_type = GraphCSRView<vertex_t, edge_t, weight_t>;
+
  CUGRAPH_EXPECTS(graph.edge_data != nullptr,
                  "Invalid input argument: louvain expects a weighted graph");


Why public inheritance? If no method is overriden (virtual) then public inheritance is too much of an exposure (hence creating unnecessary coupling). Why not use private inheritance?

Moreover, if Louvain class is meant to be publicly derived (and it seems it is since it exposes at least one virtual method) then it should have a virtual destructor.

aschaffer · 2021-02-02T21:40:59Z

+  using graph_type = GraphCSRView<vertex_t, edge_t, weight_t>;
+
  CUGRAPH_EXPECTS(graph.edge_data != nullptr,
                  "Invalid input argument: louvain expects a weighted graph");


Again, confusion about class naming convention, shouldn't be <small_caps>_t? I understand that some are older (legacy) classes, but perhaps new classed might follow the rule?

aschaffer · 2021-02-02T21:42:15Z

+#include <rmm/thrust_rmm_allocator.h>
+#include <community/dendrogram.cuh>
+#include <experimental/graph_functions.hpp>
+#include <raft/handle.hpp>


@seunghwak, include statements order is dictated by clang-format

afender

Looks good.

afender · 2021-02-03T23:07:23Z

         vertex_t ensemble_size,
         vertex_t *clustering)
 {
+  using graph_type = GraphCSRView<vertex_t, edge_t, weight_t>;


We should try and get rid of legacy classes in SG Louvain and ECG to use graph_t. Or the MG path could be used to support the 1 GPU case.

There's work currently scheduled for 0.19 that will adapt all of these to use the graph primitives. We should be able to consider getting rid of the legacy versions at that point.

ChuckHastings · 2021-02-04T04:29:16Z

rerun tests

BradReesWork · 2021-02-04T15:40:16Z

rerun tests

ChuckHastings · 2021-02-04T17:18:21Z

rerun tests

This is now failing on Pascal (using a method that only works on > Pascal). Working on an update to the python tests to disable these tests on a Pascal system.

… Louvain tests on Pascal hardware

BradReesWork · 2021-02-05T15:35:14Z

@gpucibot merge

rlratzel

LGTM, also happy to see the new notebook test skipping mechanism. One suggestion which need not hold up my approval (and I can file an issue if you agree and want to defer this):

I'm thinking it might be better to generalize the keywords a bit more with the side effect of making them more self documenting to NB users who run across them outside the context of our test infra. Maybe for example:

# AUTOMATED TESTING: skip

skips that NB when run from our test scripts

# AUTOMATED TESTING: skip on Pascal

skips that NB when run from our test scripts on a Pascal system

I'm just thinking the AUTOMATED TESTING tag makes it obvious this isn't a comment an unknowing user can just reword for clarity at some point later.

ChuckHastings added 10 commits January 7, 2021 17:26

create a dendogram in the C++ code

ab335b7

fix clang format issues

4d984c6

update copyright date

9c276ae

Merge branch 'branch-0.18' into fea_louvain_dendogram

0ded26d

New idea for ECG

637aee1

1) Separate flatten_dendogram from dendogram class 2) Add initialize_dendogram_level function 3) Created an ECG variation of Louvain that initializes the dendogram with a random ordering of vertex ids rather than creating a new graph.

Merge branch 'branch-0.18' into fea_louvain_dendogram

7187bdb

rename ECG to make it consistent

347051c

missed renaming ECG in CMakeLists.txt

755c298

Merge branch 'branch-0.18' into fea_louvain_dendogram

142024c

fix spelling of dendrogram, fix clang formatting issues

07ad090

ChuckHastings requested review from a team as code owners January 27, 2021 18:48

BradReesWork added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 29, 2021

BradReesWork added this to the 0.18 milestone Jan 29, 2021

BradReesWork requested review from afender, aschaffer and seunghwak January 29, 2021 16:26

seunghwak reviewed Feb 2, 2021

View reviewed changes

BradReesWork reviewed Feb 2, 2021

View reviewed changes

Merge branch 'branch-0.18' into fea_louvain_dendogram

becbc63

aschaffer requested changes Feb 2, 2021

View reviewed changes

ChuckHastings added 2 commits February 3, 2021 17:56

address comments from PR

bc3f26d

Merge branch 'branch-0.18' into fea_louvain_dendogram

a3dd26d

afender approved these changes Feb 3, 2021

View reviewed changes

add checks for Pascal architecture in Louvain derived tests

84d3bf1

ChuckHastings requested a review from a team as a code owner February 4, 2021 17:30

ChuckHastings added 4 commits February 4, 2021 12:42

reformat and update copyright dates

2e89e2c

a few more flake8 errors

825d6af

Merge branch 'branch-0.18' into fea_louvain_dendogram

4f0453e

refactor notebook tests to make it easier to filter tests; filter out…

9afc8de

… Louvain tests on Pascal hardware

ChuckHastings requested review from a team as code owners February 5, 2021 00:20

ChuckHastings added 3 commits February 4, 2021 16:25

delete some unused code in script

3db5697

misspelled update from Rick

d40f5a0

add early breaks, fix copyright dates

9532b13

BradReesWork approved these changes Feb 5, 2021

View reviewed changes

aschaffer approved these changes Feb 5, 2021

View reviewed changes

rlratzel approved these changes Feb 5, 2021

View reviewed changes

ajschmidt8 approved these changes Feb 5, 2021

View reviewed changes

rapids-bot Bot merged commit 039b857 into rapidsai:branch-0.18 Feb 5, 2021

ChuckHastings deleted the fea_louvain_dendogram branch February 10, 2021 16:09


		size_t num_levels() const { return level_size_.size(); }

		vertex_t *get_level_ptr_unsafe(size_t level) const


		vertex_t get_level_size_unsafe(size_t level) const { return level_size_[level]; }

		vertex_t *current_level_begin() const { return get_level_ptr_unsafe(current_level()); }


		vertex_t *current_level_begin() const { return get_level_ptr_unsafe(current_level()); }

		vertex_t *current_level_end() const { return current_level_begin() + current_level_size(); }

Conversation

ChuckHastings commented Jan 27, 2021

Uh oh!

codecov-io commented Jan 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ChuckHastings commented Jan 28, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seunghwak Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seunghwak Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BradReesWork commented Feb 2, 2021

Uh oh!

ChuckHastings commented Feb 2, 2021

Uh oh!

cjnolet commented Feb 2, 2021

codecov-io commented Jan 27, 2021 •

edited

Loading

seunghwak Feb 2, 2021 •

edited

Loading

seunghwak Feb 2, 2021 •

edited

Loading