From 0cc951f709a0907d356883018978221274712f9d Mon Sep 17 00:00:00 2001
From: Raymond Douglass <ray@raydouglass.com>
Date: Wed, 24 Feb 2021 11:04:01 -0500
Subject: [PATCH 1/5] update changelog

---
 CHANGELOG.md | 56 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 2957a22a68d..3740c4227a1 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,6 +1,56 @@
-# 0.18.0
-
-Please see https://github.com/rapidsai/cugraph/releases/tag/branch-0.18-latest for the latest changes to this development branch.
+# cuGraph 0.18.0 (24 Feb 2021)
+
+## Bug Fixes 🐛
+
+- Fixed TSP returned routes (#1412) @hlinsen
+- Updated CI scripts to use a different error handling convention, updated LD_LIBRARY_PATH for project flash runs (#1386) @rlratzel
+- Bug fixes for MNMG coarsen_graph, renumber_edgelist, relabel (#1364) @seunghwak
+- Set a specific known working commit hash for gunrock instead of &quot;dev&quot; (#1336) @rlratzel
+- Updated git utils used by copyright.py for compatibility with current CI env (#1325) @rlratzel
+- Fix MNMG Louvain tests on Pascal architecture (#1322) @ChuckHastings
+- FIX Set bash trap after PATH is updated (#1321) @dillon-cullinan
+- Fix graph nodes function and renumbering from series (#1319) @Iroy30
+- Fix Branch 0.18 merge 0.17 (#1314) @BradReesWork
+- Fix EXPERIMENTAL_LOUVAIN_TEST on Pascal (#1312) @ChuckHastings
+- Updated cuxfilter to 0.18, removed datashader indirect dependency in conda dev .yml files (#1311) @rlratzel
+- Update SG PageRank C++ tests (#1307) @seunghwak
+
+## Documentation 📖
+
+- Enabled MultiGraph class and tests, updated SOURCEBUILD.md to include the latest build.sh options (#1351) @rlratzel
+
+## New Features 🚀
+
+- New EgoNet extractor (#1365) @afender
+- Implement induced subgraph extraction primitive (SG C++) (#1354) @seunghwak
+
+## Improvements 🛠️
+
+- Update stale GHA with exemptions &amp; new labels (#1413) @mike-wendt
+- Add GHA to mark issues/prs as stale/rotten (#1408) @Ethyling
+- update subgraph tests and remove legacy pagerank (#1378) @Iroy30
+- Update the conda environments and README file (#1369) @BradReesWork
+- Prepare Changelog for Automation (#1368) @ajschmidt8
+- Update CMakeLists.txt files for consistency with RAPIDS and to support cugraph as an external project and other tech debt removal (#1367) @rlratzel
+- Use new coarsen_graph primitive in Louvain (#1362) @ChuckHastings
+- Added initial infrastructure for MG C++ testing and a Pagerank MG test using it (#1361) @rlratzel
+- Add SG TSP (#1360) @hlinsen
+- Build a Dendrogram class, adapt Louvain/Leiden/ECG to use it (#1359) @ChuckHastings
+- Auto-label PRs based on their content (#1358) @jolorunyomi
+- Implement MNMG Renumber (#1355) @aschaffer
+- Enabling pytest code coverage output by default (#1352) @jnke2016
+- Added configuration for new cugraph-doc-codeowners review group (#1344) @rlratzel
+- API update to match RAFT PR #120 (#1343) @drobison00
+- Pin gunrock to v1.2 for version 0.18 (#1342) @ChuckHastings
+- Fix #1340 - Use generic from_edgelist() methods (#1341) @miguelusque
+- Using RAPIDS_DATASET_ROOT_DIR env var in place of absolute path to datasets in tests (#1337) @jnke2016
+- Expose dense implementation of Hungarian algorithm (#1333) @ChuckHastings
+- SG Pagerank transition (#1332) @Iroy30
+- improving error checking and docs (#1327) @BradReesWork
+- Fix MNMG cleanup exceptions (#1326) @Iroy30
+- Create labeler.yml (#1318) @jolorunyomi
+- Updates to support nightly MG test automation (#1308) @rlratzel
+- Add C++ graph functions (coarsen_grpah, renumber_edgelist, relabel) and primitvies (transform_reduce_by_adj_matrix_row_key, transform_reduce_by_adj_matrix_col_key, copy_v_transform_reduce_key_aggregated_out_nbr) (#1257) @seunghwak
 
 # cuGraph 0.17.0 (10 Dec 2020)
 ## New Features

From 06ac713c5e5700185abe28fbc261c84e2b7165a8 Mon Sep 17 00:00:00 2001
From: Seunghwa Kang <45857425+seunghwak@users.noreply.github.com>
Date: Thu, 25 Feb 2021 17:08:15 -0500
Subject: [PATCH 2/5] Matching updates for RAFT comms updates (device_sendrecv,
 device_multicast_sendrecv, gather, gatherv) (#1391)

- [x] Update cuGraph to use RAFT::comms_t's newly added device_sendrecv & device_multicast_sendrecv)
- [x] Update cuGraph to use RAFT::comms_t's newly added gather & gatherv
- [x] Update RAFT git tag once https://github.com/rapidsai/raft/pull/114 (currently merged in 0.18 but is not merged to 0.19) and https://github.com/rapidsai/raft/pull/144 are merged to 0.19

Ready for review but cannot be merged till RAFT PR 114 and 144 are merged to RAFT branch-0.19.

Authors:
  - Seunghwa Kang (@seunghwak)

Approvers:
  - Alex Fender (@afender)

URL: https://github.com/rapidsai/cugraph/pull/1391
---
 cpp/CMakeLists.txt                         |  2 +-
 cpp/include/utilities/device_comm.cuh      | 55 +++++++---------------
 cpp/include/utilities/host_scalar_comm.cuh | 10 ----
 3 files changed, 17 insertions(+), 50 deletions(-)

diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
index b2d537edaa2..d211fe9ed5a 100644
--- a/cpp/CMakeLists.txt
+++ b/cpp/CMakeLists.txt
@@ -298,7 +298,7 @@ else(DEFINED ENV{RAFT_PATH})
   FetchContent_Declare(
     raft
     GIT_REPOSITORY    https://github.com/rapidsai/raft.git
-    GIT_TAG           4a79adcb0c0e87964dcdc9b9122f242b5235b702
+    GIT_TAG           a3461b201ea1c9f61571f1927274f739e775d2d2
     SOURCE_SUBDIR     raft
   )
 
diff --git a/cpp/include/utilities/device_comm.cuh b/cpp/include/utilities/device_comm.cuh
index 8c3b0f86a47..24b9147ce3c 100644
--- a/cpp/include/utilities/device_comm.cuh
+++ b/cpp/include/utilities/device_comm.cuh
@@ -196,21 +196,13 @@ device_sendrecv_impl(raft::comms::comms_t const& comm,
   using value_type = typename std::iterator_traits<InputIterator>::value_type;
   static_assert(
     std::is_same<typename std::iterator_traits<OutputIterator>::value_type, value_type>::value);
-  // ncclSend/ncclRecv pair needs to be located inside ncclGroupStart/ncclGroupEnd to avoid deadlock
-  ncclGroupStart();
-  ncclSend(iter_to_raw_ptr(input_first),
-           tx_count * sizeof(value_type),
-           ncclUint8,
-           dst,
-           comm.get_nccl_comm(),
-           stream);
-  ncclRecv(iter_to_raw_ptr(output_first),
-           rx_count * sizeof(value_type),
-           ncclUint8,
-           src,
-           comm.get_nccl_comm(),
-           stream);
-  ncclGroupEnd();
+  comm.device_sendrecv(iter_to_raw_ptr(input_first),
+                       tx_count,
+                       dst,
+                       iter_to_raw_ptr(output_first),
+                       rx_count,
+                       src,
+                       stream);
 }
 
 template <typename InputIterator, typename OutputIterator, size_t I, size_t N>
@@ -288,25 +280,15 @@ device_multicast_sendrecv_impl(raft::comms::comms_t const& comm,
   using value_type = typename std::iterator_traits<InputIterator>::value_type;
   static_assert(
     std::is_same<typename std::iterator_traits<OutputIterator>::value_type, value_type>::value);
-  // ncclSend/ncclRecv pair needs to be located inside ncclGroupStart/ncclGroupEnd to avoid deadlock
-  ncclGroupStart();
-  for (size_t i = 0; i < tx_counts.size(); ++i) {
-    ncclSend(iter_to_raw_ptr(input_first + tx_offsets[i]),
-             tx_counts[i] * sizeof(value_type),
-             ncclUint8,
-             tx_dst_ranks[i],
-             comm.get_nccl_comm(),
-             stream);
-  }
-  for (size_t i = 0; i < rx_counts.size(); ++i) {
-    ncclRecv(iter_to_raw_ptr(output_first + rx_offsets[i]),
-             rx_counts[i] * sizeof(value_type),
-             ncclUint8,
-             rx_src_ranks[i],
-             comm.get_nccl_comm(),
-             stream);
-  }
-  ncclGroupEnd();
+  comm.device_multicast_sendrecv(iter_to_raw_ptr(input_first),
+                                 tx_counts,
+                                 tx_offsets,
+                                 tx_dst_ranks,
+                                 iter_to_raw_ptr(output_first),
+                                 rx_counts,
+                                 rx_offsets,
+                                 rx_src_ranks,
+                                 stream);
 }
 
 template <typename InputIterator, typename OutputIterator, size_t I, size_t N>
@@ -589,10 +571,6 @@ device_gatherv_impl(raft::comms::comms_t const& comm,
 {
   static_assert(std::is_same<typename std::iterator_traits<InputIterator>::value_type,
                              typename std::iterator_traits<OutputIterator>::value_type>::value);
-  // FIXME: should be enabled once the RAFT gather & gatherv PR is merged
-#if 1
-  CUGRAPH_FAIL("Unimplemented.");
-#else
   comm.gatherv(iter_to_raw_ptr(input_first),
                iter_to_raw_ptr(output_first),
                sendcount,
@@ -600,7 +578,6 @@ device_gatherv_impl(raft::comms::comms_t const& comm,
                displacements.data(),
                root,
                stream);
-#endif
 }
 
 template <typename InputIterator, typename OutputIterator, size_t I, size_t N>
diff --git a/cpp/include/utilities/host_scalar_comm.cuh b/cpp/include/utilities/host_scalar_comm.cuh
index dda0ce1f091..2ecfd913813 100644
--- a/cpp/include/utilities/host_scalar_comm.cuh
+++ b/cpp/include/utilities/host_scalar_comm.cuh
@@ -321,16 +321,11 @@ std::enable_if_t<std::is_arithmetic<T>::value, std::vector<T>> host_scalar_gathe
     &input,
     1,
     stream);
-  // FIXME: should be enabled once the RAFT gather & gatherv PR is merged
-#if 1
-  CUGRAPH_FAIL("Unimplemented.");
-#else
   comm.gather(comm.get_rank() == root ? d_outputs.data() + comm.get_rank() : d_outputs.data(),
               d_outputs.data(),
               size_t{1},
               root,
               stream);
-#endif
   std::vector<T> h_outputs(comm.get_rank() == root ? comm.get_size() : 0);
   if (comm.get_rank() == root) {
     raft::update_host(h_outputs.data(), d_outputs.data(), comm.get_size(), stream);
@@ -358,10 +353,6 @@ host_scalar_gather(raft::comms::comms_t const& comm, T input, int root, cudaStre
                       h_tuple_scalar_elements.data(),
                       tuple_size,
                       stream);
-  // FIXME: should be enabled once the RAFT gather & gatherv PR is merged
-#if 1
-  CUGRAPH_FAIL("Unimplemented.");
-#else
   comm.gather(comm.get_rank() == root
                 ? d_gathered_tuple_scalar_elements.data() + comm.get_rank() * tuple_size
                 : d_gathered_tuple_scalar_elements.data(),
@@ -369,7 +360,6 @@ host_scalar_gather(raft::comms::comms_t const& comm, T input, int root, cudaStre
               tuple_size,
               root,
               stream);
-#endif
   std::vector<int64_t> h_gathered_tuple_scalar_elements(
     comm.get_rank() == root ? comm.get_size() * tuple_size : size_t{0});
   if (comm.get_rank() == root) {

From 99d1328922b03a96734cf7b520263af66347e55c Mon Sep 17 00:00:00 2001
From: Seunghwa Kang <45857425+seunghwak@users.noreply.github.com>
Date: Thu, 25 Feb 2021 17:21:27 -0500
Subject: [PATCH 3/5] Adding new primitives:
 copy_v_transform_reduce_key_aggregated_out_nbr &
 transform_reduce_by_adj_matrix_row|col_key_e bug fixes (#1399)

bug fixes

Authors:
  - Seunghwa Kang (@seunghwak)

Approvers:
  - Chuck Hastings (@ChuckHastings)
  - Brad Rees (@BradReesWork)

URL: https://github.com/rapidsai/cugraph/pull/1399
---
 ...ransform_reduce_key_aggregated_out_nbr.cuh |  20 +--
 ...orm_reduce_by_adj_matrix_row_col_key_e.cuh | 128 ++++++++----------
 cpp/include/utilities/device_comm.cuh         |   8 +-
 3 files changed, 75 insertions(+), 81 deletions(-)

diff --git a/cpp/include/patterns/copy_v_transform_reduce_key_aggregated_out_nbr.cuh b/cpp/include/patterns/copy_v_transform_reduce_key_aggregated_out_nbr.cuh
index 785f8197aff..8490df1d17d 100644
--- a/cpp/include/patterns/copy_v_transform_reduce_key_aggregated_out_nbr.cuh
+++ b/cpp/include/patterns/copy_v_transform_reduce_key_aggregated_out_nbr.cuh
@@ -18,8 +18,10 @@
 #include <experimental/detail/graph_utils.cuh>
 #include <experimental/graph.hpp>
 #include <experimental/graph_view.hpp>
+#include <matrix_partition_device.cuh>
 #include <utilities/dataframe_buffer.cuh>
 #include <utilities/error.hpp>
+#include <utilities/host_scalar_comm.cuh>
 #include <utilities/shuffle_comm.cuh>
 #include <vertex_partition_device.cuh>
 
@@ -100,10 +102,10 @@ __global__ void for_all_major_for_all_nbr_low_degree(
       }
       thrust::fill(thrust::seq,
                    major_vertices + local_offset,
-                   major_vertices + local_offset + key_idx,
+                   major_vertices + local_offset + key_idx + 1,
                    matrix_partition.get_major_from_major_offset_nocheck(major_offset));
       thrust::fill(thrust::seq,
-                   major_vertices + local_offset + key_idx,
+                   major_vertices + local_offset + key_idx + 1,
                    major_vertices + local_offset + local_degree,
                    invalid_vertex);
     }
@@ -159,8 +161,7 @@ __global__ void for_all_major_for_all_nbr_low_degree(
  * pairs provided by @p map_key_first, @p map_key_last, and @p map_value_first (aggregated over the
  * entire set of processes in multi-GPU).
  * @param reduce_op Binary operator takes two input arguments and reduce the two variables to one.
- * @param init Initial value to be added to the reduced @p key_aggregated_e_op return values for
- * each vertex.
+ * @param init Initial value to be added to the reduced @p reduce_op return values for each vertex.
  * @param vertex_value_output_first Iterator pointing to the vertex property variables for the
  * first (inclusive) vertex (assigned to tihs process in multi-GPU). `vertex_value_output_last`
  * (exclusive) is deduced as @p vertex_value_output_first + @p
@@ -191,6 +192,7 @@ void copy_v_transform_reduce_key_aggregated_out_nbr(
                 "GraphViewType should support the push model.");
   static_assert(std::is_same<typename std::iterator_traits<VertexIterator>::value_type,
                              typename GraphViewType::vertex_type>::value);
+  static_assert(is_arithmetic_or_thrust_tuple_of_arithmetic<T>::value);
 
   using vertex_t = typename GraphViewType::vertex_type;
   using edge_t   = typename GraphViewType::edge_type;
@@ -393,7 +395,7 @@ void copy_v_transform_reduce_key_aggregated_out_nbr(
       tmp_major_vertices.begin(), tmp_minor_keys.begin(), tmp_key_aggregated_edge_weights.begin()));
     thrust::transform(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
                       triplet_first,
-                      triplet_first + major_vertices.size(),
+                      triplet_first + tmp_major_vertices.size(),
                       tmp_e_op_result_buffer_first,
                       [adj_matrix_row_value_input_first,
                        key_aggregated_e_op,
@@ -408,7 +410,7 @@ void copy_v_transform_reduce_key_aggregated_out_nbr(
                           w,
                           *(adj_matrix_row_value_input_first +
                             matrix_partition.get_major_offset_from_major_nocheck(major)),
-                          kv_map.find(key)->second);
+                          kv_map.find(key)->second.load(cuda::std::memory_order_relaxed));
                       });
     tmp_minor_keys.resize(0, handle.get_stream());
     tmp_key_aggregated_edge_weights.resize(0, handle.get_stream());
@@ -488,11 +490,12 @@ void copy_v_transform_reduce_key_aggregated_out_nbr(
   auto major_vertex_first = thrust::make_transform_iterator(
     thrust::make_counting_iterator(size_t{0}),
     [major_vertices = major_vertices.data()] __device__(auto i) {
-      return ((i == 0) || (major_vertices[i] == major_vertices[i - 1]))
+      return ((i == 0) || (major_vertices[i] != major_vertices[i - 1]))
                ? major_vertices[i]
                : invalid_vertex_id<vertex_t>::value;
     });
   thrust::copy_if(
+    rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
     major_vertex_first,
     major_vertex_first + major_vertices.size(),
     unique_major_vertices.begin(),
@@ -506,9 +509,10 @@ void copy_v_transform_reduce_key_aggregated_out_nbr(
     thrust::make_permutation_iterator(
       vertex_value_output_first,
       thrust::make_transform_iterator(
-        major_vertices.begin(),
+        unique_major_vertices.begin(),
         [vertex_partition = vertex_partition_device_t<GraphViewType>(graph_view)] __device__(
           auto v) { return vertex_partition.get_local_vertex_offset_from_vertex_nocheck(v); })),
+    thrust::equal_to<vertex_t>{},
     reduce_op);
 
   thrust::transform(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
diff --git a/cpp/include/patterns/transform_reduce_by_adj_matrix_row_col_key_e.cuh b/cpp/include/patterns/transform_reduce_by_adj_matrix_row_col_key_e.cuh
index 70b6dc92752..0b3588bc8c5 100644
--- a/cpp/include/patterns/transform_reduce_by_adj_matrix_row_col_key_e.cuh
+++ b/cpp/include/patterns/transform_reduce_by_adj_matrix_row_col_key_e.cuh
@@ -17,6 +17,8 @@
 
 #include <experimental/detail/graph_utils.cuh>
 #include <experimental/graph_view.hpp>
+#include <matrix_partition_device.cuh>
+#include <patterns/edge_op_utils.cuh>
 #include <utilities/dataframe_buffer.cuh>
 #include <utilities/error.hpp>
 #include <utilities/shuffle_comm.cuh>
@@ -124,6 +126,35 @@ __global__ void for_all_major_for_all_nbr_low_degree(
   }
 }
 
+// FIXME: better derive value_t from BufferType
+template <typename vertex_t, typename value_t, typename BufferType>
+std::tuple<rmm::device_uvector<vertex_t>, BufferType> reduce_to_unique_kv_pairs(
+  rmm::device_uvector<vertex_t>&& keys, BufferType&& value_buffer, cudaStream_t stream)
+{
+  thrust::sort_by_key(rmm::exec_policy(stream)->on(stream),
+                      keys.begin(),
+                      keys.end(),
+                      get_dataframe_buffer_begin<value_t>(value_buffer));
+  auto num_uniques =
+    thrust::count_if(rmm::exec_policy(stream)->on(stream),
+                     thrust::make_counting_iterator(size_t{0}),
+                     thrust::make_counting_iterator(keys.size()),
+                     [keys = keys.data()] __device__(auto i) {
+                       return ((i == 0) || (keys[i] != keys[i - 1])) ? true : false;
+                     });
+
+  rmm::device_uvector<vertex_t> unique_keys(num_uniques, stream);
+  auto value_for_unique_key_buffer = allocate_dataframe_buffer<value_t>(unique_keys.size(), stream);
+  thrust::reduce_by_key(rmm::exec_policy(stream)->on(stream),
+                        keys.begin(),
+                        keys.end(),
+                        get_dataframe_buffer_begin<value_t>(value_buffer),
+                        unique_keys.begin(),
+                        get_dataframe_buffer_begin<value_t>(value_for_unique_key_buffer));
+
+  return std::make_tuple(std::move(unique_keys), std::move(value_for_unique_key_buffer));
+}
+
 template <bool adj_matrix_row_key,
           typename GraphViewType,
           typename AdjMatrixRowValueInputIterator,
@@ -210,95 +241,54 @@ transform_reduce_by_adj_matrix_row_col_key_e(
           tmp_keys.data(),
           get_dataframe_buffer_begin<T>(tmp_value_buffer));
     }
+    std::tie(tmp_keys, tmp_value_buffer) = reduce_to_unique_kv_pairs<vertex_t, T>(
+      std::move(tmp_keys), std::move(tmp_value_buffer), handle.get_stream());
 
     if (GraphViewType::is_multi_gpu) {
       auto& comm           = handle.get_comms();
       auto const comm_size = comm.get_size();
 
-      thrust::sort_by_key(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
-                          tmp_keys.begin(),
-                          tmp_keys.end(),
-                          get_dataframe_buffer_begin<T>(tmp_value_buffer));
-
-      auto num_uniques =
-        thrust::count_if(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
-                         thrust::make_counting_iterator(size_t{0}),
-                         thrust::make_counting_iterator(tmp_keys.size()),
-                         [tmp_keys = tmp_keys.data()] __device__(auto i) {
-                           return ((i == 0) || (tmp_keys[i] != tmp_keys[i - 1])) ? true : false;
-                         });
-      rmm::device_uvector<vertex_t> unique_keys(num_uniques, handle.get_stream());
-      auto value_for_unique_key_buffer =
-        allocate_dataframe_buffer<T>(unique_keys.size(), handle.get_stream());
-
-      thrust::reduce_by_key(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
-                            tmp_keys.begin(),
-                            tmp_keys.end(),
-                            get_dataframe_buffer_begin<T>(tmp_value_buffer),
-                            unique_keys.begin(),
-                            get_dataframe_buffer_begin<T>(value_for_unique_key_buffer));
-
       rmm::device_uvector<vertex_t> rx_unique_keys(0, handle.get_stream());
       auto rx_value_for_unique_key_buffer = allocate_dataframe_buffer<T>(0, handle.get_stream());
       std::tie(rx_unique_keys, rx_value_for_unique_key_buffer, std::ignore) =
         groupby_gpuid_and_shuffle_kv_pairs(
           comm,
-          unique_keys.begin(),
-          unique_keys.end(),
-          get_dataframe_buffer_begin<T>(value_for_unique_key_buffer),
+          tmp_keys.begin(),
+          tmp_keys.end(),
+          get_dataframe_buffer_begin<T>(tmp_value_buffer),
           [key_func = detail::compute_gpu_id_from_vertex_t<vertex_t>{comm_size}] __device__(
             auto val) { return key_func(val); },
           handle.get_stream());
 
-      // FIXME: we can reduce after shuffle
-
-      tmp_keys         = std::move(rx_unique_keys);
-      tmp_value_buffer = std::move(rx_value_for_unique_key_buffer);
+      std::tie(tmp_keys, tmp_value_buffer) = reduce_to_unique_kv_pairs<vertex_t, T>(
+        std::move(rx_unique_keys), std::move(rx_value_for_unique_key_buffer), handle.get_stream());
     }
 
     auto cur_size = keys.size();
-    // FIXME: this can lead to frequent costly reallocation; we may be able to avoid this if we can
-    // reserve address space to avoid expensive reallocation.
-    // https://devblogs.nvidia.com/introducing-low-level-gpu-virtual-memory-management
-    keys.resize(cur_size + tmp_keys.size(), handle.get_stream());
-    resize_dataframe_buffer<T>(value_buffer, keys.size(), handle.get_stream());
-
-    thrust::copy(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
-                 tmp_keys.begin(),
-                 tmp_keys.end(),
-                 keys.begin() + cur_size);
-    thrust::copy(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
-                 get_dataframe_buffer_begin<T>(tmp_value_buffer),
-                 get_dataframe_buffer_begin<T>(tmp_value_buffer) + tmp_keys.size(),
-                 get_dataframe_buffer_begin<T>(value_buffer) + cur_size);
+    if (cur_size == 0) {
+      keys         = std::move(tmp_keys);
+      value_buffer = std::move(tmp_value_buffer);
+    } else {
+      // FIXME: this can lead to frequent costly reallocation; we may be able to avoid this if we
+      // can reserve address space to avoid expensive reallocation.
+      // https://devblogs.nvidia.com/introducing-low-level-gpu-virtual-memory-management
+      keys.resize(cur_size + tmp_keys.size(), handle.get_stream());
+      resize_dataframe_buffer<T>(value_buffer, keys.size(), handle.get_stream());
+
+      thrust::copy(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
+                   tmp_keys.begin(),
+                   tmp_keys.end(),
+                   keys.begin() + cur_size);
+      thrust::copy(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
+                   get_dataframe_buffer_begin<T>(tmp_value_buffer),
+                   get_dataframe_buffer_begin<T>(tmp_value_buffer) + tmp_keys.size(),
+                   get_dataframe_buffer_begin<T>(value_buffer) + cur_size);
+    }
   }
 
   if (GraphViewType::is_multi_gpu) {
-    thrust::sort_by_key(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
-                        keys.begin(),
-                        keys.end(),
-                        get_dataframe_buffer_begin<T>(value_buffer));
-
-    auto num_uniques =
-      thrust::count_if(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
-                       thrust::make_counting_iterator(size_t{0}),
-                       thrust::make_counting_iterator(keys.size()),
-                       [keys = keys.data()] __device__(auto i) {
-                         return ((i == 0) || (keys[i] != keys[i - 1])) ? true : false;
-                       });
-    rmm::device_uvector<vertex_t> unique_keys(num_uniques, handle.get_stream());
-    auto value_for_unique_key_buffer =
-      allocate_dataframe_buffer<T>(unique_keys.size(), handle.get_stream());
-
-    thrust::reduce_by_key(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
-                          keys.begin(),
-                          keys.end(),
-                          get_dataframe_buffer_begin<T>(value_buffer),
-                          unique_keys.begin(),
-                          get_dataframe_buffer_begin<T>(value_for_unique_key_buffer));
-
-    keys         = std::move(unique_keys);
-    value_buffer = std::move(value_for_unique_key_buffer);
+    std::tie(keys, value_buffer) = reduce_to_unique_kv_pairs<vertex_t, T>(
+      std::move(keys), std::move(value_buffer), handle.get_stream());
   }
 
   // FIXME: add init
diff --git a/cpp/include/utilities/device_comm.cuh b/cpp/include/utilities/device_comm.cuh
index 24b9147ce3c..7b9956902cc 100644
--- a/cpp/include/utilities/device_comm.cuh
+++ b/cpp/include/utilities/device_comm.cuh
@@ -973,10 +973,10 @@ device_gatherv(raft::comms::comms_t const& comm,
   size_t constexpr tuple_size =
     thrust::tuple_size<typename thrust::iterator_traits<InputIterator>::value_type>::value;
 
-  detail::device_allgatherv_tuple_iterator_element_impl<InputIterator,
-                                                        OutputIterator,
-                                                        size_t{0},
-                                                        tuple_size>()
+  detail::device_gatherv_tuple_iterator_element_impl<InputIterator,
+                                                     OutputIterator,
+                                                     size_t{0},
+                                                     tuple_size>()
     .run(comm, input_first, output_first, sendcount, recvcounts, displacements, root, stream);
 }
 

From 55896052e05f4e1d27def51391458cb08c3516ca Mon Sep 17 00:00:00 2001
From: Seunghwa Kang <45857425+seunghwak@users.noreply.github.com>
Date: Thu, 25 Feb 2021 17:21:52 -0500
Subject: [PATCH 4/5] Add new primitives: compute_in|out_degrees,
 compute_in|out_weight_sums to graph_view_t (#1394)

Close https://github.com/rapidsai/cugraph/issues/1208

- [x] add compute_in|out_degrees, compute_in|out_weight_sums
- [x] replace PageRank's custom code to compute out-weight-sums to use graph_view_t's compute_out_weight_sums
- [x] add SG C++ tests

Authors:
  - Seunghwa Kang (@seunghwak)

Approvers:
  - Chuck Hastings (@ChuckHastings)
  - Alex Fender (@afender)

URL: https://github.com/rapidsai/cugraph/pull/1394
---
 .../experimental/detail/graph_utils.cuh       |  22 +-
 cpp/include/experimental/graph_view.hpp       |  12 +
 cpp/include/utilities/shuffle_comm.cuh        |   4 +-
 cpp/src/experimental/graph.cu                 |   2 +-
 cpp/src/experimental/graph_view.cu            | 228 +++++++++++++++++-
 cpp/src/experimental/pagerank.cu              |  20 +-
 cpp/src/experimental/renumber_edgelist.cu     |   4 +-
 cpp/tests/CMakeLists.txt                      |  20 ++
 cpp/tests/experimental/degree_test.cpp        | 165 +++++++++++++
 cpp/tests/experimental/weight_sum_test.cpp    | 186 ++++++++++++++
 10 files changed, 637 insertions(+), 26 deletions(-)
 create mode 100644 cpp/tests/experimental/degree_test.cpp
 create mode 100644 cpp/tests/experimental/weight_sum_test.cpp

diff --git a/cpp/include/experimental/detail/graph_utils.cuh b/cpp/include/experimental/detail/graph_utils.cuh
index 3ac2e2163c6..084d68b8ba4 100644
--- a/cpp/include/experimental/detail/graph_utils.cuh
+++ b/cpp/include/experimental/detail/graph_utils.cuh
@@ -25,6 +25,7 @@
 #include <rmm/device_uvector.hpp>
 
 #include <thrust/sort.h>
+#include <thrust/tabulate.h>
 #include <thrust/transform.h>
 #include <cuco/detail/hash_functions.cuh>
 
@@ -39,7 +40,7 @@ namespace detail {
 // compute the numbers of nonzeros in rows (of the graph adjacency matrix, if store_transposed =
 // false) or columns (of the graph adjacency matrix, if store_transposed = true)
 template <typename vertex_t, typename edge_t>
-rmm::device_uvector<edge_t> compute_major_degree(
+rmm::device_uvector<edge_t> compute_major_degrees(
   raft::handle_t const &handle,
   std::vector<edge_t const *> const &adj_matrix_partition_offsets,
   partition_t<vertex_t> const &partition)
@@ -120,7 +121,7 @@ rmm::device_uvector<edge_t> compute_major_degree(
 // compute the numbers of nonzeros in rows (of the graph adjacency matrix, if store_transposed =
 // false) or columns (of the graph adjacency matrix, if store_transposed = true)
 template <typename vertex_t, typename edge_t>
-rmm::device_uvector<edge_t> compute_major_degree(
+rmm::device_uvector<edge_t> compute_major_degrees(
   raft::handle_t const &handle,
   std::vector<rmm::device_uvector<edge_t>> const &adj_matrix_partition_offsets,
   partition_t<vertex_t> const &partition)
@@ -131,7 +132,22 @@ rmm::device_uvector<edge_t> compute_major_degree(
                  adj_matrix_partition_offsets.end(),
                  tmp_offsets.begin(),
                  [](auto const &offsets) { return offsets.data(); });
-  return compute_major_degree(handle, tmp_offsets, partition);
+  return compute_major_degrees(handle, tmp_offsets, partition);
+}
+
+// compute the numbers of nonzeros in rows (of the graph adjacency matrix, if store_transposed =
+// false) or columns (of the graph adjacency matrix, if store_transposed = true)
+template <typename vertex_t, typename edge_t>
+rmm::device_uvector<edge_t> compute_major_degrees(raft::handle_t const &handle,
+                                                  edge_t const *offsets,
+                                                  vertex_t number_of_vertices)
+{
+  rmm::device_uvector<edge_t> degrees(number_of_vertices, handle.get_stream());
+  thrust::tabulate(rmm::exec_policy(handle.get_stream())->on(handle.get_stream()),
+                   degrees.begin(),
+                   degrees.end(),
+                   [offsets] __device__(auto i) { return offsets[i + 1] - offsets[i]; });
+  return degrees;
 }
 
 template <typename vertex_t, typename edge_t>
diff --git a/cpp/include/experimental/graph_view.hpp b/cpp/include/experimental/graph_view.hpp
index d2ae1150970..7598841fc1a 100644
--- a/cpp/include/experimental/graph_view.hpp
+++ b/cpp/include/experimental/graph_view.hpp
@@ -494,6 +494,12 @@ class graph_view_t<vertex_t,
              : static_cast<weight_t const*>(nullptr);
   }
 
+  rmm::device_uvector<edge_t> compute_in_degrees(raft::handle_t const& handle) const;
+  rmm::device_uvector<edge_t> compute_out_degrees(raft::handle_t const& handle) const;
+
+  rmm::device_uvector<weight_t> compute_in_weight_sums(raft::handle_t const& handle) const;
+  rmm::device_uvector<weight_t> compute_out_weight_sums(raft::handle_t const& handle) const;
+
  private:
   std::vector<edge_t const*> adj_matrix_partition_offsets_{};
   std::vector<vertex_t const*> adj_matrix_partition_indices_{};
@@ -638,6 +644,12 @@ class graph_view_t<vertex_t,
   // private.
   weight_t const* weights() const { return weights_; }
 
+  rmm::device_uvector<edge_t> compute_in_degrees(raft::handle_t const& handle) const;
+  rmm::device_uvector<edge_t> compute_out_degrees(raft::handle_t const& handle) const;
+
+  rmm::device_uvector<weight_t> compute_in_weight_sums(raft::handle_t const& handle) const;
+  rmm::device_uvector<weight_t> compute_out_weight_sums(raft::handle_t const& handle) const;
+
  private:
   edge_t const* offsets_{nullptr};
   vertex_t const* indices_{nullptr};
diff --git a/cpp/include/utilities/shuffle_comm.cuh b/cpp/include/utilities/shuffle_comm.cuh
index 7e04c7e1972..da86f76b11d 100644
--- a/cpp/include/utilities/shuffle_comm.cuh
+++ b/cpp/include/utilities/shuffle_comm.cuh
@@ -69,7 +69,7 @@ rmm::device_uvector<size_t> sort_and_count(raft::comms::comms_t const &comm,
     d_tx_value_counts = std::move(d_counts);
   }
 
-  return std::move(d_tx_value_counts);
+  return d_tx_value_counts;
 }
 
 template <typename VertexIterator, typename ValueIterator, typename KeyToGPUIdOp>
@@ -111,7 +111,7 @@ rmm::device_uvector<size_t> sort_and_count(raft::comms::comms_t const &comm,
     d_tx_value_counts = std::move(d_counts);
   }
 
-  return std::move(d_tx_value_counts);
+  return d_tx_value_counts;
 }
 
 // inline to suppress a complaint about ODR violation
diff --git a/cpp/src/experimental/graph.cu b/cpp/src/experimental/graph.cu
index 5cf393bfce4..498bb4eaefe 100644
--- a/cpp/src/experimental/graph.cu
+++ b/cpp/src/experimental/graph.cu
@@ -278,7 +278,7 @@ graph_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu, std::enable_if_
   // update degree-based segment offsets (to be used for graph analytics kernel optimization)
 
   if (sorted_by_global_degree_within_vertex_partition) {
-    auto degrees = detail::compute_major_degree(
+    auto degrees = detail::compute_major_degrees(
       *(this->get_handle_ptr()), adj_matrix_partition_offsets_, partition_);
 
     // optional expensive checks (part 2/3)
diff --git a/cpp/src/experimental/graph_view.cu b/cpp/src/experimental/graph_view.cu
index df92fd94194..f443608e424 100644
--- a/cpp/src/experimental/graph_view.cu
+++ b/cpp/src/experimental/graph_view.cu
@@ -17,6 +17,7 @@
 #include <experimental/detail/graph_utils.cuh>
 #include <experimental/graph_view.hpp>
 #include <partition_manager.hpp>
+#include <patterns/copy_v_transform_reduce_in_out_nbr.cuh>
 #include <utilities/error.hpp>
 #include <utilities/host_scalar_comm.cuh>
 
@@ -70,6 +71,83 @@ std::vector<edge_t> update_adj_matrix_partition_edge_counts(
   return adj_matrix_partition_edge_counts;
 }
 
+template <typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<edge_t> compute_minor_degrees(
+  raft::handle_t const& handle,
+  graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu> const& graph_view)
+{
+  rmm::device_uvector<edge_t> minor_degrees(graph_view.get_number_of_local_vertices(),
+                                            handle.get_stream());
+  if (store_transposed) {
+    copy_v_transform_reduce_out_nbr(
+      handle,
+      graph_view,
+      thrust::make_constant_iterator(0) /* dummy */,
+      thrust::make_constant_iterator(0) /* dummy */,
+      [] __device__(vertex_t src, vertex_t dst, weight_t w, auto src_val, auto dst_val) {
+        return edge_t{1};
+      },
+      edge_t{0},
+      minor_degrees.data());
+  } else {
+    copy_v_transform_reduce_in_nbr(
+      handle,
+      graph_view,
+      thrust::make_constant_iterator(0) /* dummy */,
+      thrust::make_constant_iterator(0) /* dummy */,
+      [] __device__(vertex_t src, vertex_t dst, weight_t w, auto src_val, auto dst_val) {
+        return edge_t{1};
+      },
+      edge_t{0},
+      minor_degrees.data());
+  }
+
+  return minor_degrees;
+}
+
+template <bool major,
+          typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<weight_t> compute_weight_sums(
+  raft::handle_t const& handle,
+  graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu> const& graph_view)
+{
+  rmm::device_uvector<weight_t> weight_sums(graph_view.get_number_of_local_vertices(),
+                                            handle.get_stream());
+  if (major == store_transposed) {
+    copy_v_transform_reduce_in_nbr(
+      handle,
+      graph_view,
+      thrust::make_constant_iterator(0) /* dummy */,
+      thrust::make_constant_iterator(0) /* dummy */,
+      [] __device__(vertex_t src, vertex_t dst, weight_t w, auto src_val, auto dst_val) {
+        return w;
+      },
+      weight_t{0.0},
+      weight_sums.data());
+  } else {
+    copy_v_transform_reduce_out_nbr(
+      handle,
+      graph_view,
+      thrust::make_constant_iterator(0) /* dummy */,
+      thrust::make_constant_iterator(0) /* dummy */,
+      [] __device__(vertex_t src, vertex_t dst, weight_t w, auto src_val, auto dst_val) {
+        return w;
+      },
+      weight_t{0.0},
+      weight_sums.data());
+  }
+
+  return weight_sums;
+}
+
 }  // namespace
 
 template <typename vertex_t,
@@ -180,7 +258,7 @@ graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu, std::enabl
                     "number_of_local_edges.");
 
     if (sorted_by_global_degree_within_vertex_partition) {
-      auto degrees = detail::compute_major_degree(handle, adj_matrix_partition_offsets, partition);
+      auto degrees = detail::compute_major_degrees(handle, adj_matrix_partition_offsets, partition);
       CUGRAPH_EXPECTS(
         thrust::is_sorted(rmm::exec_policy(default_stream)->on(default_stream),
                           degrees.begin(),
@@ -301,6 +379,154 @@ graph_view_t<vertex_t,
   }
 }
 
+template <typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<edge_t>
+graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu, std::enable_if_t<multi_gpu>>::
+  compute_in_degrees(raft::handle_t const& handle) const
+{
+  if (store_transposed) {
+    return detail::compute_major_degrees(
+      handle, this->adj_matrix_partition_offsets_, this->partition_);
+  } else {
+    return compute_minor_degrees(handle, *this);
+  }
+}
+
+template <typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<edge_t>
+graph_view_t<vertex_t,
+             edge_t,
+             weight_t,
+             store_transposed,
+             multi_gpu,
+             std::enable_if_t<!multi_gpu>>::compute_in_degrees(raft::handle_t const& handle) const
+{
+  if (store_transposed) {
+    return detail::compute_major_degrees(
+      handle, this->offsets_, this->get_number_of_local_vertices());
+  } else {
+    return compute_minor_degrees(handle, *this);
+  }
+}
+
+template <typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<edge_t>
+graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu, std::enable_if_t<multi_gpu>>::
+  compute_out_degrees(raft::handle_t const& handle) const
+{
+  if (store_transposed) {
+    return compute_minor_degrees(handle, *this);
+  } else {
+    return detail::compute_major_degrees(
+      handle, this->adj_matrix_partition_offsets_, this->partition_);
+  }
+}
+
+template <typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<edge_t>
+graph_view_t<vertex_t,
+             edge_t,
+             weight_t,
+             store_transposed,
+             multi_gpu,
+             std::enable_if_t<!multi_gpu>>::compute_out_degrees(raft::handle_t const& handle) const
+{
+  if (store_transposed) {
+    return compute_minor_degrees(handle, *this);
+  } else {
+    return detail::compute_major_degrees(
+      handle, this->offsets_, this->get_number_of_local_vertices());
+  }
+}
+
+template <typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<weight_t>
+graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu, std::enable_if_t<multi_gpu>>::
+  compute_in_weight_sums(raft::handle_t const& handle) const
+{
+  if (store_transposed) {
+    return compute_weight_sums<true>(handle, *this);
+  } else {
+    return compute_weight_sums<false>(handle, *this);
+  }
+}
+
+template <typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<weight_t> graph_view_t<
+  vertex_t,
+  edge_t,
+  weight_t,
+  store_transposed,
+  multi_gpu,
+  std::enable_if_t<!multi_gpu>>::compute_in_weight_sums(raft::handle_t const& handle) const
+{
+  if (store_transposed) {
+    return compute_weight_sums<true>(handle, *this);
+  } else {
+    return compute_weight_sums<false>(handle, *this);
+  }
+}
+
+template <typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<weight_t>
+graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu, std::enable_if_t<multi_gpu>>::
+  compute_out_weight_sums(raft::handle_t const& handle) const
+{
+  if (store_transposed) {
+    return compute_weight_sums<false>(handle, *this);
+  } else {
+    return compute_weight_sums<true>(handle, *this);
+  }
+}
+
+template <typename vertex_t,
+          typename edge_t,
+          typename weight_t,
+          bool store_transposed,
+          bool multi_gpu>
+rmm::device_uvector<weight_t> graph_view_t<
+  vertex_t,
+  edge_t,
+  weight_t,
+  store_transposed,
+  multi_gpu,
+  std::enable_if_t<!multi_gpu>>::compute_out_weight_sums(raft::handle_t const& handle) const
+{
+  if (store_transposed) {
+    return compute_weight_sums<false>(handle, *this);
+  } else {
+    return compute_weight_sums<true>(handle, *this);
+  }
+}
+
 // explicit instantiation
 
 template class graph_view_t<int32_t, int32_t, float, true, true>;
diff --git a/cpp/src/experimental/pagerank.cu b/cpp/src/experimental/pagerank.cu
index 058cbfe5966..c498d2864b4 100644
--- a/cpp/src/experimental/pagerank.cu
+++ b/cpp/src/experimental/pagerank.cu
@@ -142,23 +142,9 @@ void pagerank(raft::handle_t const& handle,
 
   // 2. compute the sums of the out-going edge weights (if not provided)
 
-  rmm::device_uvector<weight_t> tmp_vertex_out_weight_sums(0, handle.get_stream());
-  if (precomputed_vertex_out_weight_sums == nullptr) {
-    tmp_vertex_out_weight_sums.resize(pull_graph_view.get_number_of_local_vertices(),
-                                      handle.get_stream());
-    // FIXME: better refactor this out (computing out-degree).
-    copy_v_transform_reduce_out_nbr(
-      handle,
-      pull_graph_view,
-      thrust::make_constant_iterator(0) /* dummy */,
-      thrust::make_constant_iterator(0) /* dummy */,
-      [alpha] __device__(vertex_t src, vertex_t dst, weight_t w, auto src_val, auto dst_val) {
-        return w;
-      },
-      weight_t{0.0},
-      tmp_vertex_out_weight_sums.data());
-  }
-
+  auto tmp_vertex_out_weight_sums = precomputed_vertex_out_weight_sums == nullptr
+                                      ? pull_graph_view.compute_out_weight_sums(handle)
+                                      : rmm::device_uvector<weight_t>(0, handle.get_stream());
   auto vertex_out_weight_sums = precomputed_vertex_out_weight_sums != nullptr
                                   ? precomputed_vertex_out_weight_sums
                                   : tmp_vertex_out_weight_sums.data();
diff --git a/cpp/src/experimental/renumber_edgelist.cu b/cpp/src/experimental/renumber_edgelist.cu
index 6a5a1c732c2..b093a9adb22 100644
--- a/cpp/src/experimental/renumber_edgelist.cu
+++ b/cpp/src/experimental/renumber_edgelist.cu
@@ -224,7 +224,7 @@ rmm::device_uvector<vertex_t> compute_renumber_map(
                       labels.begin(),
                       thrust::greater<edge_t>());
 
-  return std::move(labels);
+  return labels;
 }
 
 template <typename vertex_t, typename edge_t, bool multi_gpu>
@@ -609,7 +609,7 @@ std::enable_if_t<!multi_gpu, rmm::device_uvector<vertex_t>> renumber_edgelist(
   renumber_map.find(
     edgelist_minor_vertices, edgelist_minor_vertices + num_edgelist_edges, edgelist_minor_vertices);
 
-  return std::move(renumber_map_labels);
+  return renumber_map_labels;
 #else
   return rmm::device_uvector<vertex_t>(0, handle.get_stream());
 #endif
diff --git a/cpp/tests/CMakeLists.txt b/cpp/tests/CMakeLists.txt
index 5425c68e896..68b277871b1 100644
--- a/cpp/tests/CMakeLists.txt
+++ b/cpp/tests/CMakeLists.txt
@@ -331,6 +331,26 @@ set(EXPERIMENTAL_GRAPH_TEST_SRCS
 
 ConfigureTest(EXPERIMENTAL_GRAPH_TEST "${EXPERIMENTAL_GRAPH_TEST_SRCS}")
 
+###################################################################################################
+# - Experimental weight-sum tests -----------------------------------------------------------------
+
+set(EXPERIMENTAL_WEIGHT_SUM_TEST_SRCS
+    "${CMAKE_CURRENT_SOURCE_DIR}/../../thirdparty/mmio/mmio.c"
+    "${CMAKE_CURRENT_SOURCE_DIR}/utilities/test_utilities.cpp"
+    "${CMAKE_CURRENT_SOURCE_DIR}/experimental/weight_sum_test.cpp")
+
+ConfigureTest(EXPERIMENTAL_WEIGHT_SUM_TEST "${EXPERIMENTAL_WEIGHT_SUM_TEST_SRCS}")
+
+###################################################################################################
+# - Experimental degree tests ---------------------------------------------------------------------
+
+set(EXPERIMENTAL_DEGREE_TEST_SRCS
+    "${CMAKE_CURRENT_SOURCE_DIR}/../../thirdparty/mmio/mmio.c"
+    "${CMAKE_CURRENT_SOURCE_DIR}/utilities/test_utilities.cpp"
+    "${CMAKE_CURRENT_SOURCE_DIR}/experimental/degree_test.cpp")
+
+ConfigureTest(EXPERIMENTAL_DEGREE_TEST "${EXPERIMENTAL_DEGREE_TEST_SRCS}")
+
 ###################################################################################################
 # - Experimental coarsening tests -----------------------------------------------------------------
 
diff --git a/cpp/tests/experimental/degree_test.cpp b/cpp/tests/experimental/degree_test.cpp
new file mode 100644
index 00000000000..7c7b41cdacc
--- /dev/null
+++ b/cpp/tests/experimental/degree_test.cpp
@@ -0,0 +1,165 @@
+/*
+ * Copyright (c) 2020-2021, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governin_from_mtxg permissions and
+ * limitations under the License.
+ */
+
+#include <utilities/base_fixture.hpp>
+#include <utilities/test_utilities.hpp>
+
+#include <algorithms.hpp>
+#include <experimental/graph.hpp>
+#include <experimental/graph_view.hpp>
+
+#include <raft/cudart_utils.h>
+#include <raft/handle.hpp>
+#include <rmm/device_uvector.hpp>
+#include <rmm/mr/device/cuda_memory_resource.hpp>
+
+#include <gtest/gtest.h>
+
+#include <algorithm>
+#include <iterator>
+#include <limits>
+#include <numeric>
+#include <vector>
+
+template <typename vertex_t, typename edge_t>
+void degree_reference(edge_t const* offsets,
+                      vertex_t const* indices,
+                      edge_t* degrees,
+                      vertex_t num_vertices,
+                      bool major)
+{
+  if (major) {
+    std::adjacent_difference(offsets + 1, offsets + num_vertices + 1, degrees);
+  } else {
+    std::fill(degrees, degrees + num_vertices, edge_t{0});
+    for (vertex_t i = 0; i < num_vertices; ++i) {
+      for (auto j = offsets[i]; j < offsets[i + 1]; ++j) {
+        auto nbr = indices[j];
+        ++degrees[nbr];
+      }
+    }
+  }
+
+  return;
+}
+
+typedef struct Degree_Usecase_t {
+  std::string graph_file_full_path{};
+
+  Degree_Usecase_t(std::string const& graph_file_path)
+  {
+    if ((graph_file_path.length() > 0) && (graph_file_path[0] != '/')) {
+      graph_file_full_path = cugraph::test::get_rapids_dataset_root_dir() + "/" + graph_file_path;
+    } else {
+      graph_file_full_path = graph_file_path;
+    }
+  };
+} Degree_Usecase;
+
+class Tests_Degree : public ::testing::TestWithParam<Degree_Usecase> {
+ public:
+  Tests_Degree() {}
+  static void SetupTestCase() {}
+  static void TearDownTestCase() {}
+
+  virtual void SetUp() {}
+  virtual void TearDown() {}
+
+  template <typename vertex_t, typename edge_t, typename weight_t, bool store_transposed>
+  void run_current_test(Degree_Usecase const& configuration)
+  {
+    raft::handle_t handle{};
+
+    auto graph = cugraph::test::
+      read_graph_from_matrix_market_file<vertex_t, edge_t, weight_t, store_transposed>(
+        handle, configuration.graph_file_full_path, false);
+    auto graph_view = graph.view();
+
+    std::vector<edge_t> h_offsets(graph_view.get_number_of_vertices() + 1);
+    std::vector<vertex_t> h_indices(graph_view.get_number_of_edges());
+    raft::update_host(h_offsets.data(),
+                      graph_view.offsets(),
+                      graph_view.get_number_of_vertices() + 1,
+                      handle.get_stream());
+    raft::update_host(h_indices.data(),
+                      graph_view.indices(),
+                      graph_view.get_number_of_edges(),
+                      handle.get_stream());
+    CUDA_TRY(cudaStreamSynchronize(handle.get_stream()));
+
+    std::vector<edge_t> h_reference_in_degrees(graph_view.get_number_of_vertices());
+    std::vector<edge_t> h_reference_out_degrees(graph_view.get_number_of_vertices());
+
+    degree_reference(h_offsets.data(),
+                     h_indices.data(),
+                     h_reference_in_degrees.data(),
+                     graph_view.get_number_of_vertices(),
+                     store_transposed);
+
+    degree_reference(h_offsets.data(),
+                     h_indices.data(),
+                     h_reference_out_degrees.data(),
+                     graph_view.get_number_of_vertices(),
+                     !store_transposed);
+
+    CUDA_TRY(cudaDeviceSynchronize());  // for consistent performance measurement
+
+    auto d_in_degrees  = graph_view.compute_in_degrees(handle);
+    auto d_out_degrees = graph_view.compute_out_degrees(handle);
+
+    CUDA_TRY(cudaDeviceSynchronize());  // for consistent performance measurement
+
+    std::vector<edge_t> h_cugraph_in_degrees(graph_view.get_number_of_vertices());
+    std::vector<edge_t> h_cugraph_out_degrees(graph_view.get_number_of_vertices());
+
+    raft::update_host(
+      h_cugraph_in_degrees.data(), d_in_degrees.data(), d_in_degrees.size(), handle.get_stream());
+    raft::update_host(h_cugraph_out_degrees.data(),
+                      d_out_degrees.data(),
+                      d_out_degrees.size(),
+                      handle.get_stream());
+    CUDA_TRY(cudaStreamSynchronize(handle.get_stream()));
+
+    ASSERT_TRUE(std::equal(
+      h_reference_in_degrees.begin(), h_reference_in_degrees.end(), h_cugraph_in_degrees.begin()))
+      << "In-degree values do not match with the reference values.";
+    ASSERT_TRUE(std::equal(h_reference_out_degrees.begin(),
+                           h_reference_out_degrees.end(),
+                           h_cugraph_out_degrees.begin()))
+      << "Out-degree values do not match with the reference values.";
+  }
+};
+
+// FIXME: add tests for type combinations
+
+TEST_P(Tests_Degree, CheckInt32Int32FloatTransposed)
+{
+  run_current_test<int32_t, int32_t, float, true>(GetParam());
+}
+
+TEST_P(Tests_Degree, CheckInt32Int32FloatUntransposed)
+{
+  run_current_test<int32_t, int32_t, float, false>(GetParam());
+}
+
+INSTANTIATE_TEST_CASE_P(simple_test,
+                        Tests_Degree,
+                        ::testing::Values(Degree_Usecase("test/datasets/karate.mtx"),
+                                          Degree_Usecase("test/datasets/web-Google.mtx"),
+                                          Degree_Usecase("test/datasets/ljournal-2008.mtx"),
+                                          Degree_Usecase("test/datasets/webbase-1M.mtx")));
+
+CUGRAPH_TEST_PROGRAM_MAIN()
diff --git a/cpp/tests/experimental/weight_sum_test.cpp b/cpp/tests/experimental/weight_sum_test.cpp
new file mode 100644
index 00000000000..aeda7386314
--- /dev/null
+++ b/cpp/tests/experimental/weight_sum_test.cpp
@@ -0,0 +1,186 @@
+/*
+ * Copyright (c) 2020-2021, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governin_from_mtxg permissions and
+ * limitations under the License.
+ */
+
+#include <utilities/base_fixture.hpp>
+#include <utilities/test_utilities.hpp>
+
+#include <algorithms.hpp>
+#include <experimental/graph.hpp>
+#include <experimental/graph_view.hpp>
+
+#include <raft/cudart_utils.h>
+#include <raft/handle.hpp>
+#include <rmm/device_uvector.hpp>
+#include <rmm/mr/device/cuda_memory_resource.hpp>
+
+#include <gtest/gtest.h>
+
+#include <algorithm>
+#include <iterator>
+#include <limits>
+#include <numeric>
+#include <vector>
+
+template <typename vertex_t, typename edge_t, typename weight_t>
+void weight_sum_reference(edge_t const* offsets,
+                          vertex_t const* indices,
+                          weight_t const* weights,
+                          weight_t* weight_sums,
+                          vertex_t num_vertices,
+                          bool major)
+{
+  if (!major) { std::fill(weight_sums, weight_sums + num_vertices, weight_t{0.0}); }
+  for (vertex_t i = 0; i < num_vertices; ++i) {
+    if (major) {
+      weight_sums[i] =
+        std::accumulate(weights + offsets[i], weights + offsets[i + 1], weight_t{0.0});
+    } else {
+      for (auto j = offsets[i]; j < offsets[i + 1]; ++j) {
+        auto nbr = indices[j];
+        weight_sums[nbr] += weights[j];
+      }
+    }
+  }
+
+  return;
+}
+
+typedef struct WeightSum_Usecase_t {
+  std::string graph_file_full_path{};
+
+  WeightSum_Usecase_t(std::string const& graph_file_path)
+  {
+    if ((graph_file_path.length() > 0) && (graph_file_path[0] != '/')) {
+      graph_file_full_path = cugraph::test::get_rapids_dataset_root_dir() + "/" + graph_file_path;
+    } else {
+      graph_file_full_path = graph_file_path;
+    }
+  };
+} WeightSum_Usecase;
+
+class Tests_WeightSum : public ::testing::TestWithParam<WeightSum_Usecase> {
+ public:
+  Tests_WeightSum() {}
+  static void SetupTestCase() {}
+  static void TearDownTestCase() {}
+
+  virtual void SetUp() {}
+  virtual void TearDown() {}
+
+  template <typename vertex_t, typename edge_t, typename weight_t, bool store_transposed>
+  void run_current_test(WeightSum_Usecase const& configuration)
+  {
+    raft::handle_t handle{};
+
+    auto graph = cugraph::test::
+      read_graph_from_matrix_market_file<vertex_t, edge_t, weight_t, store_transposed>(
+        handle, configuration.graph_file_full_path, true);
+    auto graph_view = graph.view();
+
+    std::vector<edge_t> h_offsets(graph_view.get_number_of_vertices() + 1);
+    std::vector<vertex_t> h_indices(graph_view.get_number_of_edges());
+    std::vector<weight_t> h_weights(graph_view.get_number_of_edges());
+    raft::update_host(h_offsets.data(),
+                      graph_view.offsets(),
+                      graph_view.get_number_of_vertices() + 1,
+                      handle.get_stream());
+    raft::update_host(h_indices.data(),
+                      graph_view.indices(),
+                      graph_view.get_number_of_edges(),
+                      handle.get_stream());
+    raft::update_host(h_weights.data(),
+                      graph_view.weights(),
+                      graph_view.get_number_of_edges(),
+                      handle.get_stream());
+    CUDA_TRY(cudaStreamSynchronize(handle.get_stream()));
+
+    std::vector<weight_t> h_reference_in_weight_sums(graph_view.get_number_of_vertices());
+    std::vector<weight_t> h_reference_out_weight_sums(graph_view.get_number_of_vertices());
+
+    weight_sum_reference(h_offsets.data(),
+                         h_indices.data(),
+                         h_weights.data(),
+                         h_reference_in_weight_sums.data(),
+                         graph_view.get_number_of_vertices(),
+                         store_transposed);
+
+    weight_sum_reference(h_offsets.data(),
+                         h_indices.data(),
+                         h_weights.data(),
+                         h_reference_out_weight_sums.data(),
+                         graph_view.get_number_of_vertices(),
+                         !store_transposed);
+
+    CUDA_TRY(cudaDeviceSynchronize());  // for consistent performance measurement
+
+    auto d_in_weight_sums  = graph_view.compute_in_weight_sums(handle);
+    auto d_out_weight_sums = graph_view.compute_out_weight_sums(handle);
+
+    CUDA_TRY(cudaDeviceSynchronize());  // for consistent performance measurement
+
+    std::vector<weight_t> h_cugraph_in_weight_sums(graph_view.get_number_of_vertices());
+    std::vector<weight_t> h_cugraph_out_weight_sums(graph_view.get_number_of_vertices());
+
+    raft::update_host(h_cugraph_in_weight_sums.data(),
+                      d_in_weight_sums.data(),
+                      d_in_weight_sums.size(),
+                      handle.get_stream());
+    raft::update_host(h_cugraph_out_weight_sums.data(),
+                      d_out_weight_sums.data(),
+                      d_out_weight_sums.size(),
+                      handle.get_stream());
+    CUDA_TRY(cudaStreamSynchronize(handle.get_stream()));
+
+    auto threshold_ratio     = weight_t{1e-4};
+    auto threshold_magnitude = std::numeric_limits<weight_t>::min();
+    auto nearly_equal        = [threshold_ratio, threshold_magnitude](auto lhs, auto rhs) {
+      return std::abs(lhs - rhs) <
+             std::max(std::max(lhs, rhs) * threshold_ratio, threshold_magnitude);
+    };
+
+    ASSERT_TRUE(std::equal(h_reference_in_weight_sums.begin(),
+                           h_reference_in_weight_sums.end(),
+                           h_cugraph_in_weight_sums.begin(),
+                           nearly_equal))
+      << "In-weight-sum values do not match with the reference values.";
+    ASSERT_TRUE(std::equal(h_reference_out_weight_sums.begin(),
+                           h_reference_out_weight_sums.end(),
+                           h_cugraph_out_weight_sums.begin(),
+                           nearly_equal))
+      << "Out-weight-sum values do not match with the reference values.";
+  }
+};
+
+// FIXME: add tests for type combinations
+
+TEST_P(Tests_WeightSum, CheckInt32Int32FloatTransposed)
+{
+  run_current_test<int32_t, int32_t, float, true>(GetParam());
+}
+
+TEST_P(Tests_WeightSum, CheckInt32Int32FloatUntransposed)
+{
+  run_current_test<int32_t, int32_t, float, false>(GetParam());
+}
+
+INSTANTIATE_TEST_CASE_P(simple_test,
+                        Tests_WeightSum,
+                        ::testing::Values(WeightSum_Usecase("test/datasets/karate.mtx"),
+                                          WeightSum_Usecase("test/datasets/web-Google.mtx"),
+                                          WeightSum_Usecase("test/datasets/ljournal-2008.mtx"),
+                                          WeightSum_Usecase("test/datasets/webbase-1M.mtx")));
+
+CUGRAPH_TEST_PROGRAM_MAIN()

From ca895946189ae6cb00daa5c5bde1e37cb78788e4 Mon Sep 17 00:00:00 2001
From: Seunghwa Kang <45857425+seunghwak@users.noreply.github.com>
Date: Mon, 1 Mar 2021 09:58:55 -0500
Subject: [PATCH 5/5] Add boost 1.0 license file. (#1401)

#1411 added code (to address #1329) that follows the BOOST 1.0 license and this PR adds the BOOST 1.0 license to cuGraph codebase.

Authors:
  - Seunghwa Kang (@seunghwak)

Approvers:
  - Brad Rees (@BradReesWork)

URL: https://github.com/rapidsai/cugraph/pull/1401
---
 thirdparty/LICENSES/LICENSE.boost | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
 create mode 100644 thirdparty/LICENSES/LICENSE.boost

diff --git a/thirdparty/LICENSES/LICENSE.boost b/thirdparty/LICENSES/LICENSE.boost
new file mode 100644
index 00000000000..36b7cd93cdf
--- /dev/null
+++ b/thirdparty/LICENSES/LICENSE.boost
@@ -0,0 +1,23 @@
+Boost Software License - Version 1.0 - August 17th, 2003
+
+Permission is hereby granted, free of charge, to any person or organization
+obtaining a copy of the software and accompanying documentation covered by
+this license (the "Software") to use, reproduce, display, distribute,
+execute, and transmit the Software, and to prepare derivative works of the
+Software, and to permit third-parties to whom the Software is furnished to
+do so, all subject to the following:
+
+The copyright notices in the Software and this entire statement, including
+the above license grant, this restriction and the following disclaimer,
+must be included in all copies of the Software, in whole or in part, and
+all derivative works of the Software, unless such copies or derivative
+works are solely in the form of machine-executable object code generated by
+a source language processor.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+DEALINGS IN THE SOFTWARE.