-
Notifications
You must be signed in to change notification settings - Fork 32
Description
I set up a set of clusters in Riak 1.4.2 (and this patchset https://github.com/basho/internal_wiki/wiki/Riak-1.4.2-release---beams-for-openx) and Riak 1.4.3 with a ring topology that is using cascading replication, where a connection between two clusters in the figure 1 below represents bidirectional replication.
__ XV __
/ \
CA LC
\ /
XF ---- XA
with realtime replication represented as:
XV <-> LC <-> XA <-> XF <-> CA <-> XV
Figure 1. Topology of Clusters
This was set up with the following script:
#!/bin/bash
echo "Attempting to activate Erlang using default kerl settings..."
. /opt/erlang/r15b01/activate
command -v erl >/dev/null 2>&1 || { echo >&2 "Erlang is required. Please install Erlang or activate via kerl."; exit 1; }
RINGSIZE=64
BACKEND=leveldb
while getopts r:b:h option
do
case "${option}" in
r)
RINGSIZE=${OPTARG};;
b)
BACKEND=${OPTARG};;
h)
echo "-r Set ring_creation_size. Default: 16"
echo "-b Set backend, bitcask/leveldb. Default: bitcask"
exit 1
;;
esac
done
RIAKBACKEND='riak_kv_bitcask_backend'
if [ "$BACKEND" = "leveldb" ]
then
RIAKBACKEND='riak_kv_eleveldb_backend'
fi
echo "Killing all active Riak processes..."
ps -ef | grep beam | grep -v grep | awk '{print $2}' | xargs kill -9
ulimit -n 4096
git clone http://github.com/basho/riak_ee
cd riak_ee
git checkout riak_ee-1.4.2
make rel
make devrel DEVNODES=15
NODE=1
while [ $NODE -lt 16 ]; do
echo Copying Patches to basho-patches for Node $NODE
cp ../riak_repl_09292013/riak_repl2_rtq.beam dev/dev$NODE/lib/basho-patches/riak_repl2_rtq.beam
cp ../riak_repl_09292013/riak_repl2_rtsink_conn.beam dev/dev$NODE/lib/basho-patches/riak_repl2_rtsink_conn.beam
cp ../riak_repl_09292013/riak_repl2_rtsource_conn.beam dev/dev$NODE/lib/basho-patches/riak_repl2_rtsource_conn.beam
echo Setting Ring Size to $RINGSIZE for Node $NODE
sed -i .bk "s/%{ring_creation_size, 64}/{ring_creation_size, $RINGSIZE}/g" dev/dev$NODE/etc/app.config
echo Setting backend to $RIAKBACKEND for Node $NODE
sed -i .bk "s/riak_kv_bitcask_backend/$RIAKBACKEND/g" dev/dev$NODE/etc/app.config
echo Disabling swfi in vm.args
sed -i .bk "s/+sfwi/#+sfwi/g" dev/dev$NODE/etc/vm.args
echo Starting Node $NODE
dev/dev$NODE/bin/riak start
let NODE=NODE+1
done
dev/dev2/bin/riak-admin cluster join dev1@127.0.0.1
dev/dev3/bin/riak-admin cluster join dev1@127.0.0.1
dev/dev5/bin/riak-admin cluster join dev4@127.0.0.1
dev/dev6/bin/riak-admin cluster join dev4@127.0.0.1
dev/dev8/bin/riak-admin cluster join dev7@127.0.0.1
dev/dev9/bin/riak-admin cluster join dev7@127.0.0.1
dev/dev11/bin/riak-admin cluster join dev10@127.0.0.1
dev/dev12/bin/riak-admin cluster join dev10@127.0.0.1
dev/dev14/bin/riak-admin cluster join dev13@127.0.0.1
dev/dev15/bin/riak-admin cluster join dev13@127.0.0.1
dev/dev1/bin/riak-admin cluster plan
dev/dev1/bin/riak-admin cluster commit
dev/dev1/bin/riak-admin status
dev/dev4/bin/riak-admin cluster plan
dev/dev4/bin/riak-admin cluster commit
dev/dev4/bin/riak-admin status
dev/dev7/bin/riak-admin cluster plan
dev/dev7/bin/riak-admin cluster commit
dev/dev7/bin/riak-admin status
dev/dev10/bin/riak-admin cluster plan
dev/dev10/bin/riak-admin cluster commit
dev/dev10/bin/riak-admin status
dev/dev13/bin/riak-admin cluster plan
dev/dev13/bin/riak-admin cluster commit
dev/dev13/bin/riak-admin status
dev/dev1/bin/riak-repl clustername XV
dev/dev4/bin/riak-repl clustername CA
dev/dev7/bin/riak-repl clustername LC
dev/dev10/bin/riak-repl clustername XA
dev/dev13/bin/riak-repl clustername XF
dev/dev1/bin/riak-repl connect 127.0.0.1:10046
dev/dev1/bin/riak-repl connect 127.0.0.1:10076
dev/dev4/bin/riak-repl connect 127.0.0.1:10136
dev/dev7/bin/riak-repl connect 127.0.0.1:10106
dev/dev10/bin/riak-repl connect 127.0.0.1:10136
# Virgina to California
dev/dev1/bin/riak-repl realtime enable CA
dev/dev1/bin/riak-repl realtime start CA
dev/dev4/bin/riak-repl realtime enable XV
dev/dev4/bin/riak-repl realtime start XV
# Virgina to Chicago
dev/dev1/bin/riak-repl realtime enable LC
dev/dev1/bin/riak-repl realtime start LC
dev/dev7/bin/riak-repl realtime enable XV
dev/dev7/bin/riak-repl realtime start XV
# Chicago to Amsterdam
dev/dev7/bin/riak-repl realtime enable XA
dev/dev7/bin/riak-repl realtime start XA
dev/dev10/bin/riak-repl realtime enable LC
dev/dev10/bin/riak-repl realtime start LC
# Amsterdam to Tokyo
dev/dev10/bin/riak-repl realtime enable XF
dev/dev10/bin/riak-repl realtime start XF
dev/dev13/bin/riak-repl realtime enable XA
dev/dev13/bin/riak-repl realtime start XA
# California to Tokyo
dev/dev4/bin/riak-repl realtime enable XF
dev/dev4/bin/riak-repl realtime start XF
dev/dev13/bin/riak-repl realtime enable CA
dev/dev13/bin/riak-repl realtime start CA
Once the connections have been made, and realtime has been started however not all sources are able to connect to their sinks.
The following sources show {connected, false} to their sinks:
CA -> XV
LC -> XV
XA -> LC
XF -> CA
XF -> XA
Giving a realtime replication topology like Figure 2.
XV -> LC -> XA -> XF <- CA <- XV
Figure 2. Topology of realtime source connections.
When load is added to XV, simulating the use case being tested where PUT traffic only goes to XV, pending counts rise while the queue remains empty.
For example, XF:
sources: [{source_stats,
[{pid,"<0.3333.0>"},
{message_queue_len,0},
{rt_source_connected_to,
[{source,"CA"},{pid,"<0.3333.0>"},{connected,false}]}]},
{source_stats,
[{pid,"<0.3255.0>"},
{message_queue_len,0},
{rt_source_connected_to,
[{source,"XA"},{pid,"<0.3255.0>"},{connected,false}]}]}]
fullsync_coordinator: []
fullsync_coordinator_srv: []
cluster_name: <<"XF">>
cluster_leader: 'dev15@127.0.0.1'
connected_clusters: [<<"CA">>,<<"XA">>]
realtime_queue_stats: [{bytes,768},
{max_bytes,104857600},
{consumers,[{"CA",
[{pending,297},
{unacked,0},
{drops,0},
{errs,0}]},
{"XA",
[{pending,297},
{unacked,0},
{drops,0},
{errs,0}]}]},
{overload_drops,0}]
In this current state, which using the script provided is repeatable for the same clusters every time, any object PUT to XF is never replicated to other clusters.
Additionally, the pending count in XF does not reflect the size of the realtime queue, which is empty.