From c86044e803546c31e4da793bde201120c06976ca Mon Sep 17 00:00:00 2001 From: Shuai Lin Date: Mon, 3 Apr 2017 17:00:22 +0800 Subject: [PATCH 1/2] [SPARK-15352][Doc] follow-up: add configuration docs for topology-aware block replication --- docs/configuration.md | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/docs/configuration.md b/docs/configuration.md index 2687f542b8bd3..009d8c36adeae 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1012,6 +1012,36 @@ Apart from these, the following properties are also available, and may be useful to get the replication level of the block to the initial number. + + spark.storage.replication.policy + org.apache.spark.storage.RandomBlockReplicationPolicy + + The policy to use for choosing peers when replicating blocks. The default policy would randomly + choose the peers to replicate to. A more resilient replication policy is provided by + org.apache.spark.storage.BasicBlockReplicationPolicy, which makes use of the + topology information of the hosts to choose the peers, much like the HDFS blocks replication + strategy: it would try to choose the first replica within the same rack, and a third replica on + a different rack. See spark.storage.replication.topologyMapper below for how to + provide the topology information for the hosts. + + + + spark.storage.replication.topologyMapper + org.apache.spark.storage.DefaultTopologyMapper + + The topology information of a host is determined by a topology mapping service defined by the + abstract class org.apache.spark.storage.TopologyMapper, which can be configured by + this property. A default implementation that assumes all hosts are in the same rack is provided + by org.apache.spark.storage.DefaultTopologyMapper. A file-based implementation is + provided by org.apache.spark.storage.FileBasedTopologyMapper, which reads the + topology information from the file org.apache.spark.storage.topologyFile. Each line + of this file is of the format of host1 = /rack1 and provides a mapping from a host + name to its rack information. Note: This configuration only takes effect when + spark.storage.replication.policy is set to a a policy that takes the topology + information into consideration, e.g. + org.apache.spark.storage.BasicBlockReplicationPolicy. + + ### Execution Behavior From bf56ba274b00962bd5ccc9c0b098f17b934ea5fb Mon Sep 17 00:00:00 2001 From: Shuai Lin Date: Mon, 3 Apr 2017 17:13:39 +0800 Subject: [PATCH 2/2] Fixed the markdown table formatting. --- docs/configuration.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/configuration.md b/docs/configuration.md index 009d8c36adeae..c77a01e983386 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1004,7 +1004,7 @@ Apart from these, the following properties are also available, and may be useful - spark.storage.replication.proactive + spark.storage.replication.proactive false Enables proactive block replication for RDD blocks. Cached RDD block replicas lost due to @@ -1013,8 +1013,10 @@ Apart from these, the following properties are also available, and may be useful - spark.storage.replication.policy - org.apache.spark.storage.RandomBlockReplicationPolicy + spark.storage.replication.policy + + org.apache.spark.storage.
RandomBlockReplicationPolicy + The policy to use for choosing peers when replicating blocks. The default policy would randomly choose the peers to replicate to. A more resilient replication policy is provided by @@ -1026,8 +1028,10 @@ Apart from these, the following properties are also available, and may be useful - spark.storage.replication.topologyMapper - org.apache.spark.storage.DefaultTopologyMapper + spark.storage.replication.topologyMapper + + org.apache.spark.storage.
DefaultTopologyMapper + The topology information of a host is determined by a topology mapping service defined by the abstract class org.apache.spark.storage.TopologyMapper, which can be configured by