Skip to content

Conversation

@jeanlyn
Copy link
Contributor

@jeanlyn jeanlyn commented Jun 16, 2015

The issue link SPARK-8379
Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): 
Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo 
owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53 
but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57

This pr try to write the data to temporary dir when using dynamic parition avoid the speculative tasks writing the same file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems never use,so remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think you are right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeanlyn Yeah, this should be removed.

@chenghao-intel
Copy link
Contributor

Seems the bug only existed the dynamic partition in HiveContext, @jeanlyn can you confirm that?

@scwf
Copy link
Contributor

scwf commented Jun 17, 2015

also met this issue when dynamic partition in HiveContext

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Jun 17, 2015

@chenghao-intel ,I think it only affect the dynamic partition.Because SparkHadoopWriter get the write by OutputFormat.getRecordWriter,most of them use the FileOutputFormat.getTaskOutputPath to get the path

@andrewor14
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 17, 2015

Test build #35053 has finished for PR 6833 at commit 64bbfab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

ok to test. Is this issue the same as the one reported in #6864? @liancheng

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35171 has finished for PR 6833 at commit 64bbfab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

@andrewor14 They are not the same. #6864 affects dynamic partitioning feature of external data sources, while this one is about dynamic partitions of Hive.

@liancheng
Copy link
Contributor

LGTM, thanks for fixing this! Merging to master and branch-1.4.

asfgit pushed a commit that referenced this pull request Jun 21, 2015
The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379)
Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception
```
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo
owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53
but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57
```
This pr try to write the data to temporary dir when using dynamic parition  avoid the speculative tasks writing the same file

Author: jeanlyn <jeanlyn92@gmail.com>

Closes #6833 from jeanlyn/speculation and squashes the following commits:

64bbfab [jeanlyn] use FileOutputFormat.getTaskOutputPath to get the path
8860af0 [jeanlyn] remove the never using code
e19a3bd [jeanlyn] avoid speculative tasks write same file

(cherry picked from commit a1e3649)
Signed-off-by: Cheng Lian <lian@databricks.com>
@asfgit asfgit closed this in a1e3649 Jun 21, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 22, 2015
The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379)
Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception
```
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo
owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53
but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57
```
This pr try to write the data to temporary dir when using dynamic parition  avoid the speculative tasks writing the same file

Author: jeanlyn <jeanlyn92@gmail.com>

Closes apache#6833 from jeanlyn/speculation and squashes the following commits:

64bbfab [jeanlyn] use FileOutputFormat.getTaskOutputPath to get the path
8860af0 [jeanlyn] remove the never using code
e19a3bd [jeanlyn] avoid speculative tasks write same file

(cherry picked from commit a1e3649)
Signed-off-by: Cheng Lian <lian@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants