Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ stages:
${{ if and(ne(variables['System.TeamProject'], 'public'), notin(variables['Build.Reason'], 'PullRequest')) }}:
_OfficialBuildIdArgs: /p:OfficialBuildId=$(BUILD.BUILDNUMBER)
HADOOP_HOME: $(Build.BinariesDirectory)\hadoop
DOTNET_WORKER_DIR: $(CurrentDotnetWorkerDir)
DOTNET_WORKER_DIR: $(CurrentDotnetWorkerDir)

steps:
- task: DownloadBuildArtifacts@0
Expand Down Expand Up @@ -431,7 +431,7 @@ stages:
${{ if and(ne(variables['System.TeamProject'], 'public'), notin(variables['Build.Reason'], 'PullRequest')) }}:
_OfficialBuildIdArgs: /p:OfficialBuildId=$(BUILD.BUILDNUMBER)
HADOOP_HOME: $(Build.BinariesDirectory)\hadoop
DOTNET_WORKER_DIR: $(BackwardCompatibleDotnetWorkerDir)
DOTNET_WORKER_DIR: $(BackwardCompatibleDotnetWorkerDir)

steps:
- task: DownloadBuildArtifacts@0
Expand Down Expand Up @@ -558,12 +558,16 @@ stages:
env:
SPARK_HOME: $(Build.BinariesDirectory)\spark-2.4.6-bin-hadoop2.7

# Spark 3.0.0 uses Arrow 0.15.1, which contains a new Arrow spec. This breaks backward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it easy to track if we have a different backward compatibility for different Spark version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment there are no published workers that's backward compatible with 3.0 (since the previous workers only use 0.14.1 and aren't aware of the new spec). But I agree that this is a breaking change.

For backward compatibility, do we want to differentiate between different spark versions and test them against different spark Worker versions? Or one Worker version where we say is backward compatible for all spark versions?

This can be addressed in a separate PR if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility, do we want to differentiate between different spark versions and test them against different spark Worker versions? Or one Worker version where we say is backward compatible for all spark versions?

I would say the latter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think we will have to wait until the next official Worker release before we can remove these extra filters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... since we have one worker binary to support all spark versions.

Copy link
Contributor

@imback82 imback82 Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can remove the backward compatibility test (breaking change) then add it back when the new one is released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to remove the extra filters for 3.0 and in the unit tests add the skip attribute ? Or just remove the spark 3.0.0 section in the backward compatibility tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 3.0 backward compatibility testing.

# compatibility when using Microsoft.Spark.Worker with incompatible versions of Arrow.
# Skip Arrow tests until the backward compatibility Worker version is updated.
- task: DotNetCoreCLI@2
displayName: 'E2E tests for Spark 3.0.0'
inputs:
command: test
projects: '**/Microsoft.Spark*.E2ETest/*.csproj'
arguments: '--configuration $(buildConfiguration) --filter $(TestsToFilterOut)'
arguments: "--configuration $(buildConfiguration) --filter $(TestsToFilterOut)&\
(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.DataFrameTests.TestGroupedMapUdf)&\
(FullyQualifiedName!=Microsoft.Spark.E2ETest.IpcTests.DataFrameTests.TestVectorUdf)"
env:
SPARK_HOME: $(Build.BinariesDirectory)\spark-3.0.0-bin-hadoop2.7

7 changes: 3 additions & 4 deletions src/csharp/Microsoft.Spark.E2ETest/IpcTests/BroadcastTests.cs
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
using System;
using System.Linq;
using Microsoft.Spark.E2ETest.Utils;
using Microsoft.Spark.Sql;
using Xunit;
using static Microsoft.Spark.Sql.Functions;
Expand Down Expand Up @@ -35,7 +34,7 @@ public BroadcastTests(SparkFixture fixture)
/// <summary>
/// Test Broadcast support by using multiple broadcast variables in a UDF.
/// </summary>
[SkipIfSparkVersionIsGreaterOrEqualTo(Versions.V3_0_0)]
[Fact]
public void TestMultipleBroadcastWithoutEncryption()
{
var obj1 = new TestBroadcastVariable(1, "first");
Expand All @@ -56,7 +55,7 @@ public void TestMultipleBroadcastWithoutEncryption()
/// Test Broadcast.Destroy() that destroys all data and metadata related to the broadcast
/// variable and makes it inaccessible from workers.
/// </summary>
[SkipIfSparkVersionIsGreaterOrEqualTo(Versions.V3_0_0)]
[Fact]
public void TestDestroy()
{
var obj1 = new TestBroadcastVariable(5, "destroy");
Expand Down Expand Up @@ -97,7 +96,7 @@ public void TestDestroy()
/// Test Broadcast.Unpersist() deletes cached copies of the broadcast on the executors. If
/// the broadcast is used after unpersist is called, it is re-sent to the executors.
/// </summary>
[SkipIfSparkVersionIsGreaterOrEqualTo(Versions.V3_0_0)]
[Fact]
public void TestUnpersist()
{
var obj = new TestBroadcastVariable(1, "unpersist");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ public void TestUDF()
}
}

[SkipIfSparkVersionIsGreaterOrEqualTo(Versions.V3_0_0)]
[Fact]
public void TestVectorUdf()
{
Func<Int32Array, StringArray, StringArray> udf1Func =
Expand Down Expand Up @@ -224,7 +224,7 @@ public void TestVectorUdf()
}
}

[SkipIfSparkVersionIsGreaterOrEqualTo(Versions.V3_0_0)]
[Fact]
public void TestDataFrameVectorUdf()
{
Func<Int32DataFrameColumn, ArrowStringDataFrameColumn, ArrowStringDataFrameColumn> udf1Func =
Expand Down Expand Up @@ -368,7 +368,6 @@ private static RecordBatch ArrowBasedCountCharacters(RecordBatch records)
returnLength);
}


[SkipIfSparkVersionIsGreaterOrEqualTo(Versions.V3_0_0)]
public void TestDataFrameGroupedMapUdf()
{
Expand Down
Loading