Skip to content

Conversation

@steveloughran
Copy link
Contributor

Cherry picked the parts of the initial SPARK-8064 WiP branch needed to get sql/hive to compile against hive 1.2.1. That's the ASF release packaged under org.apache.hive, not any fork.

Tests not run yet: that's what the machines are for

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36407 has finished for PR 7191 at commit 9c0336c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 2, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36412 has finished for PR 7191 at commit 9c0336c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

jenkins test this please

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37253 has finished for PR 7191 at commit 9c0336c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran steveloughran force-pushed the stevel/feature/SPARK-8064-hive-1.2-002 branch from 9c0336c to b87eebf Compare July 15, 2015 13:33
@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37360 has finished for PR 7191 at commit b87eebf.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37386 has finished for PR 7191 at commit 8667845.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran steveloughran force-pushed the stevel/feature/SPARK-8064-hive-1.2-002 branch from 8667845 to 4f1e210 Compare July 16, 2015 20:37
@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37538 has finished for PR 7191 at commit 4f1e210.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class HiveServerServerOptionsProcessor(serverName: String)

@steveloughran steveloughran force-pushed the stevel/feature/SPARK-8064-hive-1.2-002 branch from 4f1e210 to 778e97e Compare July 17, 2015 17:26
@SparkQA
Copy link

SparkQA commented Jul 17, 2015

Test build #37649 timed out for PR 7191 at commit 778e97e after a configured wait of 175m.

@steveloughran steveloughran force-pushed the stevel/feature/SPARK-8064-hive-1.2-002 branch from 778e97e to 7d0fcf1 Compare July 22, 2015 20:11
@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38107 has finished for PR 7191 at commit 9772354.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class CreateArray(children: Seq[Expression]) extends Expression
    • case class CreateStruct(children: Seq[Expression]) extends Expression
    • case class CreateNamedStruct(children: Seq[Expression]) extends Expression

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38105 timed out for PR 7191 at commit 7d0fcf1 after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38117 timed out for PR 7191 at commit ced5c09 after a configured wait of 175m.

@steveloughran steveloughran force-pushed the stevel/feature/SPARK-8064-hive-1.2-002 branch from ced5c09 to c7d8d82 Compare July 23, 2015 18:18
@vanzin
Copy link
Contributor

vanzin commented Jul 23, 2015

Hi @steveloughran,

I actually spent some time recently updating the Hive code to work against 1.1 (not 1.2, but a lot of it should apply). I can't post it as a PR because I did not touch the thrift server code, but since you're working on that part also, I thought you'd benefit from the code I worked on.

Looking at your patch I see a lot of things that I had to change missing. I've posted my patch here:
https://github.com/vanzin/spark/tree/hive-1.1

Feel free to use it any way you see fit. All the sql/hive unit tests pass with that patch, when built with Hive 1.1.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38253 timed out for PR 7191 at commit c7d8d82 after a configured wait of 175m.

@steveloughran
Copy link
Contributor Author

vanzin: thanks for that work. I'd held off going near sql/hive as you'd said you were working on it, but I was just starting to stare at the tokenization code and wondering what I was going to do there...

I'm going to push up my next iteration, which adds more resilience to the tests (i.e. cleanup), but doesn't address any of the root causes. Then I'll see about how to merge in your code.

@vanzin
Copy link
Contributor

vanzin commented Jul 23, 2015

I'd held off going near sql/hive as you'd said you were working on it

I was working on the metastore support for newer Hive versions, which went in some time ago. Sorry if that wasn't clear and you've been holding back on your work for that reason - the patch I posted above is unrelated to the metastore work.

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #38254 timed out for PR 7191 at commit 01e2bde after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38301 has finished for PR 7191 at commit c303d2f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnionType(types: Seq[DataType]) extends DataType

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38396 timed out for PR 7191 at commit 1eeb87f after a configured wait of 175m.

@steveloughran
Copy link
Contributor Author

Test build 38301 timed out, presumably due to timeout on thrift server tests that took 6+ minutes to fail if the server didn't come up. Changes to the pom to get jersey server back on the classpath on hadoop < 2.6, (i.e. reinstate hive-exec's inclusion of the yarn-rm-server module) and improved fail-fast test setup should address this.

as a result of the timeout, failures in the HiveCompatibilitySuite haven't been picked up

@SparkQA
Copy link

SparkQA commented Jul 27, 2015

Test build #38508 has finished for PR 7191 at commit a1907bf.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnionType(types: Seq[DataType]) extends DataType

@steveloughran steveloughran force-pushed the stevel/feature/SPARK-8064-hive-1.2-002 branch from 181e494 to d7553b0 Compare July 28, 2015 01:25
@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38621 has finished for PR 7191 at commit d7553b0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RFormula(override val uid: String) extends Estimator[RFormulaModel] with RFormulaBase
    • case class UnionType(types: Seq[DataType]) extends DataType

@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38647 has finished for PR 7191 at commit c2ca071.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnionType(types: Seq[DataType]) extends DataType

@pwendell
Copy link
Contributor

Hey @steveloughran and @vanzin. If I understand correctly, I think we need to modify the dependency on hive-exec to depend on the core artifact. Currently I think this depends on the default hive-exec artifact, which is a 20+MB jar that includes a bunch of other dependencies.

http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C1.2.1%7Cjar

@SparkQA
Copy link

SparkQA commented Aug 1, 2015

Test build #39368 has finished for PR 7191 at commit fdf759b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StopWordsRemover(override val uid: String)
    • case class SortArray(base: Expression, ascendingOrder: Expression)

@steveloughran
Copy link
Contributor Author

  1. This is a rebased patch; apologies to anyone who has branched off it.

  2. It uses our own spark-project/hive version, where hive-exec only contains the core hive packages (common, hive-exec, shims) and a shaded protobuf. The latter ensures that it works on the spark hadoop-1 branch. Longer term we'll work with the hive team to sort out "our differences"

  3. if you've done the maven build on or since friday, you'll need to purge any local maven cached version

    rm -rf ~/.m2/repository/org/spark-project/hive/
    
  4. If you've done the sbt build, do the same for its copies.

  5. Hint: the following finds all the various directories & jars; rm the directories

    find ~ | grep "hive-exec-1.2.1.spark"
    
  6. The sql/hive test org.apache.spark.sql.hive.ClasspathDependenciesSuite looks for the various transitive dependencies that need to be shaded (protobuf) and those which must not be found shaded (kryo & its transitives). If this test is failing, the wrong version of something is being pulled in.

@pwendell
Copy link
Contributor

pwendell commented Aug 1, 2015

I've cleared the maven/ivy cache on all the jenkins machines, so going to restart the build again. Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 1, 2015

Test build #39371 has finished for PR 7191 at commit 0bbe475.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2015

Test build #39373 has finished for PR 7191 at commit 0bbe475.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

I'm looking into the failures caused by Parquet dependency. Hive shades Parquet classes into its own private namespace. I think we miss a shading somewhere in the updated POM.

@steveloughran
Copy link
Contributor Author

Latest patch includes the commits from lliancheng for HiveSubmit & missing spark-hive parquet dependencies on the SBT test runs

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39435 has finished for PR 7191 at commit ef4af62.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2015

Test build #39562 has finished for PR 7191 at commit 7556d85.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pwendell
Copy link
Contributor

pwendell commented Aug 3, 2015

@steveloughran any issues preventing us from merging?

@steveloughran
Copy link
Contributor Author

None that I know of: I've tested the thriftserver code on -Phadoop-1 -Dhadoop-version=1.2.1, as well as all the way to Hadoop 2.7.1.

@marmbrus
Copy link
Contributor

marmbrus commented Aug 3, 2015

Awesome, I'm going to merge to master and 1.5 then.

asfgit pushed a commit that referenced this pull request Aug 3, 2015
Cherry picked the parts of the initial SPARK-8064 WiP branch needed to get sql/hive to compile against hive 1.2.1. That's the ASF release packaged under org.apache.hive, not any fork.

Tests not run yet: that's what the machines are for

Author: Steve Loughran <stevel@hortonworks.com>
Author: Cheng Lian <lian@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Author: Patrick Wendell <patrick@databricks.com>

Closes #7191 from steveloughran/stevel/feature/SPARK-8064-hive-1.2-002 and squashes the following commits:

7556d85 [Cheng Lian] Updates .q files and corresponding golden files
ef4af62 [Steve Loughran] Merge commit '6a92bb09f46a04d6cd8c41bdba3ecb727ebb9030' into stevel/feature/SPARK-8064-hive-1.2-002
6a92bb0 [Cheng Lian] Overrides HiveConf time vars
dcbb391 [Cheng Lian] Adds com.twitter:parquet-hadoop-bundle:1.6.0 for Hive Parquet SerDe
0bbe475 [Steve Loughran] SPARK-8064 scalastyle rejects the standard Hadoop ASF license header...
fdf759b [Steve Loughran] SPARK-8064 classpath dependency suite to be in sync with shading in final (?) hive-exec spark
7a6c727 [Steve Loughran] SPARK-8064 switch to second staging repo of the spark-hive artifacts. This one has the protobuf-shaded hive-exec jar
376c003 [Steve Loughran] SPARK-8064 purge duplicate protobuf declaration
2c74697 [Steve Loughran] SPARK-8064 switch to the protobuf shaded hive-exec jar with tests to chase it down
cc44020 [Steve Loughran] SPARK-8064 remove hadoop.version from runtest.py, as profile will fix that automatically.
6901fa9 [Steve Loughran] SPARK-8064 explicit protobuf import
da310dc [Michael Armbrust] Fixes for Hive tests.
a775a75 [Steve Loughran] SPARK-8064 cherry-pick-incomplete
7404f34 [Patrick Wendell] Add spark-hive staging repo
832c164 [Steve Loughran] SPARK-8064 try to supress compiler warnings on Complex.java pasted-thrift-code
312c0d4 [Steve Loughran] SPARK-8064  maven/ivy dependency purge; calcite declaration needed
fa5ae7b [Steve Loughran] HIVE-8064 fix up hive-thriftserver dependencies and cut back on evicted references in the hive- packages; this keeps mvn and ivy resolution compatible, as the reconciliation policy is "by hand"
c188048 [Steve Loughran] SPARK-8064 manage the Hive depencencies to that -things that aren't needed are excluded -sql/hive built with ivy is in sync with the maven reconciliation policy, rather than latest-first
4c8be8d [Cheng Lian] WIP: Partial fix for Thrift server and CLI tests
314eb3c [Steve Loughran] SPARK-8064 deprecation warning  noise in one of the tests
17b0341 [Steve Loughran] SPARK-8064 IDE-hinted cleanups of Complex.java to reduce compiler warnings. It's all autogenerated code, so still ugly.
d029b92 [Steve Loughran] SPARK-8064 rely on unescaping to have already taken place, so go straight to map of serde options
23eca7e [Steve Loughran] HIVE-8064 handle raw and escaped property tokens
54d9b06 [Steve Loughran] SPARK-8064 fix compilation regression surfacing from rebase
0b12d5f [Steve Loughran] HIVE-8064 use subset of hive complex type whose types deserialize
fce73b6 [Steve Loughran] SPARK-8064 poms rely implicitly on the version of kryo chill provides
fd3aa5d [Steve Loughran] SPARK-8064 version of hive to d/l from ivy is 1.2.1
dc73ece [Steve Loughran] SPARK-8064 revert to master's determinstic pushdown strategy
d3c1e4a [Steve Loughran] SPARK-8064 purge UnionType
051cc21 [Steve Loughran] SPARK-8064 switch to an unshaded version of hive-exec-core, which must have been built with Kryo 2.21. This currently looks for a (locally built) version 1.2.1.spark
6684c60 [Steve Loughran] SPARK-8064 ignore RTE raised in blocking process.exitValue() call
e6121e5 [Steve Loughran] SPARK-8064 address review comments
aa43dc6 [Steve Loughran] SPARK-8064  more robust teardown on JavaMetastoreDatasourcesSuite
f2bff01 [Steve Loughran] SPARK-8064 better takeup of asynchronously caught error text
8b1ef38 [Steve Loughran] SPARK-8064: on failures executing spark-submit in HiveSparkSubmitSuite, print command line and all logged output.
5a9ce6b [Steve Loughran] SPARK-8064 add explicit reason for kv split failure, rather than array OOB. *does not address the issue*
642b63a [Steve Loughran] SPARK-8064 reinstate something cut briefly during rebasing
97194dc [Steve Loughran] SPARK-8064 add extra logging to the YarnClusterSuite classpath test. There should be no reason why this is failing on jenkins, but as it is (and presumably its CP-related), improve the logging including any exception raised.
335357f [Steve Loughran] SPARK-8064 fail fast on thrive process spawning tests on exit codes and/or error string patterns seen in log.
3ed872f [Steve Loughran] SPARK-8064 rename field double to  dbl
bca55e5 [Steve Loughran] SPARK-8064 missed one of the `date` escapes
41d6479 [Steve Loughran] SPARK-8064 wrap tests with withTable() calls to avoid table-exists exceptions
2bc29a4 [Steve Loughran] SPARK-8064 ParquetSuites to escape `date` field name
1ab9bc4 [Steve Loughran] SPARK-8064 TestHive to use sered2.thrift.test.Complex
bf3a249 [Steve Loughran] SPARK-8064: more resubmit than fix; tighten startup timeout to 60s. Still no obvious reason why jersey server code in spark-assembly isn't being picked up -it hasn't been shaded
c829b8f [Steve Loughran] SPARK-8064: reinstate yarn-rm-server dependencies to hive-exec to ensure that jersey server is on classpath on hadoop versions < 2.6
0b0f738 [Steve Loughran] SPARK-8064: thrift server startup to fail fast on any exception in the main thread
13abaf1 [Steve Loughran] SPARK-8064 Hive compatibilty tests sin sync with explain/show output from Hive 1.2.1
d14d5ea [Steve Loughran] SPARK-8064: DATE is now a predicate; you can't use it as a field in select ops
26eef1c [Steve Loughran] SPARK-8064: HIVE-9039 renamed TOK_UNION => TOK_UNIONALL while adding TOK_UNIONDISTINCT
3d64523 [Steve Loughran] SPARK-8064 improve diagns on uknown token; fix scalastyle failure
d0360f6 [Steve Loughran] SPARK-8064: delicate merge in of the branch vanzin/hive-1.1
1126e5a [Steve Loughran] SPARK-8064: name of unrecognized file format wasn't appearing in error text
8cb09c4 [Steve Loughran] SPARK-8064: test resilience/assertion improvements. Independent of the rest of the work; can be backported to earlier versions
dec12cb [Steve Loughran] SPARK-8064: when a CLI suite test fails include the full output text in the raised exception; this ensures that the stdout/stderr is included in jenkins reports, so it becomes possible to diagnose the cause.
463a670 [Steve Loughran] SPARK-8064 run-tests.py adds a hadoop-2.6 profile, and changes info messages to say "w/Hive 1.2.1" in console output
2531099 [Steve Loughran] SPARK-8064 successful attempt to get rid of pentaho as a transitive dependency of hive-exec
1d59100 [Steve Loughran] SPARK-8064 (unsuccessful) attempt to get rid of pentaho as a transitive dependency of hive-exec
75733fc [Steve Loughran] SPARK-8064 change thrift binary startup message to "Starting ThriftBinaryCLIService on port"
3ebc279 [Steve Loughran] SPARK-8064 move strings used to check for http/bin thrift services up into constants
c80979d [Steve Loughran] SPARK-8064: SparkSQLCLIDriver drops remote mode support. CLISuite Tests pass instead of timing out: undetected regression?
27e8370 [Steve Loughran] SPARK-8064 fix some style & IDE warnings
00e50d6 [Steve Loughran] SPARK-8064 stop excluding hive shims from dependency (commented out , for now)
cb4f142 [Steve Loughran] SPARK-8054 cut pentaho dependency from calcite
f7aa9cb [Steve Loughran] SPARK-8064 everything compiles with some commenting and moving of classes into a hive package
6c310b4 [Steve Loughran] SPARK-8064 subclass  Hive ServerOptionsProcessor to make it public again
f61a675 [Steve Loughran] SPARK-8064 thrift server switched to Hive 1.2.1, though it doesn't compile everywhere
4890b9d [Steve Loughran] SPARK-8064, build against Hive 1.2.1

(cherry picked from commit a2409d1)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@asfgit asfgit closed this in a2409d1 Aug 3, 2015
@steveloughran steveloughran deleted the stevel/feature/SPARK-8064-hive-1.2-002 branch August 3, 2015 23:10
asfgit pushed a commit that referenced this pull request Aug 4, 2015
…accidentally reverted

This PR removes the dependency reduced POM hack brought back by #7191

Author: tedyu <yuzhihong@gmail.com>

Closes #7919 from tedyu/master and squashes the following commits:

1bfbd7b [tedyu] [BUILD] Remove dependency reduced POM hack

(cherry picked from commit b211cbc)
Signed-off-by: Sean Owen <sowen@cloudera.com>
asfgit pushed a commit that referenced this pull request Aug 4, 2015
…accidentally reverted

This PR removes the dependency reduced POM hack brought back by #7191

Author: tedyu <yuzhihong@gmail.com>

Closes #7919 from tedyu/master and squashes the following commits:

1bfbd7b [tedyu] [BUILD] Remove dependency reduced POM hack
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants