Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Conversation

@paryoja
Copy link
Contributor

@paryoja paryoja commented Oct 12, 2021

What is the context for this pull request?

What changes were proposed in this pull request?

Added a feature to enable Hyperspace with SparkSessionExtention

Does this PR introduce any user-facing change?

Yes. Users now can enable Hyperspace as follows

spark-shell -c spark.sql.extensions=com.microsoft.hyperspace.HyperspaceSparkSessionExtension

or

val spark = SparkSession
       .builder()
       .appName("...")
       .master("...")
       .config("spark.sql.extensions", "com.microsoft.hyperspace.HyperspaceSparkSessionExtension")
       .getOrCreate()

How was this patch tested?

Manually with spark-shell. If an automated test is required, I will add it.

@sezruby sezruby requested review from clee704 and imback82 October 14, 2021 06:23
@paryoja paryoja changed the title Add spark session extension for Hyperspace [WIP] Add spark session extension for Hyperspace Oct 18, 2021
@paryoja paryoja force-pushed the feature/exetension branch from c62f5b6 to ab14466 Compare October 27, 2021 03:06
@paryoja paryoja closed this Oct 29, 2021
@paryoja paryoja deleted the feature/exetension branch October 29, 2021 04:47
@paryoja paryoja restored the feature/exetension branch October 29, 2021 04:48
@paryoja paryoja reopened this Oct 29, 2021
@paryoja paryoja changed the title [WIP] Add spark session extension for Hyperspace Add spark session extension for Hyperspace Oct 29, 2021
@paryoja paryoja force-pushed the feature/exetension branch 2 times, most recently from 923872a to 84b2969 Compare November 4, 2021 03:25
@paryoja paryoja force-pushed the feature/exetension branch from 0cd2a5b to c0d439f Compare November 4, 2021 07:38
@paryoja paryoja force-pushed the feature/exetension branch from c0d439f to 35d4ffe Compare November 4, 2021 07:55
@sezruby sezruby added the enhancement New feature or request label Nov 4, 2021
@sezruby sezruby linked an issue Nov 4, 2021 that may be closed by this pull request
1 task
@paryoja paryoja force-pushed the feature/exetension branch from 82820db to daf9d08 Compare November 8, 2021 00:52
add configure to control enabling hyperspace
add dummy rule to avoid different behavior of Extensions / apply hyperspace
add test for hyperspace extension
@paryoja paryoja force-pushed the feature/exetension branch from 89cdb02 to fa80dba Compare November 9, 2021 01:13
Copy link
Collaborator

@sezruby sezruby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @paryoja!
@clee704 Could you have another look and approve the PR?

override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectOptimizerRule { sparkSession =>
// Enable Hyperspace to leverage indexes.
sparkSession.enableHyperspace()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the new model, enableHyperspace only exists for backward compatibility. I think it's better to factor out the rule insertion code out of this method and invoke the method here and from enableHyperspace.

Copy link
Contributor Author

@paryoja paryoja Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clee704 Do you mean this part?

      if (!sparkSession.sessionState.experimentalMethods.extraOptimizations.contains(
          ApplyHyperspace)) {
        sparkSession.sessionState.experimentalMethods.extraOptimizations ++=
          ApplyHyperspace :: Nil
      }
      if (!sparkSession.sessionState.experimentalMethods.extraStrategies.contains(
          BucketUnionStrategy)) {
        sparkSession.sessionState.experimentalMethods.extraStrategies ++=
          BucketUnionStrategy :: Nil
      }

Where should I put this code because package object hyperspace is quite customer side interfaces, so not sure if it is ok to create a function like

package object hyperspace {

  /**
   * Hyperspace-specific implicit class on SparkSession.
   */
  implicit class Implicits(sparkSession: SparkSession) {

    def enableHyperspace(): SparkSession = {
      HyperspaceConf.setHyperspaceApplyEnabled(sparkSession, true)
      addOptimizationsIfNeeded()
      sparkSession
    }

    private def addOptimizationsIfNeeded(): Unit = {
      if (!sparkSession.sessionState.experimentalMethods.extraOptimizations.contains(
          ApplyHyperspace)) {
        sparkSession.sessionState.experimentalMethods.extraOptimizations ++=
          ApplyHyperspace :: Nil
      }
      if (!sparkSession.sessionState.experimentalMethods.extraStrategies.contains(
          BucketUnionStrategy)) {
        sparkSession.sessionState.experimentalMethods.extraStrategies ++=
          BucketUnionStrategy :: Nil
      }
    }
  }
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put addOptimizationsIfNeeded() in a companion object HyperspaceSparkSessionExtension and call the method from here and enableHyperspace().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clee704 can you check whether I implemented as you expected?

@paryoja paryoja requested a review from clee704 November 12, 2021 02:46
*
* @param sparkSession Spark session that will use Hyperspace
*/
def addOptimizationsIfNeeded(sparkSession: SparkSession): Unit = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to locate the function in package.scala

@sezruby sezruby merged commit d8c4b79 into microsoft:master Nov 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST]: Enable hyperspace with SparkSessionExtention

3 participants