Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1220 commits
Select commit Hold shift + click to select a range
9c53057
[SPARK-16536][SQL][PYSPARK][MINOR] Expose `sql` in PySpark Shell
dongjoon-hyun Jul 14, 2016
39c836e
[SPARK-16503] SparkSession should provide Spark version
lw-lin Jul 14, 2016
db7317a
[SPARK-16448] RemoveAliasOnlyProject should not remove alias with met…
cloud-fan Jul 14, 2016
252d4f2
[SPARK-16500][ML][MLLIB][OPTIMIZER] add LBFGS convergence warning for…
WeichenXu123 Jul 14, 2016
e3f8a03
[SPARK-16403][EXAMPLES] Cleanup to remove unused imports, consistent …
BryanCutler Jul 14, 2016
c4bc2ed
[SPARK-14963][MINOR][YARN] Fix typo in YarnShuffleService recovery fi…
jerryshao Jul 14, 2016
b7b5e17
[SPARK-16505][YARN] Optionally propagate error during shuffle service…
Jul 14, 2016
1b5c9e5
[SPARK-16530][SQL][TRIVIAL] Wrong Parser Keyword in ALTER TABLE CHANG…
gatorsmile Jul 14, 2016
56183b8
[SPARK-16543][SQL] Rename the columns of `SHOW PARTITION/COLUMNS` com…
dongjoon-hyun Jul 14, 2016
093ebbc
[SPARK-16509][SPARKR] Rename window.partitionBy and window.orderBy to…
sun-rui Jul 14, 2016
12005c8
[SPARK-16538][SPARKR] fix R call with namespace operator on SparkSess…
felixcheung Jul 14, 2016
c576f9f
[SPARK-16529][SQL][TEST] `withTempDatabase` should set `default` data…
dongjoon-hyun Jul 14, 2016
31ca741
[SPARK-16528][SQL] Fix NPE problem in HiveClientImpl
jacek-lewandowski Jul 14, 2016
91575ca
[SPARK-16540][YARN][CORE] Avoid adding jars twice for Spark running o…
jerryshao Jul 14, 2016
01c4c1f
[SPARK-16553][DOCS] Fix SQL example file name in docs
shivaram Jul 14, 2016
972673a
[SPARK-16555] Work around Jekyll error-handling bug which led to sile…
JoshRosen Jul 14, 2016
2e4075e
[SPARK-16557][SQL] Remove stale doc in sql/README.md
rxin Jul 15, 2016
1832423
[SPARK-16546][SQL][PYSPARK] update python dataframe.drop
WeichenXu123 Jul 15, 2016
71ad945
[SPARK-16426][MLLIB] Fix bug that caused NaNs in IsotonicRegression
neggert Jul 15, 2016
5ffd5d3
[SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLl…
jkbradley Jul 15, 2016
611a8ca
[SPARK-16538][SPARKR] Add more tests for namespace call to SparkSessi…
felixcheung Jul 15, 2016
b2f24f9
[SPARK-16230][CORE] CoarseGrainedExecutorBackend to self kill if ther…
tejasapatil Jul 15, 2016
a1ffbad
[SPARK-16582][SQL] Explicitly define isNull = false for non-nullable …
sameeragarwal Jul 16, 2016
5ec0d69
[SPARK-3359][DOCS] More changes to resolve javadoc 8 errors that will…
srowen Jul 16, 2016
4167304
[SPARK-16112][SPARKR] Programming guide for gapply/gapplyCollect
Jul 16, 2016
c33e4b0
[SPARK-16507][SPARKR] Add a CRAN checker, fix Rd aliases
shivaram Jul 17, 2016
7b84758
[SPARK-16584][SQL] Move regexp unit tests to RegexpExpressionsSuite
rxin Jul 17, 2016
d27fe9b
[SPARK-16027][SPARKR] Fix R tests SparkSession init/stop
felixcheung Jul 18, 2016
480c870
[SPARK-16588][SQL] Deprecate monotonicallyIncreasingId in Scala/Java
rxin Jul 18, 2016
a529fc9
[MINOR][TYPO] fix fininsh typo
WeichenXu123 Jul 18, 2016
8ea3f4e
[SPARK-16055][SPARKR] warning added while using sparkPackages with sp…
krishnakalyan3 Jul 18, 2016
2877f1a
[SPARK-16351][SQL] Avoid per-record type dispatch in JSON when writing
HyukjinKwon Jul 18, 2016
96e9afa
[SPARK-16515][SQL] set default record reader and writer for script tr…
adrian-wang Jul 18, 2016
75f0efe
[SPARKR][DOCS] minor code sample update in R programming guide
felixcheung Jul 18, 2016
ea78edb
[SPARK-16590][SQL] Improve LogicalPlanToSQLSuite to check generated S…
dongjoon-hyun Jul 19, 2016
c4524f5
[HOTFIX] Fix Scala 2.10 compilation
rxin Jul 19, 2016
69c7730
[SPARK-16615][SQL] Expose sqlContext in SparkSession
rxin Jul 19, 2016
e5fbb18
[MINOR] Remove unused arg in als.py
zhengruifeng Jul 19, 2016
1426a08
[SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update
liancheng Jul 19, 2016
6ee40d2
[DOC] improve python doc for rdd.histogram and dataframe.join
mortada Jul 19, 2016
556a943
[MINOR][BUILD] Fix Java Linter `LineLength` errors
dongjoon-hyun Jul 19, 2016
21a6dd2
[SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant de…
keypointt Jul 19, 2016
6caa220
[MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar
ahmed-mahran Jul 19, 2016
8310c07
[SPARK-16600][MLLIB] fix some latex formula syntax error
WeichenXu123 Jul 19, 2016
6c4b9f4
[SPARK-16395][STREAMING] Fail if too many CheckpointWriteHandlers are…
srowen Jul 19, 2016
5d92326
[SPARK-16478] graphX (added graph caching in strongly connected compo…
Jul 19, 2016
6708914
[SPARK-16494][ML] Upgrade breeze version to 0.12
yanboliang Jul 19, 2016
0bd76e8
[SPARK-16620][CORE] Add back the tokenization process in `RDD.pipe(co…
lw-lin Jul 19, 2016
162d04a
[SPARK-16602][SQL] `Nvl` function should support numeric-string cases
dongjoon-hyun Jul 19, 2016
2ae7b88
[SPARK-15705][SQL] Change the default value of spark.sql.hive.convert…
yhuai Jul 19, 2016
004e29c
[SPARK-14702] Make environment of SparkLauncher launched process more…
Jul 20, 2016
9674af6
[SPARK-16568][SQL][DOCUMENTATION] update sql programming guide refres…
WeichenXu123 Jul 20, 2016
fc23263
[SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to Sp…
shivaram Jul 20, 2016
75146be
[SPARK-16632][SQL] Respect Hive schema when merging parquet schema.
Jul 20, 2016
0dc79ff
[SPARK-16440][MLLIB] Destroy broadcasted variables even on driver
Jul 20, 2016
95abbe5
[SPARK-15923][YARN] Spark Application rest api returns 'no such app: …
weiqingy Jul 20, 2016
4b079dc
[SPARK-16613][CORE] RDD.pipe returns values for empty partitions
srowen Jul 20, 2016
b9bab4d
[SPARK-15951] Change Executors Page to use datatables to support sort…
kishorvpatil Jul 20, 2016
e3cd5b3
[SPARK-16634][SQL] Workaround JVM bug by moving some code out of ctor.
Jul 20, 2016
e651900
[SPARK-16344][SQL] Decoding Parquet array of struct with a single fie…
liancheng Jul 20, 2016
75a06aa
[SPARK-16272][CORE] Allow config values to reference conf, env, syste…
Jul 21, 2016
cfa5ae8
[SPARK-16644][SQL] Aggregate should not propagate constraints contain…
cloud-fan Jul 21, 2016
1bf13ba
[MINOR][DOCS][STREAMING] Minor docfix schema of csv rather than parqu…
holdenk Jul 21, 2016
864b764
[SPARK-16226][SQL] Weaken JDBC isolation level to avoid locking when …
srowen Jul 21, 2016
8674054
[SPARK-16632][SQL] Use Spark requested schema to guide vectorized Par…
liancheng Jul 21, 2016
6203668
[SPARK-16640][SQL] Add codegen for Elt function
viirya Jul 21, 2016
69626ad
[SPARK-16632][SQL] Revert PR #14272: Respect Hive schema when merging…
liancheng Jul 21, 2016
235cb25
[SPARK-16194] Mesos Driver env vars
Jul 21, 2016
9abd99b
[SPARK-16656][SQL] Try to make CreateTableAsSelectSuite more stable
yhuai Jul 21, 2016
43a9b68
Copied parameters over from Estimator to Transformer
Nov 19, 2015
4b1e757
Copied parameters over from Estimator to Transformer
Nov 19, 2015
c59490c
[SPARK-10931] Fixed conflicts
May 12, 2016
d80a9e7
[SPARK-10931][PYSPARK][ML] PySpark ML Model updated to newest Spark c…
Jul 21, 2016
e56244b
[SPARK-10931][PYSPARK][ML] PySpark ML Model updated to newest Spark c…
Jul 21, 2016
46f80a3
[SPARK-16334] Maintain single dictionary per row-batch in vectorized …
sameeragarwal Jul 21, 2016
df2c6d5
[SPARK-16287][SQL] Implement str_to_map SQL function
techaddict Jul 22, 2016
94f14b5
[SPARK-16556][SPARK-16559][SQL] Fix Two Bugs in Bucket Specification
gatorsmile Jul 22, 2016
e1bd70f
[SPARK-16287][HOTFIX][BUILD][SQL] Fix annotation argument needs to be…
jaceklaskowski Jul 22, 2016
2c72a44
[SPARK-16487][STREAMING] Fix some batches might not get marked as ful…
ahmed-mahran Jul 22, 2016
b4e16bd
[GIT] add pydev & Rstudio project file to gitignore list
WeichenXu123 Jul 22, 2016
6c56fff
[SPARK-16650] Improve documentation of spark.task.maxFailures
Jul 22, 2016
47f5b88
[SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description…
dongjoon-hyun Jul 22, 2016
e10b874
[SPARK-16622][SQL] Fix NullPointerException when the returned value o…
viirya Jul 23, 2016
25db516
[SPARK-16561][MLLIB] fix multivarOnlineSummary min/max bug
WeichenXu123 Jul 23, 2016
ab6e4ae
[SPARK-16662][PYSPARK][SQL] fix HiveContext warning bug
WeichenXu123 Jul 23, 2016
86c2752
[SPARK-16690][TEST] rename SQLTestUtils.withTempTable to withTempView
cloud-fan Jul 23, 2016
53b2456
[SPARK-16380][EXAMPLES] Update SQL examples and programming guide for…
liancheng Jul 23, 2016
e3c7039
[MINOR] Close old PRs that should be closed but have not been
srowen Jul 24, 2016
d6795c7
[SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/Windows...
lw-lin Jul 24, 2016
cc1d2dc
[SPARK-16463][SQL] Support `truncate` option in Overwrite mode for JD…
dongjoon-hyun Jul 24, 2016
37bed97
[PYSPARK] add picklable SparseMatrix in pyspark.ml.common
WeichenXu123 Jul 24, 2016
23e047f
[SPARK-16416][CORE] force eager creation of loggers to avoid shutdown…
Jul 24, 2016
1221ce0
[SPARK-16645][SQL] rename CatalogStorageFormat.serdeProperties to pro…
cloud-fan Jul 25, 2016
daace60
[SPARK-5581][CORE] When writing sorted map output file, avoid open / …
bchocho Jul 25, 2016
468a3c3
[SPARK-16699][SQL] Fix performance bug in hash aggregate on long stri…
ooq Jul 25, 2016
68b4020
[SPARK-16648][SQL] Make ignoreNullsExpr a child expression of First a…
liancheng Jul 25, 2016
7ffd99e
[SPARK-16674][SQL] Avoid per-record type dispatch in JDBC when reading
HyukjinKwon Jul 25, 2016
d27d362
[SPARK-16660][SQL] CreateViewCommand should not take CatalogTable
cloud-fan Jul 25, 2016
64529b1
[SPARK-16691][SQL] move BucketSpec to catalyst module and use it in C…
cloud-fan Jul 25, 2016
d6a5217
[SPARK-16668][TEST] Test parquet reader for row groups containing bot…
sameeragarwal Jul 25, 2016
79826f3
[SPARK-16698][SQL] Field names having dots should be allowed for data…
HyukjinKwon Jul 25, 2016
7ea6d28
[SPARK-16703][SQL] Remove extra whitespace in SQL generation for wind…
liancheng Jul 25, 2016
b73defd
[SPARKR][DOCS] fix broken url in doc
felixcheung Jul 25, 2016
ad3708e
[SPARK-16653][ML][OPTIMIZER] update ANN convergence tolerance param d…
WeichenXu123 Jul 25, 2016
dd784a8
[SPARK-16685] Remove audit-release scripts.
rxin Jul 25, 2016
978cd5f
[SPARK-15271][MESOS] Allow force pulling executor docker images
philipphoffmann Jul 25, 2016
3b6e1d0
[SPARK-16485][DOC][ML] Fixed several inline formatting in ml features…
lins05 Jul 25, 2016
fc17121
Revert "[SPARK-15271][MESOS] Allow force pulling executor docker images"
JoshRosen Jul 25, 2016
cda4603
[SQL][DOC] Fix a default name for parquet compression
maropu Jul 25, 2016
f5ea7fe
[SPARK-16166][CORE] Also take off-heap memory usage into consideratio…
jerryshao Jul 25, 2016
12f490b
[SPARK-16715][TESTS] Fix a potential ExprId conflict for Subexpressio…
zsxwing Jul 25, 2016
c979c8b
[SPARK-14131][STREAMING] SQL Improved fix for avoiding potential dead…
tdas Jul 25, 2016
db36e1e
[SPARK-15590][WEBUI] Paginate Job Table in Jobs tab
nblintao Jul 26, 2016
e164a04
[SPARK-16722][TESTS] Fix a StreamingContext leak in StreamingContextS…
zsxwing Jul 26, 2016
3fc4566
[SPARK-16678][SPARK-16677][SQL] Fix two View-related bugs
gatorsmile Jul 26, 2016
ba0aade
Fix description of spark.speculation.quantile
nwbvt Jul 26, 2016
8a8d26f
[SPARK-16672][SQL] SQLBuilder should not raise exceptions on EXISTS q…
dongjoon-hyun Jul 26, 2016
f99e34e
[SPARK-16724] Expose DefinedByConstructorParams
marmbrus Jul 26, 2016
815f3ee
[SPARK-16633][SPARK-16642][SPARK-16721][SQL] Fixes three issues relat…
yhuai Jul 26, 2016
7b06a89
[SPARK-16686][SQL] Remove PushProjectThroughSample since it is handle…
viirya Jul 26, 2016
6959061
[SPARK-16706][SQL] support java map in encoder
cloud-fan Jul 26, 2016
03c2743
[TEST][STREAMING] Fix flaky Kafka rate controlling test
tdas Jul 26, 2016
3b2b785
[SPARK-16675][SQL] Avoid per-record type dispatch in JDBC when writing
HyukjinKwon Jul 26, 2016
4c96955
[SPARK-16697][ML][MLLIB] improve LDA submitMiniBatch method to avoid …
WeichenXu123 Jul 26, 2016
a2abb58
[SPARK-16663][SQL] desc table should be consistent between data sourc…
cloud-fan Jul 26, 2016
0869b3a
[SPARK-15271][MESOS] Allow force pulling executor docker images
philipphoffmann Jul 26, 2016
0b71d9a
[SPARK-15703][SCHEDULER][CORE][WEBUI] Make ListenerBus event queue si…
dhruve Jul 26, 2016
738b4cc
[SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGenerator
ooq Jul 27, 2016
5b8e848
[SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
dongjoon-hyun Jul 27, 2016
ef0ccbc
[SPARK-16729][SQL] Throw analysis exception for invalid date casts
petermaxlee Jul 27, 2016
3c3371b
[MINOR][ML] Fix some mistake in LinearRegression formula.
yanboliang Jul 27, 2016
045fc36
[MINOR][DOC][SQL] Fix two documents regarding size in bytes
viirya Jul 27, 2016
7e8279f
[SPARK-15254][DOC] Improve ML pipeline Cross Validation Scaladoc & PyDoc
krishnakalyan3 Jul 27, 2016
70f846a
[SPARK-5847][CORE] Allow for configuring MetricsSystem's use of app I…
markgrover Jul 27, 2016
bc4851a
[MINOR][DOC] missing keyword new
Jul 27, 2016
b14d7b5
[SPARK-16110][YARN][PYSPARK] Fix allowing python version to be specif…
KevinGrealish Jul 27, 2016
11d427c
[SPARK-16730][SQL] Implement function aliases for type casts
petermaxlee Jul 28, 2016
5c2ae79
[SPARK-15232][SQL] Add subquery SQL building tests to LogicalPlanToSQ…
dongjoon-hyun Jul 28, 2016
762366f
[SPARK-16552][SQL] Store the Inferred Schemas into External Catalog T…
gatorsmile Jul 28, 2016
9ade77c
[SPARK-16639][SQL] The query with having condition that contains grou…
viirya Jul 28, 2016
1178d61
[SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap
sylvinus Jul 28, 2016
3fd39b8
[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on O…
sameeragarwal Jul 28, 2016
274f3b9
[SPARK-16772] Correct API doc references to PySpark classes + formatt…
nchammas Jul 28, 2016
d1d5069
[SPARK-16664][SQL] Fix persist call on Data frames with more than 200…
Jul 29, 2016
0557a45
[SPARK-16750][ML] Fix GaussianMixture training failed due to feature …
yanboliang Jul 29, 2016
04a2c07
[SPARK-16751] Upgrade derby to 10.12.1.1
a-roberts Jul 29, 2016
266b92f
[SPARK-16637] Unified containerizer
Jul 29, 2016
2c15323
[SPARK-16761][DOC][ML] Fix doc link in docs/ml-guide.md
sundapeng Jul 29, 2016
2182e43
[SPARK-16772][PYTHON][DOCS] Restore "datatype string" to Python API d…
nchammas Jul 29, 2016
bbc2475
[SPARK-16748][SQL] SparkExceptions during planning should not wrapped…
tdas Jul 30, 2016
0dc4310
[SPARK-16694][CORE] Use for/foreach rather than map for Unit expressi…
srowen Jul 30, 2016
bce354c
[SPARK-16696][ML][MLLIB] destroy KMeans bcNewCenters when loop finish…
WeichenXu123 Jul 30, 2016
a6290e5
[SPARK-16800][EXAMPLES][ML] Fix Java examples that fail to run due to…
BryanCutler Jul 30, 2016
957a8ab
[SPARK-16818] Exchange reuse incorrectly reuses scans over different …
ericl Jul 31, 2016
7c27d07
[SPARK-16812] Open up SparkILoop.getAddedJars
rxin Jul 31, 2016
064d91f
[SPARK-16813][SQL] Remove private[sql] and private[spark] from cataly…
rxin Jul 31, 2016
301fb0d
[SPARK-16731][SQL] use StructType in CatalogTable and remove CatalogC…
cloud-fan Aug 1, 2016
579fbcf
[SPARK-16805][SQL] Log timezone when query result does not match
rxin Aug 1, 2016
64d8f37
[SPARK-16726][SQL] Improve `Union/Intersect/Except` error messages on…
dongjoon-hyun Aug 1, 2016
2a0de7d
[SPARK-16485][DOC][ML] Remove useless latex in a log messge.
lins05 Aug 1, 2016
1e9b59b
[SPARK-16778][SQL][TRIVIAL] Fix deprecation warning with SQLContext
holdenk Aug 1, 2016
f93ad4f
[SPARK-16776][STREAMING] Replace deprecated API in KafkaTestUtils for…
HyukjinKwon Aug 1, 2016
338a98d
[SPARK-16791][SQL] cast struct with timestamp field fails
Aug 1, 2016
ab1e761
[SPARK-16774][SQL] Fix use of deprecated timestamp constructor & impr…
holdenk Aug 1, 2016
03d46aa
[SPARK-15869][STREAMING] Fix a potential NPE in StreamingJobProgressL…
zsxwing Aug 1, 2016
2eedc00
[SPARK-16828][SQL] remove MaxOf and MinOf
cloud-fan Aug 2, 2016
5184df0
[SPARK-16793][SQL] Set the temporary warehouse path to sc'conf in Tes…
jiangxb1987 Aug 2, 2016
10e1c0e
[SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings
liancheng Aug 2, 2016
a1ff72e
[SPARK-16850][SQL] Improve type checking error message for greatest/l…
petermaxlee Aug 2, 2016
d9e0919
[SPARK-16851][ML] Incorrect threshould length in 'setThresholds()' ev…
zhengruifeng Aug 2, 2016
dd8514f
[SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use M…
yinxusen Aug 2, 2016
511dede
[SPARK-15541] Casting ConcurrentHashMap to ConcurrentMap (master branch)
Aug 2, 2016
36827dd
[SPARK-16822][DOC] Support latex in scaladoc.
lins05 Aug 2, 2016
1dab63d
[SPARK-16837][SQL] TimeWindow incorrectly drops slideDuration in cons…
tmagrino Aug 2, 2016
146001a
[SPARK-16062] [SPARK-15989] [SQL] Fix two bugs of Python-only UDTs
viirya Aug 2, 2016
2330f3e
[SPARK-16836][SQL] Add support for CURRENT_DATE/CURRENT_TIMESTAMP lit…
hvanhovell Aug 2, 2016
cbdff49
[SPARK-16816] Modify java example which is also reflect in documentat…
phalodi Aug 2, 2016
a9beeaa
[SPARK-16855][SQL] move Greatest and Least from conditionalExpression…
cloud-fan Aug 2, 2016
e9fc0b6
[SPARK-16787] SparkContext.addFile() should not throw if called twice…
JoshRosen Aug 2, 2016
b73a570
[SPARK-16858][SQL][TEST] Removal of TestHiveSharedState
gatorsmile Aug 2, 2016
3861273
[SPARK-16796][WEB UI] Visible passwords on Spark environment page
Devian-ua Aug 2, 2016
ae22628
[SQL][MINOR] use stricter type parameter to make it clear that parque…
cloud-fan Aug 3, 2016
639df04
[SPARK-16831][PYTHON] Fixed bug in CrossValidator.avgMetrics
pkch Aug 3, 2016
b55f343
[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's…
cloud-fan Aug 3, 2016
e6f226c
[SPARK-16596] [SQL] Refactor DataSourceScanExec to do partition disco…
ericl Aug 3, 2016
685b08e
[SPARK-14204][SQL] register driverClass rather than user-specified class
mchalek Aug 3, 2016
4775eb4
[SPARK-16770][BUILD] Fix JLine dependency management and version (Sca…
stsc-pentasys Aug 4, 2016
c5eb1df
[SPARK-16814][SQL] Fix deprecated parquet constructor usage
holdenk Aug 4, 2016
583d91a
[SPARK-16873][CORE] Fix SpillReader NPE when spillFile has no data
sharkdtu Aug 4, 2016
780c722
[MINOR][SQL] Fix minor formatting issue of SortAggregateExec.toString
liancheng Aug 4, 2016
27e815c
[SPARK-16888][SQL] Implements eval method for expression AssertNotNull
clockfly Aug 4, 2016
43f4fd6
[SPARK-16867][SQL] createTable and alterTable in ExternalCatalog shou…
cloud-fan Aug 4, 2016
9d7a474
[SPARK-16853][SQL] fixes encoder error in DataSet typed select
clockfly Aug 4, 2016
9d4e621
[SPARK-16802] [SQL] fix overflow in LongToUnsafeRowMap
Aug 4, 2016
ac2a26d
[SPARK-16884] Move DataSourceScanExec out of ExistingRDD.scala file
ericl Aug 4, 2016
be8ea4b
[SPARK-16875][SQL] Add args checking for DataSet randomSplit and sample
zhengruifeng Aug 4, 2016
462784f
[SPARK-16880][ML][MLLIB] make ann training data persisted if needed
WeichenXu123 Aug 4, 2016
1d78157
[SPARK-16877][BUILD] Add rules for preventing to use Java annotations…
HyukjinKwon Aug 4, 2016
0e2e5d7
[SPARK-16863][ML] ProbabilisticClassifier.fit check threshoulds' length
zhengruifeng Aug 4, 2016
9c15d07
[SPARK-15074][SHUFFLE] Cache shuffle index file to speedup shuffle fetch
Aug 4, 2016
d91c675
[HOTFIX] Remove unnecessary imports from #12944 that broke build
JoshRosen Aug 4, 2016
53e766c
MAINTENANCE. Cleaning up stale PRs.
Aug 4, 2016
1fa6444
[SPARK-16907][SQL] Fix performance regression for parquet table when …
clockfly Aug 5, 2016
faaefab
[SPARK-15726][SQL] Make DatasetBenchmark fairer among Dataset, DataFr…
inouehrs Aug 5, 2016
5effc01
[SPARK-16879][SQL] unify logical plans for CREATE TABLE and CTAS
cloud-fan Aug 5, 2016
c9f2501
[SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration
koeninger Aug 5, 2016
e026064
[MINOR] Update AccumulatorV2 doc to not mention "+=".
petermaxlee Aug 5, 2016
39a2b2e
[SPARK-16625][SQL] General data types to be mapped to Oracle
wangyum Aug 5, 2016
2460f03
[SPARK-16826][SQL] Switch to java.net.URI for parse_url()
sylvinus Aug 5, 2016
180fd3e
[SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs
BryanCutler Aug 5, 2016
1f96c97
[SPARK-13238][CORE] Add ganglia dmax parameter
ekasitk Aug 5, 2016
6cbde33
[SPARK-16750][FOLLOW-UP][ML] Add transformSchema for StringIndexer/Ve…
yanboliang Aug 5, 2016
e679bc3
[SPARK-16901] Hive settings in hive-site.xml may be overridden by Hiv…
yhuai Aug 5, 2016
55d6dad
[SPARK-16847][SQL] Prevent to potentially read corrupt statstics on b…
HyukjinKwon Aug 6, 2016
14dba45
[SPARK-16796][WEB UI] Mask spark.authenticate.secret on Spark environ…
Devian-ua Aug 6, 2016
2dd0388
[SPARK-16772][PYTHON][DOCS] Fix API doc references to UDFRegistration…
nchammas Aug 6, 2016
4f5f9b6
[SPARK-16925] Master should call schedule() after all executor exit e…
JoshRosen Aug 7, 2016
7aaa5a0
document that Mesos cluster mode supports python
Aug 7, 2016
b1ebe18
[SPARK-16932][DOCS] Changed programming guide to not reference old ac…
BryanCutler Aug 7, 2016
1275f64
[SPARK-16870][DOCS] Summary:add "spark.sql.broadcastTimeout" into doc…
biglobster Aug 7, 2016
6c1ecb1
[SPARK-16911] Fix the links in the programming guide
shiv4nsh Aug 7, 2016
bdfab9f
[SPARK-16909][SPARK CORE] Streaming for postgreSQL JDBC driver
princejwesley Aug 7, 2016
8d87252
[SPARK-16409][SQL] regexp_extract with optional groups causes NPE
srowen Aug 7, 2016
a16983c
[SPARK-16939][SQL] Fix build error by using `Tuple1` explicitly in St…
dongjoon-hyun Aug 7, 2016
e076fb0
[SPARK-16919] Configurable update interval for console progress bar
tejasapatil Aug 8, 2016
1db1c65
[SPARK-16404][ML] LeastSquaresAggregators serializes unnecessary data
sethah Aug 8, 2016
e10ca8d
[SPARK-16945] Fix Java Lint errors
weiqingy Aug 8, 2016
06f5dc8
[SPARK-16804][SQL] Correlated subqueries containing non-deterministic…
nsyca Aug 8, 2016
94a9d11
[SPARK-16906][SQL] Adds auxiliary info like input class and input sch…
clockfly Aug 8, 2016
ab12690
[SPARK-16457][SQL] Fix Wrong Messages when CTAS with a Partition By C…
gatorsmile Aug 8, 2016
5959df2
[SPARK-16936][SQL] Case Sensitivity Support for Refresh Temp Table
gatorsmile Aug 8, 2016
1739e75
[SPARK-16586][CORE] Handle JVM errors printed to stdout.
Aug 8, 2016
8650239
[SPARK-16953] Make requestTotalExecutors public Developer API to be c…
tdas Aug 8, 2016
9216901
[SPARK-16779][TRIVIAL] Avoid using postfix operators where they do no…
holdenk Aug 8, 2016
53d1c78
Update docs to include SASL support for RPC
Aug 8, 2016
df10658
[SPARK-16749][SQL] Simplify processing logic in LEAD/LAG processing.
hvanhovell Aug 8, 2016
bca43cd
[SPARK-16898][SQL] Adds argument type information for typed logical p…
clockfly Aug 9, 2016
e17a76e
[SPARK-16563][SQL] fix spark sql thrift server FetchResults bug
Aug 9, 2016
bb2b9d0
[SPARK-16610][SQL] Add `orc.compress` as an alias for `compression` o…
HyukjinKwon Aug 9, 2016
801e4d0
[SPARK-16606][CORE] Misleading warning for SparkContext.getOrCreate "…
srowen Aug 9, 2016
af710e5
[SPARK-16522][MESOS] Spark application throws exception on exit.
sun-rui Aug 9, 2016
2154345
[SPARK-16940][SQL] `checkAnswer` should raise `TestFailedException` f…
dongjoon-hyun Aug 9, 2016
62e6212
[SPARK-16809] enable history server links in dispatcher UI
Aug 9, 2016
511f52f
[SPARK-16964][SQL] Remove private[sql] and private[spark] from sql.ex…
rxin Aug 9, 2016
182e119
[SPARK-16933][ML] Fix AFTAggregator in AFTSurvivalRegression serializ…
yanboliang Aug 9, 2016
29081b5
[SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.c…
Aug 9, 2016
92da228
[SPARK-16905] SQL DDL: MSCK REPAIR TABLE
Aug 9, 2016
b89b3a5
[SPARK-16956] Make ApplicationState.MAX_NUM_RETRY configurable
JoshRosen Aug 9, 2016
7dc72ae
Copied parameters over from Estimator to Transformer
Nov 19, 2015
b3c6578
Merge branch 'SPARK-10931-pyspark-mllib' of github.com:evanyc15/spark…
Aug 11, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
.idea/
.idea_modules/
.project
.pydevproject
.scala_dependencies
.settings
/lib/
Expand Down Expand Up @@ -72,7 +73,12 @@ metastore/
metastore_db/
sql/hive-thriftserver/test_warehouses
warehouse/
spark-warehouse/

# For R session data
.RData
.RHistory
.Rhistory
*.Rproj
*.Rproj.*

51 changes: 51 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Spark provides this Travis CI configuration file to help contributors
# check Scala/Java style conformance and JDK7/8 compilation easily
# during their preparing pull requests.
# - Scalastyle is executed during `maven install` implicitly.
# - Java Checkstyle is executed by `lint-java`.
# See the related discussion here.
# https://github.com/apache/spark/pull/12980

# 1. Choose OS (Ubuntu 14.04.3 LTS Server Edition 64bit, ~2 CORE, 7.5GB RAM)
sudo: required
dist: trusty

# 2. Choose language and target JDKs for parallel builds.
language: java
jdk:
- oraclejdk7
- oraclejdk8

# 3. Setup cache directory for SBT and Maven.
cache:
directories:
- $HOME/.sbt
- $HOME/.m2

# 4. Turn off notifications.
notifications:
email: false

# 5. Run maven install before running lint-java.
install:
- export MAVEN_SKIP_RC=1
- build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install

# 6. Run lint-java.
script:
- dev/lint-java
3 changes: 2 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.9.2 - http://py4j.sourceforge.net/)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.1 - http://py4j.sourceforge.net/)
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
(BSD licence) sbt and sbt-launch-lib.bash
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
Expand Down Expand Up @@ -296,3 +296,4 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(MIT License) blockUI (http://jquery.malsup.com/block/)
(MIT License) RowsGroup (http://datatables.net/license/mit)
(MIT License) jsonFormatter (http://www.jqueryscript.net/other/jQuery-Plugin-For-Pretty-JSON-Formatting-jsonFormatter.html)
(MIT License) modernizr (https://github.com/Modernizr/Modernizr/blob/master/LICENSE)
13 changes: 5 additions & 8 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Apache Spark
Copyright 2014 The Apache Software Foundation.
Copyright 2014 and onwards The Apache Software Foundation.

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Expand All @@ -12,7 +12,9 @@ Common Development and Distribution License 1.0
The following components are provided under the Common Development and Distribution License 1.0. See project link for details.

(CDDL 1.0) Glassfish Jasper (org.mortbay.jetty:jsp-2.1:6.1.14 - http://jetty.mortbay.org/project/modules/jsp-2.1)
(CDDL 1.0) JAX-RS (https://jax-rs-spec.java.net/)
(CDDL 1.0) Servlet Specification 2.5 API (org.mortbay.jetty:servlet-api-2.5:6.1.14 - http://jetty.mortbay.org/project/modules/servlet-api-2.5)
(CDDL 1.0) (GPL2 w/ CPE) javax.annotation API (https://glassfish.java.net/nonav/public/CDDL+GPL.html)
(COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0) (GNU General Public Library) Streaming API for XML (javax.xml.stream:stax-api:1.0-2 - no url defined)
(Common Development and Distribution License (CDDL) v1.0) JavaBeans Activation Framework (JAF) (javax.activation:activation:1.1 - http://java.sun.com/products/javabeans/jaf/index.jsp)

Expand All @@ -22,15 +24,10 @@ Common Development and Distribution License 1.1

The following components are provided under the Common Development and Distribution License 1.1. See project link for details.

(CDDL 1.1) (GPL2 w/ CPE) org.glassfish.hk2 (https://hk2.java.net)
(CDDL 1.1) (GPL2 w/ CPE) JAXB API bundle for GlassFish V3 (javax.xml.bind:jaxb-api:2.2.2 - https://jaxb.dev.java.net/)
(CDDL 1.1) (GPL2 w/ CPE) JAXB RI (com.sun.xml.bind:jaxb-impl:2.2.3-1 - http://jaxb.java.net/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-core (com.sun.jersey:jersey-core:1.8 - https://jersey.dev.java.net/jersey-core/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-core (com.sun.jersey:jersey-core:1.9 - https://jersey.java.net/jersey-core/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-guice (com.sun.jersey.contribs:jersey-guice:1.9 - https://jersey.java.net/jersey-contribs/jersey-guice/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-json (com.sun.jersey:jersey-json:1.8 - https://jersey.dev.java.net/jersey-json/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-json (com.sun.jersey:jersey-json:1.9 - https://jersey.java.net/jersey-json/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-server (com.sun.jersey:jersey-server:1.8 - https://jersey.dev.java.net/jersey-server/)
(CDDL 1.1) (GPL2 w/ CPE) jersey-server (com.sun.jersey:jersey-server:1.9 - https://jersey.java.net/jersey-server/)
(CDDL 1.1) (GPL2 w/ CPE) Jersey 2 (https://jersey.java.net)

========================================================================
Common Public License 1.0
Expand Down
12 changes: 6 additions & 6 deletions R/DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# SparkR Documentation

SparkR documentation is generated using in-source comments annotated using using
`roxygen2`. After making changes to the documentation, to generate man pages,
SparkR documentation is generated by using in-source comments and annotated by using
[`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/index.html). After making changes to the documentation and generating man pages,
you can run the following from an R console in the SparkR home directory

library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))

```R
library(devtools)
devtools::document(pkg="./pkg", roclets=c("rd"))
```
You can verify if your changes are good by running

R CMD check pkg/
32 changes: 18 additions & 14 deletions R/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# R on Spark

SparkR is an R package that provides a light-weight frontend to use Spark from R.

### Installing sparkR

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
```
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
./install-dev.sh
Expand All @@ -17,8 +18,9 @@ export R_HOME=/home/username/R
#### Build Spark

Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
```
build/mvn -DskipTests -Psparkr package

```bash
build/mvn -DskipTests -Psparkr package
```

#### Running sparkR
Expand All @@ -37,8 +39,8 @@ To set other options like driver memory, executor memory etc. you can pass in th

#### Using SparkR from RStudio

If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
```R
# Set this to where Spark is installed
Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
Expand All @@ -55,23 +57,25 @@ Once you have made your changes, please include unit tests for them and run exis

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
To run one of them, use `./bin/spark-submit <filename> <args>`. For example:

./bin/spark-submit examples/src/main/r/dataframe.R

You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):

R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh
```bash
./bin/spark-submit examples/src/main/r/dataframe.R
```
You can also run the unit tests for SparkR by running. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:
```bash
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./R/run-tests.sh
```

### Running on YARN

The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
```
```bash
export YARN_CONF_DIR=/etc/hadoop/conf
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
```
20 changes: 20 additions & 0 deletions R/WINDOWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,23 @@ include Rtools and R in `PATH`.
directory in Maven in `PATH`.
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`

## Unit tests

To run the SparkR unit tests on Windows, the following steps are required —assuming you are in the Spark root directory and do not have Apache Hadoop installed already:

1. Create a folder to download Hadoop related files for Windows. For example, `cd ..` and `mkdir hadoop`.

2. Download the relevant Hadoop bin package from [steveloughran/winutils](https://github.com/steveloughran/winutils). While these are not official ASF artifacts, they are built from the ASF release git hashes by a Hadoop PMC member on a dedicated Windows VM. For further reading, consult [Windows Problems on the Hadoop wiki](https://wiki.apache.org/hadoop/WindowsProblems).

3. Install the files into `hadoop\bin`; make sure that `winutils.exe` and `hadoop.dll` are present.

4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop` directory.

5. Run unit tests for SparkR by running the command below. You need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first:

```
R -e "install.packages('testthat', repos='http://cran.us.r-project.org')"
.\bin\spark-submit2.cmd --conf spark.hadoop.fs.default.name="file:///" R\pkg\tests\run-all.R
```

52 changes: 52 additions & 0 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/bin/bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

set -o pipefail
set -e

FWDIR="$(cd `dirname $0`; pwd)"
pushd $FWDIR > /dev/null

if [ ! -z "$R_HOME" ]
then
R_SCRIPT_PATH="$R_HOME/bin"
else
# if system wide R_HOME is not found, then exit
if [ ! `command -v R` ]; then
echo "Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed."
exit 1
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"

# Build the latest docs
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

# Run check as-cran.
# TODO(shivaram): Remove the skip tests once we figure out the install mechanism

VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`

"$R_SCRIPT_PATH/"R CMD check --as-cran --no-tests SparkR_"$VERSION".tar.gz

popd > /dev/null
7 changes: 6 additions & 1 deletion R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,12 @@ pushd $FWDIR > /dev/null
if [ ! -z "$R_HOME" ]
then
R_SCRIPT_PATH="$R_HOME/bin"
else
else
# if system wide R_HOME is not found, then exit
if [ ! `command -v R` ]; then
echo "Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed."
exit 1
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
Expand Down
5 changes: 5 additions & 0 deletions R/pkg/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.lintr$
^src-native$
^html$
10 changes: 5 additions & 5 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,20 +1,18 @@
Package: SparkR
Type: Package
Title: R frontend for Spark
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2013-09-09
Date: 2016-07-07
Author: The Apache Software Foundation
Maintainer: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Imports:
methods
Depends:
R (>= 3.0),
methods,
Suggests:
testthat,
e1071,
survival
Description: R frontend for Spark
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
'schema.R'
Expand All @@ -26,6 +24,7 @@ Collate:
'pairRDD.R'
'DataFrame.R'
'SQLContext.R'
'WindowSpec.R'
'backend.R'
'broadcast.R'
'client.R'
Expand All @@ -38,4 +37,5 @@ Collate:
'stats.R'
'types.R'
'utils.R'
'window.R'
RoxygenNote: 5.0.1
Loading