Apache Spark 1.6.1 发布,集群计算环境

jopen 7年前

Apache Spark 1.6.1 发布,集群计算环境

Apache Spark 1.6.1 发布了,Apache Spark 是一种与 Hadoop 相似的开源集群计算环境,但是两者之间还存在一些不同之处,这些有用的不同之处使 Spark 在某些工作负载方面表现得更加优越,换句话说,Spark 启用了内存分布数据集,除了能够提供交互式查询外,它还可以优化迭代工作负载。

Spark 是在 Scala 语言中实现的,它将 Scala 用作其应用程序框架。与 Hadoop 不同,Spark 和 Scala 能够紧密集成,其中的 Scala 可以像操作本地集合对象一样轻松地操作分布式数据集。

尽 管创建 Spark 是为了支持分布式数据集上的迭代作业,但是实际上它是对 Hadoop 的补充,可以在 Hadoo 文件系统中并行运行。通过名为 Mesos 的第三方集群框架可以支持此行为。Spark 由加州大学伯克利分校 AMP 实验室 (Algorithms, Machines, and People Lab) 开发,可用来构建大型的、低延迟的数据分析应用程序。


[SPARK-10359] - Enumerate Spark's dependencies in a file and diff against it for new pull requests

Bug 修复

  • [SPARK-7615] - MLLIB Word2Vec wordVectors divided by Euclidean Norm equals to zero

  • [SPARK-9844] - File appender race condition during SparkWorker shutdown

  • [SPARK-10524] - Decision tree binary classification with ordered categorical features: incorrect centroid

  • [SPARK-10847] - Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

  • [SPARK-11394] - PostgreDialect cannot handle BYTE types

  • [SPARK-11624] - Spark SQL CLI will set sessionstate twice

  • [SPARK-11972] - [Spark SQL] the value of 'hiveconf' parameter in CLI can't be got after enter spark-sql session

  • [SPARK-12006] - GaussianMixture.train crashes if an initial model is not None

  • [SPARK-12010] - Spark JDBC requires support for column-name-free INSERT syntax

  • [SPARK-12016] - word2vec load model can't use findSynonyms to get words

  • [SPARK-12026] - ChiSqTest gets slower and slower over time when number of features is large

  • [SPARK-12268] - pyspark shell uses execfile which breaks python3 compatibility

  • [SPARK-12300] - Fix schema inferance on local collections

  • [SPARK-12316] - Stack overflow with endless call of `Delegation token thread` when application end.

  • [SPARK-12327] - lint-r checks fail with commented code



来自: http://www.oschina.net//news/71432/apache-spark-1-6-1