Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12006

GaussianMixture.train crashes if an initial model is not None

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.0, 1.5.0, 1.6.0
    • 1.4.2, 1.5.3, 1.6.1, 2.0.0
    • MLlib, PySpark
    • None

    Description

      Steps to reproduce :

      from pyspark.mllib.clustering import GaussianMixture
      from numpy import array
      
      data = sc.textFile("data/mllib/gmm_data.txt")
      parsedData = data.map(lambda line: array([float(x) for x in line.strip().split(' ')]))
      
      gmm = GaussianMixture.train(parsedData, 2)
      GaussianMixture.train(parsedData, 2, initialModel=gmm)
      

      It looks like the source of the problem is initialModelWeights NumPy array. In 1.5 / 1.6 it leads to net.razorvine.pickle.PickleException, in 1.4 we get Method trainGaussianMixture([..., class org.apache.spark.mllib.linalg.DenseVector, class java.util.ArrayList, class java.util.ArrayList]) does not exist

      Attachments

        Activity

          People

            zero323 Maciej Szymkiewicz
            zero323 Maciej Szymkiewicz
            Joseph K. Bradley Joseph K. Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: