Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12016

word2vec load model can't use findSynonyms to get words

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.2
    • 1.5.3, 1.6.1, 2.0.0
    • PySpark
    • None
    • ubuntu 14.04

    Description

      I use word2vec.fit to train a word2vecModel and then save the model to file system. when I load the model from file system, I found I can use transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get some words.

      I use the fellow code to test word2vec

      from pyspark import SparkContext
      from pyspark.mllib.feature import Word2Vec, Word2VecModel

      import os, tempfile
      from shutil import rmtree

      if _name_ == '_main_':
      sc = SparkContext('local', 'test')
      sentence = "a b " * 100 + "a c " * 10
      localDoc = [sentence, sentence]
      doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
      model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)

      syms = model.findSynonyms("a", 2)
      print [s[0] for s in syms]
      path = tempfile.mkdtemp()
      model.save(sc, path)
      sameModel = Word2VecModel.load(sc, path)
      print model.transform("a") == sameModel.transform("a")
      syms = sameModel.findSynonyms("a", 2)
      print [s[0] for s in syms]
      try:
      rmtree(path)
      except OSError:
      pass

      I got "[u'b', u'c']" when the first printf
      then the “True” and " [u'__class__'] "
      I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2)

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              ooniuniuoo yuangang.liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: