Details
Description
I use word2vec.fit to train a word2vecModel and then save the model to file system. when I load the model from file system, I found I can use transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get some words.
I use the fellow code to test word2vec
from pyspark import SparkContext
from pyspark.mllib.feature import Word2Vec, Word2VecModel
import os, tempfile
from shutil import rmtree
if _name_ == '_main_':
sc = SparkContext('local', 'test')
sentence = "a b " * 100 + "a c " * 10
localDoc = [sentence, sentence]
doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)
syms = model.findSynonyms("a", 2)
print [s[0] for s in syms]
path = tempfile.mkdtemp()
model.save(sc, path)
sameModel = Word2VecModel.load(sc, path)
print model.transform("a") == sameModel.transform("a")
syms = sameModel.findSynonyms("a", 2)
print [s[0] for s in syms]
try:
rmtree(path)
except OSError:
pass
I got "[u'b', u'c']" when the first printf
then the “True” and " [u'__class__'] "
I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2)
Attachments
Issue Links
- is duplicated by
-
SPARK-12680 Loading Word2Vec model in pyspark gives "ValueError: too many values to unpack" in findSynonyms
- Closed
- links to