[SPARK-12016] word2vec load model can't use findSynonyms to get words - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.2
Fix Version/s: 1.5.3, 1.6.1, 2.0.0
Component/s: PySpark
Labels:
None
Environment:

ubuntu 14.04

Target Version/s:

1.5.3, 1.6.1, 2.0.0
Flags:

Important

Description

I use word2vec.fit to train a word2vecModel and then save the model to file system. when I load the model from file system, I found I can use transform('a') to get a vector, but I can't use findSynonyms('a', 2) to get some words.

I use the fellow code to test word2vec

from pyspark import SparkContext
from pyspark.mllib.feature import Word2Vec, Word2VecModel

import os, tempfile
from shutil import rmtree

if _name_ == '_main_':
sc = SparkContext('local', 'test')
sentence = "a b " * 100 + "a c " * 10
localDoc = [sentence, sentence]
doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)

syms = model.findSynonyms("a", 2)
print [s[0] for s in syms]
path = tempfile.mkdtemp()
model.save(sc, path)
sameModel = Word2VecModel.load(sc, path)
print model.transform("a") == sameModel.transform("a")
syms = sameModel.findSynonyms("a", 2)
print [s[0] for s in syms]
try:
rmtree(path)
except OSError:
pass

I got "[u'b', u'c']" when the first printf
then the “True” and " [u'__class__'] "
I don't know how to get 'b' or 'c' with sameModel.findSynonyms("a", 2)

Attachments

Issue Links

is duplicated by

SPARK-12680 Loading Word2Vec model in pyspark gives "ValueError: too many values to unpack" in findSynonyms

Closed

links to

[Github] Pull Request #10100 (viirya)

Activity

People

Assignee:: L. C. Hsieh

Reporter:: yuangang.liu

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 26/Nov/15 15:03

Updated:: 07/Jan/16 03:45

Resolved:: 14/Dec/15 17:59