I want to use a pretrained embedding model on each nodes of my cluster. for that i create the following:
from gensim.models.fasttext importFastTextas FT_gensim
# Load model (loads when this library is being imported)
model = FT_gensim.load_fasttext_format("/project/6008168/bib/wiki.en.bin")
pred = model[msg]
The vector of "nights" is outputed. But suppose i have an my_rdd =(stringID, sentence) and i want to find the emebdding vector of sentence by summing up it words embedding vectors. In my solution, if sentence consist of 3 words the model will be loaded 3 times which in not efficace. How can i load the model on time per node?