Gensim: a generator is not an iterator
When using Gensim word2vec on a dataset stored in a database, I was pleased to see the library accepts an iterator to represent the corpus, allowing to process bigger-than-memory datasets. So, I wrote my generator function to stream text directly from a database, and came across a strange message:
TypeError: You can't pass a generator as the sentences argument. Try an iterator.
Looking at the code of Gensim, this is intended and is for a good reason: while Gensim is fine with iterating over the dataset, it may need to iterate on it more than once. But a generator can be consumed only once and then it’s over.
...