Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Implemented
-
SystemDS 3.1
-
None
Description
Bring the current tokenization and ngram logic inside the multithreaded transformencode instruction. Implement a build phase to tokenize the individual texts and remove duplicates, and an apply phase to merge the individual token lists into one – both phases are multithreaded.
Attachments
Issue Links
- links to