restrict preprocessing to text normalization and move document term matrix to lda CB

the applicability of the preprocessing CB is quite restricted to topic modeling, since the output is not preprocessed text but the docment term matrix (DTM). Also storing the DTM creates huge files. 

Suggestion: 
- Restrict the preprocessing CB to general text preprocessing with preprocessed text as output. 
- Implement DTM as part of the lda topic modeling CB. Do not save the dtm and vocab (or, if possible, only save the dtm and vocab on demand). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

restrict preprocessing to text normalization and move document term matrix to lda CB #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

restrict preprocessing to text normalization and move document term matrix to lda CB #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions