How to use termWeights option

Hi! My name is Matt Britton, I'm a student at Georgia Tech. My advisor is Alex Endert, a member of John Stasko's department.

I am working on a project to use Sententrees in a visualization of threaded replies in a forum (e.g. Reddit). My objective is to make it easier to navigate and summarize a large conversation.

I have a prototype created with a working Sententree, but the algorithm tends to choose irrelevant words with low content value, e.g. I, would, think, not, like, etc. My guess is that these words predominate because the text in a forum, unlike tweets, has a lot more structure and includes more prepositions, articles, conjunctions, etc. than the corpus used in your examples. 

I'm looking at ways to address this, and before I do my own text preprocessing, I'd like to investigate the "termWeights" object that can be passed to SententreeModel() as part of the "options" parameter. It looks like this value is parsed and passed to SententreeModel.growSeq(), but from what I can tell, it is not actually implemented there yet.

Can you confirm that my understanding of this code is correct? If so, I may choose to implement weighting myself - can you give me a sense of what you envisioned for this feature and how you intended it to function?

Best,

Matt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use termWeights option #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to use termWeights option #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions