Skip to content

How to use termWeights option #5

@the-data-pro

Description

@the-data-pro

Hi! My name is Matt Britton, I'm a student at Georgia Tech. My advisor is Alex Endert, a member of John Stasko's department.

I am working on a project to use Sententrees in a visualization of threaded replies in a forum (e.g. Reddit). My objective is to make it easier to navigate and summarize a large conversation.

I have a prototype created with a working Sententree, but the algorithm tends to choose irrelevant words with low content value, e.g. I, would, think, not, like, etc. My guess is that these words predominate because the text in a forum, unlike tweets, has a lot more structure and includes more prepositions, articles, conjunctions, etc. than the corpus used in your examples.

I'm looking at ways to address this, and before I do my own text preprocessing, I'd like to investigate the "termWeights" object that can be passed to SententreeModel() as part of the "options" parameter. It looks like this value is parsed and passed to SententreeModel.growSeq(), but from what I can tell, it is not actually implemented there yet.

Can you confirm that my understanding of this code is correct? If so, I may choose to implement weighting myself - can you give me a sense of what you envisioned for this feature and how you intended it to function?

Best,

Matt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions