Hi! My name is Matt Britton, I'm a student at Georgia Tech. My advisor is Alex Endert, a member of John Stasko's department.
I am working on a project to use Sententrees in a visualization of threaded replies in a forum (e.g. Reddit). My objective is to make it easier to navigate and summarize a large conversation.
I have a prototype created with a working Sententree, but the algorithm tends to choose irrelevant words with low content value, e.g. I, would, think, not, like, etc. My guess is that these words predominate because the text in a forum, unlike tweets, has a lot more structure and includes more prepositions, articles, conjunctions, etc. than the corpus used in your examples.
I'm looking at ways to address this, and before I do my own text preprocessing, I'd like to investigate the "termWeights" object that can be passed to SententreeModel() as part of the "options" parameter. It looks like this value is parsed and passed to SententreeModel.growSeq(), but from what I can tell, it is not actually implemented there yet.
Can you confirm that my understanding of this code is correct? If so, I may choose to implement weighting myself - can you give me a sense of what you envisioned for this feature and how you intended it to function?
Best,
Matt
Hi! My name is Matt Britton, I'm a student at Georgia Tech. My advisor is Alex Endert, a member of John Stasko's department.
I am working on a project to use Sententrees in a visualization of threaded replies in a forum (e.g. Reddit). My objective is to make it easier to navigate and summarize a large conversation.
I have a prototype created with a working Sententree, but the algorithm tends to choose irrelevant words with low content value, e.g. I, would, think, not, like, etc. My guess is that these words predominate because the text in a forum, unlike tweets, has a lot more structure and includes more prepositions, articles, conjunctions, etc. than the corpus used in your examples.
I'm looking at ways to address this, and before I do my own text preprocessing, I'd like to investigate the "termWeights" object that can be passed to SententreeModel() as part of the "options" parameter. It looks like this value is parsed and passed to SententreeModel.growSeq(), but from what I can tell, it is not actually implemented there yet.
Can you confirm that my understanding of this code is correct? If so, I may choose to implement weighting myself - can you give me a sense of what you envisioned for this feature and how you intended it to function?
Best,
Matt