-
Notifications
You must be signed in to change notification settings - Fork 10
Closed
Description
While trying to use a Topics model of repositories, I got confused with different number of tokens in Topics and BOW models that are published: topics has 2,015,336 tokens, while bow has 999,424.
import logging
from sourced.ml.models import BOW, Topics
from modelforge.backends import create_backend
from modelforge.index import GitIndex
logging.basicConfig(level=logging.INFO)
git_index = GitIndex(
log_level=logging.INFO,
index_repo="https://github.com/src-d/models",
cache="~/.source{d}/models")
backend = create_backend(git_index=git_index)
topics = Topics(log_level=logging.INFO).load(backend=backend)
bow = BOW(log_level=logging.INFO).load(backend=backend)
assert topics.matrix.shape[1] == bow.matrix.shape[
1], "Topics has %s tokens, BOW has %s" % (
topics.matrix.shape[1], bow.matrix.shape[1])Could anybody please help me understand, whether it should be the same number in order for these models to be used together? Or is BOW model is not per-repository?
Thanks in advance!
Metadata
Metadata
Assignees
Labels
No labels