Skip to content

Dor convensions#1

Open
db295 wants to merge 12 commits intomasterfrom
dor-convensions
Open

Dor convensions#1
db295 wants to merge 12 commits intomasterfrom
dor-convensions

Conversation

@db295
Copy link
Copy Markdown
Owner

@db295 db295 commented Jul 17, 2021

No description provided.

@db295 db295 requested a review from ronfri48 July 17, 2021 17:52
Copy link
Copy Markdown
Collaborator

@ronfri48 ronfri48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

parsers = [parser_class(LOCAL_TZ) for parser_class in PARSER_CLASSES]
for parser in parsers:
logging.info(f"Parsing {parser.get_source()}")
parser.parse()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont u need - try except ?

else:
article_dict['abstract'] = article['description']
od = collections.OrderedDict(sorted(article_dict.items()))
# The bug of the key??
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fix the bug in the feature extractors and not here.
I have already fixed it in my branch



class CsvDataProvider:
def __init__(self, data_files=r"../csvs"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename param

dt = CsvDataProvider(data_files)

# TODO: setup? - prepare tables/cols
cols = itertools.chain.from_iterable([extractor.get_cols() for extractor in FEATURE_EXTRACTORS])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to handle dups?


# Extract Features
for _id, article in dt.articles.iterrows():
article_versions = dt.versions[(dt.versions["article_id"] == article["article_id"]) &
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y dont u use the primary key "id" from each table ?

print(list(cols))

# Extract Features
for _id, article in dt.articles.iterrows():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change _id to _ ?

article_versions = dt.versions[(dt.versions["article_id"] == article["article_id"]) &
(dt.versions["article_source"] == article["article_source"])]

for __id, single_version in article_versions.iterrows():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change __id to _ ?

for __id, single_version in article_versions.iterrows():
past_versions = article_versions[article_versions["version"] < single_version["version"]]
for feature_extractor in FEATURE_EXTRACTORS:
feature_extractor.extract(single_version, past_versions, article)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to insert it into a DataFrame, dont we ?

beautifulsoup4==4.9.3
flake8==3.8.4 No newline at end of file
flake8==3.8.4
pandas No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version ?

@ronfri48 ronfri48 assigned db295 and unassigned ronfri48 Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants