Dor convensions by db295 · Pull Request #1 · db295/ILNewsDiff

db295 · 2021-07-17T17:52:57Z

No description provided.

ronfri48

Ping

ronfri48 · 2021-07-19T19:45:28Z

main.py

+    parsers = [parser_class(LOCAL_TZ) for parser_class in PARSER_CLASSES]
+    for parser in parsers:
+        logging.info(f"Parsing {parser.get_source()}")
+        parser.parse()


dont u need - try except ?

ronfri48 · 2021-07-19T19:46:10Z

parsers/parser_utils.py

    else:
        article_dict['abstract'] = article['description']
    od = collections.OrderedDict(sorted(article_dict.items()))
+    # The bug of the key??


I think we should fix the bug in the feature extractors and not here.
I have already fixed it in my branch

ronfri48 · 2021-07-19T19:46:34Z

process_data/csv_data_provider.py

+
+
+class CsvDataProvider:
+    def __init__(self, data_files=r"../csvs"):


rename param

ronfri48 · 2021-07-19T19:47:15Z

process_data/process_data.py

+    dt = CsvDataProvider(data_files)
+
+    # TODO: setup? - prepare tables/cols
+    cols = itertools.chain.from_iterable([extractor.get_cols() for extractor in FEATURE_EXTRACTORS])


How are we going to handle dups?

ronfri48 · 2021-07-19T19:48:17Z

process_data/process_data.py

+
+    # Extract Features
+    for _id, article in dt.articles.iterrows():
+        article_versions = dt.versions[(dt.versions["article_id"] == article["article_id"]) &


y dont u use the primary key "id" from each table ?

ronfri48 · 2021-07-19T19:48:43Z

process_data/process_data.py

+    print(list(cols))
+
+    # Extract Features
+    for _id, article in dt.articles.iterrows():


change _id to _ ?

ronfri48 · 2021-07-19T19:48:52Z

process_data/process_data.py

+        article_versions = dt.versions[(dt.versions["article_id"] == article["article_id"]) &
+                                       (dt.versions["article_source"] == article["article_source"])]
+
+        for __id, single_version in article_versions.iterrows():


change __id to _ ?

ronfri48 · 2021-07-19T19:49:31Z

process_data/process_data.py

+        for __id, single_version in article_versions.iterrows():
+            past_versions = article_versions[article_versions["version"] < single_version["version"]]
+            for feature_extractor in FEATURE_EXTRACTORS:
+                feature_extractor.extract(single_version, past_versions, article)


we have to insert it into a DataFrame, dont we ?

ronfri48 · 2021-07-19T19:49:47Z

requirements.txt

 beautifulsoup4==4.9.3
-flake8==3.8.4
+flake8==3.8.4
+pandas


db29 added 11 commits July 6, 2021 23:23

Used some linter

3719cdc

Remove the second gitignore

820ae24

Moved logging into a file

6847258

Moved parsers to the folder togther

d2d95f9

Move img genrating code into a folder togther

34c3c44

Comment about where might be the id bug

b49302a

Process data initial commit

ff98a64

Get database file as an argument

17ff3bb

Process data methods in dataProvider

605e45c

Feature Extractors start

a880f8e

Transfered from sqlite to csvs

49360c5

db295 requested a review from ronfri48 July 17, 2021 17:52

db295 assigned ronfri48 Jul 17, 2021

ronfri48 requested changes Jul 19, 2021

View reviewed changes

ronfri48 assigned db295 and unassigned ronfri48 Jul 19, 2021

Working better with pandas

3766558

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dor convensions#1

Dor convensions#1
db295 wants to merge 12 commits intomasterfrom
dor-convensions

db295 commented Jul 17, 2021

Uh oh!

ronfri48 left a comment

Uh oh!

ronfri48 Jul 19, 2021

Uh oh!

ronfri48 Jul 19, 2021

Uh oh!

ronfri48 Jul 19, 2021

Uh oh!

ronfri48 Jul 19, 2021

Uh oh!

ronfri48 Jul 19, 2021

Uh oh!

ronfri48 Jul 19, 2021

Uh oh!

ronfri48 Jul 19, 2021

Uh oh!

ronfri48 Jul 19, 2021

Uh oh!

ronfri48 Jul 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		class CsvDataProvider:
		def __init__(self, data_files=r"../csvs"):

Conversation

db295 commented Jul 17, 2021

Uh oh!

ronfri48 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants