Skip to content

Comments

fix: String to int conversion in scraping#474

Merged
edgarrmondragon merged 5 commits intoMeltanoLabs:mainfrom
rluvaton:patch-1
Oct 14, 2025
Merged

fix: String to int conversion in scraping#474
edgarrmondragon merged 5 commits intoMeltanoLabs:mainfrom
rluvaton:patch-1

Conversation

@rluvaton
Copy link
Contributor

Hey, I'm not familiar with the codebase at all but had this error, so creating pr.

if someone could take it it will be great!

Fix for:

2025-10-13T12:47:24.737839Z [info] An unhandled error occurred while syncing 'dependents'
2025-10-13T12:47:24.738171Z [info] An unhandled error occurred while syncing 'repositories'
2025-10-13T12:47:24.743866Z [info] invalid literal for int() with base 10: '1,808'
2025-10-13T12:47:24.743946Z [info] Traceback (most recent call last):
2025-10-13T12:47:24.744036Z [info]   File "tap-github", line 12, in <module>
2025-10-13T12:47:24.744184Z [info]     sys.exit(cli())
2025-10-13T12:47:24.744372Z [info]   File "site-packages/click/core.py", line 1462, in __call__
2025-10-13T12:47:24.744444Z [info]     return self.main(*args, **kwargs)
2025-10-13T12:47:24.744582Z [info]   File "site-packages/click/core.py", line 1383, in main
2025-10-13T12:47:24.744647Z [info]     rv = self.invoke(ctx)
2025-10-13T12:47:24.744781Z [info]   File "site-packages/singer_sdk/plugin_base.py", line 150, in invoke
2025-10-13T12:47:24.744844Z [info]     return super().invoke(ctx)
2025-10-13T12:47:24.744969Z [info]   File "site-packages/click/core.py", line 1246, in invoke
2025-10-13T12:47:24.745031Z [info]     return ctx.invoke(self.callback, **ctx.params)
2025-10-13T12:47:24.745153Z [info]   File "site-packages/click/core.py", line 814, in invoke
2025-10-13T12:47:24.745214Z [info]     return callback(*args, **kwargs)
2025-10-13T12:47:24.745335Z [info]   File "site-packages/singer_sdk/tap_base.py", line 554, in invoke
2025-10-13T12:47:24.745396Z [info]     tap.sync_all()
2025-10-13T12:47:24.745456Z [info]   File "site-packages/singer_sdk/tap_base.py", line 495, in sync_all
2025-10-13T12:47:24.745517Z [info]     stream.sync()
2025-10-13T12:47:24.745577Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1354, in sync
2025-10-13T12:47:24.745638Z [info]     for _ in self._sync_records(context=context):
2025-10-13T12:47:24.745757Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1251, in _sync_records
2025-10-13T12:47:24.745819Z [info]     self._process_record(
2025-10-13T12:47:24.745881Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1180, in _process_record
2025-10-13T12:47:24.745941Z [info]     self._sync_children(copy.copy(context))
2025-10-13T12:47:24.746001Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1376, in _sync_children
2025-10-13T12:47:24.746061Z [info]     child_stream.sync(context=child_context)
2025-10-13T12:47:24.746120Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1354, in sync
2025-10-13T12:47:24.746180Z [info]     for _ in self._sync_records(context=context):
2025-10-13T12:47:24.746299Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1229, in _sync_records
2025-10-13T12:47:24.746360Z [info]     for idx, record_result in enumerate(self.get_records(current_context)):
2025-10-13T12:47:24.746480Z [info]   File "site-packages/singer_sdk/streams/rest.py", line 631, in get_records
2025-10-13T12:47:24.746540Z [info]     yield from self.request_records(context)
2025-10-13T12:47:24.746600Z [info]   File "site-packages/singer_sdk/streams/rest.py", line 466, in request_records
2025-10-13T12:47:24.746659Z [info]     first_record = next(records)
2025-10-13T12:47:24.746781Z [info]   File "site-packages/tap_github/repository_streams.py", line 3189, in parse_response
2025-10-13T12:47:24.746842Z [info]     yield from scrape_dependents(response, self.logger)
2025-10-13T12:47:24.746900Z [info]   File "site-packages/tap_github/scraping.py", line 42, in scrape_dependents
2025-10-13T12:47:24.746960Z [info]     yield from _scrape_dependents(f"https://{base_url}/{link}", logger)
2025-10-13T12:47:24.747019Z [info]   File "site-packages/tap_github/scraping.py", line 61, in _scrape_dependents
2025-10-13T12:47:24.747079Z [info]     int(s.next_sibling.strip())
2025-10-13T12:47:24.747140Z [info] ValueError: invalid literal for int() with base 10: '1,808'

Fix for:
```
2025-10-13T12:47:24.737839Z [info] An unhandled error occurred while syncing 'dependents'
2025-10-13T12:47:24.738171Z [info] An unhandled error occurred while syncing 'repositories'
2025-10-13T12:47:24.743866Z [info] invalid literal for int() with base 10: '1,808'
2025-10-13T12:47:24.743946Z [info] Traceback (most recent call last):
2025-10-13T12:47:24.744036Z [info]   File "tap-github", line 12, in <module>
2025-10-13T12:47:24.744184Z [info]     sys.exit(cli())
2025-10-13T12:47:24.744372Z [info]   File "site-packages/click/core.py", line 1462, in __call__
2025-10-13T12:47:24.744444Z [info]     return self.main(*args, **kwargs)
2025-10-13T12:47:24.744582Z [info]   File "site-packages/click/core.py", line 1383, in main
2025-10-13T12:47:24.744647Z [info]     rv = self.invoke(ctx)
2025-10-13T12:47:24.744781Z [info]   File "site-packages/singer_sdk/plugin_base.py", line 150, in invoke
2025-10-13T12:47:24.744844Z [info]     return super().invoke(ctx)
2025-10-13T12:47:24.744969Z [info]   File "site-packages/click/core.py", line 1246, in invoke
2025-10-13T12:47:24.745031Z [info]     return ctx.invoke(self.callback, **ctx.params)
2025-10-13T12:47:24.745153Z [info]   File "site-packages/click/core.py", line 814, in invoke
2025-10-13T12:47:24.745214Z [info]     return callback(*args, **kwargs)
2025-10-13T12:47:24.745335Z [info]   File "site-packages/singer_sdk/tap_base.py", line 554, in invoke
2025-10-13T12:47:24.745396Z [info]     tap.sync_all()
2025-10-13T12:47:24.745456Z [info]   File "site-packages/singer_sdk/tap_base.py", line 495, in sync_all
2025-10-13T12:47:24.745517Z [info]     stream.sync()
2025-10-13T12:47:24.745577Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1354, in sync
2025-10-13T12:47:24.745638Z [info]     for _ in self._sync_records(context=context):
2025-10-13T12:47:24.745757Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1251, in _sync_records
2025-10-13T12:47:24.745819Z [info]     self._process_record(
2025-10-13T12:47:24.745881Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1180, in _process_record
2025-10-13T12:47:24.745941Z [info]     self._sync_children(copy.copy(context))
2025-10-13T12:47:24.746001Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1376, in _sync_children
2025-10-13T12:47:24.746061Z [info]     child_stream.sync(context=child_context)
2025-10-13T12:47:24.746120Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1354, in sync
2025-10-13T12:47:24.746180Z [info]     for _ in self._sync_records(context=context):
2025-10-13T12:47:24.746299Z [info]   File "site-packages/singer_sdk/streams/core.py", line 1229, in _sync_records
2025-10-13T12:47:24.746360Z [info]     for idx, record_result in enumerate(self.get_records(current_context)):
2025-10-13T12:47:24.746480Z [info]   File "site-packages/singer_sdk/streams/rest.py", line 631, in get_records
2025-10-13T12:47:24.746540Z [info]     yield from self.request_records(context)
2025-10-13T12:47:24.746600Z [info]   File "site-packages/singer_sdk/streams/rest.py", line 466, in request_records
2025-10-13T12:47:24.746659Z [info]     first_record = next(records)
2025-10-13T12:47:24.746781Z [info]   File "site-packages/tap_github/repository_streams.py", line 3189, in parse_response
2025-10-13T12:47:24.746842Z [info]     yield from scrape_dependents(response, self.logger)
2025-10-13T12:47:24.746900Z [info]   File "site-packages/tap_github/scraping.py", line 42, in scrape_dependents
2025-10-13T12:47:24.746960Z [info]     yield from _scrape_dependents(f"https://{base_url}/{link}", logger)
2025-10-13T12:47:24.747019Z [info]   File "site-packages/tap_github/scraping.py", line 61, in _scrape_dependents
2025-10-13T12:47:24.747079Z [info]     int(s.next_sibling.strip())
2025-10-13T12:47:24.747140Z [info] ValueError: invalid literal for int() with base 10: '1,808'
```
@rluvaton rluvaton requested a review from a team as a code owner October 13, 2025 13:11
Copy link
Member

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@edgarrmondragon edgarrmondragon changed the title fix string conversion in scraping fix: String to int conversion in scraping Oct 14, 2025
@sonarqubecloud
Copy link

@edgarrmondragon edgarrmondragon added this pull request to the merge queue Oct 14, 2025
Merged via the queue into MeltanoLabs:main with commit 89476e6 Oct 14, 2025
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants