Skip to content

Comments

fix: just adding stuff from developer.py to synopsis developer#182

Merged
michaelneale merged 2 commits intomainfrom
add_fetch
Oct 23, 2024
Merged

fix: just adding stuff from developer.py to synopsis developer#182
michaelneale merged 2 commits intomainfrom
add_fetch

Conversation

@michaelneale
Copy link
Collaborator

This was stuff that didn't make it over yet

@michaelneale michaelneale requested a review from baxen October 22, 2024 23:37
@michaelneale michaelneale changed the title bug: just adding stuff from developer.py to synopsis developer fix: just adding stuff from developer.py to synopsis developer Oct 22, 2024
tmp_file.write(result)
tmp_text_file_path = tmp_file.name.replace(".html", ".txt")
plain_text = re.sub(
r"<head.*?>.*?</head>|<script.*?>.*?</script>|<style.*?>.*?</style>|<[^>]+>",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:nit: i think we should consider adding the html2text dependency (it's cheap and avoids the zalgo regex of doom)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is GPL (v3) so a no go (already looked at that)

return path

@tool
def fetch_web_content(self, url: str) -> str:
Copy link
Contributor

@lamchau lamchau Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be worth adding @cache though i'm not quite sure that plays well with the named temp file

edit: nevermind just noticed the temp file stays so i think memoizing would be great for refetching the same content. that being said, maybe we should add a clean up for old files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think temp files that is implicit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelneale the named file is created each time this function is called so it's not cached at all

with tempfile.NamedTemporaryFile(delete=False, mode="w", suffix=f"_{friendly_name}.html") as tmp_file:

if we add @cache we'd memorize the function call in memory so it wouldn't need to hit disk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the LLM gets the file back, not the content, so it won't call the fetch each time as it knows in its context that it has the file with the content

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, that makes sense

@michaelneale michaelneale merged commit e19006c into main Oct 23, 2024
@michaelneale michaelneale deleted the add_fetch branch October 23, 2024 01:32
@lamchau
Copy link
Contributor

lamchau commented Oct 23, 2024 via email

@michaelneale
Copy link
Collaborator Author

@lamchau yes! #184 - can do it that way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants