fix: just adding stuff from developer.py to synopsis developer by michaelneale · Pull Request #182 · block/goose

michaelneale · 2024-10-22T23:37:18Z

This was stuff that didn't make it over yet

…ssed

lamchau · 2024-10-22T23:56:04Z

src/goose/synopsis/toolkit.py

+                tmp_file.write(result)
+                tmp_text_file_path = tmp_file.name.replace(".html", ".txt")
+                plain_text = re.sub(
+                    r"<head.*?>.*?</head>|<script.*?>.*?</script>|<style.*?>.*?</style>|<[^>]+>",


:nit: i think we should consider adding the html2text dependency (it's cheap and avoids the zalgo regex of doom)

it is GPL (v3) so a no go (already looked at that)

lamchau · 2024-10-23T00:00:31Z

src/goose/synopsis/toolkit.py

        return path
+
+    @tool
+    def fetch_web_content(self, url: str) -> str:


this might be worth adding @cache though i'm not quite sure that plays well with the named temp file

edit: nevermind just noticed the temp file stays so i think memoizing would be great for refetching the same content. that being said, maybe we should add a clean up for old files?

I think temp files that is implicit?

@michaelneale the named file is created each time this function is called so it's not cached at all

goose/src/goose/toolkit/developer.py

Line 97 in e19006c

with tempfile.NamedTemporaryFile(delete=False, mode="w", suffix=f"_{friendly_name}.html") as tmp_file:

if we add @cache we'd memorize the function call in memory so it wouldn't need to hit disk

the LLM gets the file back, not the content, so it won't call the fetch each time as it knows in its context that it has the file with the content

ah right, that makes sense

lamchau · 2024-10-23T01:41:43Z

ahh, didn't realize about licensing. is there an automation we can add to audit/check these things? i'm not very well versed in the legalese to know

…

On Tue, Oct 22, 2024 at 18:32 Michael Neale ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/goose/synopsis/toolkit.py <#182 (comment)>: > + Args: + url (str): url of the site to visit. + Returns: + (dict): A dictionary with two keys: + - 'html_file_path' (str): Path to a html file which has the content of the page. It will be very large so use rg to search it or head in chunks. Will contain meta data and links and markup. + - 'text_file_path' (str): Path to a plain text file which has the some of the content of the page. It will be large so use rg to search it or head in chunks. If content isn't there, try the html variant. + """ # noqa + friendly_name = re.sub(r"[^a-zA-Z0-9]", "_", url)[:50] # Limit length to prevent filenames from being too long + + try: + result = httpx.get(url, follow_redirects=True).text + with tempfile.NamedTemporaryFile(delete=False, mode="w", suffix=f"_{friendly_name}.html") as tmp_file: + tmp_file.write(result) + tmp_text_file_path = tmp_file.name.replace(".html", ".txt") + plain_text = re.sub( + r"<head.*?>.*?</head>|<script.*?>.*?</script>|<style.*?>.*?</style>|<[^>]+>", it is GPL (v3) so a no go (already looked at that) — Reply to this email directly, view it on GitHub <#182 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFPCKETXZ3HIS6ESE26NH3Z434DNAVCNFSM6AAAAABQNSKXZSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGOBWGY4TKOJVGU> . You are receiving this because you commented.Message ID: ***@***.***>

michaelneale · 2024-10-23T07:56:00Z

@lamchau yes! #184 - can do it that way

…#182)

just adding stuff from developer.py to synopsis developer that was mi…

5625207

…ssed

michaelneale requested a review from baxen October 22, 2024 23:37

michaelneale changed the title ~~bug: just adding stuff from developer.py to synopsis developer~~ fix: just adding stuff from developer.py to synopsis developer Oct 22, 2024

format

34f8b56

lamchau reviewed Oct 22, 2024

View reviewed changes

lamchau reviewed Oct 23, 2024

View reviewed changes

lamchau approved these changes Oct 23, 2024

View reviewed changes

michaelneale merged commit e19006c into main Oct 23, 2024

michaelneale deleted the add_fetch branch October 23, 2024 01:32

ahau-square pushed a commit that referenced this pull request May 2, 2025

fix: just adding stuff from developer.py to synopsis developer (#182)

bbb1f68

cbruyndoncx pushed a commit to cbruyndoncx/goose that referenced this pull request Jul 20, 2025

fix: just adding stuff from developer.py to synopsis developer (block…

dee650a

…#182)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: just adding stuff from developer.py to synopsis developer#182

fix: just adding stuff from developer.py to synopsis developer#182
michaelneale merged 2 commits intomainfrom
add_fetch

michaelneale commented Oct 22, 2024

Uh oh!

lamchau Oct 22, 2024

Uh oh!

michaelneale Oct 23, 2024

Uh oh!

lamchau Oct 23, 2024 •

edited

Loading

Uh oh!

michaelneale Oct 23, 2024

Uh oh!

lamchau Oct 23, 2024

Uh oh!

michaelneale Oct 23, 2024

Uh oh!

lamchau Oct 23, 2024

Uh oh!

lamchau commented Oct 23, 2024 via email

Uh oh!

michaelneale commented Oct 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

michaelneale commented Oct 22, 2024

Uh oh!

lamchau Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

michaelneale Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

lamchau Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelneale Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

lamchau Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

michaelneale Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

lamchau Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

lamchau commented Oct 23, 2024 via email

Uh oh!

michaelneale commented Oct 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lamchau Oct 23, 2024 •

edited

Loading