Add support for "deeper" search of optional property. by nightbloos · Pull Request #5 · nightbloos/goscraper

nightbloos · 2021-01-15T12:26:08Z

For some ULRs was found that we can't get for strange reason the og:type data.
One of this ULRs - was youtube links.
Was detected that in YouTube they keep metadata in body (and not in head as other normal services).
And because previously the criteria for breaking loop of procession of tokens was "we have Title + description + ogImage and we passed head" - we were not able to process all other optional meta after that we pass head.

Now we are able to control how much tokens we can process before breaking loop (or if we found required optional fields already)

vhbit · 2021-01-15T13:04:51Z

 	maxDocumentLength int64
 	url               string
 	maxRedirect       int
+	maxTokenDeep      int


maxTokenDepth and accordingly in other places

vhbit · 2021-01-15T13:05:25Z

 	doc.Preview.Name = scraper.Url.Host
 	// set default icon to web root if <link rel="icon" href="/favicon.ico"> not found
 	doc.Preview.Icon = fmt.Sprintf("%s://%s%s", scraper.Url.Scheme, scraper.Url.Host, "/favicon.ico")
+	deepCounter := 0


simple depth should be enough

vhbit · 2021-01-15T15:31:01Z

+			if scraper.Options.MaxTokenDepth == 0 {
+				return nil
+			}
+			if ogType || depth >= scraper.Options.MaxTokenDepth {
+				return nil
+			}


looking at this code I was thinking that maybe scrapet.Options.MaxTokenDepth == 0 is redundant, as depth >= scraper.Options.MaxTokenDepth will handle it anyway on the first iteration right? So we can have just second if

and another thing which hit me while iterating over this part is that it'll read much better with a little bit different naming, something like if gotOgType or if hasOgType.

I guess ogImage from above could be changed in the same vein to gotOgImage or hasOgImage?

I guess ogImage from above could be changed in the same vein to gotOgImage or hasOgImage

In origin lib, as you can see - we have var ogImage bool. I decide to use same "naming".
btw. Will fix naming in both places.

looking at this code I was thinking that maybe scrapet.Options.MaxTokenDepth == 0 is redundant, as depth >= scraper.Options.MaxTokenDepth will handle it anyway on the first iteration right?

You are almost right.
But from BC perspective - I'm trying to have same as soon as possible logic that will behave with default value - same as old code.
The point is that in feature - most probably we will have additional logic/checks after first if statement.

ohh, that bright future which will come sometime and will force us to change the code! 🙃

For some ULRs was found that we can't get for strange reason the `og:type` data. One of this ULRs - was youtube links. Was detected that in YouTube they keep metadata in body (and not in head as other normal services). And because previously the criteria for breaking loop of procession of tokens was "we have Title + description + ogImage and we passed head" - we were not able to process all other optional meta after that we pass head. Now we are able to control how much tokens we can process before breaking loop (or if we found required optional fields already)

vhbit reviewed Jan 15, 2021

View reviewed changes

nightbloos force-pushed the handle-optional-fields-with-max-tag-deep branch from fd2240b to 176dfae Compare January 15, 2021 14:27

vhbit reviewed Jan 15, 2021

View reviewed changes

nightbloos force-pushed the handle-optional-fields-with-max-tag-deep branch from 176dfae to f0a1002 Compare January 15, 2021 15:43

vhbit approved these changes Jan 15, 2021

View reviewed changes

nightbloos merged commit 44a43d8 into master Jan 18, 2021

nightbloos deleted the handle-optional-fields-with-max-tag-deep branch January 18, 2021 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for "deeper" search of optional property.#5

Add support for "deeper" search of optional property.#5
nightbloos merged 1 commit intomasterfrom
handle-optional-fields-with-max-tag-deep

nightbloos commented Jan 15, 2021

Uh oh!

vhbit Jan 15, 2021

Uh oh!

vhbit Jan 15, 2021

Uh oh!

vhbit Jan 15, 2021

Uh oh!

nightbloos Jan 15, 2021

Uh oh!

vhbit Jan 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nightbloos commented Jan 15, 2021

Uh oh!

vhbit Jan 15, 2021

Choose a reason for hiding this comment

Uh oh!

vhbit Jan 15, 2021

Choose a reason for hiding this comment

Uh oh!

vhbit Jan 15, 2021

Choose a reason for hiding this comment

Uh oh!

nightbloos Jan 15, 2021

Choose a reason for hiding this comment

Uh oh!

vhbit Jan 15, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants