Add support for "deeper" search of optional property.#5
Conversation
| maxDocumentLength int64 | ||
| url string | ||
| maxRedirect int | ||
| maxTokenDeep int |
There was a problem hiding this comment.
maxTokenDepth and accordingly in other places
| doc.Preview.Name = scraper.Url.Host | ||
| // set default icon to web root if <link rel="icon" href="/favicon.ico"> not found | ||
| doc.Preview.Icon = fmt.Sprintf("%s://%s%s", scraper.Url.Scheme, scraper.Url.Host, "/favicon.ico") | ||
| deepCounter := 0 |
fd2240b to
176dfae
Compare
| if scraper.Options.MaxTokenDepth == 0 { | ||
| return nil | ||
| } | ||
| if ogType || depth >= scraper.Options.MaxTokenDepth { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
looking at this code I was thinking that maybe scrapet.Options.MaxTokenDepth == 0 is redundant, as depth >= scraper.Options.MaxTokenDepth will handle it anyway on the first iteration right? So we can have just second if
and another thing which hit me while iterating over this part is that it'll read much better with a little bit different naming, something like if gotOgType or if hasOgType.
I guess ogImage from above could be changed in the same vein to gotOgImage or hasOgImage?
There was a problem hiding this comment.
I guess ogImage from above could be changed in the same vein to gotOgImage or hasOgImage
In origin lib, as you can see - we have var ogImage bool. I decide to use same "naming".
btw. Will fix naming in both places.
looking at this code I was thinking that maybe
scrapet.Options.MaxTokenDepth == 0is redundant, as depth >= scraper.Options.MaxTokenDepth will handle it anyway on the first iteration right?
You are almost right.
But from BC perspective - I'm trying to have same as soon as possible logic that will behave with default value - same as old code.
The point is that in feature - most probably we will have additional logic/checks after first if statement.
There was a problem hiding this comment.
ohh, that bright future which will come sometime and will force us to change the code! 🙃
For some ULRs was found that we can't get for strange reason the `og:type` data. One of this ULRs - was youtube links. Was detected that in YouTube they keep metadata in body (and not in head as other normal services). And because previously the criteria for breaking loop of procession of tokens was "we have Title + description + ogImage and we passed head" - we were not able to process all other optional meta after that we pass head. Now we are able to control how much tokens we can process before breaking loop (or if we found required optional fields already)
176dfae to
f0a1002
Compare
For some ULRs was found that we can't get for strange reason the
og:typedata.One of this ULRs - was youtube links.
Was detected that in YouTube they keep metadata in body (and not in head as other normal services).
And because previously the criteria for breaking loop of procession of tokens was "we have Title + description + ogImage and we passed head" - we were not able to process all other optional meta after that we pass head.
Now we are able to control how much tokens we can process before breaking loop (or if we found required optional fields already)