-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Summary
By crawling a GitHub organisation the rate limit will exceed at the really beginning of the crawling since the organisatzion that is going to be crawled has a lot of repositories.
Type of Issue
It is a :
- bug
- request
- question regarding the documentation
Motivation
I am trying to crawl a github organisation (https://github.com/python) but unfortunately at the really early stage of the crawling the github-crawler-starter-2.0.1-exec.jar is throwing this error:
2022-04-27 11:35:00.382 ERROR 23962 --- [ main] ication$$EnhancerBySpringCGLIB$$1b1fa732 : problem while running github crawler
com.fasterxml.jackson.module.kotlin.MissingKotlinParameterException: Instantiation of [simple type, class com.societegenerale.githubcrawler.model.SearchResult] value failed for JSON property items due to missing (therefore NULL) value for creator parameter items which is a non-nullable type
at [Source: (String)"{"message":"API rate limit exceeded for user ID [USERID].","documentation_url":"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"}"; line: 1, column: 158] (through reference chain: com.societegenerale.githubcrawler.model.SearchResult["items"])
[...]
or
com.fasterxml.jackson.module.kotlin.MissingKotlinParameterException: Instantiation of [simple type, class com.societegenerale.githubcrawler.model.SearchResult] value failed for JSON property items due to missing (therefore NULL) value for creator parameter items which is a non-nullable type
at [Source: (String)"{
"documentation_url": "https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#secondary-rate-limits",
"message": "You have exceeded a secondary rate limit. Please wait a few minutes before you try again."
}
"; line: 4, column: 1] (through reference chain: com.societegenerale.githubcrawler.model.SearchResult["items"])
at com.fasterxml.jackson.module.kotlin.KotlinValueInstantiator.createFromObjectWith(KotlinValueInstantiator.kt:116) ~[jackson-module-kotlin-2.12.6.jar!/:2.12.6]
at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:202) ~[jackson-databind-2.12.6.jar!/:2.12.6]
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:520) ~[jackson-databind-2.12.6.jar!/:2.12.6]
[...]
These two errors are of course because of the rate limit, so that there is not the expected result inside the received arguments.
Unfortunately the application will terminate right here.
Current Behavior
No matter if running the code crawl-in-parallel is true or false, the rate limit always gets exceeded.
Expected Behavior
A default parameter which respects the GitHub API where the application will wait every 10 seconds between each query should be available to avoid getting banned. The user itself should also be able to change the amount of time wait between each query in the config file.
I hope this is still somehow possible to do in the current release. If I missed it, could you please let me know what I have to do, to respect the GitHub API waiting time?
Please do not hesitate to contact me if you need more information.