Conversation
| trust_remote_code = kwargs.get("trust_remote_code", False) | ||
| index = cls._build_index(config, trust_remote_code=trust_remote_code) |
There was a problem hiding this comment.
pass down to dataset call
| split=self.dataset_split, | ||
| dummy=self.use_dummy_dataset, | ||
| revision=dataset_revision, | ||
| trust_remote_code=trust_remote_code, |
There was a problem hiding this comment.
Here is load_dataset
| tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq") | ||
| retriever = RagRetriever.from_pretrained( | ||
| "facebook/rag-sequence-nq", index_name="exact", use_dummy_dataset=True, dataset_revision="b24a417" | ||
| "facebook/rag-sequence-nq", index_name="exact", use_dummy_dataset=True, dataset_revision="b24a417", trust_remote_code=True, |
There was a problem hiding this comment.
I need to update other tests, but I would like to hear your opinion on the modeling code changes.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| EXPECTED_OUTPUT_TEXT_1 = "\"She's My Kind of Girl" | ||
| EXPECTED_OUTPUT_TEXT_2 = "\"She's My Kind of Love" |
There was a problem hiding this comment.
Not sure why we have this short expected outputs.
It's already not matching back to 2024/04 and maybe even earlier.
|
@Cyrilvallez ready for a review. For the changes in the tests, there is something for @itazap to check. |
| # PR #31938 cause the output being changed from `june 22, 2018` to `june 22 , 2018`. | ||
| # Need @itazap to take a look | ||
| # TODO: itazap |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Allright, LGTM in general! Let's just wait for @itazap to confirm before merging!
|
The additional 'space' after the comma in the test TLDR: LGTM :) |
|
Thank you @itazap for double checking 👍 |
| "wget", | ||
| "-O", | ||
| f"{cls.index_path}", | ||
| "https://huggingface.co/datasets/hf-internal-testing/wiki_dpr_dummy/resolve/main/index", |
There was a problem hiding this comment.
maybe rename the file "index.faiss" ? and optionally include what kind of index it is, here I assume "flat_index.faiss" ?
There was a problem hiding this comment.
I will rename and maybe give the steps of how i produce this dummy dataset for testing purpose.
What does this PR do?
datasetsintroducetrust_remote_codeat some point (probably 2024/09), but RAG's modeling code isn't handling this, and we getThis PR makes necessary changes for users could at least specify this argument in
from_pretrainedthat would pass to the datasets call.