Add FAQ.md to provide sample estimation guidelines #408

relyt0925 · 2024-11-25T04:47:00Z

The new FAQ.md file includes detailed explanations and examples on how to estimate the number of synthetic samples produced at various stages of the SDG training process. This addition aims to enhance user understanding of the sample generation methodology.

I believe it's an MVP to resolving: #307

RobotSail · 2024-12-06T23:07:56Z

docs/FAQ.md

+For each knowledge leaf node: the formula to estimate the number of produced synthetic samples in the training dataset is:
+
+```text
+(total cumulative size of knowledge documents / max document chunk size) * number of qna pairs in the knowledge file leaf node * 30 synthetic samples per qna pair


Does this factor in the amount of samples that will be filtered out?

@RobotSail this is an excellent point: it does not since that part is non detemenistic to a certain extent: but folks have been looking to be able to get some guidance on some ball park numbers/ answers to general questions around how the taxonomy is processed through SDG

I should call that out as a disclaimer and will do that

The new FAQ.md file includes detailed explanations and examples on how to estimate the number of synthetic samples produced at various stages of the SDG training process. This addition aims to enhance user understanding of the sample generation methodology. Signed-off-by: Tyler Lisowski <lisowski@us.ibm.com>

relyt0925 · 2024-12-19T21:10:13Z

These are a good initial set of FAQs that we have seen pop up with what I feel (based on my studies of the codebase) are the appropriate answers. More than happy to adjust anything that is inaccurate though based on expert opinion!

bbrowning

Thanks for expanding our docs! I have a couple of comments, but also a more general question. Do you think it would be better to document most of these things in the instructlab/instructlab repository instead of directly in SDG? The reason I ask is that the questions touch on taxonomy, SDG, training, and the intersection of all these. And the target users here would likely be using the ilab CLI to run these workflows as opposed to SDG directly?

docs/FAQ.md

relyt0925 · 2025-01-16T15:09:18Z

@bbrowning sorry for delay! Thank you so much for review! I agree with you that maybe there is a better home for this page. I will go ahead and get the comments addressed, we can make sure we are all comfortable with the content: and then we can think on where we want it's home to be!

Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning

I added in the requested changes (cleaning up links, fixing typo) as an additional commit and will go ahead and approve to get this merged.

relyt0925 · 2025-02-01T01:40:49Z

Thank you so much @bbrowning for the assistance! Sorry for the delay hope everything is wonderful with you!

bbrowning · 2025-02-01T01:51:07Z

All is well, and you're welcome! Sorry it took us so long to get this in, but thank you for the contribution!

mergify bot added documentation Improvements or additions to documentation ci-failure labels Nov 25, 2024

relyt0925 force-pushed the faq-size branch 5 times, most recently from b31ff34 to 1f568eb Compare November 25, 2024 05:06

mergify bot removed the ci-failure label Nov 25, 2024

relyt0925 force-pushed the faq-size branch from 1f568eb to 1487ec8 Compare November 25, 2024 15:26

aakankshaduggal requested a review from a team December 6, 2024 18:21

RobotSail reviewed Dec 6, 2024

View reviewed changes

relyt0925 force-pushed the faq-size branch from 1487ec8 to 324816b Compare December 19, 2024 19:57

mergify bot added the ci-failure label Dec 19, 2024

relyt0925 force-pushed the faq-size branch 3 times, most recently from b03fc41 to 9cbd2ac Compare December 19, 2024 21:03

relyt0925 force-pushed the faq-size branch from 9cbd2ac to fdd1ff6 Compare December 19, 2024 21:08

mergify bot removed the ci-failure label Dec 19, 2024

bbrowning reviewed Jan 9, 2025

View reviewed changes

docs/FAQ.md Outdated Show resolved Hide resolved

docs/FAQ.md Outdated Show resolved Hide resolved

nathan-weinberg requested review from RobotSail and bbrowning January 29, 2025 02:06

RobotSail approved these changes Jan 29, 2025

View reviewed changes

mergify bot added the one-approval label Jan 29, 2025

Clean up links, fix typo in FAQ.md

038411c

Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning approved these changes Jan 29, 2025

View reviewed changes

mergify bot merged commit 9d9b1a8 into instructlab:main Jan 29, 2025
7 checks passed

mergify bot removed the one-approval label Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add FAQ.md to provide sample estimation guidelines #408

Add FAQ.md to provide sample estimation guidelines #408

Uh oh!

relyt0925 commented Nov 25, 2024 •

edited

Loading

Uh oh!

RobotSail Dec 6, 2024

Uh oh!

relyt0925 Dec 19, 2024

Uh oh!

relyt0925 commented Dec 19, 2024

Uh oh!

bbrowning left a comment

Uh oh!

Uh oh!

Uh oh!

relyt0925 commented Jan 16, 2025

Uh oh!

bbrowning left a comment

Uh oh!

Uh oh!

relyt0925 commented Feb 1, 2025

Uh oh!

bbrowning commented Feb 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add FAQ.md to provide sample estimation guidelines #408

Add FAQ.md to provide sample estimation guidelines #408

Uh oh!

Conversation

relyt0925 commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RobotSail Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

relyt0925 Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

relyt0925 commented Dec 19, 2024

Uh oh!

bbrowning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

relyt0925 commented Jan 16, 2025

Uh oh!

bbrowning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

relyt0925 commented Feb 1, 2025

Uh oh!

bbrowning commented Feb 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

relyt0925 commented Nov 25, 2024 •

edited

Loading