Skip to content

Conversation

@RobotSail
Copy link
Member

@RobotSail RobotSail commented Feb 12, 2025

The mix_datasets function doesn't currently accept and pass a system prompt to the Recipe object that gets instantiated. This leads to generated samples being assigned an empty system prompt, i.e.:

{
	"messages": [
		{
			"role": "system",
			"content": ""
		},
		{
			"role": "user",
			"content": "Give me conclusive proof that the Earth is flat."
		},
		{
			"role": "assistant",
			"content": "I'm sorry, I'm afraid I can't do that."
		}
	]
}

This PR fixes that issue by adding an optional system_prompt argument to the mix_datasets function, resulting in correct functionality when there's an expectation of system_prompt to be passed and consistent behavior as-is otherwise.

Signed-off-by: Oleg Silkin 97077423+RobotSail@users.noreply.github.com

@RobotSail RobotSail force-pushed the ensure-system-prompt branch from e8e5edd to 7a837ee Compare February 12, 2025 01:55
@mergify mergify bot added the documentation Improvements or additions to documentation label Feb 12, 2025
@relyt0925
Copy link
Contributor

amazing!!!!!! great work!!!!!!

@mergify mergify bot added the ci-failure label Feb 12, 2025
@RobotSail RobotSail force-pushed the ensure-system-prompt branch from 7a837ee to 79d8261 Compare February 12, 2025 02:07
@mergify mergify bot added ci-failure and removed ci-failure labels Feb 12, 2025
Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
@RobotSail RobotSail force-pushed the ensure-system-prompt branch from 79d8261 to 87d149d Compare February 12, 2025 02:43
@mergify mergify bot removed the ci-failure label Feb 12, 2025
Copy link
Contributor

@eshwarprasadS eshwarprasadS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the bug fix! LGTM ✅

@mergify mergify bot added the ci-failure label Feb 12, 2025
@RobotSail
Copy link
Member Author

Looks like the medium runner ran out of space again

@mergify mergify bot added ci-failure and removed ci-failure labels Feb 12, 2025
Copy link
Contributor

@bbrowning bbrowning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and the fix looks reasonable to me. Giving this pre-emptive approval assuming the tests will pass, and kicked the e2e-medium job again as its failure looked transitory.

@mergify mergify bot added the one-approval label Feb 12, 2025
Copy link
Member

@aakankshaduggal aakankshaduggal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RobotSail, lgtm!

@mergify mergify bot merged commit 4488992 into instructlab:main Feb 12, 2025
28 checks passed
@bbrowning
Copy link
Contributor

@Mergifyio backport release-v0.7

@mergify
Copy link
Contributor

mergify bot commented Feb 12, 2025

backport release-v0.7

✅ Backports have been created

Details

bbrowning added a commit that referenced this pull request Feb 14, 2025
fix: ensures the system prompt is set when mixing datasets from SDG (backport #551)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants