Conversation
...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py
Outdated
Show resolved
Hide resolved
...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py
Outdated
Show resolved
Hide resolved
...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py
Outdated
Show resolved
Hide resolved
|
API change check APIView has identified API level changes in this PR and created following API reviews. |
…tors/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <40341266+needuv@users.noreply.github.com>
…tors/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <40341266+needuv@users.noreply.github.com>
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_common/rai_service.py
Outdated
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Show resolved
Hide resolved
...ation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_content_safety.py
Show resolved
Hide resolved
...valuation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_retrieval/_retrieval.py
Show resolved
Hide resolved
...ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py
Show resolved
Hide resolved
| result = await super()._do_eval(eval_input) | ||
| real_result = {} | ||
| real_result[self._output_prefix + "_label"] = ( | ||
| result[EvaluationMetrics.GROUNDEDNESS + "_score"] >= self._passing_score |
There was a problem hiding this comment.
@MilesHolland, why do we not output the binary output as part of the AACS API? Is it because it is not part of the service call?
| azure_ai_project, | ||
| **kwargs, | ||
| ): | ||
| self._passing_score = 3 # TODO update once the binarization PR is merged |
There was a problem hiding this comment.
@posaninagendra, in order to reach parity with AACS groundedness, any ungrounded content detected will make its AACS binary output ungroundedDetected to be True.
- isn't
ungroundedDetectedpart of the service call output? - if not (meaning SDK only receives
ungroundedPercentageas the output), then to match the logic for ungroudnedDetected, this self._passing_score should be 5, right?
There was a problem hiding this comment.
For reference, here is the sample output on the AACS doc:
{ "ungroundedDetected": true, "ungroundedPercentage": 1, "ungroundedDetails": [ { "text": "12/hour.", "offset": { "utf8": 0, "utf16": 0, "codePoint": 0 }, "length": { "utf8": 8, "utf16": 8, "codePoint": 8 }, "reason": "None. The premise mentions a pay of \"10/hour\" but does not mention \"12/hour.\" It's neutral. " } ] }
* Adding service based groundedness * groundedness pro eval * remove groundedness and fix unit tests * run black * change evaluate label * Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <40341266+needuv@users.noreply.github.com> * Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_service_groundedness/_service_groundedness.py Co-authored-by: Neehar Duvvuri <40341266+needuv@users.noreply.github.com> * comments and CL * re record tests * black and pylint * comments * nits * analysis * re cast * more mypy appeasement --------- Co-authored-by: Ankit Singhal <anksing@microsoft.com> Co-authored-by: Neehar Duvvuri <40341266+needuv@users.noreply.github.com>
Add new service based groundedness evaluator, which uses the rai service to determine groundedness.
This has a few extra adaptations compared to a normal rai service evaluator, including:
evaluatefunction to rename the evaluator's output label to a passing score when aggregated into a metric.