RoPE loses precision for Llama / Gemma + Gemma logits.float() by danielhanchen · Pull Request #29285 · huggingface/transformers

danielhanchen · 2024-02-26T06:21:58Z

When I was implementing Gemma for Unsloth, I noticed when one uses bfloat16, the RoPE embeddings get autocast to bfloat16, when we require it to be in float32. This causes the positional encodings to lose precision dramatically especially for very large context lengths.

Below I pasted the image on how HF for now handles RoPE. You can see the loss in precision when using bfloat16. I manually autocasted it to float32 in Unsloth, and you can see the expected positional encodings.

I couldn't find why Unsloth's error could not match that of HF's original Gemma implementation. On float16, this issue does not occur, with HF and Unsloth's training loss curve being equivalent:

However when I switched over to bfloat16, HF and Unsloth's training losses diverge at the start, and Unsloth always retains a lower loss as training goes on:

If you look at the losses more carefully (same seed), you can see the differences more closely.

The culprit I found was

inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
position_ids_expanded = position_ids[:, None, :].float()
freqs = (inv_freq_expanded @ position_ids_expanded).transpose(1, 2)
emb = torch.cat((freqs, freqs), dim=-1)

where if one uses torch.autocast(), freqs = (inv_freq_expanded @ position_ids_expanded).transpose(1, 2) gets done in bfloat16 and not float32. I propose we turn off autocast to force float32. Ie:

with torch.autocast(device_type=position_ids_expanded.device.type, enabled=False):
    freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)

This ensures torch.autocast to turn off automatic downcasting to float16 / bfloat16 for the RoPE embeddings. My proposed fix shows the following loss curve:

Also, in Gemma, a 1 liner was missed :) logits = logits.float() must be placed to upcast the logits to float32. Although it should be done automatically in torch.autocast, it's best to keep the convention as done in llama, mistral and other models. Gemma's implementation seems to maybe have forgotten this 1 line :)

Llama - Force float32 since bfloat16 loses precision on long contexts

Fix RoPE and logits.float()

danielhanchen · 2024-02-26T06:22:57Z

Forgot to add I'm not certain if this will break CUDAGraphs for faster inference - hopefully not

ArthurZucker

I'll have to check the compile test and everything, but we usually hate these kind of changes 🫣 the bug is real, I'll see if I can find a good alternative as this is pretty much only for training! Great catch 🤗

danielhanchen · 2024-02-26T08:07:24Z

Sadly unsure if it's just for training :(( For inference I don't remember up to which context length, bfloat16 won't be an issue. I think it was up to 4096. However, bfloat16 loses precision even for inference sadly after 4096 context lengths. 8192 definitely - bfloat16 essentially thinks the last 4 tokens are all position 8192 ie [8192, 8192, 8192, 8192], whilst the correct float32 is [8188, 8189, 8190, 8191].

ArthurZucker

LGTM let's do no grad and autocast, I'll test compile once you have both!

ArthurZucker

LGTM, before merging I'll ping @pacman100, @younesbelkada and @fxmarty as this is pretty important! Feel free to comment if you are against these changes!

ArthurZucker · 2024-02-27T09:23:28Z

                self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64, device=x.device).float() / self.dim)
            )
-
+        


Suggested change

HuggingFaceDocBuilderDev · 2024-02-27T09:26:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2024-02-27T11:26:15Z

@danielhanchen .sin() and .cos() should ideally happen in FP32 as well. Have you noticed any performance changes if you force them to happen in FP32?

danielhanchen · 2024-02-27T15:15:03Z

@gante Actually interesting point - I can see torch.autocast does arc sin and sinh etc in float32, but it doesnt list sin itself - I'll have to check if .sin() is done in float32 or float16

danielhanchen · 2024-02-28T06:24:03Z

@ArthurZucker I checked everything and it's working! You guys can double check if anything is wrong. You can push the commit whenever. Thank you! :)

ArthurZucker

LGTM! Thanks a mile for this.
Let's make sure you run make style and make fixup for the last CIs

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

This reverts commit b860a22.

kwen2501 · 2024-07-14T13:41:05Z

FYI -- as reported in pytorch/pytorch#128394, this change seems to break torch.export of the Llama 2 model. There seems to be two causes of the break:

torch.autocast does not support meta device, as suggested in its documentation:

device_type (str, required) – Device type to use. Possible values are: ‘cuda’, ‘cpu’, ‘xpu’ and ‘hpu’. The type is the same as the type attribute of a torch.device. Thus, you may obtain the device type of a tensor using Tensor.device.type.

Creating a model in meta mode seems to be quite useful for quite a few applications, in particular when materializing the whole model on a single device is infeasible.

torch.export today does not work well with torch.autocast, as torch.amp.autocast_mode._enter_autocast is not a valid ATen op. See detailed discussion here.

Wondering if there is a way to reenable export of Llama 2? Thanks!

Cc: @angelayi @lessw2020

gante · 2024-07-14T15:48:27Z

Hi @kwen2501 👋 Happy to iterate with you to get a working solution for torch.export 🤗

I'm not experienced with torch.autocast, so I have a question for you. Our current code has

        with torch.autocast(device_type=device_type, enabled=False):
            freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)

Isn't .float() and the context manager disabling autocast redundant for autocasting purposes? If so, given that freqs would always be a float32 tensor, removing the context manager should do the trick

lessw2020 · 2024-07-14T18:24:29Z

Hi @gante and @kwen2501 and @angelayi,
@gante - you are correct that the autocast here is redundant both since it is disabled and the relevant variables are force upcast to float anyway, further making the autocast redundant.

(to your question, short summary is autocast will dynamically lower the precision to fp16 where it feels it can do so without losing precision. In this case it appears that for long context the lower precision could be negative and hence the reason to ensure the freqs calc is always done in fp32).

To try and speed things up in resolving, I've made a PR that removes the device type calcs that were being fed to the disabled autocast and of course removed the autocast itself.

When I tested this on Friday, removing the autocast did enable torch.export to function as expected so this should resolve this for us.

But appreciate any feedback on the PR in case I have missed something!
(adding @danielhanchen to the PR reviewers as well since he isolated the original core issue)

lessw2020 · 2024-07-14T18:35:56Z

Note - PR is failing CI due to this - @gante, would you be able to help review this error?

File "/root/transformers/utils/check_copies.py", line 856, in check_copies
    raise Exception(
Exception: Found the following copy inconsistencies:
- src/transformers/models/mistral/modeling_mistral.py: copy does not match models.llama.modeling_llama.LlamaRotaryEmbedding.forward at line 95
- src/transformers/models/olmo/modeling_olmo.py: copy does not match models.llama.modeling_llama.LlamaRotaryEmbedding at line 97
Run `make fix-copies` or `python utils/check_copies.py --fix_and_overwrite` to fix them.

danielhanchen · 2024-07-14T20:28:01Z

I'll comment here as well - the primary reason why the context manager was placed was because mixed precision training would cause torch.amp.autocast to randomly ignore the .float() upcast, and so the solution was to force the autocast to get disabled in that region - if newer Torch versions don't downcast, then the autocast can be safely removed.

kwen2501 · 2024-07-15T01:53:42Z

Thanks @gante @lessw2020 @danielhanchen !
It seems the autocast disabler was intended to work in a bigger autocast enabling region (added by a user).

I wonder if the user can divide the autocast enabling region into two to skip this disabled region? Perhaps we can a note to the LlamaRotaryEmbedding module to warn that applying autocast would cause precision loss?

Longer term, maybe it would be worth adding a torch.no_autocast API -- the current way of enabled=False is a bit confusing.

lessw2020 · 2024-07-15T02:04:15Z

Thanks @danielhanchen for the insight to the reasoning behind the autocast w/ enabled = False, here (I suspected there was likely more to the story).

I did want to add that we had the inverse of this situation with fsdp - for fsdp mixed precision, if you explicitly set a model dtype to something esp .bfloat16, and then pass to FSDP, we respect that. That behaviour however, then generated a big commotion from HF about why wouldn't we secretly auto override this for mixed precision and keep fp32 weights. (because we respect what the user has directed).
Anyway, it sems the behaviour is 180 degree opposite here - if a user sets .float(), then that should be respected as a direct dtype command...not just something to be arbitrarily overridden without a blocking disabled autocast.
Regardless, agree with @kwen2501 that there needs to be a direct way for the user to spec a dtype and not have that secretly overridden, which would then be consistent with mixed precision in FSDP (i.e. you spec it, you got it, the end).

ArthurZucker · 2024-07-15T12:36:20Z

that there needs to be a direct way for the user to spec a dtype and not have that secretly overridden, which would then be consistent with mixed precision in FSDP (i.e. you spec it, you got it, the end).

do you mean for ROPE?

gante · 2024-07-16T10:18:44Z

#31959 (comment)

"What I'm reading then is that we SHOULDN'T merge this PR until we confirm that pytorch doesn't change the type of an explicit .float() cast when autocast is active. Otherwise, we will get a regression (thank you @danielhanchen for confirming 🤗 )"

lessw2020 · 2024-07-18T17:02:47Z

Hi all - we have a much better solution now by making some changes to how PP does the model tracing via PT export and with that, it now handles the autocast issue directly so no changes needed here in the transformer code.
Thus, I'm going to close my earlier PR as the proper fix is within export/PP directly.
The PR for reference is here in PyTorch:
pytorch/pytorch#130998

Thanks @danielhanchen @gante @ArthurZucker @kwen2501 for the details and convo on this.
I'm glad we are able to have a more fundamental fix for this!

ArthurZucker · 2024-07-22T13:09:34Z

Thanks for updating, great to know that overall this will be seamlessly fixed for everyone! 🤗

### Details: - *Sin/Cos table generation must run in f32 otherwise it has accuracy issue* - Reference : huggingface/transformers#29285 ### Tickets: - *CVS-146672*

commit 67fd2eb6d83435b195ef56004d7d9f9c2a728502 Merge: 5f09ab51c0 9432b3d2a5 Author: Ujjayant Kadian <118752727+ujjayant-kadian@users.noreply.github.com> Date: Tue Aug 6 13:07:36 2024 +0100 Merge branch 'master' into uk/changing-sub-byte-i4-element-order commit 9432b3d2a577bc27e8008d85002ce57c4b0e3159 Author: Min, Byungil <byungil.min@intel.com> Date: Tue Aug 6 19:20:02 2024 +0900 [GPU] Bugfix reorder for byfx format (#25782) + Reorder returns OOR error while handling byfx from a fused permute parent ### Details: - *item1* - *...* ### Tickets: - CVS-147330 --------- Signed-off-by: Min, Byung-il <byungil.min@intel.com> commit 606d909ab8ec130fd7c6a9d2d56a839978903a2f Author: Bogdan Pereanu <bogdan.pereanu@intel.com> Date: Tue Aug 6 13:12:32 2024 +0300 [NPU] Disable MCL in case of UD28 (#25903) ### Details: - *The UD28 Windows driver version doesn't support as expected the MutableCommandList feature - just disable this feature from the plugin in case this driver is used* ### Tickets: - *EISW-133845* commit b6447980be06caf6bb6c1592eee4eb6de094218c Author: Anastasiia Pnevskaia <anastasiia.pnevskaia@intel.com> Date: Tue Aug 6 10:26:04 2024 +0200 [DOCS] Corrected build guides in docs. (#25922) ### Details: - Corrected build guides in docs. ### Tickets: - commit 265dfad8ebcdae2b17611d833ec8da0f0ddc9bd2 Author: Przemyslaw Wysocki <przemyslaw.wysocki@intel.com> Date: Tue Aug 6 10:19:41 2024 +0200 Change index precision from `i64` to `i32` in MaxPool14 to MaxPool8 downgrade transformation (#25514) ### Tickets: - CVS-146277 commit 9eeb7a18d5ae039d1b406cab405ad2083dc5680c Author: Maciej Smyk <maciejx.smyk@intel.com> Date: Tue Aug 6 09:38:15 2024 +0200 [DOCS] Dependencies and Building for OpenVINO GenAI article for master (#25908) Adding information on the OpenVINO GenAI Dependencies and ref-link to the GenAI building in user docs. commit cbf4035c257042aec180102d434287c27d9cd2f6 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Aug 6 11:16:55 2024 +0400 Bump hendrikmuhs/ccache-action from 1.2.13 to 1.2.14 (#25917) Bumps [hendrikmuhs/ccache-action](https://github.com/hendrikmuhs/ccache-action) from 1.2.13 to 1.2.14. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/hendrikmuhs/ccache-action/releases">hendrikmuhs/ccache-action's releases</a>.</em></p> <blockquote> <h2>v1.2.14</h2> <h2>What's Changed</h2> <ul> <li>Add sccache to PATH after installation by <a href="https://github.com/kendalharland"><code>@kendalharland</code></a> in <a href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/204">hendrikmuhs/ccache-action#204</a></li> <li>Make ccache-action respect environment variables by <a href="https://github.com/TrentHouliston"><code>@TrentHouliston</code></a> in <a href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/217">hendrikmuhs/ccache-action#217</a></li> <li>updates</li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/kendalharland"><code>@kendalharland</code></a> made their first contribution in <a href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/204">hendrikmuhs/ccache-action#204</a></li> <li><a href="https://github.com/cclauss"><code>@cclauss</code></a> made their first contribution in <a href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/213">hendrikmuhs/ccache-action#213</a></li> <li><a href="https://github.com/TrentHouliston"><code>@TrentHouliston</code></a> made their first contribution in <a href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/217">hendrikmuhs/ccache-action#217</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/hendrikmuhs/ccache-action/compare/v1...v1.2.14">https://github.com/hendrikmuhs/ccache-action/compare/v1...v1.2.14</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/ed74d11c0b343532753ecead8a951bb09bb34bc9"><code>ed74d11</code></a> Bump <code>@types/node</code> from 22.0.0 to 22.1.0 (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/222">#222</a>)</li> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/a92dd99d2cf20a1db8898b00bb383b234fb1cf15"><code>a92dd99</code></a> Bump <code>@types/node</code> from 20.14.11 to 22.0.0 (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/220">#220</a>)</li> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/aa7d29411285c29f578109e54b7a8d8155c2fbb3"><code>aa7d294</code></a> Bump typescript from 5.5.3 to 5.5.4 (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/218">#218</a>)</li> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/6f0874030891bf49d844fff92b862568f093dabe"><code>6f08740</code></a> Make ccache-action respect environment variables (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/217">#217</a>)</li> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/ed979992cda44142d976add1d5a7d6f39f7e8b67"><code>ed97999</code></a> Bump <code>@types/node</code> from 20.14.10 to 20.14.11 (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/216">#216</a>)</li> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/ca1e5062f3378412bbfeb780d1ebe3c2a4913081"><code>ca1e506</code></a> Bump actions/checkout from 2 to 4 (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/214">#214</a>)</li> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/069136ab7ab2267ea6624fde73f80d7d472d323e"><code>069136a</code></a> Bump <code>@types/node</code> from 20.14.9 to 20.14.10 (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/212">#212</a>)</li> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/3cf745af56c860cc76c89ffd830efec6aef03b56"><code>3cf745a</code></a> Bump typescript from 5.5.2 to 5.5.3 (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/211">#211</a>)</li> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/9a0cc152966f2c3f3df86a6e0364da1608924006"><code>9a0cc15</code></a> Keep GitHub Actions up to date with GitHub's Dependabot (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/213">#213</a>)</li> <li><a href="https://github.com/hendrikmuhs/ccache-action/commit/b7c0e162a73e852cdd80bd368aa77e7801fce009"><code>b7c0e16</code></a> Bump <code>@types/node</code> from 20.14.8 to 20.14.9 (<a href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/210">#210</a>)</li> <li>Additional commits viewable in <a href="https://github.com/hendrikmuhs/ccache-action/compare/c92f40bee50034e84c763e33b317c77adaa81c92...ed74d11c0b343532753ecead8a951bb09bb34bc9">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=hendrikmuhs/ccache-action&package-manager=github_actions&previous-version=1.2.13&new-version=1.2.14)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit d9d5ace62609f909cdaac0ac8073d78a1f19607d Author: Vladimir Paramuzov <vladimir.paramuzov@intel.com> Date: Tue Aug 6 10:24:43 2024 +0400 [Transformations] Extend MoveEltwiseUpThroughData pass with per channel case (#24401) ### Details: - Added pass to swap Reshape/Squeeze/Unsqueeze -> Eltwise (per channel) commit 513c812fcf5049cab4084b7a09e862f7df357880 Author: Gorokhov Dmitriy <dmitry.gorokhov@intel.com> Date: Tue Aug 6 09:33:29 2024 +0400 [CPU] FullyConnected weights compression: mxfp4 (wei=f4e2m1, scales=f8e8m0) support (#25783) ### Details: - This PR extends FC weights compression support with mxfp4 (wei=f4e2m1, scales=f8e8m0) precision - ISA coverage: avx2, avx512 - oneDNN fork changes: https://github.com/openvinotoolkit/oneDNN/pull/258 ### Tickets: - [CVS-142986](https://jira.devtools.intel.com/browse/CVS-142986) ### Dependencies: oneDNN 3.5 migration: https://github.com/openvinotoolkit/openvino/pull/25153 commit 73e1b94625c277ad89d4a613eef889213a1b856e Author: Vladimir Paramuzov <vladimir.paramuzov@intel.com> Date: Tue Aug 6 09:21:09 2024 +0400 [GPU][TRANSFORMATIONS] Disable per pass validation in some cases (#25874) ### Details: - Disable per pass validation for GPU specific passes and mixed precision markup to improve model loading time commit 7fd8b2ed77d4b31cab9556742320a793506f7327 Author: Vladimir Paramuzov <vladimir.paramuzov@intel.com> Date: Tue Aug 6 09:16:49 2024 +0400 [GPU] Dynamic pipeline host opt (#25886) ### Details: - Reduce count of copies for layouts/shapes and other complex objects commit d604f1d8b2a60fa68b704c2a8f81e283c4aa2f0f Author: Michal Miotk <michal.miotk@intel.com> Date: Tue Aug 6 00:54:25 2024 +0200 fix for confused input with output in assert error message (#25915) ### Details: - short fix for message ### Tickets: - N/A commit f8d0e8c47c5be32b2e5e44e4449a337fcbc130fb Author: Andrew Kwangwoong Park <andrew.park@intel.com> Date: Tue Aug 6 02:52:42 2024 +0900 Revert "[GPU] Avoid crop buffer fusing when dynamic shape and squeeze/unsqueeze reshape mode" (#25895) ### Details: - This revert https://github.com/openvinotoolkit/openvino/pull/25700 - As support for Crop->Reshape(Squeeze/Unsqueeze modes) buffer optimization was added by https://github.com/openvinotoolkit/openvino/pull/25836 ### Tickets: - 146626 commit 5264c9995f3a41b642a3359155edb719243944a1 Author: Karol Blaszczak <karol.blaszczak@intel.com> Date: Mon Aug 5 18:41:37 2024 +0200 [DOCS] tiny article name changes (#25910) commit 3cf27441ff5cd497499bc37d92e55d901e88ca59 Author: Ilya Lavrenov <ilya.lavrenov@intel.com> Date: Mon Aug 5 19:47:38 2024 +0400 Removed GHA WA for older ONNX versions (#25912) ### Details: - Removed WA introduced here https://github.com/openvinotoolkit/openvino/pull/25234 because ONNX version is updated here https://github.com/openvinotoolkit/openvino/pull/24242 commit afb194f3747ed56ab524500842cb50281abe41a9 Author: Rinne <AsakusaRinne@gmail.com> Date: Mon Aug 5 22:33:17 2024 +0800 [JAX FE] Add translation for more operations. (#25292) ### Details: - *Add the translation for reduce_window_max, reduce_window_sum, rsqrt, reshape , squeeze, slice, broadcast_in_dim, copy, dot_general and transpose of JAX frontend* - *Add corresponding test* ### Tickets: - *CVS-145575* - *CVS-145583* - *CVS-145580* - *CVS-145574* - *CVS-145581* - *CVS-145579* - *CVS-145582* - *CVS-145573* - *CVS-145578* NOTE: this PR should be merged after #25290 --- @mvafin Could you please help to review this PR? cc @rkazants --------- Co-authored-by: Maxim Vafin <maxim.vafin@intel.com> Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com> commit c30a0bcf6ba4f1b75412c353cefe63f97f6ee33c Author: Georgy Krivoruchko <georgy.krivoruchko@intel.com> Date: Mon Aug 5 18:12:22 2024 +0400 [ONNX] Aligned behavior for ReduceProd-11,13,18 (#25875) ### Details: - Aligned behavior of ReduceProd operation ### Tickets: - 143347 commit 10a2e91d2502bc7bc5aa7c2fbcc5b845c7a00975 Author: Aleksandr Voron <aleksandr.voron@intel.com> Date: Mon Aug 5 15:54:19 2024 +0200 [CPU][ARM] Enable ACL MVN executor for `initAcrossChannels` option in NHWC layout (#25905) ### Details: - This configuration (initAcrossChannels is true and NHWC is used) was disabled for ACL executor to enable `yolo_v3_tiny`. The last check shows this restriction is not required anymore. ### Tickets: - *ticket-id* commit 12a5e5a505da2d793bed99efdfdc4bda42be9850 Author: Georgy Krivoruchko <georgy.krivoruchko@intel.com> Date: Mon Aug 5 17:06:03 2024 +0400 [ONNX] Switched to ONNX 1.16.0 (#24242) ### Details: - Switched to ONNX 1.16.0 - Removed WA for ONNX 1.15.0 - ONNXRuntime for tests 1.18.1 ### Tickets: - 136748, 138876 --------- Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com> commit 7eedf84ef21918e84f4488a582645a82d921e507 Author: Luo Cheng <cheng.luo@intel.com> Date: Mon Aug 5 21:05:08 2024 +0800 [CPU] Add score output for PagedAttention (#25594) ### Details: - *Add score output for PagedAttention* - *...* ### Tickets: - *[146969](https://jira.devtools.intel.com/browse/CVS-146969)* commit a6413b415ff8cc7cd9eb9cf3cfe96334bd1907e4 Author: Przemyslaw Wysocki <przemyslaw.wysocki@intel.com> Date: Mon Aug 5 15:00:01 2024 +0200 [PyOV] Replace `std::stringstream` with `std::fstream` in `import_model` (#25724) ### Details: - The current implementation breaks when the model size is > 2gb - `std::fstream` does not limit the model size - Tested in https://github.com/openvinotoolkit/openvino/blob/master/src/bindings/python/tests/test_runtime/test_compiled_model.py#L57 - The fix has been verified ### TODO: - Should we simulate > 2gb model case in tests? ### Tickets: - EISW-130771 commit e35acf91e9a953ee081d0bae355a7e848ef41b86 Author: Attila Csok <attila.csok@intel.com> Date: Mon Aug 5 15:45:52 2024 +0300 [intel-npu] Adding NPU_TURBO option to plugin (#25646) ### Details: - Adding npu_turbo option for intel-npu plugin - updating documentation with turbo and other missing properties Master backport of https://github.com/openvinotoolkit/openvino/pull/25603 ### Tickets: - [*ticket-id*](https://jira.devtools.intel.com/browse/CVS-147038) commit 64c5f67a5aa31b020d295c210b0345bdd74e4dbb Author: Ilya Lavrenov <ilya.lavrenov@intel.com> Date: Mon Aug 5 14:08:22 2024 +0400 Fixed compatibility with new version of 'wheel' (#25899) ### Details: - *item1* - *...* ### Tickets: - *ticket-id* commit c664ca7f288f59722d82e9bfbb994f0c7c1e232e Author: Xuejun Zhai <xuejun.zhai@intel.com> Date: Sun Aug 4 17:32:04 2024 +0800 Clean meta plugin tests from CPU/GPU plugin (#24477) ### Details: - Move BATCH related test out from CPU/GPU func test to BATCH func test - Move HETERO related test out from CPU/GPU func test to HETERO func test - *...* ### Tickets: - *ticket-id* --------- Signed-off-by: Zhai, Xuejun <xuejun.zhai@intel.com> Co-authored-by: Chen Peter <peter.chen@intel.com> commit 59a0f019913681287f82553a07e0b299404de821 Author: Peyara Nando <nandu45@outlook.com> Date: Sat Aug 3 05:45:30 2024 +0530 Implemented getOutputElementType (#25760) Implemented Method on c++ side. Updated typescript definitions. Created unit tests. For Issue [https://github.com/openvinotoolkit/openvino/issues/25406](https://github.com/openvinotoolkit/openvino/issues/25406) Resolved merge errors --------- Co-authored-by: Alicja Miloszewska <alicja.miloszewska@intel.com> commit 5f09ab51c00ed0d207bc02963783efe597dda5de Author: Kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 16:14:42 2024 +0100 Modified comments commit 0f1ad2b95de9d7985f8db93e99450bb490c260d0 Merge: 99523fc962 ae454eebbd Author: Kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 16:08:41 2024 +0100 Merge branch 'uk/changing-sub-byte-i4-element-order' of github.com:ujjayant-kadian/openvino into uk/changing-sub-byte-i4-element-order commit 99523fc9624738b9af5fdd1ca58aa301f44d49df Author: Kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 13:06:18 2024 +0100 Added a new pattern in pattern matcher [CPU] Avoid rounding to zero for Reduce node in quantized models (#25766) - *If the Reduce node has both input and output precision to be integers from the original model, then rounding to zero should be done before converting intermediate floating point value to integer.* - *However, if such integer precisions are resulted from quantization, then we should not do such rounding, in order to maintain accuracy.* - *Add corresponding test cases.* - *CVS-147352* Correct clang format issues Tried to resolve the segmentation fault Corrected clang format error Tried to correct segmentation fault Removed std::move Using std::move with much more caution commit ae454eebbdde2d2582cbe43e5a10e62a7ec61d50 Merge: 46b84b994e b2319a5bea Author: Ujjayant Kadian <118752727+ujjayant-kadian@users.noreply.github.com> Date: Fri Aug 2 16:04:40 2024 +0100 Merge branch 'openvinotoolkit:master' into uk/changing-sub-byte-i4-element-order commit d29948c758501bafe807ff0feeed8875574545a6 Author: Roman Kazantsev <roman.kazantsev@intel.com> Date: Fri Aug 2 19:02:31 2024 +0400 [TF FE][SDL] Fix performance inefficiencies (#25884) **Details:** Fix performance inefficiencies **Ticket:** 148599 Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com> commit 46b84b994e55306f89aa437fd2271b6164e548b1 Author: Kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 15:45:34 2024 +0100 Using std::move with much more caution commit cb2814d2ee5e832b7e8a0809f55185f305133cad Author: Kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 15:43:28 2024 +0100 Removed std::move commit ae13bed22c9a222611dce04075f3bbe6ac87091e Author: Kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 15:34:40 2024 +0100 Tried to correct segmentation fault commit a98775ad74bcdf2bcc58820f2373adcaf3d98dff Author: ujjayant-kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 14:31:54 2024 +0000 Corrected clang format error commit c2ba823ef43f6804b07781af3707220be184f541 Author: Kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 15:24:24 2024 +0100 Tried to resolve the segmentation fault commit a33afe422f0ff9f655dd9f660d35f441e148433e Author: Sergey Shlyapnikov <sergey.shlyapnikov@intel.com> Date: Fri Aug 2 18:17:31 2024 +0400 [GPU] Fix Crop->Reshape (Squeeze/Unsqueeze modes) buffer optimization (#25836) These changes fix a significant accuracy issue (reducing perplexity from 120 000 to 17) for Llama models with precalculated constant sin/cos values. However, there is still a problem with sin/cos representation in FP16 precision, which will be addressed in a separate PR. ### Details: - Fixed Crop->Reshape (Squeeze/Unsqueeze modes) buffer optimization - Update rope_ref kernel to support dynamic paddings for cos/sin inputs - Fix propagate_padding() function and update shape infer tests ### Tickets: - [CVS-148220](https://jira.devtools.intel.com/browse/CVS-148220), [CVS-146283](https://jira.devtools.intel.com/browse/CVS-146283) commit b2319a5bea85fd057d1e3ea102e83d8d6af6c6db Author: Alexandra Sidorova <alexandra.sidorova@intel.com> Date: Fri Aug 2 18:09:25 2024 +0400 [Snippets][CPU] Added Brgemm FP32 blocking support by dynamic K, N dimensions (#25745) ### Details: - *Added update support of `K` and `N` dimensions for Brgemm block in `BrgemmKernelExecutor::update_config`* ### Tickets: - *147852* ### Prerequisites: - [x] https://github.com/openvinotoolkit/openvino/pull/25378 commit b625fcbfbf95be80d1fe57f471a02b8fd31d94ef Author: Roman Kazantsev <roman.kazantsev@intel.com> Date: Fri Aug 2 17:44:16 2024 +0400 [TF FE] Extend UnsortedSegmentSum for ND indices (#25877) **Details:** This extension is needed for some customer model **Ticket:** 148750 --------- Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com> commit 64072f63e7afc66a3e7a49f2bc00d3ae0f695b02 Author: Maxim Vafin <maxim.vafin@intel.com> Date: Fri Aug 2 15:38:01 2024 +0200 [PT FE] Update GHA tests (#25868) ### Details: - *item1* - *...* ### Tickets: - *ticket-id* commit ab1e8dec8341f7ada47d959de895056ddb93ff52 Author: Karol Blaszczak <karol.blaszczak@intel.com> Date: Fri Aug 2 15:18:03 2024 +0200 [DOCS] ovms llm data master (#25880) commit a9f670a1073b3ce7660b3dac133bed4b45e518d5 Author: mei, yang <yang.mei@intel.com> Date: Fri Aug 2 20:55:52 2024 +0800 [CPU] Align cpu execution order before/after ResolveComplexInplaceConflicts() (#24937) ### Details: - *Align cpu execution order before/after ResolveComplexInplaceConflicts()* - *Keep order information of Results and Parameters when dump CPU graph to ov::Model* - *Let MemoryInput always execute first to avoid potential issue because it will update its sibling MemoryOutput memory after execution* ### Tickets: - *CVS-134638* - *CVS-148497* ### Description: - CPU execution order of some nodes may changes after https://github.com/openvinotoolkit/openvino/blob/2024.2.0.dev20240513/src/plugins/intel_cpu/src/graph.cpp#L285. Sometimes that may give ResolveComplexInplaceConflicts() incorrect execution order information. That may lead to ResolveComplexInplaceConflicts() get the wrong conclusion which edge memory should be shared. So this PR add SortTopologically() right before ResolveComplexInplaceConflicts() to let execution order not change much before/after ResolveComplexInplaceConflicts()* - *The node order of CPU graph topology is not stable. For example in below graph* ![image](https://github.com/openvinotoolkit/openvino/assets/37289649/ca14e697-6986-4c30-9c2a-86603cc4a106) *If Parameter0 is before than Parameter1 in graphNodes, in original SortTopologically(), it will first recurse node down from Parameter0. So in final sorted graphNodes, Parameter0 will be sorted after Parameter1. Then in second round of SortTopologically(), it will first recurse from Parameter1 and in final sorted graphNodes, Parameter0 will be sorted before Parameter0 again. This will make sometimes ReduceProd is executed before ScatterNDUpdate while sometimes ReduceProd is after ScatterNDUpdate. It will mislead ResolveComplexInplaceConflicts()* - *MemoryInput will update its sibling MemoryOutput memory after execution. To avoid memory changes during the execution of other nodes, always let MemoryInput execute first* commit 2e95269d14cfb7c865f2fd5e2329d6c9523469a4 Author: ujjayant-kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 12:35:17 2024 +0000 Correct clang format issues commit 63e9e38413e223e645029baf18359bf5df21b076 Merge: 6bc933a4dd ea6731f8a7 Author: Kadian <ujjayant.kadian@intel.com> Date: Fri Aug 2 13:07:13 2024 +0100 Merge branch 'uk/changing-sub-byte-i4-element-order' of github.com:ujjayant-kadian/openvino into uk/changing-sub-byte-i4-element-order commit da2a4e770a163af6419e0d9e46594e58dbc8ef64 Author: Aleksandr Voron <aleksandr.voron@intel.com> Date: Fri Aug 2 13:33:21 2024 +0200 [CPU][ARM] Added debug logs to ACL Interpolate executor (#25866) ### Details: - Added debug logs to ACL Interpolate executor to debug easier - Remove redundant check (since it duplicates the check https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_cpu/src/nodes/executors/acl/acl_interpolate.cpp#L135_L136) ### Tickets: - *ticket-id* commit 34d41aeb078eca2c0d55c556011bea1ba7729fdd Author: Maksim Doronin <maksim.doronin@intel.com> Date: Fri Aug 2 11:40:44 2024 +0100 Add folder parameter for reference blobs in SIT (#25651) ### Details: - Adding a new optional parameter for SIT to specify a directory with reference outputs. So instead of running NetsVal-CalcRef on CPU or TEMPLATE we can re-use predefined reference outputs. However, their names must comply with the existing name convention ### Tickets: - E-131878 commit 3c5a52e30f363cd25122ccd5b8bc081d717e8e03 Author: Damian Kurek <damian.kurek@intel.com> Date: Fri Aug 2 10:55:38 2024 +0200 [GPU] Use parallel sum reduction in MVN BFYX OPT kernel (#25840) Optimize MVN BFYX OPT kernel ### Details: - Use parallel sum reduction in order to improve efficiency ### Tickets: - 148585 commit ab613e115267feece039e409dea6fb8e10371746 Author: Maxim Vafin <maxim.vafin@intel.com> Date: Fri Aug 2 10:55:20 2024 +0200 [PT FE] Move sending telemetry to stage after conversion is done (#25855) ### Details: - *Previously telemetry was send every time `FrameworkNode` is created. Now we send it only when `FrameworkNode` exist in the model and only once per op type* ### Tickets: - *ticket-id* commit 7bc7fb0cb64d09fd258724f6e3c5935f162cd129 Author: Georgy Krivoruchko <georgy.krivoruchko@intel.com> Date: Fri Aug 2 14:24:47 2024 +0400 Changed dependency types-setuptools (#25872) ### Details: - Solution verification ### Tickets: - N/A commit a9c004798b56eb5e74502a44ef320ff5333d23dd Author: Wilson Seok <wilson.seok@intel.com> Date: Thu Aug 1 13:39:13 2024 -0700 [GPU] Add crop check in optimize check of buffer fusing (#25850) ### Details: - Add crop check in optimize check of buffer fusing to pass through simple dynamic shape crop case ### Tickets: - Follow up PR25737 commit 7b11ba4c1bec8f7e2063f1c43b0c10a314d724bd Author: Alexey Smirnov <alexey.smirnov@intel.com> Date: Thu Aug 1 20:32:47 2024 +0100 [NPUW] Introduce new passes to online partitioning (#25679) Config (internal/extended): ``` "NPU_COMPILATION_MODE_PARAMS" : "compute-layers-with-higher-precision=Sqrt,Power,ReduceMean,Add_RMSNorm", "NPU_USE_NPUW" : "YES", "NPUW_FOLD" : "YES", "NPUW_DCOFF_TYPE" : "f16", "NPUW_DCOFF_SCALE" : "YES", "NPUW_ONLINE_ISOLATE" : "P:DQMatMulGQ/compute,P:DQMatMulCW/compute,P:RMSNorm/compute", "NPUW_ONLINE_NOFOLD" : "compute" ``` Config (user/basic): ``` "NPU_COMPILATION_MODE_PARAMS" : "compute-layers-with-higher-precision=Sqrt,Power,ReduceMean,Add_RMSNorm", "NPU_USE_NPUW" : "YES", "NPUW_FOLD" : "YES", "NPUW_DCOFF_TYPE" : "f16", "NPUW_DCOFF_SCALE" : "YES", "NPUW_ONLINE_PIPELINE" : "COMPUTE" ``` --------- Co-authored-by: Dmitry Matveev <dmitry.matveev@intel.com> commit 3b4e747c8687d0e11501b63f3e425a335e8c9641 Author: Ilya Lavrenov <ilya.lavrenov@intel.com> Date: Thu Aug 1 21:23:30 2024 +0400 Allow to override CPACK_ARCHIVE_COMPONENT_INSTALL (#25867) ### Details: - To override by external cmake options - Useful for GenAI to create a single archive commit 605b13fbee58b48cf27cf0e64ac154148dfd8b39 Author: Alicja Miloszewska <alicja.miloszewska@intel.com> Date: Thu Aug 1 08:12:30 2024 -0700 [PyOV] Add more ov.Model constructors (#25635) ### Details: - Accept sinks as output ports in addition to generic nodes and op class instances in `ov.Model` ctors - Add test Added support for: - `Model(results: List[openvino._pyopenvino.op.Result], sinks: List[ov::Output<ov::Node>], parameters: List[openvino._pyopenvino.op.Parameter], name: str = '')` - `Model(results: List[ov::Output<ov::Node>], sinks: List[ov::Output<ov::Node>], parameters: List[openvino._pyopenvino.op.Parameter], name: str = '')` ### Tickets: - *[CVS-131037](https://jira.devtools.intel.com/browse/CVS-131037)* --------- Co-authored-by: Anastasia Kuporosova <anastasia.kuporosova@intel.com> commit 754f48a0d96d0451fd7c7cf4a68019dfafd20c5e Author: Pawel Raasz <pawel.raasz@intel.com> Date: Thu Aug 1 17:10:38 2024 +0200 [core] Unify axis normalization/validation utils (#25614) ### Details: - Split function for smaller simper utils, responsible for validation or normalization or more complex doing both. - Unify the functions parameters order - Remove redundant check of rank - Produce smaller binary size - Fix Coverity issue `Improper use of negative value`. ### Tickets: - CVS-136544 commit 2e399de62eed4ab212e36032380e6972921b5cd9 Author: Alexandra Sidorova <alexandra.sidorova@intel.com> Date: Thu Aug 1 18:26:25 2024 +0400 [Tests] Commented out debug prints in input range generation (#25848) ### Details: - *Commented out debug prints in input range generation in test infrastructure to avoid large outputs during test executions:* ![image](https://github.com/user-attachments/assets/8e19df2c-2bd2-4327-91cd-da439d0da544) ### Tickets: - *N/A* commit 2f8c265b6cb9b078757b71b0a81d6b95bfd4bcb8 Author: Maciej Smyk <maciejx.smyk@intel.com> Date: Thu Aug 1 16:14:30 2024 +0200 [DOCS] CODEOWNER update for master (#25863) JIRA: 148360 Update of documentation paths for codeowner groups. commit 81e7b21e6bec757398fdb4074e085799ee5c795c Author: Andrei Kashchikhin <andrey.kashchikhin@intel.com> Date: Thu Aug 1 15:06:59 2024 +0100 [CI] [GHA] Get VCPKG version from repository (#25862) ### Tickets: - *132496* commit 504873014ccc800005504841d9819ccf04abc312 Author: Prakash <qxprakash@gmail.com> Date: Thu Aug 1 17:57:11 2024 +0530 [OV JS] Add vision-background-removal sample notebook (#25714) ### Details: - added vision-background-removal notebook - added comments and formatting ### Things Remaining: - adding the sample in the readme - adding the weights download once the unet model ir gets uploaded @vishniakov-nikolai @almilosz please give feedback With Regards Prakash commit fb4e2d3e832d488f94012cc5e4cde1a6d4c4bf44 Author: Vishniakov Nikolai <nikolai.vishniakov@intel.com> Date: Thu Aug 1 14:26:31 2024 +0200 [OVJS] Update openvino-node binaries to 2024.3 in master (#25823) ### Details: - update openvino-node package version to 2024.3.0 in master branch commit 7e16d63b042371655f75869890a770aa9c01e703 Author: Andrei Kashchikhin <andrey.kashchikhin@intel.com> Date: Thu Aug 1 12:55:11 2024 +0100 [CI] [GHA] Gather statistics on newly added Ubuntu workflows (#25856) New workflows were introduced in https://github.com/openvinotoolkit/openvino/pull/25234 but were not added to the workflow that gathers statistics. ### Tickets: - *144917* commit 18e775ff8d7c56e0ba3bfbdb6c94494eddb2d4ce Author: Aleksandr Voron <aleksandr.voron@intel.com> Date: Thu Aug 1 13:35:38 2024 +0200 [CPU][ARM] MLAS transpose executor deprioritised (#25854) ### Details: - The latest performance reports on Ampere show ACL transpose executor provides better performance rather than MLAS Transpose executor (details are in the ticket). Therefore, MLAS Transpose executor priority has been decreased. - Redundant check has been deleted in ACL Transpose executor. ### Tickets: - CVS-148625 commit a0062533f09fc2362004cb7c179ca88d6a4549cd Author: Ilya Lavrenov <ilya.lavrenov@intel.com> Date: Thu Aug 1 16:59:24 2024 +0400 Added version for OpenVINO developer package local version (#25859) ### Details: - To allow to select developer package of specific version - Required for GenAI build as part of OpenVINO extra modules commit eda2f7f40598cce2f970ea635454546844a801ba Author: Zhang Yi <yi3.zhang@intel.com> Date: Thu Aug 1 19:27:09 2024 +0800 [Core][CPU]markup rope's sin/cos generation with f32 (#25662) ### Details: - *Sin/Cos table generation must run in f32 otherwise it has accuracy issue* - Reference : https://github.com/huggingface/transformers/pull/29285 ### Tickets: - *CVS-146672* commit 45b4737e706d0b06f5dd5c4e513fc181ddf4c3ba Author: Karol Blaszczak <karol.blaszczak@intel.com> Date: Thu Aug 1 13:06:49 2024 +0200 [DOCS] supportedmodels table fix 24.3 (#25860) port: https://github.com/openvinotoolkit/openvino/pull/25818 commit 546daf2959928457116fcb807337a511da37c8d9 Author: M <mortezaho.1376@gmail.com> Date: Thu Aug 1 03:26:00 2024 -0700 [GSOC][CPU][ARM] Add NEON implementation for attention softmax (#25616) ### Details: - This PR aims to add NEON implementation for attention softmax commit 7617b37f047b29c67e5010bc54b40ed6de858d76 Author: Karol Blaszczak <karol.blaszczak@intel.com> Date: Thu Aug 1 11:51:53 2024 +0200 [DOCS] add benchmark results for phi (#25838) (#25851) port: https://github.com/openvinotoolkit/openvino/pull/25838 Co-authored-by: Michael Frank Hansen <michael.f.hansen@intel.com> commit 508795f44e301d5f848a212dbfc1257d8552a09b Author: Prakash <qxprakash@gmail.com> Date: Thu Aug 1 15:03:25 2024 +0530 [OV JS] Add vision-background-removal sample script (#25698) ### Details: - added script code and added the unet model weights inside the directory -- ```/openvino/samples/js/node/assets/models``` @vishniakov-nikolai can you please upload it - focused on the implementaion and formatting - output images for now will be saved in the same directory , I will change it later as per your feedback - @vishniakov-nikolai I am a bit doubtful about my naming convention so let me know if I need to modify any names ### Things remaining - [x] Proper comments remaining - [x] Bit of refactoring - [x] Readme Please provide Feedback @vishniakov-nikolai @almilosz With Regards Prakash commit dc3eaf0a2b816fc32a59e79455bce33ec54f535c Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Aug 1 07:26:11 2024 +0000 Bump actions/upload-artifact from 4.3.3 to 4.3.4 (#25846) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.3.3 to 4.3.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's releases</a>.</em></p> <blockquote> <h2>v4.3.4</h2> <h2>What's Changed</h2> <ul> <li>Update <code>@actions/artifact</code> version, bump dependencies by <a href="https://github.com/robherley"><code>@robherley</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/584">actions/upload-artifact#584</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v4.3.3...v4.3.4">https://github.com/actions/upload-artifact/compare/v4.3.3...v4.3.4</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/upload-artifact/commit/0b2256b8c012f0828dc542b3febcab082c67f72b"><code>0b2256b</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/584">#584</a> from actions/robherley/bump-pkgs</li> <li><a href="https://github.com/actions/upload-artifact/commit/488dcefb9bf01619ac19bad29c5c5409a1e4dd4c"><code>488dcef</code></a> licensed cache</li> <li><a href="https://github.com/actions/upload-artifact/commit/04c51f57662651dd3333286989e2db1111c0fd07"><code>04c51f5</code></a> ncc</li> <li><a href="https://github.com/actions/upload-artifact/commit/32a9e276a8f8ac18b4b2dce8213ed340ed4e5ed8"><code>32a9e27</code></a> bump <code>@actions/artifact</code> and npm audit</li> <li><a href="https://github.com/actions/upload-artifact/commit/552bf3722c16e81001aea7db72d8cedf64eb5f68"><code>552bf37</code></a> new version</li> <li><a href="https://github.com/actions/upload-artifact/commit/79616d2ded92999fceefea2ca2e4bdf6101fa919"><code>79616d2</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/565">#565</a> from actions/eggyhead/use-artifact-v2.1.6</li> <li>See full diff in <a href="https://github.com/actions/upload-artifact/compare/v4.3.3...0b2256b8c012f0828dc542b3febcab082c67f72b">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-artifact&package-manager=github_actions&previous-version=4.3.3&new-version=4.3.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit fa949478e149f17cce514ebd0d019e8766ef249d Author: Karol Blaszczak <karol.blaszczak@intel.com> Date: Thu Aug 1 09:04:13 2024 +0200 [DOCS] rn fixes and model table (#25835) commit ba681ed72d7e30b2fe94e1cfc5a950a0bcf9bb54 Author: Wilson Seok <wilson.seok@intel.com> Date: Wed Jul 31 20:53:18 2024 -0700 [GPU] Rollback whlie-loop structure for 2nd stage of optimize all crops (#25737) ### Details: - Rollback while-loop structure for 2nd stage of optimize all crops because it has regression for reshape case which has padding. ### Tickets: - 146653 commit 8cfd586e6128055b600e1abe9dcce263071dec7d Author: Eddy Kim <eddy.kim@intel.com> Date: Thu Aug 1 10:05:32 2024 +0900 [GPU] group_normalization for bfzyx (#25753) ### Details: - This PR updates the `group_normalization_bfyx` kernel to support bfzyx format. - Additionally, this PR fixes the output feature calculation logic of the group_norm_fsv16 kernel and a model caching related logic for dynamic model. ### Tickets: - 147841 commit 13b3e4703e32053797099256849b78ebfef6d49c Author: Roman Kazantsev <roman.kazantsev@intel.com> Date: Thu Aug 1 01:44:49 2024 +0400 [TF FE] Stabilize Bitwise layer tests on all platforms and fix u16 bug (#25843) **Details:** Fix u16 bug "Tensor data with element type u16, is not representable as pointer to i32" **Ticket:** 122716 --------- Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com> commit d2ab797a0fff1f95ec9ea39e444798dbba499cf6 Author: Ilya Lavrenov <ilya.lavrenov@intel.com> Date: Wed Jul 31 23:22:43 2024 +0400 Fixed compilation with clang and libc++ (#25813) ### Details: - *item1* - *...* ### Tickets: - Closes https://github.com/openvinotoolkit/openvino/issues/25420 commit b26c533421b1ca3f3254df1de14300dbe928405b Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed Jul 31 21:01:11 2024 +0200 Update setuptools requirement from <72,>=65.6.1 to >=65.6.1,<73 in /src/bindings/python (#25792) Updates the requirements on [setuptools](https://github.com/pypa/setuptools) to permit the latest version. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/setuptools/blob/main/NEWS.rst">setuptools's changelog</a>.</em></p> <blockquote> <h1>v72.1.0</h1> <h2>Features</h2> <ul> <li>Restore the tests command and deprecate access to the module. (<a href="https://redirect.github.com/pypa/setuptools/issues/4519">#4519</a>) (<a href="https://redirect.github.com/pypa/setuptools/issues/4520">#4520</a>)</li> </ul> <h1>v72.0.0</h1> <h2>Deprecations and Removals</h2> <ul> <li>The test command has been removed. Users relying on 'setup.py test' will need to migrate to another test runner or pin setuptools before this version. (<a href="https://redirect.github.com/pypa/setuptools/issues/931">#931</a>)</li> </ul> <h1>v71.1.0</h1> <h2>Features</h2> <ul> <li> <p>Added return types to typed public functions -- by :user:<code>Avasam</code></p> <p>Marked <code>pkg_resources</code> as <code>py.typed</code> -- by :user:<code>Avasam</code> (<a href="https://redirect.github.com/pypa/setuptools/issues/4409">#4409</a>)</p> </li> </ul> <h2>Misc</h2> <ul> <li><a href="https://redirect.github.com/pypa/setuptools/issues/4492">#4492</a></li> </ul> <h1>v71.0.4</h1> <h2>Bugfixes</h2> <ul> <li>Removed lingering unused code around Distribution._patched_dist. (<a href="https://redirect.github.com/pypa/setuptools/issues/4489">#4489</a>)</li> </ul> <h1>v71.0.3</h1> <h2>Bugfixes</h2>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/setuptools/commit/441799f8b45a1a01c608db49333403db1b0d7100"><code>441799f</code></a> Bump version: 72.0.0 → 72.1.0</li> <li><a href="https://github.com/pypa/setuptools/commit/59aff448e79415ee3e491a8426553b373d7914e5"><code>59aff44</code></a> Merge pull request <a href="https://redirect.github.com/pypa/setuptools/issues/4522">#4522</a> from pypa/feature/graceful-drop-tests</li> <li><a href="https://github.com/pypa/setuptools/commit/c437aaa8d5b969a9fe8c8147463bfcb85b31ab26"><code>c437aaa</code></a> Restore the tests command and deprecate access to the module.</li> <li><a href="https://github.com/pypa/setuptools/commit/a6726b95f7a50dc5945e012050f00450c883fdcd"><code>a6726b9</code></a> Add celery and requests to the packages that test integration. Ref <a href="https://redirect.github.com/pypa/setuptools/issues/4520">#4520</a></li> <li><a href="https://github.com/pypa/setuptools/commit/5e1b3c414779317bc3e105d9bae82ce70c22dbf9"><code>5e1b3c4</code></a> Bump version: 71.1.0 → 72.0.0</li> <li><a href="https://github.com/pypa/setuptools/commit/4c0b9f3ee6ee47c597572655567f215c08c90137"><code>4c0b9f3</code></a> Merge pull request <a href="https://redirect.github.com/pypa/setuptools/issues/4458">#4458</a> from pypa/debt/remove-test-command</li> <li><a href="https://github.com/pypa/setuptools/commit/be8e3a09812f0a3717045098ac6ce7b52fc7d202"><code>be8e3a0</code></a> Merge pull request <a href="https://redirect.github.com/pypa/setuptools/issues/4507">#4507</a> from pypa/docs/4483-install-core-extra</li> <li><a href="https://github.com/pypa/setuptools/commit/99d2c722ca5d58ef1360ed86a3252cc16bd84dfd"><code>99d2c72</code></a> Add documentation clarifying how to reliably install setuptools with its depe...</li> <li><a href="https://github.com/pypa/setuptools/commit/63c89f93d6d43ff96ce5f7f5a862395f924905d0"><code>63c89f9</code></a> 👹 Feed the hobgoblins (delint).</li> <li><a href="https://github.com/pypa/setuptools/commit/c405ac1bf29b945db9af7ba9b0dd77e4d871f72a"><code>c405ac1</code></a> Merge branch 'main' into debt/remove-test-command</li> <li>Additional commits viewable in <a href="https://github.com/pypa/setuptools/compare/v65.6.1...v72.1.0">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com> Co-authored-by: Anastasia Kuporosova <anastasia.kuporosova@intel.com> commit a60140ef5c60f60304ad2a70ebff0f80f97cd51c Author: Dmitry Matveev <dmitry.matveev@intel.com> Date: Wed Jul 31 19:16:29 2024 +0100 Add NPUW to labeler (#25839) ### Details: - Mark changes under "src/plugins/intel_npu/src/plugin/npuw" with NPUW label ### Tickets: - n/a commit 3a9464dc34900b8ee11249f6f56f7a3636a796c8 Author: Vladislav Golubev <vladislav.golubev@intel.com> Date: Wed Jul 31 20:01:30 2024 +0200 [Snippets] Support Brgemm with transposed_b via BrgemmCopyB (#24932) ### Details: - *Support FP32/BF16/I8 matmuls with transpose_b=true via BrgemmCopyB* - *BrgemmCopyB emitter: handle tail iteration by N before the main body* - *Remove workaround on LDB and N dim rounding in brgemm emitters and related buffers* ### Tickets: - *CVS-114487* ## TODO: - [ ] BufferAllocation test for FP32 brgemm with repacking - [ ] SetBrgemmCopyBBuffersShape tests - [ ] MHA with transpose B for low precisions (FP32 already exists) - [ ] FuseTransposeBrgemm tests commit f48b30aab7ae2bb05c9f3709f9398eefe17ff66f Author: Andrei Kashchikhin <andrey.kashchikhin@intel.com> Date: Wed Jul 31 18:39:31 2024 +0100 [CI] [GHA] Introduce additional Ubuntu versions via separate workflows (#25234) ### Details: - This is a sister PR to #25202, the idea is the same: test more Linux flavours. This PR adds Ubuntu 22/24 as separate workflows instead of a matrix used in #25202. - The approach with separate workflows seems better as it does not require unique names for artefacts for matrix jobs and dependent jobs thus making it easier to write and maintain w/o magic strings. ### Tickets: - *144917* commit 161fce5d380e6ab3bdf0dcc6109ea904f11672bd Author: Zlobin Vladimir <vladimir.zlobin@intel.com> Date: Wed Jul 31 20:01:50 2024 +0400 Update open-model_zoo submodule (#25826) commit 25455a0dd97d9c724522dab43f2a019e2a6643d0 Author: Ujjayant Kadian <118752727+ujjayant-kadian@users.noreply.github.com> Date: Wed Jul 31 16:28:45 2024 +0100 NPUW: Change the sub-byte (i4) element order in the unpack procedure to match OpenVINO 2024.0 (#25827) ### Details: In the latest versions of OpenVINO the sub-byte order is defined as [1,0] meaning that first (MSB) 4 bits of an 8-bit vector form 1st element, and the last (LSB) 4 bits of an 8-bit vector form 0th element. Our unpack procedures for i4 were aligned with the older representation, where sub-byte order was defined as [0,1] meaning that first (MSB) 4 bits of an 8-bit vector form 0th element, and the last (LSB) 4 bits were the 1st element. **Updated these unpack functions to use this new order.** ### Tickets: - *121052* commit 3e058b90a891fee9e707dd9c2859492fa5166f71 Author: Roman Lyamin <Roman.Lyamin@intel.com> Date: Wed Jul 31 18:45:15 2024 +0400 [GPU] Fix lws calculation for reorder_kernel_bfyx_to_blocked_format kernel (#25830) ### Tickets: - *[146165](https://jira.devtools.intel.com/browse/CVS-146165)* commit a5d82f2ebf15bb11b452a4027c6b7ae54ca2951c Author: Sebastian Golebiewski <sebastianx.golebiewski@intel.com> Date: Wed Jul 31 15:04:21 2024 +0200 [DOCS] Updating Edit Button for articles for master (#25832) Porting: https://github.com/openvinotoolkit/openvino/pull/25831 commit 98956aa41354f0402bc7e84ad993efef21cb8cf8 Author: Alexandra Sidorova <alexandra.sidorova@intel.com> Date: Wed Jul 31 16:54:52 2024 +0400 [CPU][RISCV64] Fixed onednn build for RVV case (#24151) ### Details: - *Missed include `primitive.hpp` in RVV pooling implementation* - *oneDNN PR: https://github.com/openvinotoolkit/oneDNN/pull/259* - *It's not seen in CI since OV is built with default `-march=rv64imafdc` - without vector intrinsic support. Need to build with RVV support (`-march=rv64gcv0p7`)* ### Tickets: - *N/A* commit 10620e9fd68cbfb2f6ae2a1298e6af8425367bfe Author: Sun Xiaoxia <xiaoxia.sun@intel.com> Date: Wed Jul 31 19:54:29 2024 +0800 Fix executor memory leak when "-nstreams 0" (#25778) ### Details: - *create executor config when streams=0* ### Tickets: - *146686* commit cae739b96354aff83945767d2fad094e03ebebce Author: Edward Shogulin <edward.shogulin@intel.com> Date: Wed Jul 31 12:28:41 2024 +0100 [LPT] Dequantization precision reusage (#25668) ### Details: - *NNCF quantized fp16 model on GPU support* ### Tickets: - *CVS-126300* commit 3e49c22ff76f55304ea2bb1a832fce8b2a04ea69 Author: Alexandra Sidorova <alexandra.sidorova@intel.com> Date: Wed Jul 31 15:24:23 2024 +0400 [Snippets] Added auto sorting of LoopPorts (#25623) ### Details: - *Added support of expression enumeration - new attribute `m_exec_num` of `Expression`. Calculated as `exec_num_left + (exec_num_right - exec_num_left) / 2`. Now we can figure out which expression is executed earlier than another using `m_exec_num O(1)` instead of `find(begin(), end(), smth) == end() O(n)`* - *Refactored LoopInfo interface: united all `update` and `replace` into one `replace_with_new_ports`.* - *Added auto sorting of ports in LoopInfo: after port replacing, new expression/node insertion using helpers - loop ports are automatically reordered by expression execution numbers* - *Removed previous workarounds with `GetTopologicalOrder` from tokenization pass* ### Tickets: - *113536* - *142990* - *137819* commit 89b49c10ca719505712b53cf44370dbdb3782fbc Author: Karol Blaszczak <karol.blaszczak@intel.com> Date: Wed Jul 31 13:12:50 2024 +0200 [DOCS] 24.3 archives and final touches (#25829) port: https://github.com/openvinotoolkit/openvino/pull/25828 commit f0d7cd8c22e2a994a4371cc5e15d6be33c9e6785 Author: Sebastian Golebiewski <sebastianx.golebiewski@intel.com> Date: Wed Jul 31 13:05:07 2024 +0200 [DOCS] Updating Tool Ecosystem article (#25824) Adding information on OpenVINO-based AI projects. Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com> Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> commit ea6731f8a75b907eea1ee9317c2cd89a2d54e4c4 Merge: 70b8346d72 3c713d4aec Author: Ujjayant Kadian <118752727+ujjayant-kadian@users.noreply.github.com> Date: Wed Jul 31 11:56:25 2024 +0100 Merge branch 'master' into uk/changing-sub-byte-i4-element-order commit 11c01898f507c1abb7d64d70f89ffcc281081373 Author: Roman Kazantsev <roman.kazantsev@intel.com> Date: Wed Jul 31 14:19:01 2024 +0400 [TF FE] Support TensorListConcatV2 operation for multiple undefined dims in element_shape (#25814) **Details:** Support TensorListConcatV2 operation for multiple undefined dims in element_shape **Ticket:** 105671 --------- Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com> commit 3c713d4aec23c825baa71fd524f93140bc928ce9 Author: Chen Xu <chen.xu@intel.com> Date: Wed Jul 31 17:32:10 2024 +0800 [CPU] Avoid rounding to zero for Reduce node in quantized models (#25766) ### Details: - *If the Reduce node has both input and output precision to be integers from the original model, then rounding to zero should be done before converting intermediate floating point value to integer.* - *However, if such integer precisions are resulted from quantization, then we should not do such rounding, in order to maintain accuracy.* - *Add corresponding test cases.* ### Tickets: - *CVS-147352*

danielhanchen added 3 commits February 26, 2024 17:04

Update modeling_llama.py

7a25720

Llama - Force float32 since bfloat16 loses precision on long contexts

Update modeling_llama.py

db8237f

Update modeling_gemma.py

3de95c4

Fix RoPE and logits.float()

ArthurZucker reviewed Feb 26, 2024

View reviewed changes

Comment thread src/transformers/models/llama/modeling_llama.py Outdated

ArthurZucker mentioned this pull request Feb 27, 2024

torch.arange use should not use dtype=float for integer ranges, conflicts w/ DS zero.Init() #28685

Closed

4 tasks

ArthurZucker reviewed Feb 27, 2024

View reviewed changes

Comment thread src/transformers/models/gemma/modeling_gemma.py Outdated

ArthurZucker reviewed Feb 27, 2024

View reviewed changes

danielhanchen added 3 commits February 27, 2024 19:56

Merge branch 'huggingface:main' into main

9e5cbb0

@torch.no_grad()

99d564e

@torch.no_grad()

d0c08bf

ArthurZucker approved these changes Feb 27, 2024

View reviewed changes

fxmarty reviewed Feb 27, 2024

View reviewed changes

Comment thread src/transformers/models/gemma/modeling_gemma.py

Comment thread src/transformers/models/gemma/modeling_gemma.py Outdated

Comment thread src/transformers/models/gemma/modeling_gemma.py

Comment thread src/transformers/models/gemma/modeling_gemma.py

gante mentioned this pull request Feb 27, 2024

LLaMA RoPE precision with bf16 model #29301

Closed

4 tasks

ArthurZucker mentioned this pull request Feb 28, 2024

gemma-7b-it聊天失败 hiyouga/LlamaFactory#2540

Closed

danielhanchen added 3 commits February 28, 2024 16:17

Merge branch 'huggingface:main' into main

bd3a214

Cos, Sin to float32

abffebb

cos, sin to float32

c2e31bf

ArthurZucker approved these changes Feb 28, 2024

View reviewed changes

Comment thread src/transformers/models/gemma/modeling_gemma.py Outdated

Comment thread src/transformers/models/llama/modeling_llama.py Outdated

danielhanchen and others added 6 commits February 28, 2024 21:30

Update src/transformers/models/gemma/modeling_gemma.py

f487800

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/llama/modeling_llama.py

c852675

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Resolve PR conflicts

1a50a4b

Fix RoPE for llama

b860a22

Revert "Fix RoPE for llama"

790e4a3

This reverts commit b860a22.

Merge remote-tracking branch 'upstream/main'

06c7634

danielhanchen mentioned this pull request Mar 9, 2024

Keep rope at float32 precision keras-team/keras-hub#1497

Merged

ArthurZucker mentioned this pull request Apr 5, 2024

Move instantiation of RoPE from MistralAttention to MistralModel #30072

Closed

5 tasks

EricLBuehler mentioned this pull request Apr 29, 2024

Sliding window for phi3 EricLBuehler/mistral.rs#244

Merged

molbap mentioned this pull request May 24, 2024

Fix for hardcoded final_labels to enable loss calculation in PaliGemma #30987

Closed

ArthurZucker mentioned this pull request Jun 6, 2024

bf16 is more unstable than fp16, when looking at the difference of generation logprobs and forward logprobs #31267

Closed

4 tasks

vwxyzjn mentioned this pull request Jun 6, 2024

Got an abnormally high loss when training Gemma-7B. huggingface/trl#1709

Closed

angelayi mentioned this pull request Jul 10, 2024

[export] Failed to trace HF Llama2 model pytorch/pytorch#128394

Closed

lessw2020 mentioned this pull request Jul 14, 2024

remove disabled autocast context and device type calc, as freqs are already force upcast to float precision (and fix https://github.com/pytorch/pytorch/issues/128394) #31959

Closed

4 tasks

zhangYiIntel mentioned this pull request Jul 22, 2024

[Core][CPU]markup rope's sin/cos generation with f32 openvinotoolkit/openvino#25662

Merged

davedgd mentioned this pull request Sep 27, 2024

Fix tensors on "two devices" issue #32420 #33742

Closed

5 tasks

viclzhu mentioned this pull request Oct 12, 2024

Lower precision RoPE computation leads to training instability NVIDIA/TransformerEngine#1245

Open

ivankrylatskoe mentioned this pull request Nov 8, 2024

Different LlamaRotaryEmbedding in old and new versions of transformers #34657

Closed

4 tasks

banyan-god mentioned this pull request Nov 29, 2024

RoPE has precision errors when used with BFloat16 KellerJordan/modded-nanogpt#39

Closed

xenova mentioned this pull request Mar 5, 2025

Proper handling of repeated fp16 conversion. microsoft/onnxconverter-common#310

Open

zijunx mentioned this pull request Apr 30, 2025

Adapt omniquant to transformers 4.41.0 OpenGVLab/OmniQuant#107

Closed

zucchini-nlp mentioned this pull request Jun 13, 2025

align xpu's autocast behavior w/ cuda by using device agnostic torch APIs #38284

Merged

		self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64, device=x.device).float() / self.dim)
		)

Conversation

danielhanchen commented Feb 26, 2024

Uh oh!

danielhanchen commented Feb 26, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danielhanchen commented Feb 26, 2024

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Feb 27, 2024

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Feb 27, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gante commented Feb 27, 2024

Uh oh!

danielhanchen commented Feb 27, 2024

Uh oh!

danielhanchen commented Feb 28, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kwen2501 commented Jul 14, 2024

Uh oh!

gante commented Jul 14, 2024

Uh oh!

lessw2020 commented Jul 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lessw2020 commented Jul 14, 2024

Uh oh!

danielhanchen commented Jul 14, 2024

Uh oh!

kwen2501 commented Jul 15, 2024

Uh oh!

lessw2020 commented Jul 15, 2024

Uh oh!

ArthurZucker commented Jul 15, 2024

Uh oh!

gante commented Jul 16, 2024

Uh oh!

lessw2020 commented Jul 18, 2024

Uh oh!

ArthurZucker commented Jul 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

lessw2020 commented Jul 14, 2024 •

edited

Loading