RoPE loses precision for Llama / Gemma + Gemma logits.float()#29285
RoPE loses precision for Llama / Gemma + Gemma logits.float()#29285ArthurZucker merged 20 commits intohuggingface:mainfrom
Conversation
Llama - Force float32 since bfloat16 loses precision on long contexts
Fix RoPE and logits.float()
|
Forgot to add I'm not certain if this will break CUDAGraphs for faster inference - hopefully not |
ArthurZucker
left a comment
There was a problem hiding this comment.
I'll have to check the compile test and everything, but we usually hate these kind of changes 🫣 the bug is real, I'll see if I can find a good alternative as this is pretty much only for training! Great catch 🤗
|
Sadly unsure if it's just for training :(( For inference I don't remember up to which context length, bfloat16 won't be an issue. I think it was up to 4096. However, bfloat16 loses precision even for inference sadly after 4096 context lengths. 8192 definitely - bfloat16 essentially thinks the last 4 tokens are all position 8192 ie [8192, 8192, 8192, 8192], whilst the correct float32 is [8188, 8189, 8190, 8191]. |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM let's do no grad and autocast, I'll test compile once you have both!
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM, before merging I'll ping @pacman100, @younesbelkada and @fxmarty as this is pretty important! Feel free to comment if you are against these changes!
| self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64, device=x.device).float() / self.dim) | ||
| ) | ||
|
|
||
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@danielhanchen |
|
@gante Actually interesting point - I can see torch.autocast does arc sin and sinh etc in float32, but it doesnt list sin itself - I'll have to check if .sin() is done in float32 or float16 |
|
@ArthurZucker I checked everything and it's working! You guys can double check if anything is wrong. You can push the commit whenever. Thank you! :) |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM! Thanks a mile for this.
Let's make sure you run make style and make fixup for the last CIs
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This reverts commit b860a22.
|
FYI -- as reported in pytorch/pytorch#128394, this change seems to break torch.export of the Llama 2 model. There seems to be two causes of the break:
Creating a model in
Wondering if there is a way to reenable export of Llama 2? Thanks! Cc: @angelayi @lessw2020 |
|
Hi @kwen2501 👋 Happy to iterate with you to get a working solution for I'm not experienced with with torch.autocast(device_type=device_type, enabled=False):
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)Isn't |
|
Hi @gante and @kwen2501 and @angelayi, (to your question, short summary is autocast will dynamically lower the precision to fp16 where it feels it can do so without losing precision. In this case it appears that for long context the lower precision could be negative and hence the reason to ensure the freqs calc is always done in fp32). To try and speed things up in resolving, I've made a PR that removes the device type calcs that were being fed to the disabled autocast and of course removed the autocast itself. When I tested this on Friday, removing the autocast did enable torch.export to function as expected so this should resolve this for us. But appreciate any feedback on the PR in case I have missed something! |
|
Note - PR is failing CI due to this - @gante, would you be able to help review this error? |
|
I'll comment here as well - the primary reason why the context manager was placed was because mixed precision training would cause |
|
Thanks @gante @lessw2020 @danielhanchen ! I wonder if the user can divide the autocast enabling region into two to skip this disabled region? Perhaps we can a note to the Longer term, maybe it would be worth adding a |
|
Thanks @danielhanchen for the insight to the reasoning behind the autocast w/ enabled = False, here (I suspected there was likely more to the story). I did want to add that we had the inverse of this situation with fsdp - for fsdp mixed precision, if you explicitly set a model dtype to something esp .bfloat16, and then pass to FSDP, we respect that. That behaviour however, then generated a big commotion from HF about why wouldn't we secretly auto override this for mixed precision and keep fp32 weights. (because we respect what the user has directed). |
do you mean for ROPE? |
|
"What I'm reading then is that we SHOULDN'T merge this PR until we confirm that pytorch doesn't change the type of an explicit .float() cast when autocast is active. Otherwise, we will get a regression (thank you @danielhanchen for confirming 🤗 )" |
|
Hi all - we have a much better solution now by making some changes to how PP does the model tracing via PT export and with that, it now handles the autocast issue directly so no changes needed here in the transformer code. Thanks @danielhanchen @gante @ArthurZucker @kwen2501 for the details and convo on this. |
|
Thanks for updating, great to know that overall this will be seamlessly fixed for everyone! 🤗 |
### Details: - *Sin/Cos table generation must run in f32 otherwise it has accuracy issue* - Reference : huggingface/transformers#29285 ### Tickets: - *CVS-146672*
commit 67fd2eb6d83435b195ef56004d7d9f9c2a728502
Merge: 5f09ab51c0 9432b3d2a5
Author: Ujjayant Kadian <118752727+ujjayant-kadian@users.noreply.github.com>
Date: Tue Aug 6 13:07:36 2024 +0100
Merge branch 'master' into uk/changing-sub-byte-i4-element-order
commit 9432b3d2a577bc27e8008d85002ce57c4b0e3159
Author: Min, Byungil <byungil.min@intel.com>
Date: Tue Aug 6 19:20:02 2024 +0900
[GPU] Bugfix reorder for byfx format (#25782)
+ Reorder returns OOR error while handling byfx from a fused permute
parent
### Details:
- *item1*
- *...*
### Tickets:
- CVS-147330
---------
Signed-off-by: Min, Byung-il <byungil.min@intel.com>
commit 606d909ab8ec130fd7c6a9d2d56a839978903a2f
Author: Bogdan Pereanu <bogdan.pereanu@intel.com>
Date: Tue Aug 6 13:12:32 2024 +0300
[NPU] Disable MCL in case of UD28 (#25903)
### Details:
- *The UD28 Windows driver version doesn't support as expected the
MutableCommandList feature - just disable this feature from the plugin
in case this driver is used*
### Tickets:
- *EISW-133845*
commit b6447980be06caf6bb6c1592eee4eb6de094218c
Author: Anastasiia Pnevskaia <anastasiia.pnevskaia@intel.com>
Date: Tue Aug 6 10:26:04 2024 +0200
[DOCS] Corrected build guides in docs. (#25922)
### Details:
- Corrected build guides in docs.
### Tickets:
-
commit 265dfad8ebcdae2b17611d833ec8da0f0ddc9bd2
Author: Przemyslaw Wysocki <przemyslaw.wysocki@intel.com>
Date: Tue Aug 6 10:19:41 2024 +0200
Change index precision from `i64` to `i32` in MaxPool14 to MaxPool8 downgrade transformation (#25514)
### Tickets:
- CVS-146277
commit 9eeb7a18d5ae039d1b406cab405ad2083dc5680c
Author: Maciej Smyk <maciejx.smyk@intel.com>
Date: Tue Aug 6 09:38:15 2024 +0200
[DOCS] Dependencies and Building for OpenVINO GenAI article for master (#25908)
Adding information on the OpenVINO GenAI Dependencies and ref-link to
the GenAI building in user docs.
commit cbf4035c257042aec180102d434287c27d9cd2f6
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue Aug 6 11:16:55 2024 +0400
Bump hendrikmuhs/ccache-action from 1.2.13 to 1.2.14 (#25917)
Bumps
[hendrikmuhs/ccache-action](https://github.com/hendrikmuhs/ccache-action)
from 1.2.13 to 1.2.14.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/hendrikmuhs/ccache-action/releases">hendrikmuhs/ccache-action's
releases</a>.</em></p>
<blockquote>
<h2>v1.2.14</h2>
<h2>What's Changed</h2>
<ul>
<li>Add sccache to PATH after installation by <a
href="https://github.com/kendalharland"><code>@kendalharland</code></a>
in <a
href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/204">hendrikmuhs/ccache-action#204</a></li>
<li>Make ccache-action respect environment variables by <a
href="https://github.com/TrentHouliston"><code>@TrentHouliston</code></a>
in <a
href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/217">hendrikmuhs/ccache-action#217</a></li>
<li>updates</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/kendalharland"><code>@kendalharland</code></a>
made their first contribution in <a
href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/204">hendrikmuhs/ccache-action#204</a></li>
<li><a href="https://github.com/cclauss"><code>@cclauss</code></a> made
their first contribution in <a
href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/213">hendrikmuhs/ccache-action#213</a></li>
<li><a
href="https://github.com/TrentHouliston"><code>@TrentHouliston</code></a>
made their first contribution in <a
href="https://redirect.github.com/hendrikmuhs/ccache-action/pull/217">hendrikmuhs/ccache-action#217</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/hendrikmuhs/ccache-action/compare/v1...v1.2.14">https://github.com/hendrikmuhs/ccache-action/compare/v1...v1.2.14</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/ed74d11c0b343532753ecead8a951bb09bb34bc9"><code>ed74d11</code></a>
Bump <code>@types/node</code> from 22.0.0 to 22.1.0 (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/222">#222</a>)</li>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/a92dd99d2cf20a1db8898b00bb383b234fb1cf15"><code>a92dd99</code></a>
Bump <code>@types/node</code> from 20.14.11 to 22.0.0 (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/220">#220</a>)</li>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/aa7d29411285c29f578109e54b7a8d8155c2fbb3"><code>aa7d294</code></a>
Bump typescript from 5.5.3 to 5.5.4 (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/218">#218</a>)</li>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/6f0874030891bf49d844fff92b862568f093dabe"><code>6f08740</code></a>
Make ccache-action respect environment variables (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/217">#217</a>)</li>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/ed979992cda44142d976add1d5a7d6f39f7e8b67"><code>ed97999</code></a>
Bump <code>@types/node</code> from 20.14.10 to 20.14.11 (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/216">#216</a>)</li>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/ca1e5062f3378412bbfeb780d1ebe3c2a4913081"><code>ca1e506</code></a>
Bump actions/checkout from 2 to 4 (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/214">#214</a>)</li>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/069136ab7ab2267ea6624fde73f80d7d472d323e"><code>069136a</code></a>
Bump <code>@types/node</code> from 20.14.9 to 20.14.10 (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/212">#212</a>)</li>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/3cf745af56c860cc76c89ffd830efec6aef03b56"><code>3cf745a</code></a>
Bump typescript from 5.5.2 to 5.5.3 (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/211">#211</a>)</li>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/9a0cc152966f2c3f3df86a6e0364da1608924006"><code>9a0cc15</code></a>
Keep GitHub Actions up to date with GitHub's Dependabot (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/213">#213</a>)</li>
<li><a
href="https://github.com/hendrikmuhs/ccache-action/commit/b7c0e162a73e852cdd80bd368aa77e7801fce009"><code>b7c0e16</code></a>
Bump <code>@types/node</code> from 20.14.8 to 20.14.9 (<a
href="https://redirect.github.com/hendrikmuhs/ccache-action/issues/210">#210</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/hendrikmuhs/ccache-action/compare/c92f40bee50034e84c763e33b317c77adaa81c92...ed74d11c0b343532753ecead8a951bb09bb34bc9">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
commit d9d5ace62609f909cdaac0ac8073d78a1f19607d
Author: Vladimir Paramuzov <vladimir.paramuzov@intel.com>
Date: Tue Aug 6 10:24:43 2024 +0400
[Transformations] Extend MoveEltwiseUpThroughData pass with per channel case (#24401)
### Details:
- Added pass to swap Reshape/Squeeze/Unsqueeze -> Eltwise (per channel)
commit 513c812fcf5049cab4084b7a09e862f7df357880
Author: Gorokhov Dmitriy <dmitry.gorokhov@intel.com>
Date: Tue Aug 6 09:33:29 2024 +0400
[CPU] FullyConnected weights compression: mxfp4 (wei=f4e2m1, scales=f8e8m0) support (#25783)
### Details:
- This PR extends FC weights compression support with mxfp4 (wei=f4e2m1,
scales=f8e8m0) precision
- ISA coverage: avx2, avx512
- oneDNN fork changes:
https://github.com/openvinotoolkit/oneDNN/pull/258
### Tickets:
- [CVS-142986](https://jira.devtools.intel.com/browse/CVS-142986)
### Dependencies:
oneDNN 3.5 migration:
https://github.com/openvinotoolkit/openvino/pull/25153
commit 73e1b94625c277ad89d4a613eef889213a1b856e
Author: Vladimir Paramuzov <vladimir.paramuzov@intel.com>
Date: Tue Aug 6 09:21:09 2024 +0400
[GPU][TRANSFORMATIONS] Disable per pass validation in some cases (#25874)
### Details:
- Disable per pass validation for GPU specific passes and mixed
precision markup to improve model loading time
commit 7fd8b2ed77d4b31cab9556742320a793506f7327
Author: Vladimir Paramuzov <vladimir.paramuzov@intel.com>
Date: Tue Aug 6 09:16:49 2024 +0400
[GPU] Dynamic pipeline host opt (#25886)
### Details:
- Reduce count of copies for layouts/shapes and other complex objects
commit d604f1d8b2a60fa68b704c2a8f81e283c4aa2f0f
Author: Michal Miotk <michal.miotk@intel.com>
Date: Tue Aug 6 00:54:25 2024 +0200
fix for confused input with output in assert error message (#25915)
### Details:
- short fix for message
### Tickets:
- N/A
commit f8d0e8c47c5be32b2e5e44e4449a337fcbc130fb
Author: Andrew Kwangwoong Park <andrew.park@intel.com>
Date: Tue Aug 6 02:52:42 2024 +0900
Revert "[GPU] Avoid crop buffer fusing when dynamic shape and squeeze/unsqueeze reshape mode" (#25895)
### Details:
- This revert https://github.com/openvinotoolkit/openvino/pull/25700
- As support for Crop->Reshape(Squeeze/Unsqueeze modes) buffer
optimization was added by
https://github.com/openvinotoolkit/openvino/pull/25836
### Tickets:
- 146626
commit 5264c9995f3a41b642a3359155edb719243944a1
Author: Karol Blaszczak <karol.blaszczak@intel.com>
Date: Mon Aug 5 18:41:37 2024 +0200
[DOCS] tiny article name changes (#25910)
commit 3cf27441ff5cd497499bc37d92e55d901e88ca59
Author: Ilya Lavrenov <ilya.lavrenov@intel.com>
Date: Mon Aug 5 19:47:38 2024 +0400
Removed GHA WA for older ONNX versions (#25912)
### Details:
- Removed WA introduced here
https://github.com/openvinotoolkit/openvino/pull/25234 because ONNX
version is updated here
https://github.com/openvinotoolkit/openvino/pull/24242
commit afb194f3747ed56ab524500842cb50281abe41a9
Author: Rinne <AsakusaRinne@gmail.com>
Date: Mon Aug 5 22:33:17 2024 +0800
[JAX FE] Add translation for more operations. (#25292)
### Details:
- *Add the translation for reduce_window_max, reduce_window_sum, rsqrt,
reshape , squeeze, slice, broadcast_in_dim, copy, dot_general and
transpose of JAX frontend*
- *Add corresponding test*
### Tickets:
- *CVS-145575*
- *CVS-145583*
- *CVS-145580*
- *CVS-145574*
- *CVS-145581*
- *CVS-145579*
- *CVS-145582*
- *CVS-145573*
- *CVS-145578*
NOTE: this PR should be merged after #25290
---
@mvafin Could you please help to review this PR?
cc @rkazants
---------
Co-authored-by: Maxim Vafin <maxim.vafin@intel.com>
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
commit c30a0bcf6ba4f1b75412c353cefe63f97f6ee33c
Author: Georgy Krivoruchko <georgy.krivoruchko@intel.com>
Date: Mon Aug 5 18:12:22 2024 +0400
[ONNX] Aligned behavior for ReduceProd-11,13,18 (#25875)
### Details:
- Aligned behavior of ReduceProd operation
### Tickets:
- 143347
commit 10a2e91d2502bc7bc5aa7c2fbcc5b845c7a00975
Author: Aleksandr Voron <aleksandr.voron@intel.com>
Date: Mon Aug 5 15:54:19 2024 +0200
[CPU][ARM] Enable ACL MVN executor for `initAcrossChannels` option in NHWC layout (#25905)
### Details:
- This configuration (initAcrossChannels is true and NHWC is used) was
disabled for ACL executor to enable `yolo_v3_tiny`. The last check shows
this restriction is not required anymore.
### Tickets:
- *ticket-id*
commit 12a5e5a505da2d793bed99efdfdc4bda42be9850
Author: Georgy Krivoruchko <georgy.krivoruchko@intel.com>
Date: Mon Aug 5 17:06:03 2024 +0400
[ONNX] Switched to ONNX 1.16.0 (#24242)
### Details:
- Switched to ONNX 1.16.0
- Removed WA for ONNX 1.15.0
- ONNXRuntime for tests 1.18.1
### Tickets:
- 136748, 138876
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
commit 7eedf84ef21918e84f4488a582645a82d921e507
Author: Luo Cheng <cheng.luo@intel.com>
Date: Mon Aug 5 21:05:08 2024 +0800
[CPU] Add score output for PagedAttention (#25594)
### Details:
- *Add score output for PagedAttention*
- *...*
### Tickets:
- *[146969](https://jira.devtools.intel.com/browse/CVS-146969)*
commit a6413b415ff8cc7cd9eb9cf3cfe96334bd1907e4
Author: Przemyslaw Wysocki <przemyslaw.wysocki@intel.com>
Date: Mon Aug 5 15:00:01 2024 +0200
[PyOV] Replace `std::stringstream` with `std::fstream` in `import_model` (#25724)
### Details:
- The current implementation breaks when the model size is > 2gb
- `std::fstream` does not limit the model size
- Tested in
https://github.com/openvinotoolkit/openvino/blob/master/src/bindings/python/tests/test_runtime/test_compiled_model.py#L57
- The fix has been verified
### TODO:
- Should we simulate > 2gb model case in tests?
### Tickets:
- EISW-130771
commit e35acf91e9a953ee081d0bae355a7e848ef41b86
Author: Attila Csok <attila.csok@intel.com>
Date: Mon Aug 5 15:45:52 2024 +0300
[intel-npu] Adding NPU_TURBO option to plugin (#25646)
### Details:
- Adding npu_turbo option for intel-npu plugin
- updating documentation with turbo and other missing properties
Master backport of
https://github.com/openvinotoolkit/openvino/pull/25603
### Tickets:
- [*ticket-id*](https://jira.devtools.intel.com/browse/CVS-147038)
commit 64c5f67a5aa31b020d295c210b0345bdd74e4dbb
Author: Ilya Lavrenov <ilya.lavrenov@intel.com>
Date: Mon Aug 5 14:08:22 2024 +0400
Fixed compatibility with new version of 'wheel' (#25899)
### Details:
- *item1*
- *...*
### Tickets:
- *ticket-id*
commit c664ca7f288f59722d82e9bfbb994f0c7c1e232e
Author: Xuejun Zhai <xuejun.zhai@intel.com>
Date: Sun Aug 4 17:32:04 2024 +0800
Clean meta plugin tests from CPU/GPU plugin (#24477)
### Details:
- Move BATCH related test out from CPU/GPU func test to BATCH func test
- Move HETERO related test out from CPU/GPU func test to HETERO func
test
- *...*
### Tickets:
- *ticket-id*
---------
Signed-off-by: Zhai, Xuejun <xuejun.zhai@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
commit 59a0f019913681287f82553a07e0b299404de821
Author: Peyara Nando <nandu45@outlook.com>
Date: Sat Aug 3 05:45:30 2024 +0530
Implemented getOutputElementType (#25760)
Implemented Method on c++ side.
Updated typescript definitions.
Created unit tests.
For Issue
[https://github.com/openvinotoolkit/openvino/issues/25406](https://github.com/openvinotoolkit/openvino/issues/25406)
Resolved merge errors
---------
Co-authored-by: Alicja Miloszewska <alicja.miloszewska@intel.com>
commit 5f09ab51c00ed0d207bc02963783efe597dda5de
Author: Kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 16:14:42 2024 +0100
Modified comments
commit 0f1ad2b95de9d7985f8db93e99450bb490c260d0
Merge: 99523fc962 ae454eebbd
Author: Kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 16:08:41 2024 +0100
Merge branch 'uk/changing-sub-byte-i4-element-order' of github.com:ujjayant-kadian/openvino into uk/changing-sub-byte-i4-element-order
commit 99523fc9624738b9af5fdd1ca58aa301f44d49df
Author: Kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 13:06:18 2024 +0100
Added a new pattern in pattern matcher
[CPU] Avoid rounding to zero for Reduce node in quantized models (#25766)
- *If the Reduce node has both input and output precision to be integers
from the original model, then rounding to zero should be done before
converting intermediate floating point value to integer.*
- *However, if such integer precisions are resulted from quantization,
then we should not do such rounding, in order to maintain accuracy.*
- *Add corresponding test cases.*
- *CVS-147352*
Correct clang format issues
Tried to resolve the segmentation fault
Corrected clang format error
Tried to correct segmentation fault
Removed std::move
Using std::move with much more caution
commit ae454eebbdde2d2582cbe43e5a10e62a7ec61d50
Merge: 46b84b994e b2319a5bea
Author: Ujjayant Kadian <118752727+ujjayant-kadian@users.noreply.github.com>
Date: Fri Aug 2 16:04:40 2024 +0100
Merge branch 'openvinotoolkit:master' into uk/changing-sub-byte-i4-element-order
commit d29948c758501bafe807ff0feeed8875574545a6
Author: Roman Kazantsev <roman.kazantsev@intel.com>
Date: Fri Aug 2 19:02:31 2024 +0400
[TF FE][SDL] Fix performance inefficiencies (#25884)
**Details:** Fix performance inefficiencies
**Ticket:** 148599
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
commit 46b84b994e55306f89aa437fd2271b6164e548b1
Author: Kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 15:45:34 2024 +0100
Using std::move with much more caution
commit cb2814d2ee5e832b7e8a0809f55185f305133cad
Author: Kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 15:43:28 2024 +0100
Removed std::move
commit ae13bed22c9a222611dce04075f3bbe6ac87091e
Author: Kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 15:34:40 2024 +0100
Tried to correct segmentation fault
commit a98775ad74bcdf2bcc58820f2373adcaf3d98dff
Author: ujjayant-kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 14:31:54 2024 +0000
Corrected clang format error
commit c2ba823ef43f6804b07781af3707220be184f541
Author: Kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 15:24:24 2024 +0100
Tried to resolve the segmentation fault
commit a33afe422f0ff9f655dd9f660d35f441e148433e
Author: Sergey Shlyapnikov <sergey.shlyapnikov@intel.com>
Date: Fri Aug 2 18:17:31 2024 +0400
[GPU] Fix Crop->Reshape (Squeeze/Unsqueeze modes) buffer optimization (#25836)
These changes fix a significant accuracy issue (reducing perplexity from
120 000 to 17) for Llama models with precalculated constant sin/cos
values. However, there is still a problem with sin/cos representation in
FP16 precision, which will be addressed in a separate PR.
### Details:
- Fixed Crop->Reshape (Squeeze/Unsqueeze modes) buffer optimization
- Update rope_ref kernel to support dynamic paddings for cos/sin inputs
- Fix propagate_padding() function and update shape infer tests
### Tickets:
- [CVS-148220](https://jira.devtools.intel.com/browse/CVS-148220),
[CVS-146283](https://jira.devtools.intel.com/browse/CVS-146283)
commit b2319a5bea85fd057d1e3ea102e83d8d6af6c6db
Author: Alexandra Sidorova <alexandra.sidorova@intel.com>
Date: Fri Aug 2 18:09:25 2024 +0400
[Snippets][CPU] Added Brgemm FP32 blocking support by dynamic K, N dimensions (#25745)
### Details:
- *Added update support of `K` and `N` dimensions for Brgemm block in
`BrgemmKernelExecutor::update_config`*
### Tickets:
- *147852*
### Prerequisites:
- [x] https://github.com/openvinotoolkit/openvino/pull/25378
commit b625fcbfbf95be80d1fe57f471a02b8fd31d94ef
Author: Roman Kazantsev <roman.kazantsev@intel.com>
Date: Fri Aug 2 17:44:16 2024 +0400
[TF FE] Extend UnsortedSegmentSum for ND indices (#25877)
**Details:** This extension is needed for some customer model
**Ticket:** 148750
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
commit 64072f63e7afc66a3e7a49f2bc00d3ae0f695b02
Author: Maxim Vafin <maxim.vafin@intel.com>
Date: Fri Aug 2 15:38:01 2024 +0200
[PT FE] Update GHA tests (#25868)
### Details:
- *item1*
- *...*
### Tickets:
- *ticket-id*
commit ab1e8dec8341f7ada47d959de895056ddb93ff52
Author: Karol Blaszczak <karol.blaszczak@intel.com>
Date: Fri Aug 2 15:18:03 2024 +0200
[DOCS] ovms llm data master (#25880)
commit a9f670a1073b3ce7660b3dac133bed4b45e518d5
Author: mei, yang <yang.mei@intel.com>
Date: Fri Aug 2 20:55:52 2024 +0800
[CPU] Align cpu execution order before/after ResolveComplexInplaceConflicts() (#24937)
### Details:
- *Align cpu execution order before/after
ResolveComplexInplaceConflicts()*
- *Keep order information of Results and Parameters when dump CPU graph
to ov::Model*
- *Let MemoryInput always execute first to avoid potential issue because
it will update its sibling MemoryOutput memory after execution*
### Tickets:
- *CVS-134638*
- *CVS-148497*
### Description:
- CPU execution order of some nodes may changes after
https://github.com/openvinotoolkit/openvino/blob/2024.2.0.dev20240513/src/plugins/intel_cpu/src/graph.cpp#L285.
Sometimes that may give ResolveComplexInplaceConflicts() incorrect
execution order information. That may lead to
ResolveComplexInplaceConflicts() get the wrong conclusion which edge
memory should be shared. So this PR add SortTopologically() right before
ResolveComplexInplaceConflicts() to let execution order not change much
before/after ResolveComplexInplaceConflicts()*
- *The node order of CPU graph topology is not stable. For example in
below graph*

*If Parameter0 is before than Parameter1 in graphNodes, in original
SortTopologically(), it will first recurse node down from Parameter0. So
in final sorted graphNodes, Parameter0 will be sorted after Parameter1.
Then in second round of SortTopologically(), it will first recurse from
Parameter1 and in final sorted graphNodes, Parameter0 will be sorted
before Parameter0 again. This will make sometimes ReduceProd is executed
before ScatterNDUpdate while sometimes ReduceProd is after
ScatterNDUpdate. It will mislead ResolveComplexInplaceConflicts()*
- *MemoryInput will update its sibling MemoryOutput memory after
execution. To avoid memory changes during the execution of other nodes,
always let MemoryInput execute first*
commit 2e95269d14cfb7c865f2fd5e2329d6c9523469a4
Author: ujjayant-kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 12:35:17 2024 +0000
Correct clang format issues
commit 63e9e38413e223e645029baf18359bf5df21b076
Merge: 6bc933a4dd ea6731f8a7
Author: Kadian <ujjayant.kadian@intel.com>
Date: Fri Aug 2 13:07:13 2024 +0100
Merge branch 'uk/changing-sub-byte-i4-element-order' of github.com:ujjayant-kadian/openvino into uk/changing-sub-byte-i4-element-order
commit da2a4e770a163af6419e0d9e46594e58dbc8ef64
Author: Aleksandr Voron <aleksandr.voron@intel.com>
Date: Fri Aug 2 13:33:21 2024 +0200
[CPU][ARM] Added debug logs to ACL Interpolate executor (#25866)
### Details:
- Added debug logs to ACL Interpolate executor to debug easier
- Remove redundant check (since it duplicates the check
https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_cpu/src/nodes/executors/acl/acl_interpolate.cpp#L135_L136)
### Tickets:
- *ticket-id*
commit 34d41aeb078eca2c0d55c556011bea1ba7729fdd
Author: Maksim Doronin <maksim.doronin@intel.com>
Date: Fri Aug 2 11:40:44 2024 +0100
Add folder parameter for reference blobs in SIT (#25651)
### Details:
- Adding a new optional parameter for SIT to specify a directory with
reference outputs. So instead of running NetsVal-CalcRef on CPU or
TEMPLATE we can re-use predefined reference outputs. However, their
names must comply with the existing name convention
### Tickets:
- E-131878
commit 3c5a52e30f363cd25122ccd5b8bc081d717e8e03
Author: Damian Kurek <damian.kurek@intel.com>
Date: Fri Aug 2 10:55:38 2024 +0200
[GPU] Use parallel sum reduction in MVN BFYX OPT kernel (#25840)
Optimize MVN BFYX OPT kernel
### Details:
- Use parallel sum reduction in order to improve efficiency
### Tickets:
- 148585
commit ab613e115267feece039e409dea6fb8e10371746
Author: Maxim Vafin <maxim.vafin@intel.com>
Date: Fri Aug 2 10:55:20 2024 +0200
[PT FE] Move sending telemetry to stage after conversion is done (#25855)
### Details:
- *Previously telemetry was send every time `FrameworkNode` is created.
Now we send it only when `FrameworkNode` exist in the model and only
once per op type*
### Tickets:
- *ticket-id*
commit 7bc7fb0cb64d09fd258724f6e3c5935f162cd129
Author: Georgy Krivoruchko <georgy.krivoruchko@intel.com>
Date: Fri Aug 2 14:24:47 2024 +0400
Changed dependency types-setuptools (#25872)
### Details:
- Solution verification
### Tickets:
- N/A
commit a9c004798b56eb5e74502a44ef320ff5333d23dd
Author: Wilson Seok <wilson.seok@intel.com>
Date: Thu Aug 1 13:39:13 2024 -0700
[GPU] Add crop check in optimize check of buffer fusing (#25850)
### Details:
- Add crop check in optimize check of buffer fusing to pass through
simple dynamic shape crop case
### Tickets:
- Follow up PR25737
commit 7b11ba4c1bec8f7e2063f1c43b0c10a314d724bd
Author: Alexey Smirnov <alexey.smirnov@intel.com>
Date: Thu Aug 1 20:32:47 2024 +0100
[NPUW] Introduce new passes to online partitioning (#25679)
Config (internal/extended):
```
"NPU_COMPILATION_MODE_PARAMS" : "compute-layers-with-higher-precision=Sqrt,Power,ReduceMean,Add_RMSNorm",
"NPU_USE_NPUW" : "YES",
"NPUW_FOLD" : "YES",
"NPUW_DCOFF_TYPE" : "f16",
"NPUW_DCOFF_SCALE" : "YES",
"NPUW_ONLINE_ISOLATE" : "P:DQMatMulGQ/compute,P:DQMatMulCW/compute,P:RMSNorm/compute",
"NPUW_ONLINE_NOFOLD" : "compute"
```
Config (user/basic):
```
"NPU_COMPILATION_MODE_PARAMS" : "compute-layers-with-higher-precision=Sqrt,Power,ReduceMean,Add_RMSNorm",
"NPU_USE_NPUW" : "YES",
"NPUW_FOLD" : "YES",
"NPUW_DCOFF_TYPE" : "f16",
"NPUW_DCOFF_SCALE" : "YES",
"NPUW_ONLINE_PIPELINE" : "COMPUTE"
```
---------
Co-authored-by: Dmitry Matveev <dmitry.matveev@intel.com>
commit 3b4e747c8687d0e11501b63f3e425a335e8c9641
Author: Ilya Lavrenov <ilya.lavrenov@intel.com>
Date: Thu Aug 1 21:23:30 2024 +0400
Allow to override CPACK_ARCHIVE_COMPONENT_INSTALL (#25867)
### Details:
- To override by external cmake options
- Useful for GenAI to create a single archive
commit 605b13fbee58b48cf27cf0e64ac154148dfd8b39
Author: Alicja Miloszewska <alicja.miloszewska@intel.com>
Date: Thu Aug 1 08:12:30 2024 -0700
[PyOV] Add more ov.Model constructors (#25635)
### Details:
- Accept sinks as output ports in addition to generic nodes and op class
instances in `ov.Model` ctors
- Add test
Added support for:
- `Model(results: List[openvino._pyopenvino.op.Result], sinks:
List[ov::Output<ov::Node>], parameters:
List[openvino._pyopenvino.op.Parameter], name: str = '')`
- `Model(results: List[ov::Output<ov::Node>], sinks:
List[ov::Output<ov::Node>], parameters:
List[openvino._pyopenvino.op.Parameter], name: str = '')`
### Tickets:
- *[CVS-131037](https://jira.devtools.intel.com/browse/CVS-131037)*
---------
Co-authored-by: Anastasia Kuporosova <anastasia.kuporosova@intel.com>
commit 754f48a0d96d0451fd7c7cf4a68019dfafd20c5e
Author: Pawel Raasz <pawel.raasz@intel.com>
Date: Thu Aug 1 17:10:38 2024 +0200
[core] Unify axis normalization/validation utils (#25614)
### Details:
- Split function for smaller simper utils, responsible for validation or
normalization or more complex doing both.
- Unify the functions parameters order
- Remove redundant check of rank
- Produce smaller binary size
- Fix Coverity issue `Improper use of negative value`.
### Tickets:
- CVS-136544
commit 2e399de62eed4ab212e36032380e6972921b5cd9
Author: Alexandra Sidorova <alexandra.sidorova@intel.com>
Date: Thu Aug 1 18:26:25 2024 +0400
[Tests] Commented out debug prints in input range generation (#25848)
### Details:
- *Commented out debug prints in input range generation in test
infrastructure to avoid large outputs during test executions:*

### Tickets:
- *N/A*
commit 2f8c265b6cb9b078757b71b0a81d6b95bfd4bcb8
Author: Maciej Smyk <maciejx.smyk@intel.com>
Date: Thu Aug 1 16:14:30 2024 +0200
[DOCS] CODEOWNER update for master (#25863)
JIRA: 148360
Update of documentation paths for codeowner groups.
commit 81e7b21e6bec757398fdb4074e085799ee5c795c
Author: Andrei Kashchikhin <andrey.kashchikhin@intel.com>
Date: Thu Aug 1 15:06:59 2024 +0100
[CI] [GHA] Get VCPKG version from repository (#25862)
### Tickets:
- *132496*
commit 504873014ccc800005504841d9819ccf04abc312
Author: Prakash <qxprakash@gmail.com>
Date: Thu Aug 1 17:57:11 2024 +0530
[OV JS] Add vision-background-removal sample notebook (#25714)
### Details:
- added vision-background-removal notebook
- added comments and formatting
### Things Remaining:
- adding the sample in the readme
- adding the weights download once the unet model ir gets uploaded
@vishniakov-nikolai @almilosz please give feedback
With Regards
Prakash
commit fb4e2d3e832d488f94012cc5e4cde1a6d4c4bf44
Author: Vishniakov Nikolai <nikolai.vishniakov@intel.com>
Date: Thu Aug 1 14:26:31 2024 +0200
[OVJS] Update openvino-node binaries to 2024.3 in master (#25823)
### Details:
- update openvino-node package version to 2024.3.0 in master branch
commit 7e16d63b042371655f75869890a770aa9c01e703
Author: Andrei Kashchikhin <andrey.kashchikhin@intel.com>
Date: Thu Aug 1 12:55:11 2024 +0100
[CI] [GHA] Gather statistics on newly added Ubuntu workflows (#25856)
New workflows were introduced in
https://github.com/openvinotoolkit/openvino/pull/25234 but were not
added to the workflow that gathers statistics.
### Tickets:
- *144917*
commit 18e775ff8d7c56e0ba3bfbdb6c94494eddb2d4ce
Author: Aleksandr Voron <aleksandr.voron@intel.com>
Date: Thu Aug 1 13:35:38 2024 +0200
[CPU][ARM] MLAS transpose executor deprioritised (#25854)
### Details:
- The latest performance reports on Ampere show ACL transpose executor
provides better performance rather than MLAS Transpose executor (details
are in the ticket). Therefore, MLAS Transpose executor priority has been
decreased.
- Redundant check has been deleted in ACL Transpose executor.
### Tickets:
- CVS-148625
commit a0062533f09fc2362004cb7c179ca88d6a4549cd
Author: Ilya Lavrenov <ilya.lavrenov@intel.com>
Date: Thu Aug 1 16:59:24 2024 +0400
Added version for OpenVINO developer package local version (#25859)
### Details:
- To allow to select developer package of specific version
- Required for GenAI build as part of OpenVINO extra modules
commit eda2f7f40598cce2f970ea635454546844a801ba
Author: Zhang Yi <yi3.zhang@intel.com>
Date: Thu Aug 1 19:27:09 2024 +0800
[Core][CPU]markup rope's sin/cos generation with f32 (#25662)
### Details:
- *Sin/Cos table generation must run in f32 otherwise it has accuracy
issue*
- Reference : https://github.com/huggingface/transformers/pull/29285
### Tickets:
- *CVS-146672*
commit 45b4737e706d0b06f5dd5c4e513fc181ddf4c3ba
Author: Karol Blaszczak <karol.blaszczak@intel.com>
Date: Thu Aug 1 13:06:49 2024 +0200
[DOCS] supportedmodels table fix 24.3 (#25860)
port: https://github.com/openvinotoolkit/openvino/pull/25818
commit 546daf2959928457116fcb807337a511da37c8d9
Author: M <mortezaho.1376@gmail.com>
Date: Thu Aug 1 03:26:00 2024 -0700
[GSOC][CPU][ARM] Add NEON implementation for attention softmax (#25616)
### Details:
- This PR aims to add NEON implementation for attention softmax
commit 7617b37f047b29c67e5010bc54b40ed6de858d76
Author: Karol Blaszczak <karol.blaszczak@intel.com>
Date: Thu Aug 1 11:51:53 2024 +0200
[DOCS] add benchmark results for phi (#25838) (#25851)
port: https://github.com/openvinotoolkit/openvino/pull/25838
Co-authored-by: Michael Frank Hansen <michael.f.hansen@intel.com>
commit 508795f44e301d5f848a212dbfc1257d8552a09b
Author: Prakash <qxprakash@gmail.com>
Date: Thu Aug 1 15:03:25 2024 +0530
[OV JS] Add vision-background-removal sample script (#25698)
### Details:
- added script code and added the unet model weights inside the
directory -- ```/openvino/samples/js/node/assets/models```
@vishniakov-nikolai can you please upload it
- focused on the implementaion and formatting
- output images for now will be saved in the same directory , I will
change it later as per your feedback
- @vishniakov-nikolai I am a bit doubtful about my naming convention so
let me know if I need to modify any names
### Things remaining
- [x] Proper comments remaining
- [x] Bit of refactoring
- [x] Readme
Please provide Feedback @vishniakov-nikolai @almilosz
With Regards
Prakash
commit dc3eaf0a2b816fc32a59e79455bce33ec54f535c
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Thu Aug 1 07:26:11 2024 +0000
Bump actions/upload-artifact from 4.3.3 to 4.3.4 (#25846)
Bumps
[actions/upload-artifact](https://github.com/actions/upload-artifact)
from 4.3.3 to 4.3.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's
releases</a>.</em></p>
<blockquote>
<h2>v4.3.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Update <code>@actions/artifact</code> version, bump dependencies by
<a href="https://github.com/robherley"><code>@robherley</code></a> in
<a
href="https://redirect.github.com/actions/upload-artifact/pull/584">actions/upload-artifact#584</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/upload-artifact/compare/v4.3.3...v4.3.4">https://github.com/actions/upload-artifact/compare/v4.3.3...v4.3.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/actions/upload-artifact/commit/0b2256b8c012f0828dc542b3febcab082c67f72b"><code>0b2256b</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/584">#584</a>
from actions/robherley/bump-pkgs</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/488dcefb9bf01619ac19bad29c5c5409a1e4dd4c"><code>488dcef</code></a>
licensed cache</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/04c51f57662651dd3333286989e2db1111c0fd07"><code>04c51f5</code></a>
ncc</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/32a9e276a8f8ac18b4b2dce8213ed340ed4e5ed8"><code>32a9e27</code></a>
bump <code>@actions/artifact</code> and npm audit</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/552bf3722c16e81001aea7db72d8cedf64eb5f68"><code>552bf37</code></a>
new version</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/79616d2ded92999fceefea2ca2e4bdf6101fa919"><code>79616d2</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/565">#565</a>
from actions/eggyhead/use-artifact-v2.1.6</li>
<li>See full diff in <a
href="https://github.com/actions/upload-artifact/compare/v4.3.3...0b2256b8c012f0828dc542b3febcab082c67f72b">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
commit fa949478e149f17cce514ebd0d019e8766ef249d
Author: Karol Blaszczak <karol.blaszczak@intel.com>
Date: Thu Aug 1 09:04:13 2024 +0200
[DOCS] rn fixes and model table (#25835)
commit ba681ed72d7e30b2fe94e1cfc5a950a0bcf9bb54
Author: Wilson Seok <wilson.seok@intel.com>
Date: Wed Jul 31 20:53:18 2024 -0700
[GPU] Rollback whlie-loop structure for 2nd stage of optimize all crops (#25737)
### Details:
- Rollback while-loop structure for 2nd stage of optimize all crops
because it has regression for reshape case which has padding.
### Tickets:
- 146653
commit 8cfd586e6128055b600e1abe9dcce263071dec7d
Author: Eddy Kim <eddy.kim@intel.com>
Date: Thu Aug 1 10:05:32 2024 +0900
[GPU] group_normalization for bfzyx (#25753)
### Details:
- This PR updates the `group_normalization_bfyx` kernel to support bfzyx
format.
- Additionally, this PR fixes the output feature calculation logic of
the group_norm_fsv16 kernel and a model caching related logic for
dynamic model.
### Tickets:
- 147841
commit 13b3e4703e32053797099256849b78ebfef6d49c
Author: Roman Kazantsev <roman.kazantsev@intel.com>
Date: Thu Aug 1 01:44:49 2024 +0400
[TF FE] Stabilize Bitwise layer tests on all platforms and fix u16 bug (#25843)
**Details:** Fix u16 bug "Tensor data with element type u16, is not
representable as pointer to i32"
**Ticket:** 122716
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
commit d2ab797a0fff1f95ec9ea39e444798dbba499cf6
Author: Ilya Lavrenov <ilya.lavrenov@intel.com>
Date: Wed Jul 31 23:22:43 2024 +0400
Fixed compilation with clang and libc++ (#25813)
### Details:
- *item1*
- *...*
### Tickets:
- Closes https://github.com/openvinotoolkit/openvino/issues/25420
commit b26c533421b1ca3f3254df1de14300dbe928405b
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Wed Jul 31 21:01:11 2024 +0200
Update setuptools requirement from <72,>=65.6.1 to >=65.6.1,<73 in /src/bindings/python (#25792)
Updates the requirements on
[setuptools](https://github.com/pypa/setuptools) to permit the latest
version.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pypa/setuptools/blob/main/NEWS.rst">setuptools's
changelog</a>.</em></p>
<blockquote>
<h1>v72.1.0</h1>
<h2>Features</h2>
<ul>
<li>Restore the tests command and deprecate access to the module. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4519">#4519</a>)
(<a
href="https://redirect.github.com/pypa/setuptools/issues/4520">#4520</a>)</li>
</ul>
<h1>v72.0.0</h1>
<h2>Deprecations and Removals</h2>
<ul>
<li>The test command has been removed. Users relying on 'setup.py test'
will need to migrate to another test runner or pin setuptools before
this version. (<a
href="https://redirect.github.com/pypa/setuptools/issues/931">#931</a>)</li>
</ul>
<h1>v71.1.0</h1>
<h2>Features</h2>
<ul>
<li>
<p>Added return types to typed public functions -- by
:user:<code>Avasam</code></p>
<p>Marked <code>pkg_resources</code> as <code>py.typed</code> -- by
:user:<code>Avasam</code> (<a
href="https://redirect.github.com/pypa/setuptools/issues/4409">#4409</a>)</p>
</li>
</ul>
<h2>Misc</h2>
<ul>
<li><a
href="https://redirect.github.com/pypa/setuptools/issues/4492">#4492</a></li>
</ul>
<h1>v71.0.4</h1>
<h2>Bugfixes</h2>
<ul>
<li>Removed lingering unused code around Distribution._patched_dist. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4489">#4489</a>)</li>
</ul>
<h1>v71.0.3</h1>
<h2>Bugfixes</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/pypa/setuptools/commit/441799f8b45a1a01c608db49333403db1b0d7100"><code>441799f</code></a>
Bump version: 72.0.0 → 72.1.0</li>
<li><a
href="https://github.com/pypa/setuptools/commit/59aff448e79415ee3e491a8426553b373d7914e5"><code>59aff44</code></a>
Merge pull request <a
href="https://redirect.github.com/pypa/setuptools/issues/4522">#4522</a>
from pypa/feature/graceful-drop-tests</li>
<li><a
href="https://github.com/pypa/setuptools/commit/c437aaa8d5b969a9fe8c8147463bfcb85b31ab26"><code>c437aaa</code></a>
Restore the tests command and deprecate access to the module.</li>
<li><a
href="https://github.com/pypa/setuptools/commit/a6726b95f7a50dc5945e012050f00450c883fdcd"><code>a6726b9</code></a>
Add celery and requests to the packages that test integration. Ref <a
href="https://redirect.github.com/pypa/setuptools/issues/4520">#4520</a></li>
<li><a
href="https://github.com/pypa/setuptools/commit/5e1b3c414779317bc3e105d9bae82ce70c22dbf9"><code>5e1b3c4</code></a>
Bump version: 71.1.0 → 72.0.0</li>
<li><a
href="https://github.com/pypa/setuptools/commit/4c0b9f3ee6ee47c597572655567f215c08c90137"><code>4c0b9f3</code></a>
Merge pull request <a
href="https://redirect.github.com/pypa/setuptools/issues/4458">#4458</a>
from pypa/debt/remove-test-command</li>
<li><a
href="https://github.com/pypa/setuptools/commit/be8e3a09812f0a3717045098ac6ce7b52fc7d202"><code>be8e3a0</code></a>
Merge pull request <a
href="https://redirect.github.com/pypa/setuptools/issues/4507">#4507</a>
from pypa/docs/4483-install-core-extra</li>
<li><a
href="https://github.com/pypa/setuptools/commit/99d2c722ca5d58ef1360ed86a3252cc16bd84dfd"><code>99d2c72</code></a>
Add documentation clarifying how to reliably install setuptools with its
depe...</li>
<li><a
href="https://github.com/pypa/setuptools/commit/63c89f93d6d43ff96ce5f7f5a862395f924905d0"><code>63c89f9</code></a>
👹 Feed the hobgoblins (delint).</li>
<li><a
href="https://github.com/pypa/setuptools/commit/c405ac1bf29b945db9af7ba9b0dd77e4d871f72a"><code>c405ac1</code></a>
Merge branch 'main' into debt/remove-test-command</li>
<li>Additional commits viewable in <a
href="https://github.com/pypa/setuptools/compare/v65.6.1...v72.1.0">compare
view</a></li>
</ul>
</details>
<br />
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Co-authored-by: Anastasia Kuporosova <anastasia.kuporosova@intel.com>
commit a60140ef5c60f60304ad2a70ebff0f80f97cd51c
Author: Dmitry Matveev <dmitry.matveev@intel.com>
Date: Wed Jul 31 19:16:29 2024 +0100
Add NPUW to labeler (#25839)
### Details:
- Mark changes under "src/plugins/intel_npu/src/plugin/npuw" with NPUW
label
### Tickets:
- n/a
commit 3a9464dc34900b8ee11249f6f56f7a3636a796c8
Author: Vladislav Golubev <vladislav.golubev@intel.com>
Date: Wed Jul 31 20:01:30 2024 +0200
[Snippets] Support Brgemm with transposed_b via BrgemmCopyB (#24932)
### Details:
- *Support FP32/BF16/I8 matmuls with transpose_b=true via BrgemmCopyB*
- *BrgemmCopyB emitter: handle tail iteration by N before the main body*
- *Remove workaround on LDB and N dim rounding in brgemm emitters and
related buffers*
### Tickets:
- *CVS-114487*
## TODO:
- [ ] BufferAllocation test for FP32 brgemm with repacking
- [ ] SetBrgemmCopyBBuffersShape tests
- [ ] MHA with transpose B for low precisions (FP32 already exists)
- [ ] FuseTransposeBrgemm tests
commit f48b30aab7ae2bb05c9f3709f9398eefe17ff66f
Author: Andrei Kashchikhin <andrey.kashchikhin@intel.com>
Date: Wed Jul 31 18:39:31 2024 +0100
[CI] [GHA] Introduce additional Ubuntu versions via separate workflows (#25234)
### Details:
- This is a sister PR to #25202, the idea is the same: test more Linux
flavours. This PR adds Ubuntu 22/24 as separate workflows instead of a
matrix used in #25202.
- The approach with separate workflows seems better as it does not
require unique names for artefacts for matrix jobs and dependent jobs
thus making it easier to write and maintain w/o magic strings.
### Tickets:
- *144917*
commit 161fce5d380e6ab3bdf0dcc6109ea904f11672bd
Author: Zlobin Vladimir <vladimir.zlobin@intel.com>
Date: Wed Jul 31 20:01:50 2024 +0400
Update open-model_zoo submodule (#25826)
commit 25455a0dd97d9c724522dab43f2a019e2a6643d0
Author: Ujjayant Kadian <118752727+ujjayant-kadian@users.noreply.github.com>
Date: Wed Jul 31 16:28:45 2024 +0100
NPUW: Change the sub-byte (i4) element order in the unpack procedure to match OpenVINO 2024.0 (#25827)
### Details:
In the latest versions of OpenVINO the sub-byte order is defined as
[1,0]
meaning that first (MSB) 4 bits of an 8-bit vector form 1st element, and
the last (LSB) 4 bits of an 8-bit vector form 0th element.
Our unpack procedures for i4 were aligned with the older representation,
where sub-byte order was defined as
[0,1]
meaning that first (MSB) 4 bits of an 8-bit vector form 0th element, and
the last (LSB) 4 bits were the 1st element.
**Updated these unpack functions to use this new order.**
### Tickets:
- *121052*
commit 3e058b90a891fee9e707dd9c2859492fa5166f71
Author: Roman Lyamin <Roman.Lyamin@intel.com>
Date: Wed Jul 31 18:45:15 2024 +0400
[GPU] Fix lws calculation for reorder_kernel_bfyx_to_blocked_format kernel (#25830)
### Tickets:
- *[146165](https://jira.devtools.intel.com/browse/CVS-146165)*
commit a5d82f2ebf15bb11b452a4027c6b7ae54ca2951c
Author: Sebastian Golebiewski <sebastianx.golebiewski@intel.com>
Date: Wed Jul 31 15:04:21 2024 +0200
[DOCS] Updating Edit Button for articles for master (#25832)
Porting: https://github.com/openvinotoolkit/openvino/pull/25831
commit 98956aa41354f0402bc7e84ad993efef21cb8cf8
Author: Alexandra Sidorova <alexandra.sidorova@intel.com>
Date: Wed Jul 31 16:54:52 2024 +0400
[CPU][RISCV64] Fixed onednn build for RVV case (#24151)
### Details:
- *Missed include `primitive.hpp` in RVV pooling implementation*
- *oneDNN PR: https://github.com/openvinotoolkit/oneDNN/pull/259*
- *It's not seen in CI since OV is built with default
`-march=rv64imafdc` - without vector intrinsic support. Need to build
with RVV support (`-march=rv64gcv0p7`)*
### Tickets:
- *N/A*
commit 10620e9fd68cbfb2f6ae2a1298e6af8425367bfe
Author: Sun Xiaoxia <xiaoxia.sun@intel.com>
Date: Wed Jul 31 19:54:29 2024 +0800
Fix executor memory leak when "-nstreams 0" (#25778)
### Details:
- *create executor config when streams=0*
### Tickets:
- *146686*
commit cae739b96354aff83945767d2fad094e03ebebce
Author: Edward Shogulin <edward.shogulin@intel.com>
Date: Wed Jul 31 12:28:41 2024 +0100
[LPT] Dequantization precision reusage (#25668)
### Details:
- *NNCF quantized fp16 model on GPU support*
### Tickets:
- *CVS-126300*
commit 3e49c22ff76f55304ea2bb1a832fce8b2a04ea69
Author: Alexandra Sidorova <alexandra.sidorova@intel.com>
Date: Wed Jul 31 15:24:23 2024 +0400
[Snippets] Added auto sorting of LoopPorts (#25623)
### Details:
- *Added support of expression enumeration - new attribute `m_exec_num`
of `Expression`. Calculated as `exec_num_left + (exec_num_right -
exec_num_left) / 2`. Now we can figure out which expression is executed
earlier than another using `m_exec_num O(1)` instead of `find(begin(),
end(), smth) == end() O(n)`*
- *Refactored LoopInfo interface: united all `update` and `replace` into
one `replace_with_new_ports`.*
- *Added auto sorting of ports in LoopInfo: after port replacing, new
expression/node insertion using helpers - loop ports are automatically
reordered by expression execution numbers*
- *Removed previous workarounds with `GetTopologicalOrder` from
tokenization pass*
### Tickets:
- *113536*
- *142990*
- *137819*
commit 89b49c10ca719505712b53cf44370dbdb3782fbc
Author: Karol Blaszczak <karol.blaszczak@intel.com>
Date: Wed Jul 31 13:12:50 2024 +0200
[DOCS] 24.3 archives and final touches (#25829)
port: https://github.com/openvinotoolkit/openvino/pull/25828
commit f0d7cd8c22e2a994a4371cc5e15d6be33c9e6785
Author: Sebastian Golebiewski <sebastianx.golebiewski@intel.com>
Date: Wed Jul 31 13:05:07 2024 +0200
[DOCS] Updating Tool Ecosystem article (#25824)
Adding information on OpenVINO-based AI projects.
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
commit ea6731f8a75b907eea1ee9317c2cd89a2d54e4c4
Merge: 70b8346d72 3c713d4aec
Author: Ujjayant Kadian <118752727+ujjayant-kadian@users.noreply.github.com>
Date: Wed Jul 31 11:56:25 2024 +0100
Merge branch 'master' into uk/changing-sub-byte-i4-element-order
commit 11c01898f507c1abb7d64d70f89ffcc281081373
Author: Roman Kazantsev <roman.kazantsev@intel.com>
Date: Wed Jul 31 14:19:01 2024 +0400
[TF FE] Support TensorListConcatV2 operation for multiple undefined dims in element_shape (#25814)
**Details:** Support TensorListConcatV2 operation for multiple undefined
dims in element_shape
**Ticket:** 105671
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
commit 3c713d4aec23c825baa71fd524f93140bc928ce9
Author: Chen Xu <chen.xu@intel.com>
Date: Wed Jul 31 17:32:10 2024 +0800
[CPU] Avoid rounding to zero for Reduce node in quantized models (#25766)
### Details:
- *If the Reduce node has both input and output precision to be integers
from the original model, then rounding to zero should be done before
converting intermediate floating point value to integer.*
- *However, if such integer precisions are resulted from quantization,
then we should not do such rounding, in order to maintain accuracy.*
- *Add corresponding test cases.*
### Tickets:
- *CVS-147352*
Tagging @ArthurZucker :)
When I was implementing Gemma for Unsloth, I noticed when one uses bfloat16, the RoPE embeddings get autocast to bfloat16, when we require it to be in float32. This causes the positional encodings to lose precision dramatically especially for very large context lengths.
Below I pasted the image on how HF for now handles RoPE. You can see the loss in precision when using bfloat16. I manually autocasted it to float32 in Unsloth, and you can see the expected positional encodings.

I couldn't find why Unsloth's error could not match that of HF's original Gemma implementation. On float16, this issue does not occur, with HF and Unsloth's training loss curve being equivalent:

However when I switched over to bfloat16, HF and Unsloth's training losses diverge at the start, and Unsloth always retains a lower loss as training goes on:

If you look at the losses more carefully (same seed), you can see the differences more closely.

The culprit I found was
where if one uses
torch.autocast(),freqs = (inv_freq_expanded @ position_ids_expanded).transpose(1, 2)gets done in bfloat16 and not float32. I propose we turn off autocast to force float32. Ie:This ensures

torch.autocastto turn off automatic downcasting to float16 / bfloat16 for the RoPE embeddings. My proposed fix shows the following loss curve:Also, in Gemma, a 1 liner was missed :)
logits = logits.float()must be placed to upcast thelogitsto float32. Although it should be done automatically intorch.autocast, it's best to keep the convention as done in llama, mistral and other models. Gemma's implementation seems to maybe have forgotten this 1 line :)