Skip to content

Flash mla interface#44054

Draft
ArthurZucker wants to merge 23 commits intomainfrom
flash-mla-interface
Draft

Flash mla interface#44054
ArthurZucker wants to merge 23 commits intomainfrom
flash-mla-interface

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker commented Feb 16, 2026

What does this PR do?

Add flash MLA interface.

  • It does not work I get a segfault
  • we don't leverage the paged cache so it's not as efficient as that I reckon.
Fetching 6 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 5966.29it/s]
Download complete: : 0.00B [00:00, ?B/s]              Fatal Python error: Segmentation fault                                                                                                                    | 0/6 [00:00<?, ?it/s]

Thread 0x00007f580d3e4700 (most recent call first):
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 359 in wait
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 655 in wait
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_timeout.py", line 43 in _on_run
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 1134 in __call__

Thread 0x00007f5a5c7fe700 (most recent call first):
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 359 in wait
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 655 in wait
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/tqdm/_monitor.py", line 60 in run
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 1134 in __call__

Thread 0x00007f5f197e1700 (most recent call first):
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 359 in wait
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 655 in wait
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2257 in _do_wait_suspend
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2188 in do_wait_suspend
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/tqdm/std.py", line 765 in get_lock
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/tqdm/_monitor.py", line 66 in run
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 1134 in __call__

Thread 0x00007f63b544f700 (most recent call first):
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 359 in wait
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 655 in wait
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 325 in _on_run
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap

Thread 0x00007f63b5c54700 (most recent call first):
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 359 in wait
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 655 in wait
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 279 in _on_run
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap

Thread 0x00007f63b6459700 (most recent call first):
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 227 in _read_line
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 245 in _on_run
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap

Thread 0x00007f63b6c5e700 (most recent call first):
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 359 in wait
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/queue.py", line 180 in get
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 390 in _on_run
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap

Current thread 0x00007f63b7dd7b80 (most recent call first):
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1209 in __call__
  File "/fsx/arthur/hub/models--kernels-community--flash-mla/snapshots/c0b7b5c72eb5e67f1e519198c866e0d8a3511290/build/torch210-cxx11-cu128-x86_64-linux/flash_mla_interface.py", line 208 in flash_mla_sparse_fwd
  File "/fsx/arthur/hub/models--kernels-community--flash-mla/snapshots/c0b7b5c72eb5e67f1e519198c866e0d8a3511290/build/torch210-cxx11-cu128-x86_64-linux/__init__.py", line 59 in flash_mla_sparse_fwd
  File "<string>", line 1 in <module>
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_vars.py", line 268 in eval_in_context
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_vars.py", line 580 in evaluate_expression
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_vars.py", line 305 in _run_with_interrupt_thread
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_vars.py", line 334 in _run_with_unblock_threads
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_vars.py", line 369 in new_func
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 1214 in internal_evaluate_expression_json
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 565 in do_it
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2248 in _do_wait_suspend
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2188 in do_wait_suspend
  File "/fsx/arthur/transformers/src/transformers/integrations/flash_mla.py", line 225 in _flash_mla_sparse_forward
  File "/fsx/arthur/transformers/src/transformers/integrations/flash_mla.py", line 112 in flash_mla_attention_forward
  File "/fsx/arthur/transformers/src/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 428 in forward
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/accelerate/hooks.py", line 175 in new_forward
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
  File "/fsx/arthur/transformers/src/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 605 in forward
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/accelerate/hooks.py", line 175 in new_forward
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
  File "/fsx/arthur/transformers/src/transformers/modeling_layers.py", line 93 in __call__
  File "/fsx/arthur/transformers/src/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 791 in forward
  File "/fsx/arthur/transformers/src/transformers/utils/output_capturing.py", line 253 in wrapper
  File "/fsx/arthur/transformers/src/transformers/utils/generic.py", line 915 in wrapper
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
  File "/fsx/arthur/transformers/src/transformers/models/glm_moe_dsa/modeling_glm_moe_dsa.py", line 856 in forward
  File "/fsx/arthur/transformers/src/transformers/utils/generic.py", line 841 in wrapper
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/accelerate/hooks.py", line 175 in new_forward
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
  File "/fsx/arthur/transformers/src/transformers/generation/utils.py", line 3863 in _prefill
  File "/fsx/arthur/transformers/src/transformers/generation/utils.py", line 2869 in _sample
  File "/fsx/arthur/transformers/src/transformers/generation/utils.py", line 2674 in generate
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124 in decorate_context
  File "/fsx/arthur/scripts/glm_dsa.py", line 30 in main
  File "/fsx/arthur/.venv/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 362 in wrapper
  File "/fsx/arthur/scripts/glm_dsa.py", line 39 in <module>
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118 in _run_code
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127 in _run_module_code
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310 in run_path
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 358 in run_file
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 508 in main
  File "/admin/home/arthur/.vscode-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71 in <module>
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/runpy.py", line 88 in _run_code
  File "/admin/home/arthur/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/runpy.py", line 198 in _run_module_as_main

Extension modules: _pydevd_bundle.pydevd_cython, _pydevd_sys_monitoring_cython, _pydevd_sys_monitoring._pydevd_sys_monitoring_cython, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, regex._regex, yaml._yaml, markupsafe._speedups, PIL._imaging, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, sentencepiece._sentencepiece, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, cython.cimports.libc.math, scipy._lib._ccallback_c, charset_normalizer.md, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, _cffi_backend, cuda_utils, __triton_launcher (total: 183)

JaredforReal and others added 13 commits February 11, 2026 20:34
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Copy link
Copy Markdown

@LalithaMV LalithaMV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some conflicts in the branch as well as failing checks, please take a look and resolve them.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm_moe_dsa

@stevhliu
Copy link
Copy Markdown
Member

is the flash-mla interface pretty specific to glm-moe-dsa or can other moe's with mla-style attention also use it? it would be worth adding to the table in the docs here if other models can use it as well :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants