Skip to content

QNN-AOT Qwen3-4B not working #659

@LXY1226

Description

@LXY1226

作者您好,我当前的设备是SM8850(SD Gen 5 Elite)16G内存,已经成功运行了Qwen3-1.7B在设备上,而且输出正常,大约210tps(prefill)和36tps(decode)。但使用仓库内的配置运行4B的时候加载仍然正常、但输出不正常:
export DEVICE_WORKDIR=/data/local/tmp/mllm-qwen3-4b-sm8850;cd ${DEVICE_WORKDIR} && export LD_LIBRARY_PATH=${DEVICE_WORKDIR} && export ADSP_LIBRARY_PATH="${DEVICE_WORKDIR};/vendor/lib/rfsa/adsp;/system/lib/rfsa/adsp;/dsp" && ./mllm-qwen3-aot-runner -m qwen3-4B-lpbq-sha.bin -t qwen3-tokenizer.json -c config_4B.json --ar_len 32 <<< hi

[QNN_DEBUG] QNNBackend::QNNPerf::create
[QNN_DEBUG] QNNBackend::QNNPerf::setPowerConfigBurst
[QNN_DEBUG] QNNBackend::QNNPerf::setRpcLatencyAndPolling
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:100 QNN Perf created successfully
[QNN_DEBUG] QNNBackend::QNNBackend end
[INFO] /repo/mllm/mllm/backends/qnn/Register.cpp:24 QNN context path exists: qwen3-4B-lpbq-sha.bin
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:497 Context binary file opened successfully: qwen3-4B-lpbq-sha.bin
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:500 Context binary file size: 2828 MB
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:518 System context created successfully
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:530 Context binary info retrieved successfully
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:545 System context freed successfully
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:557 Context created from binary successfully
[INFO] /repo/mllm/mllm/backends/qnn/QNNModel.cpp:140 QNNModel::loadGraphTensorInfo() loaded 75 input tensors and 73 output tensors for graph: model.0.s32
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:587 Successfully created QNNModel for graph: model.0.s32
[INFO] /repo/mllm/mllm/backends/qnn/QNNModel.cpp:140 QNNModel::loadGraphTensorInfo() loaded 75 input tensors and 73 output tensors for graph: model.0.s1
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:587 Successfully created QNNModel for graph: model.0.s1
[INFO] /repo/mllm/mllm/backends/qnn/QNNBackend.cpp:601 QNN context retrieved from qnn_context.bin with 2 QNNModels(QnnGraphs)
[INFO] /repo/mllm/mllm/backends/qnn/Register.cpp:28 QNN context loaded successfully from qwen3-4B-lpbq-sha.bin
[INFO] /repo/mllm/mllm/backends/qnn/Register.cpp:45 QNN memory manager registered
[INFO] /repo/mllm/mllm/backends/base/PluginSystem.cpp:89 Register customized op: DequantizeAdd:4097 -> QNN
💬 Prompt text (or 'exit/quit'): [INFO] /repo/mllm/mllm/backends/qnn/aot_rt/PromptProcessor.cpp:129 num_tokens: 13







,\
,",착













愈发,\







晦,\
,



,\
,\
^CError: Received signal2 - SIGINT (Interrupt from keyboard)
Shutting down...

因为readme里也没有4B可成功运行的信息,我想知道目前QNN-AOT方案对于4B是可用的吗,该如何使用。

非常感谢

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions