Skip to content

[PROPOSAL]: Refactor inference engine by selecting backend during init of modules #5773

@char-1ee

Description

@char-1ee

Proposal

This proposal requests a refactor on current inference engine InferenceEngine design.

Related PR:

Code touched:

I’ve noticed a limitation in our current inference engine related to how parameters from external sources (e.g., inference engine config) are integrated. As it stands, external parameters can only be passed during the model sharding process via the from_native_module interface. However, from_native_module is primarily designed for replacing model layers, which will violate Single-Responsibility-Principle.

This approach restricts the flexibility of introducing or adjusting modeling parameters post-initialization, as any additional parameters must be passed as **kwargs via from_native_method. This setup is not ideal for several reasons, particularly when dealing with predefined configurations that should be initialized early in the model setup (e.g., use_cuda_kernel, use_spec_dec, use_flash_attn etc.). These options configures the InferenceEngine in selecting specific generation strategies and ops, which are currently determined during the inference modeling forward pass.

Above, here proposes 2 possible solutions:

  1. Global Context Object: Introduce a global context object that mimics the lifecycle of the inference engine. This object will allow for the retrieval of member properties at any point during the inference process, thus providing a centralized and consistent configuration management.
  2. InferenceShardformer Wrapper: Implement a wrapper around the existing shardformer, named InferenceShardformer. This class will provide a new interface for parameter passing and will be capable of maintaining various inference states, thereby ensuring greater scalability and flexibility.

There will be a coming PR soon, after discussing with the maintainers.

Self-service

  • I'd be willing to do some initial work on this proposal myself.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions