add solar pro support#9541
Conversation
solar pro introduces block skip connections where blocks are connected to other, non-sequential blocks with a scale multiple this change adds 4 new keys to store the skip connections and one new tensor to store the scalar. the scalar is implemented a 1-dimensional tensor with 2 elements dervied from the model's bskcn_tv configuration. in general, the values are (bskcn_tv, 1 - bskcn_tv)
| } | ||
| } | ||
|
|
||
| bool n_bskcn(uint32_t n, uint32_t il = 0) const { |
There was a problem hiding this comment.
The n_ prefix implies that this returns an integer, however it returns a boolean.
|
is this PR active and maintained? |
| def prepare_tensors(self): | ||
| if bskcn_tv := self.find_hparam(['bskcn_tv'], optional=True): | ||
| # use bskcn_tv[1] for inference since bskcn_tv[0] is for training | ||
| self.gguf_writer.add_tensor(self.format_tensor_name(gguf.MODEL_TENSOR.BSKCN_TV), np.array([bskcn_tv[1], 1 - bskcn_tv[1]], dtype=np.float32)) | ||
|
|
||
| super().prepare_tensors() |
There was a problem hiding this comment.
I think this should override generate_extra_tensors instead of prepare_tensors. Otherwise LoRA conversion will not work properly, at least since #9396.
| if (hparams.n_bskcn(2, il)) { | ||
| inpSA = ggml_add( | ||
| ctx0, | ||
| ggml_mul(ctx0, bskcn_1, ggml_view_1d(ctx0, model.layers[il].bskcn_tv, 1, 0)), |
There was a problem hiding this comment.
bskcn_1 is not necessarily initialized here, because a model file could be crafted to make hparams.n_bskcn(2, il) return true while making hparams.n_bskcn(1, il) always return false.
| for i, bskcn in enumerate(self.hparams[k] for k in self.hparams.keys() if k.startswith("bskcn_") and k != 'bskcn_tv'): | ||
| # store the skip connections as a layer index where a non-zero value indicates a skip connection | ||
| # this approach simplifies lookup at inference time | ||
| self.gguf_writer.add_block_skip_connection(i, [1 if n in bskcn else 0 for n in range(self.block_count)]) |
There was a problem hiding this comment.
This assumes bskcn_{n} are in the correct order in config.json. Why not instead iterate them by their names?
|
@mxyng Is this PR still on? |
solar pro introduces block skip connections where blocks are connected to other, non-sequential blocks with a scale multiple
this change adds 4 new keys to store the skip connections and one new tensor to store the scalar. the scalar is implemented as a 1-dimensional tensor with 2 elements derived from the model's bskcn_tv configuration. in general, the values are (bskcn_tv, 1 - bskcn_tv)