[FIX Breaking]: PrependedConversationConfig and Attack Param Consistency #1299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

rlundeen2 wants to merge 6 commits into Azure:main from rlundeen2:users/rlundeen/2025_12_30_prepended_config

Contributor

rlundeen2 commented Jan 2, 2026 •

edited

Loading

PrependedConversationConfig

Added new PrependedConversationConfig dataclass to configure how prepended conversations are handled. Supports configurable behavior for non-chat targets (normalize_first_turn vs raise), and allows specifying which roles should have converters applied. Includes factory methods for common configurations

Consistent next_message Handling

Fixed RedTeamingAttack._generate_next_prompt_async to return Message instead of str, preserving multimodal content (images, audio)
Ensures next_message is sent as the first message to targets across all attacks
Multimodal content is now properly preserved end-to-end
Consistent prepended_conversation Handling (e.g some attacks used to extract next_message if the last prepended message was user, but others did not. Now it is never auto-extracted)

ConversationManager Refactoring

Refactored initialize_context_async to accept context object with cleaner parameter passing
Moved prepended conversation processing logic into the manager
Added add_prepended_conversation_to_memory_async for explicit memory operations
Started handling non PromptChatTargets
The logic for this used to be quite difficult to follow. prepended_conversation was updated multiple places. Now it should all be handled centrally.

Test Coverage

Added test_attack_parameter_consistency.py with cross-attack behavioral tests
Rewrote test_conversation_manager.py to match refactored API
Added comprehensive tests for PrependedConversationConfig settings
Fixed numerous tests affected by API changes

Breaking Changes

Nothing major IMO; I think both of these are really internal.

ConversationManager.initialize_context_async signature changed to accept context parameter
generate_next_prompt_async in RedTeamingAttack now returns Message instead of str

rlundeen2 added 6 commits

December 30, 2025 15:37


          updating message normalizer

cd4b97c


          adding tool calls

ff67632


          updating message normalizer

dbe536c


          merging main

0ce55cc


          adding tests, fixing bugs

0b7c186


          pre-commit

7caa089

rlundeen2 changed the title ~~[FIX Breaking]: Attack Parameter Consistency and ConversationManager Refactoring~~ [FIX Breaking]: PrependedconversationConfig and Attack Param Consistency

romanlutz changed the title ~~[FIX Breaking]: PrependedconversationConfig and Attack Param Consistency~~ [FIX Breaking]: PrependedConversationConfig and Attack Param Consistency

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/component/conversation_manager.py

    
                  """

                  if not prepended_conversation:

                      return 0

                  return sum(1 for msg in prepended_conversation if msg.role == "assistant")

Contributor

romanlutz Jan 2, 2026

if it's prepended, should we could for simulated_assistant instead of assistant?

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/component/conversation_manager.py

Comment on lines +284 to +286

    
                      For PromptChatTarget:

                          - Adds prepended messages to memory with simulated_assistant role

                          - All messages get new UUIDs

Contributor

romanlutz Jan 2, 2026

I am fairly sure the docstring doesn't resolve properly like this... might want to checkt

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/component/conversation_manager.py

Comment on lines +288 to +291

    
                      For non-chat PromptTarget:

                          - If config.non_chat_target_behavior="normalize_first_turn": normalizes

                            conversation to string and prepends to context.next_message

                          - If config.non_chat_target_behavior="raise": raises ValueError

Contributor

romanlutz Jan 2, 2026

same concern as above, but you probably also want to put the code in backquotes

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/component/conversation_manager.py

Comment on lines +313 to 318

    
                      context.memory_labels = combine_dict(existing_dict=memory_labels, new_dict=context.memory_labels)

                      # Initialize conversation state

                      state = ConversationState()

                      logger.debug(f"Preparing conversation with ID: {conversation_id}")

                      prepended_conversation = context.prepended_conversation

                      # Do not proceed if no history is provided

                      if not prepended_conversation:

Contributor

romanlutz Jan 2, 2026

Perhaps not to be addressed here, but something I think about a lot is the following:

Operator creates 10 turns with crescendo. Later on, they branch off manually from that conversation after 5 turns and do something else (could be manual probing or another attack). How does that work in terms of metadata (like the attack identifier, converter identifier, etc.) and memory labels?

Ideally, I would hope the metadata of each piece shows what was applied to it, but would the memory labels be a combination of the old and new ones or only the new ones? From what I'm seeing in this PR I'm assuming it's just the new ones. I think (?) that makes sense because you wouldn't want it to have a label of an old op, for example, if you prepended the conversation from there into a new op.

Contributor

romanlutz Jan 2, 2026

[wasn't necessarily for these lines, but the combine_dict reminded me of this doubt]

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/component/conversation_manager.py

    
                      if config.non_chat_target_behavior == "raise":

                          raise ValueError(

                              "prepended_conversation requires target to be a PromptChatTarget. "

Contributor

romanlutz Jan 2, 2026

Should this say which target?

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/component/conversation_manager.py

    
                          last_piece = valid_messages[-1].get_piece()

                          if last_piece.api_role == "assistant":

                              state.last_assistant_message_scores = list(

                                  self._memory.get_prompt_scores(prompt_ids=[str(last_piece.original_prompt_id)])

Contributor

romanlutz Jan 2, 2026

the message may have multiple pieces with scores, right?

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/core/prepended_conversation_config.py

    
              class PrependedConversationConfig:

                  """

                  Configuration for controlling how prepended conversations are processed before

                  being sent to targets.

Contributor

romanlutz Jan 2, 2026

to all targets? For adversarial_chat it's fine not to convert, right?

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/core/prepended_conversation_config.py

Comment on lines +38 to +41

    
                  # - "normalize_first_turn": Normalize the prepended conversation into a string and

                  #   store it in ConversationState.normalized_prepended_context. This context will be

                  #   prepended to the first message sent to the target. Uses objective_target_context_normalizer

                  #   if provided, otherwise falls back to ConversationContextNormalizer.

Contributor

romanlutz Jan 2, 2026

Is that for the many-shot jailbreak case with a PromptTarget?

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/core/prepended_conversation_config.py

    
                      return self.message_normalizer or ConversationContextNormalizer()

                  @classmethod

                  def default(cls) -> "PrependedConversationConfig":

Contributor

romanlutz Jan 2, 2026

if you import annotations from future you should be able to remove the double quotes.

Contributor

romanlutz Jan 2, 2026

same below

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/core/prepended_conversation_config.py

    
              @dataclass

              class PrependedConversationConfig:

Contributor

romanlutz Jan 2, 2026

all of this is rather complicated. Don't we need documentation?

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/multi_turn/crescendo.py

    
                          prepended_conversation=context.prepended_conversation,

                          request_converters=self._request_converters,

                          response_converters=self._response_converters,

                          prepended_conversation_config=self._prepended_conversation_config,

Contributor

romanlutz Jan 2, 2026

prepended_conversation used to come from the context, not it's an arg here. Is that a conscious choice? From a quick glance, it sounds like a fit in the context?

romanlutz reviewed

View reviewed changes

pyrit/executor/attack/multi_turn/crescendo.py

Comment on lines +701 to +703

    
                              - refused_text: The text that was refused (from context.next_message if

                                there's a refusal), empty string if no refusal

                              - objective_score: The objective score if found, None otherwise

Contributor

romanlutz Jan 2, 2026

again, pretty sure the website won't render with this. Please check

romanlutz reviewed

View reviewed changes

tests/unit/executor/attack/test_attack_parameter_consistency.py

    
                      )

                      # The first message sent should contain the next_message content with image preserved

                      first_call = mock_normalizer.send_prompt_async.call_args_list[0]

Contributor

romanlutz Jan 2, 2026

should we also assert that it's going to the objective target?

romanlutz reviewed

View reviewed changes

tests/unit/executor/attack/test_attack_parameter_consistency.py

    
                          objective_target=mock_chat_target,

                          attack_adversarial_config=adversarial_config,

                          attack_scoring_config=scoring_config,

                          max_turns=5,

Contributor

romanlutz Jan 2, 2026

do we want to make any assertions about the other turns?

romanlutz reviewed

View reviewed changes

tests/unit/executor/attack/test_attack_parameter_consistency.py

    
                      conversation = list(memory.get_conversation(conversation_id=conversation_id))

                      # Should have prepended messages in memory

                      assert len(conversation) >= 2, f"Expected at least 2 prepended messages, got {len(conversation)}"

Contributor

romanlutz Jan 2, 2026

at least? We know exactly how long it is (2) so why at least?

romanlutz approved these changes

View reviewed changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet