Skip to content

Implement support for xsl:iterate#122

Merged
faassen merged 2 commits intoPaligo:mainfrom
bojidar-bg:xsl-iterate
Sep 18, 2025
Merged

Implement support for xsl:iterate#122
faassen merged 2 commits intoPaligo:mainfrom
bojidar-bg:xsl-iterate

Conversation

@bojidar-bg
Copy link
Contributor

@bojidar-bg bojidar-bg commented Sep 11, 2025

So... xsl:iterate is a bit of a weird one.
Within XSL's functional language, it fits as a tail-recursive function which produces the rest of the elements at that place in the template.
Here's a possible example:

<xsl:iterate select="/foo">
  <xsl:param name="n" select="0"/>
  <xsl:value-of select="$n"/>
  <xsl:choose> <!-- tail-position choose -->
    <xsl:when test="n = 4">
      <xsl:break/> <!-- tail early return -->
    </xsl:when>
    <xsl:otherwise>
      Text
      <xsl:next-iteration> <!-- tail recursive call -->
        <xsl:with-param name="n" select="$n + 1"/>
      </xsl:next-iteration>
    </xsl:otherwise>
  </xsl:choose>
</xsl:iterate>

However, translating that to Xee's functional IR turns out to be tricky:

  • Xee's IR encodes appending elements using its special comma operator. As such, neither the xsl:next-iteration nor the xsl:choose are really in "tail position" (just like 1 + x() is not a tail-call); instead we have something like:
ir::Binary {
  op: Comma,
  left: <..everything before xsl:choose..>,
  right: ir::If {
    <...>
    else_: ir::Binary {
      op: Comma,
      left: <..everything before xsl:next-iteration..>,
      right: <tail-position xsl:next-iteration>
    }
  }
}
  • We could presumably place the next-iteration in tail position near the root of the IR tree, outside of any binary operators (yet still inside ir::If-s and ir::Let-s); but that's gnarly, since we would have to turn any comma operators containing xsl:choose-s and similar conditionals along the way "inside-out" to get them out of the tail position.
  • We could, presumably, make it so that some form of tail call optimization in the IR-to-bytecode compiler understands comma operators, so that we can still preserve the idea that xsl:iterate is a tail-recursive function. That feels even less tractable than the previous point...

So, instead of all of that tail-recursive nastiness ( 😇 ), this PR ends up implementing xsl:next-iteration and xsl:break as sort-of functional effects scoped by the xsl:iterate loop: they imperatively set values on the xsl:iterate loop that control the next iteration of the loop, however those imperative side-effects cannot be observed by the subsequent code in the current iteration of the loop. Within the IR, IterateBreak and IterateLetNext act as sort-of decorators: they cause a change to the execution of the subsequent iteration, but otherwise pass expression results unchanged.

Hope that description helps explain my line of thought in implementing what I did 😊

Comment on lines +184 to +191
pub struct Iterate {
pub context_names: ContextNames,
pub loop_name: Name,
pub var_atom: AtomS,
pub params: Vec<IterateParam>,
pub expr: Box<ExprS>,
pub on_complete: Option<Box<ExprS>>,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if my IR changes are good here. This ir::Iterate definition feels way too specific in the way it includes iterate control flow (loop_name), iterate parameters (params), and the on_complete statement, but given the bytecode I need to generate in compile_iterate, I couldn't see a way to weave those without making them a part of ir::Iterate

Comment on lines +732 to +735
self.compile_sequence_loop_end(span);
// pop sequence length name & index;
self.scopes.pop_name();
self.scopes.pop_name();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for compile_sequence_loop_end to only pop the values added by compile_sequence_loop_init and not the names?
(I guess compile_quantified is special?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall a good reason so if that refactoring is possible I'm happy to see it applied!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, I looked at it, and it would require an extra VarSet instruction in compile_quantified. Might get around to it at some point, especially if something brings me back to the loop code 😅

}
// Finally, emit a value as the result of IterateLetNext (typ. an empty sequence)
// self.builder.emit_constant(sequence::Sequence::default(), span);
self.compile_expr(&iterate_let_next.return_expr)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As evidenced by dead code: Not sure if I want the next-iteration in the IR have a return_expr expression or not.
Rationale for including it: it makes the IR more flexible and consistent with ir::IterateBreak
Rationale against: it's never going to be set to anything but an empty sequence by XSLT 3.1 (next-iteration cannot specify extra children; those always go before the next-iteration element (But xsl:break also could have any extra children go before it, except the standard does allow it to specify extra children to add, huh!)).

type_: param.as_.clone(),
})
})
.collect::<error::SpannedResult<Vec<_>>>()?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simpler Rust pattern to use around here?

return_expr: Box::new(return_expr.expr()),
});
let result = return_expr.bind_expr_no_span(&mut self.variables, let_next);
Ok(result)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(result can be inlined, will clean up later)

}
// Then, store them back into the variables (in reverse, so they match up)
for param in iterate_let_next.params.iter().rev() {
self.compile_variable_set(&param.name, span)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Currently, parameters set by IterateLetNext do update variables before the iteration is over; I could fix that by adding more variables (basically have two slots on the stack for each variable all the time), but it would result in less optimal bytecode, especially when next-iteration doesn't set all parameters.)

@faassen
Copy link
Collaborator

faassen commented Sep 11, 2025

Nice! Hey, just a heads up; I'm on vacation for the next few days so I'll likely be able to give feedback next week.

@faassen
Copy link
Collaborator

faassen commented Sep 11, 2025

the tail call optimization in the IR-to-bytecode compiler

I didn't know I created such an optimization!

@bojidar-bg
Copy link
Contributor Author

Er... I meant "the tail call optimization just described in the previous point", but alas, my brain moved on before my fingers could finish typing that 😂

Take your time! I'll probably move on to other things with xsl:param/xls:with-param-s, those seem like low hanging fruits that would make a lot of tests pass moving forward. (:

@faassen
Copy link
Collaborator

faassen commented Sep 18, 2025

It's interesting that this only makes 4 iterate tests pass; what's holding up the passing of the other tests? Are those missing iterate features or other XSLT features that are blocking them?

Copy link
Collaborator

@faassen faassen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all this hard work! You figured out a lot of details and it's great to see you managed to extend the IR & IR -> bytecode compiler in particular.

I feel bad for taking a while to review. It's also difficult to build enough context to do a proper review.

There are two things helping here that make me approve it:

  • this mostly deals with XSLT

  • a few more things are added to the IR; they might be overly specific but I'm good with that for now.

  • the interpreter is virtually untouched besides errors.

So given all this, I'm going to err on the side of accepting PRs on this topic, because I don't want to block development and I'd only get in the way. When the interpreter is touched or there is a significant IR compilation change that's where I want to ensure XPath doesn't have a regression, but we have a lot of tests to cover us there.

Comment on lines +732 to +735
self.compile_sequence_loop_end(span);
// pop sequence length name & index;
self.scopes.pop_name();
self.scopes.pop_name();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall a good reason so if that refactoring is possible I'm happy to see it applied!

@faassen faassen merged commit 0667627 into Paligo:main Sep 18, 2025
1 check passed
@github-actions github-actions bot mentioned this pull request Aug 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants