Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
019f2e2
Migrate storage containers to noserde::Buffer
josephbirkner Feb 18, 2026
fecb1ba
Point noserde CPM dependency to josephbirkner fork
josephbirkner Feb 18, 2026
8989316
Remove stale StringRange bitsery serializer
josephbirkner Feb 18, 2026
d667eba
Use vector<uint8_t> instead of stringstream.
josephbirkner Feb 19, 2026
5ed51b5
Enable fast serialization for ArrayArena.
josephbirkner Feb 19, 2026
3922313
Introduce compactHeads_ for arrays.
josephbirkner Feb 20, 2026
ae9f4ea
model: add ModelColumn and tagged type validation
josephbirkner Feb 24, 2026
d337c2e
model: Finish code orga for ModelColumn infrastructure.
josephbirkner Feb 24, 2026
d2f3936
test: migrate complex serialization reads to vector input
josephbirkner Feb 24, 2026
1337f58
Remove struct layout validator.
josephbirkner Mar 4, 2026
3b9c1b0
Simplify ModelColumn serialization wire format
josephbirkner Mar 4, 2026
9e3157f
Move singleton array storage to dedicated feature branch
josephbirkner Mar 4, 2026
d5313cc
Add fixedSize array flag.
josephbirkner Mar 4, 2026
c324b29
Add split TwoPart storage for object fields and array arenas
josephbirkner Mar 5, 2026
e9c17b1
Merge remote-tracking branch 'origin/v0.6.3' into sync/noserde
josephbirkner Mar 9, 2026
e4b4ed2
Merge remote-tracking branch 'origin/noserde' into sync/split
josephbirkner Mar 9, 2026
9a3911b
model: address split storage review comments
josephbirkner Mar 9, 2026
f7cdab7
expr: Add Unique Identifier to Expressions
johannes-wolf Mar 10, 2026
75c11e6
diagnostics: Rework Diagnostics
johannes-wolf Mar 10, 2026
a52ceee
expr: Make eval Const Again
johannes-wolf Mar 10, 2026
c6e04e3
diagnostics: Cursor Fixes
johannes-wolf Mar 11, 2026
33c42ac
diagnostics: Remove Environment & AST Dependencies
johannes-wolf Mar 13, 2026
c80b031
Merge pull request #137 from Klebert-Engineering/feature/split-field-…
josephbirkner Mar 16, 2026
1223db0
Merge pull request #140 from Klebert-Engineering/rework-diagnostics-f…
josephbirkner Mar 16, 2026
202d867
model: document split storage and address review issues
josephbirkner Mar 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ add_library(simfil ${LIBRARY_TYPE}
src/value.cpp
src/overlay.cpp
src/exception-handler.cpp
src/expression-visitor.cpp
src/model/model.cpp
src/model/nodes.cpp
src/model/string-pool.cpp)
Expand All @@ -94,8 +95,10 @@ target_sources(simfil PUBLIC
include/simfil/transient.h
include/simfil/simfil.h
include/simfil/exception-handler.h
include/simfil/expression-visitor.h

include/simfil/model/arena.h
include/simfil/model/column.h
include/simfil/model/string-pool.h
include/simfil/model/model.h
include/simfil/model/nodes.h
Expand Down
39 changes: 32 additions & 7 deletions docs/simfil-dev-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,24 @@ Objects and arrays do not embed child nodes directly. Instead, they maintain `Mo

`StringPool` maintains the mapping between strings and the `StringId` integers stored in object fields. The base `Model` interface exposes `lookupStringId` so that serialization code such as `ModelNode::toJson` can recover human-readable field names. `ModelPool::setStrings` allows a pool to adopt a different `StringPool`, populating any missing field names along the way. This operation is used by higher-level components that need to merge data from several pools into a unified string namespace.

### ModelColumn

The primitive storage building block below `ModelPool` and `ArrayArena` is `ModelColumn<T, RecordsPerPage, StoragePolicy>`. A model column stores a single fixed-width record stream and exposes bulk byte operations for serialization and deserialization. The generic implementation accepts three families of types:

- fixed-width scalar types (`bool`, fixed-width integers, fixed-width enums, `float`, `double`)
- explicitly tagged external record types via `MODEL_COLUMN_TYPE(expected_size)`
- other approved native POD records that are trivially copyable and standard-layout

The column implementation assumes little-endian hosts and treats the in-memory representation as the wire representation. `bytes()` returns the canonical payload bytes for the current record stream; `assign_bytes()` and `read_payload_from_bitsery()` perform the inverse operation. For vector-backed columns this is one contiguous bulk copy; for segmented storage the same payload is copied chunk-by-chunk while preserving the same wire layout.

`RecordsPerPage` defines the number of records stored per page, not the page size in bytes. The effective page size is `RecordsPerPage * sizeof(T)`, and segmented storage requires that value to be a multiple of the record size. This keeps page boundaries aligned with record boundaries and lets callers reason about capacity in record counts instead of byte counts.

### Split pair columns with `TwoPart`

`TwoPart<A, B>` is a logical pair type used when a compound record should behave like `{A, B}` in C++ but should not pay struct-padding costs on the wire. `ModelColumn<TwoPart<A, B>>` specializes the generic column by storing the `first()` and `second()` members in two synchronized child columns. Reads and writes still happen through a pair-like ref proxy, but serialization concatenates the dense payload of the first column and the dense payload of the second column.

The main current use is object member storage. `detail::ObjectField` is defined as `TwoPart<StringId, ModelNodeAddress>`, so object fields still behave like `(name, value)` pairs while the wire payload remains dense and deterministic regardless of host padding rules.

### Value representation

`Value` is the runtime carrier for scalar and structured results:
Expand Down Expand Up @@ -127,25 +145,32 @@ classDiagram

`BaseArray<ModelType, ModelNodeType>` provides the generic implementation of array behaviour for model pools. It owns a pointer to an `ArrayArena<ModelNodeAddress, …>` and an `ArrayIndex` into that arena. The base class implements `type()` (always `Array`), `at()`, `size()`, and `iterate()` in terms of the arena. `Array` itself is a thin wrapper over `BaseArray<ModelPool, ModelNode>` that adds convenience overloads for appending scalars, which internally delegate to `ModelPool::newSmallValue` or `ModelPool::newValue` and then record the resulting address in the arena.

`BaseObject<ModelType, ModelNodeType>` plays the same role for object nodes. It stores key–value pairs as `{StringId, ModelNodeAddress}` elements inside an `ArrayArena`. The base class implements `type()` (always `Object`), `get(StringId)`, `keyAt()`, `at()` (interpreting the array as an ordered sequence of fields), and `iterate()`. The concrete `Object` subclass adds convenience `addField` overloads for common scalar types and an `extend` method that copies all fields from another `Object`.
`BaseObject<ModelType, ModelNodeType>` plays the same role for object nodes. It stores key–value pairs as `detail::ObjectField` elements inside an `ArrayArena`; that type is currently `TwoPart<StringId, ModelNodeAddress>`, so names and child addresses are physically stored in split columns while the API still behaves like a logical pair sequence. The base class implements `type()` (always `Object`), `get(StringId)`, `keyAt()`, `at()` (interpreting the array as an ordered sequence of fields), and `iterate()`. The concrete `Object` subclass adds convenience `addField` overloads for common scalar types and an `extend` method that copies all fields from another `Object`.

`ProceduralObject` extends `Object` with a bounded number of synthetic fields. These fields are represented as `std::function<ModelNode::Ptr(LambdaThisType const&)>` callbacks in a `small_vector`. Accessors such as `get`, `at`, `keyAt`, and `iterate` first consult the procedural fields and then fall back to the underlying `Object` storage. This pattern makes it possible to expose computed members alongside stored ones without materialising them permanently in the arena.

`OverlayNode` is an orthogonal mechanism that wraps an arbitrary underlying node and maintains a separate map `<StringId, Value>` of overlay children. Calls to `get` and `iterate` first visit the injected children and then delegate to the wrapped node. The overlay itself derives from `MandatoryDerivedModelNodeBase` and uses an `OverlayNodeStorage` `Model` implementation to resolve access.

### Array arena details

The `ArrayArena` template implements the append-only sequences used by arrays and objects. Conceptually, it manages a collection of logical arrays, each of which may consist of one or more “chunks” backed by a single `segmented_vector<ElementType, PageSize>`. A logical array is identified by an `ArrayIndex`. For each index, the arena stores a head `Chunk` in `heads_` and, if the array grows beyond the head’s capacity, additional continuation chunks in `continuations_`.
The `ArrayArena` template implements the append-only sequences used by arrays and objects. Conceptually, it manages a collection of logical arrays, each of which may use one of two physical representations:

- a regular growable chunk chain backed by `heads_`, `continuations_`, and `data_`
- a singleton handle backed by `singletonValues_` and `singletonOccupied_`

Regular arrays behave like the historical arena implementation. Each logical array is identified by an `ArrayIndex` and starts with a head `Chunk` in `heads_`. If the array grows beyond the head’s capacity, the arena allocates continuation chunks in `continuations_`. Each chunk records an `offset` into `data_`, a `capacity`, and a `size`. For a head chunk, `size` also tracks the total logical length of the array; for continuation chunks, `size` is local to that chunk. The `next` and `last` indices form a singly-linked list from the head to the tail chunk.

`new_array(initialCapacity, fixedSize)` controls which representation is chosen. If `fixedSize` is `false`, even `initialCapacity == 1` creates a regular growable array. If `fixedSize` is `true` and `initialCapacity == 1`, the arena instead returns a singleton handle. That handle represents a 0-or-1 element logical array with no head chunk allocation. This is useful for storage patterns where one-element arrays are common and known not to grow later.

Each `Chunk` records an `offset` into the `data_` vector, a `capacity`, and a `size`. For a head chunk, `size` also tracks the total logical length of the array; for continuation chunks, `size` expresses the number of valid elements in that chunk only. The `next` and `last` indices form a singly-linked list from the head to the tail chunk. `new_array(initialCapacity)` reserves a contiguous region in `data_`, initialises the head chunk with the offset and capacity, and returns a fresh `ArrayIndex`.
When a caller appends an element to a regular array via `push_back` or `emplace_back`, the arena calls `ensure_capacity_and_get_last_chunk_unlocked`. This function locates the current tail chunk (either the head or a continuation). If the tail still has spare capacity, it is returned directly; otherwise, the function allocates a new continuation chunk with capacity doubled relative to the previous tail, extends `data_`, links the new chunk into `continuations_`, and updates the head’s `last` pointer. Singleton handles do not use this growth path; they allow at most one element and reject further appends.

When a caller appends an element via `push_back` or `emplace_back`, the arena calls `ensure_capacity_and_get_last_chunk`. This function locates the current tail chunk (either the head or a continuation). If the tail still has spare capacity, it is returned directly; otherwise, the function allocates a new continuation chunk with capacity doubled relative to the previous tail, extends `data_` accordingly, links the new chunk into `continuations_`, and updates the head’s `last` pointer. This growth strategy guarantees amortised constant time for appends while avoiding large reallocations.
Element access via `at(ArrayIndex, i)` dispatches by representation. Singleton handles resolve directly against `singletonValues_`. Compact arenas resolve against the compact head metadata. Regular arrays walk the chunk list, subtracting full chunk capacities from the requested index until the index falls within the current chunk’s capacity and size. This keeps the public API uniform while allowing denser storage for the common singleton case.

Element access via `at(ArrayIndex, i)` walks the chunk list for the target array. It subtracts full chunk capacities from the requested index until the index falls within the current chunk’s capacity and size, and then returns a reference to `data_[offset + localIndex]`. This guarantees O(number_of_chunks) access in the worst case, but in practice the number of chunks per array remains small because capacities grow geometrically.
The arena also supports a compact serialization mode. In that mode, `compactHeads_` stores only `{offset, size}` metadata for each regular array, while `data_` already contains a dense payload without chunk gaps. Runtime head chunks are materialized lazily from `compactHeads_` when a later mutation requires growable chunk state again. This allows serialized arenas to stay compact without forcing the mutable runtime representation onto the wire.

The arena also provides higher-level iteration facilities. The `begin(array)`/`end(array)` pair yields an iterator over the elements of a specific logical array. The `iterate(ArrayIndex, lambda)` helper executes a callback on every element and supports two signatures: a unary callback receiving a reference to the element, and a binary callback receiving both the element and its global index. This is used by `BaseArray::iterate` to implement `ModelNode::iterate` efficiently without allocating intermediate containers.
The higher-level iteration facilities follow the same dispatch rules. `begin(array)`/`end(array)` iterate one logical array, while the top-level arena iterator skips the sentinel head entry and also yields singleton handles. `iterate(ArrayIndex, lambda)` supports unary callbacks receiving a value and binary callbacks receiving both a value and its logical index. This is used by `BaseArray::iterate` and `BaseObject::iterate` to expose child traversal without materializing temporary containers.

Thread-safety is conditional. If `ARRAY_ARENA_THREAD_SAFE` is defined, the arena uses a shared mutex to protect growth and element access. Appends and `new_array` take an exclusive lock only when allocating new chunks; reads can proceed with shared locks. Simfil itself does not require the arena to be thread-safe as long as model construction happens before concurrent evaluation, but the hooks are present for embedders that need concurrent writers.
Thread-safety is conditional. If `ARRAY_ARENA_THREAD_SAFE` is defined, the arena uses a shared mutex to protect growth and element access. Reads use shared locks, while mutations and compact-to-runtime materialization take an exclusive lock. Simfil itself does not require the arena to be thread-safe as long as model construction happens before concurrent evaluation, but the hooks are present for embedders that need concurrent writers.

## Parser, tokens, and AST

Expand Down
115 changes: 100 additions & 15 deletions include/simfil/diagnostics.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,48 @@

#pragma once

#include "simfil/sourcelocation.h"
#include "simfil/value.h"
#include "simfil/token.h"
#include "simfil/error.h"
#include "simfil/expression.h"

#include <limits>
#include <tl/expected.hpp>
#include <optional>
#include <vector>
#include <string>
#include <memory>
#include <cstdlib>

namespace simfil
{

class AST;
class Expr;
struct Environment;
struct ModelNode;

/** Query Diagnostics. */
struct Diagnostics
class Diagnostics
{
static constexpr std::uint32_t InvalidIndex = std::numeric_limits<std::uint32_t>::max();
public:
using ExprId = std::uint32_t;
struct FieldExprData
{
SourceLocation location;
std::uint32_t hits = 0;
std::uint32_t evaluations = 0;
std::string name;
};


struct ComparisonExprData
{
SourceLocation location;
TypeFlags leftTypes;
TypeFlags rightTypes;
std::uint32_t evaluations = 0u;
std::uint32_t falseResults = 0u;
std::uint32_t trueResults = 0u;
};

struct Message
{
Expand All @@ -42,6 +61,12 @@ struct Diagnostics
Diagnostics(Diagnostics&&) noexcept;
~Diagnostics();

/**
* Get diagnostics data for a single Expr.
*/
template <class DiagnosticsDataType>
auto get(const Expr& expr) -> DiagnosticsDataType&;

/**
* Append/merge another diagnostics object into this one.
*/
Expand All @@ -53,22 +78,82 @@ struct Diagnostics
auto write(std::ostream& stream) const -> tl::expected<void, Error>;
auto read(std::istream& stream) -> tl::expected<void, Error>;

struct Data;
private:
friend auto eval(Environment&, const AST&, const ModelNode&, Diagnostics*) -> tl::expected<std::vector<Value>, Error>;
friend auto diagnostics(Environment& env, const AST& ast, const Diagnostics& diag) -> tl::expected<std::vector<Message>, Error>;

std::unique_ptr<Data> data;

/**
* Collect diagnostics data from an AST.
* Build the exprIndex_ map for the AST.
*/
auto collect(Expr& ast) -> void;
auto prepareIndices(const Expr& ast) -> void;

/** ExprId to diagnostics data index mapping. */
std::vector<std::uint32_t> exprIndex_;

/** FieldExpr diagnostics data. */
std::vector<FieldExprData> fieldData_;

/** ComparisonExpr diagnostics data. */
std::vector<ComparisonExprData> comparisonData_;

private:
friend auto diagnostics(const Diagnostics& diag) -> tl::expected<std::vector<Message>, Error>;

/**
* Build messages from this objecst diagnostics data.
*/
auto buildMessages(Environment& env, const AST& ast) const -> std::vector<Message>;
auto buildMessages() const -> std::vector<Message>;

mutable std::mutex mtx_;
};

namespace detail
{

template <class T>
struct DiagnosticsStorage;

template <>
struct DiagnosticsStorage<Diagnostics::FieldExprData>
{
static auto get(Diagnostics& diag)
{
return &diag.fieldData_;
}
};

template <>
struct DiagnosticsStorage<Diagnostics::ComparisonExprData>
{
static auto get(Diagnostics& diag)
{
return &diag.comparisonData_;
}
};

}

/**
* Get typed diagnostics data for a single Expr.
*/
template <class DiagnosticsDataType>
auto Diagnostics::get(const Expr& expr) -> DiagnosticsDataType&
{
auto* data = detail::DiagnosticsStorage<DiagnosticsDataType>::get(*this);

const auto id = expr.id();
if (exprIndex_.size() <= id) [[unlikely]] {
exprIndex_.resize(id + 1u, Diagnostics::InvalidIndex);
exprIndex_[id] = data->size();
}

auto index = exprIndex_[id];
if (index == Diagnostics::InvalidIndex) {
exprIndex_[id] = data->size();
index = exprIndex_[id];
}

if (data->size() <= index) {
data->resize(index + 1u);
}

return (*data)[index];
}

}
5 changes: 4 additions & 1 deletion include/simfil/environment.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ namespace simfil

class Expr;
class Function;
class Diagnostics;
struct ResultFn;
struct Debug;

Expand Down Expand Up @@ -138,6 +139,7 @@ struct Environment
struct Context
{
Environment* const env;
Diagnostics* const diag;

/* Current phase under which the evaluation
* takes place. */
Expand All @@ -151,7 +153,8 @@ struct Context
/* Timeout after which the evaluation should be canceled. */
std::optional<std::chrono::time_point<std::chrono::steady_clock>> timeout;

Context(Environment* env, Phase = Phase::Evaluation);
Context() = delete;
Context(Environment* env, Diagnostics* diag, Phase = Phase::Evaluation);

auto canceled() const -> bool
{
Expand Down
77 changes: 77 additions & 0 deletions include/simfil/expression-visitor.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
// Copyright (c) Navigation Data Standard e.V. - See "LICENSE" file.

#pragma once

#include <cstdlib>

namespace simfil
{

class Expr;
class WildcardExpr;
class AnyChildExpr;
class MultiConstExpr;
class ConstExpr;
class SubscriptExpr;
class SubExpr;
class AnyExpr;
class EachExpr;
class CallExpression;
class UnpackExpr;
class UnaryWordOpExpr;
class BinaryWordOpExpr;
class FieldExpr;
class PathExpr;
class AndExpr;
class OrExpr;
struct OperatorEq;
struct OperatorNeq;
struct OperatorLt;
struct OperatorLtEq;
struct OperatorGt;
struct OperatorGtEq;
template <class> class UnaryExpr;
template <class> class BinaryExpr;

/**
* Visitor base for visiting expressions recursively.
*/
class ExprVisitor
{
public:
ExprVisitor();
virtual ~ExprVisitor();

virtual void visit(const Expr& expr);
virtual void visit(const WildcardExpr& expr);
virtual void visit(const AnyChildExpr& expr);
virtual void visit(const MultiConstExpr& expr);
virtual void visit(const ConstExpr& expr);
virtual void visit(const SubscriptExpr& expr);
virtual void visit(const SubExpr& expr);
virtual void visit(const AnyExpr& expr);
virtual void visit(const EachExpr& expr);
virtual void visit(const CallExpression& expr);
virtual void visit(const PathExpr& expr);
virtual void visit(const FieldExpr& expr);
virtual void visit(const UnpackExpr& expr);
virtual void visit(const UnaryWordOpExpr& expr);
virtual void visit(const BinaryWordOpExpr& expr);
virtual void visit(const AndExpr& expr);
virtual void visit(const OrExpr& expr);
virtual void visit(const BinaryExpr<OperatorEq>& expr);
virtual void visit(const BinaryExpr<OperatorNeq>& expr);
virtual void visit(const BinaryExpr<OperatorLt>& expr);
virtual void visit(const BinaryExpr<OperatorLtEq>& expr);
virtual void visit(const BinaryExpr<OperatorGt>& expr);
virtual void visit(const BinaryExpr<OperatorGtEq>& expr);

protected:
/* Returns the index of the current expression */
std::size_t index() const;

private:
std::size_t index_ = 0;
};

}
Loading
Loading