Overview

IVR Technical Training Suite is a Unity 6 framework for creating step-by-step XR technical training simulations. Designers author training procedures as visual workflow graphs in the Unity Editor using a custom node editor built on Unity's Graph Toolkit. Each node represents a training step with an associated XR interaction (grab, socket placement, trigger enter, filtered interactor-to-interactable pairing, and others) that the trainee must physically complete before the workflow advances. Context nodes extend this to multi-object steps, allowing a single graph node to gate on several interactables simultaneously with configurable aggregation (all, any, or N-of-M). At runtime, WorkflowGraphEngine executes the graph asynchronously, polling XR state trackers each frame and signaling downstream engines for UI updates, hint delivery, and session logging. The suite ships as a reusable Unity UPM package and serves as the core training engine in multiple deployed VR simulations.

Role Summary

Sole architect and developer of the entire framework across both major versions
Designed the node lifecycle contract (WorkflowNodeRuntimeBase) and the async execution pipeline in WorkflowGraphEngine
Built all custom node editor integrations on top of Unity Graph Toolkit's experimental API, including WorkflowGraphEditor, WorkflowGraphImporter, and all node editor/blueprint classes
Designed and implemented the WorkflowObjectRegistry + WorkflowObjectMarker scene binding system
Implemented the ServiceLocator with global, scene-level, and hierarchy-level resolution scopes
Built all XR-specific node types and state trackers (interactable, interactor, filtered interaction, collider trigger, multi-state)
Packaged and configured the framework as a distributable UPM package for use across multiple Unity projects
Designed and built the generic MultiStateTracker<TTarget, TState> aggregation framework and the ColliderTriggerStateTracker that extends it, including the PhysicsTriggerRelay bridge component
Implemented Context Nodes using Graph Toolkit's ContextNode and BlockNode APIs for multi-interactable workflow steps, with corresponding editor and runtime conversion pipelines
Built the InputSystemNodeRuntime with an arm/fire state machine pattern for non-XR input conditions

Non-Technical Summary

IVR Technical Training Suite is the engine behind VR training programs where a technician learns to assemble a drone step by step inside a virtual environment. Each step says "pick up this part" or "place it in this slot," and the system will not advance until the trainee actually performs that action in VR.

Instead of hard-coding each training scenario, the suite provides a visual tool that lets anyone design step-by-step procedures like connecting puzzle pieces in a flowchart. Each piece represents a physical action the trainee must complete. Once the workflow is designed, it runs on any VR headset, tracks what the trainee does, shows helpful hints when they get stuck, and logs everything they completed and when. Some training steps require multiple actions at once. For example, placing three separate parts before the workflow moves on, and the system handles this natively, without the designer needing to chain individual steps together.

The system is packaged as a reusable module, meaning new training simulations can be built on top of it without rebuilding the core engine. It has been used to power multiple training programs and is designed to scale to any technical domain that benefits from hands-on VR instruction.

Highlights

Developed and implemented a graph-based extended reality (XR) training workflow engine within Unity 6, facilitating instructional designers in authoring multi-step virtual reality training procedures through visual node graphs, thereby eliminating the need for additional coding.
Engineered an asynchronous node lifecycle architecture (PreInitializeAsync → InitializeAsync → OnEnterAsync → OnExitAsync) utilizing C#'s async/await capabilities and per-frame coroutine polling. This design enables XR interaction requirements to regulate the workflow progression until the physical completion criteria are satisfied.
Implemented a WorkflowObjectRegistry with string-key scene object resolution, decoupling workflow graph assets from specific scene setups, and enabling the same authored workflow to run across multiple training environments
Built a modular multi-engine subscriber architecture (WorkflowEngineWatchDog, WorkflowHintEngine, WorkflowUIEngine) that independently handles session action logging, contextual hint delivery, and UI Toolkit step display by subscribing to engine lifecycle events
Architected a dual-layer Blueprint pattern separating editor-time node definition (INodeAuthoringBlueprint, IConvertibleToRuntimeNode) from runtime execution (INodeRuntimeBlueprint), enabling new interaction node types to be added without modifying the core engine
Built a full hierarchical ServiceLocator dependency injection system with global, scene-level, and per-object resolution and automatic domain reset, replacing FindObjectOfType calls across the codebase
Designed a generic MultiStateTracker<TTarget, TState> framework with configurable aggregation modes (Any, All, N-of-M) and dynamic target binding, enabling multi-object interaction requirements such as "place all three parts before advancing" to be expressed declaratively rather than with per-step custom code.
Implemented Context Nodes on top of Unity Graph Toolkit's ContextNode and BlockNode APIs, allowing designers to author multi-interactable workflow steps, including filtered interactor-to-interactable pairs and combined collider trigger conditions, directly in the visual graph editor without writing C#.

Quick Highlights

Visual workflow graph authoring in the Unity Editor; no code required for content designers
Async node execution with per-frame XR interaction polling gates workflow steps until physical conditions are met
WorkflowObjectRegistry decouples graph assets from scene layouts: author once, deploy across multiple scenes
Modular subscriber engines handle UI, hints, and session logging independently via event subscription
Ships as a distributable Unity UPM package with separate runtime and editor assembly definitions
Used as the training engine in at least two deployed VR simulations
Context nodes enable multi-object interaction steps, gating on several interactables with Any/All/N-of-M aggregation in one graph node
Generic MultiStateTracker base enables new multi-target tracker types with minimal boilerplate
Input System node type supports non-XR input conditions (button presses, action triggers) within the same workflow graph

Technical Breakdown

WorkflowGraphEngine and Async Execution

WorkflowGraphEngine is a MonoBehaviour that orchestrates the full training session lifecycle. On Start(), it iterates the node list in WorkflowGraphRuntime twice: first calling PreInitializeAsync() on every node to resolve all scene object references upfront and detect configuration errors before execution begins, then iterating sequentially, calling ExecuteNodeAsync() on each node in order. This sequential async pattern means the engine awaits each node to completion before advancing, which is the mechanism by which physical interaction requirements gate workflow progression.

WorkflowNodeRuntimeBase defines the async lifecycle contract: PreInitializeAsync (scene object resolution), InitializeAsync (per-node configuration, e.g. Rigidbody kinematic state), OnEnterAsync (main blocking interaction wait), and OnExitAsync (teardown, always executed via finally). Every concrete node delegates its behavior to a runtime Blueprint, keeping node classes thin.

Blueprint Pattern

Editor blueprints (implementing INodeAuthoringBlueprint / IConvertibleToRuntimeNode) define the visual graph node's ports and options in the Unity Graph Toolkit editor and serialize themselves into a runtime-compatible representation. Runtime blueprints (implementing INodeRuntimeBlueprint) carry the serialized data and execute the actual async interaction logic. This split keeps the editor-time and runtime assemblies separate, and new interaction types can be added by implementing both interfaces without touching the engine.

Concrete runtime blueprints include XRInteractableRuntimeBlueprint (hover/select/activate requirement polling on XRBaseInteractable), TriggerColliderRuntimeBlueprint (physics collider enter/exit gates with AggregationMode and LayerMask filtering), XRFilteredInteractionRuntimeBlueprint (filtered interactor-to-interactable relationship checks with IXRHoverFilter/IXRSelectFilter injection and arm/complete lifecycle).

WorkflowObjectRegistry and Scene Binding

WorkflowObjectRegistry is a ScriptableObject that bridges the gap between graph assets and scene objects. At startup, WorkflowGraphEngine.Awake() calls Registry.Refresh(), which uses FindObjectsByType to build a Dictionary cache. Graph nodes reference objects by string keys (typed via XRInteractableWorkflowDefinition, XRInteractorWorkflowDefinition, or ColliderWorkflowDefinition wrapper types), and Registry.GetMarker(key) resolves them to scene components. This fully decouples workflow graph assets from specific scene hierarchies.

Multi-Engine Subscriber Architecture

Three satellite engines subscribe to WorkflowGraphEngine's C# events (OnNodeStarted, OnNodeCompleted, OnWorkflowStart, OnWorkflowCompleted), each handling a single responsibility:

WorkflowEngineWatchDog: Appends ActionLog entries (timestamp + node title) to a serialized list for session auditability
WorkflowHintEngine: Receives step start/complete events and controls hint visibility based on an allowHints flag
WorkflowUIEngine: Queries the current node's title, description, and hint fields and updates named UIDocument labels via UI Toolkit, with a StepHintButton that reveals hints on demand

WorkflowGraphEngine uses [RequireComponent(typeof(WorkflowEngineWatchDog), typeof(WorkflowHintEngine))] to guarantee these engines are always co-located.

XR State Tracker System

All XR interaction nodes use a per-frame polling approach rather than event-driven completion. Each tracked scene object carries a MonoBehaviour state tracker that listens to XRITK events and accumulates state flags. Runtime blueprints call IsHoverMet(requirement), IsSelectMet(requirement), and IsActivateMet(requirement) each frame inside a Unity Coroutine yielded from OnEnterAsync, completing a TaskCompletionSource when all requirements are satisfied.

Single-target trackers extend BaseInteractionStateTracker<T>, which auto-registers callbacks on OnEnable and unregisters on OnDisable. XRInteractableStateTracker tracks hover/select/activate on XRBaseInteractable; XRInteractorStateTracker mirrors this for XRBaseInteractor; XRFilteredInteractionStateTracker implements both IXRHoverFilter and IXRSelectFilter to constrain an interactor's attention to a specific interactable, enabling precise "place this exact object in this exact socket" step conditions.

Multi-target tracking is handled by MultiStateTracker<TTarget, TState>, a generic abstract base that manages a Dictionary<TTarget, TState> of per-target state and an Action-based unsubscribe registry. BindTargets(IEnumerable<TTarget>) dynamically wires new targets; Evaluate(Func<TState, bool>, AggregationMode, nRequired) aggregates per-target predicates with Any, All, or N-of-M semantics. ColliderTriggerStateTracker extends this base, bridging Unity physics callbacks through a sealed PhysicsTriggerRelay MonoBehaviour that converts OnTriggerEnter/Exit into C# events, with per-collider Entered/Exited state and LayerMask + allowed-object filtering.

Context Nodes and Multi-Interactable Steps

Context nodes extend the workflow graph to multi-object interaction requirements. On the editor side, WorkflowContextNodeEditorBase extends Graph Toolkit's ContextNode and implements IConvertibleToRuntimeNode, providing shared port definitions (execution, title, description, hint) and the CreateIO() factory for reading port/option values during conversion. Concrete context editors compose INodeAuthoringBlueprint instances. For example, XRMultiInteractionContextNodeEditor holds an XRInteractableEditorBlueprint and delegates DefineOptions/DefinePorts to it. XRMultiInteractionColliderContextNodeEditor inherits from this and adds a ColliderTriggerBlueprint for combined interaction + collider conditions. XRMultiFilteredInteractionContextNodeEditor uses Graph Toolkit's BlockNode API (via XRFilteredBlockNodeEditor) to let designers add multiple interactor-interactable pairs inside a single context node.

At runtime, XRMultiInteractionContextNodeRuntime resolves a List<XRInteractableWorkflowDefinition> to their state trackers in InitializeAsync and polls all of them in a single PerformInteractionCheck coroutine until every tracker's requirements are met. XRMultiInteractionColliderContextNodeRuntime extends this by additionally resolving a ColliderTriggerStateTracker and adding its condition to the poll loop.

Input System Integration

InputSystemNodeRuntime enables non-XR input conditions within the workflow graph. It resolves an InputActionReference to an InputAction at initialization, then uses an arm/fire state machine in OnEnterAsync: OnActionCanceled sets _armed = true (meaning the action was released), and OnActionStarted completes the TaskCompletionSource only when _armed is true, preventing a held button from immediately satisfying the step. OnExitAsync unsubscribes all callbacks.

ServiceLocator

A full dependency injection system with three resolution scopes: Global (singleton DontDestroyOnLoad), Scene-level (one per Unity Scene), and per-hierarchy (climbs transform.parent). ServiceLocator.For(mb) automatically selects the most specific scope. Static fields are reset on [RuntimeInitializeOnLoadMethod(RuntimeInitializeLoadType.SubsystemRegistration)] to guarantee a clean state on domain reload.

Systems Used

Workflow Graph Engine: Sequential async execution engine that processes WorkflowGraphRuntime nodes one-by-one, raising OnNodeStarted, OnNodeCompleted, OnWorkflowStart, and OnWorkflowCompleted events for downstream subscribers
Workflow Object Registry: ScriptableObject that maps string keys to scene WorkflowObjectMarker components at runtime, decoupling graph asset authoring from scene configuration
Node Runtime Blueprint System: Dual-layer pattern separating editor-time node authoring (INodeAuthoringBlueprint) from runtime execution (INodeRuntimeBlueprint), enabling modular addition of new interaction types without modifying the core engine
Multi-Engine Subscriber Architecture: Loosely coupled satellite engines (WorkflowEngineWatchDog, WorkflowHintEngine, WorkflowUIEngine) that subscribe to WorkflowGraphEngine events to independently handle logging, hint delivery, and UI updates
Service Locator: Hierarchical dependency injection system supporting global, scene-level, and per-object service resolution with automatic static reset on Unity subsystem registration
XR State Tracker System: Per-component MonoBehaviour trackers that monitor and expose XR and physics interaction state for per-frame polling by runtime nodes
Multi-State Tracker Framework: Generic MultiStateTracker<TTarget, TState> base with per-target state dictionaries, dynamic target binding, and configurable aggregation (Any/All/N-of-M) for multi-object interaction requirements
Context Node System: Editor-time ContextNode and BlockNode integration enabling multi-interactable workflow steps with composed INodeAuthoringBlueprint instances and runtime conversion to multi-tracker polling nodes
Input System Node: InputSystemNodeRuntime with arm/fire state machine pattern for integrating Unity Input System actions into the workflow execution pipeline

Impact & Results

Deployed as the core training engine in at least two VR simulations (T-Rex QA Training, drone assembly workflow)
Reduced new training scenario creation to a graph-authoring task; no additional C# required for new workflows
Async lifecycle with upfront PreInitialize pass eliminates mid-session null-reference failures by validating all scene bindings before execution begins
Registry-based scene binding enables the same workflow asset to run across multiple scene configurations without modification
Modular subscriber architecture allows disabling or extending UI, hint, and logging behavior independently without touching the execution engine
Context nodes with configurable aggregation reduced multi-object training steps from multiple chained nodes to a single authored graph node, simplifying workflow design for complex assembly sequences

Deep Dive

Why Graph Toolkit Over a Custom Graph Editor

The v1.0 Handler architecture was straightforward to build but required code changes for every new training scenario. Workflow authoring was developer-only. The switch to Unity's Graph Toolkit provided a visual authoring surface in the Unity Editor that content designers can use without writing code. Because Graph Toolkit was experimental (0.2.0-exp.1), substantial custom infrastructure was built on top of it, including the WorkflowGraphImporter for asset processing, all node editor and blueprint classes, and the runtime conversion pipeline from editor nodes to serialized WorkflowNodeRuntimeBase instances.

Async Execution and the Coroutine Bridge

Unity's MonoBehaviour lifecycle does not natively integrate with C# Task-based async, and XRITK interaction events fire on the main thread. The node execution model uses a TaskCompletionSource with a Unity Coroutine bridge inside OnEnterAsync: the coroutine polls state tracker flags each frame (yield return null) and completes the TCS when all conditions are met. This lets the engine await a Task while the actual wait logic runs inside Unity's coroutine scheduler, with no threading issues, no callbacks, and clean linear code flow in the engine.

Filtered Interactions and the IXRHoverFilter/IXRSelectFilter Approach

For training steps that require placing a specific object in a specific socket, a simple "was this socket entered?" check is insufficient because the interactor may also be hovering other objects. XRFilteredInteractionStateTracker implements both IXRHoverFilter and IXRSelectFilter, injecting itself into the XRITK filter stack so only interactions involving the designated interactable are reported. This makes it possible to author steps like "pick up part A and place it in socket B" with precise, unambiguous completion conditions.

Package Structure and Multi-Project Reuse

The framework is structured as a Unity UPM package with separate runtime and editor assemblies. This allows other Unity projects to consume it via the Package Manager without importing the development project's demo scenes or Synty assets. The Trex-QA-Training-VR project demonstrates this pattern: it carries its own project-specific assembly, using the framework as a dependency while building scenario-specific logic on top.

Why a Generic MultiStateTracker

Single-target trackers (BaseInteractionStateTracker<T>) work well for one-object steps, but training scenarios frequently require "place all N parts" or "trigger any of these colliders." Rather than duplicating polling logic per tracker type, MultiStateTracker<TTarget, TState> factors out target registration, per-target state storage, and aggregation evaluation into a single generic base. Concrete implementations override three template methods (NewState(), ResetState(TState), and Attach(TTarget, TState)) and get AggregationMode support for free. ColliderTriggerStateTracker demonstrates the pattern: it overrides Attach to subscribe via PhysicsTriggerRelay (a sealed MonoBehaviour that converts OnTriggerEnter/Exit into C# events, avoiding the need for the tracker itself to be on the collider's GameObject) and adds per-collider LayerMask and allowed-object filtering on top of the base Evaluate.

Context Nodes and the BlockNode Pattern

Standard workflow nodes map one-to-one with a single interaction requirement. Context nodes break this constraint by leveraging Graph Toolkit's ContextNode API, which supports embedded BlockNodes, sub-nodes that live inside a parent context node in the visual editor. XRMultiFilteredInteractionContextNodeEditor uses this to let designers add multiple interactor-interactable filter pairs inside a single node. At conversion time, the context editor iterates its block nodes, reads each one's port values via ImportService, and populates a list on the runtime node. This avoids the combinatorial explosion of creating separate graph nodes for every pairing in a multi-part assembly step.

Known Limitations and Future Direction

The current sequential execution model executes nodes linearly, with no branching and no parallel paths. Adding branching would require promoting WorkflowGraphRuntime from a flat list to a proper adjacency structure and updating WorkflowGraphEngine to traverse it graph-first rather than list-first. The Graph Toolkit graph model already supports this topology; the execution engine is the bottleneck. This is the primary planned architectural expansion for v3.0.

vv2 — v2.0 — Graph Toolkit Workflow Engine

vv2 Highlights

Developed and implemented a graph-based extended reality (XR) training workflow engine within Unity 6, facilitating instructional designers in authoring multi-step virtual reality training procedures through visual node graphs, thereby eliminating the need for additional coding.
Engineered an asynchronous node lifecycle architecture (PreInitializeAsync → InitializeAsync → OnEnterAsync → OnExitAsync) utilizing C#'s async/await capabilities and per-frame coroutine polling. This design enables XR interaction requirements to regulate the workflow progression until the physical completion criteria are satisfied.
Implemented a WorkflowObjectRegistry with string-key scene object resolution, decoupling workflow graph assets from specific scene setups, and enabling the same authored workflow to run across multiple training environments
Built a modular multi-engine subscriber architecture (WorkflowEngineWatchDog, WorkflowHintEngine, WorkflowUIEngine) that independently handles session action logging, contextual hint delivery, and UI Toolkit step display by subscribing to engine lifecycle events
Architected a dual-layer Blueprint pattern separating editor-time node definition (INodeAuthoringBlueprint, IConvertibleToRuntimeNode) from runtime execution (INodeRuntimeBlueprint), enabling new interaction node types to be added without modifying the core engine
Built a full hierarchical ServiceLocator dependency injection system with global, scene-level, and per-object resolution and automatic domain reset, replacing FindObjectOfType calls across the codebase
Designed a generic MultiStateTracker<TTarget, TState> framework with configurable aggregation modes (Any, All, N-of-M) and dynamic target binding, enabling multi-object interaction requirements such as "place all three parts before advancing" to be expressed declaratively rather than with per-step custom code.
Implemented Context Nodes on top of Unity Graph Toolkit's ContextNode and BlockNode APIs, allowing designers to author multi-interactable workflow steps, including filtered interactor-to-interactable pairs and combined collider trigger conditions, directly in the visual graph editor without writing C#.

vv2 Overview

vv2 Technical Breakdown

WorkflowGraphEngine and Async Execution

Blueprint Pattern

WorkflowObjectRegistry and Scene Binding

Multi-Engine Subscriber Architecture

Three satellite engines subscribe to WorkflowGraphEngine's C# events (OnNodeStarted, OnNodeCompleted, OnWorkflowStart, OnWorkflowCompleted), each handling a single responsibility:

WorkflowEngineWatchDog: Appends ActionLog entries (timestamp + node title) to a serialized list for session auditability
WorkflowHintEngine: Receives step start/complete events and controls hint visibility based on an allowHints flag
WorkflowUIEngine: Queries the current node's title, description, and hint fields and updates named UIDocument labels via UI Toolkit, with a StepHintButton that reveals hints on demand

WorkflowGraphEngine uses [RequireComponent(typeof(WorkflowEngineWatchDog), typeof(WorkflowHintEngine))] to guarantee these engines are always co-located.

XR State Tracker System

Context Nodes and Multi-Interactable Steps

Input System Integration

ServiceLocator

vv1 — v1.0 — Handler/Mediator Pattern

vv1 Highlights

Built the initial VR training framework architecture in Unity 6, implementing a coordinated MonoBehaviour handler system (BaseHandler, ProcessHandler, StepHandler, AffordanceHandler, SettingsHandler, UIHandler) for managing step-sequenced XR training scenarios
Designed an XR Mediator abstraction layer (XRBaseMediator, XRInteractableMediator) to bridge Unity's XR Interaction Toolkit event model to discrete training step advancement logic, decoupling interaction detection from workflow control flow
Established the project's Unity 6 + OpenXR + XRITK 3.x foundation, configuring XR Management, OpenXR feature sets for Meta Quest, and URP for the training suite's multi-project runtime environment

vv1 Overview

The v1.0 implementation established the foundational project structure and a MonoBehaviour-based workflow execution system. A central ProcessHandler coordinated a set of specialized handlers (StepHandler, AffordanceHandler, SettingsHandler, UIHandler) discovered via FindObjectOfType. Training procedures were expressed as StepData and MediatorData ScriptableObjects, and an XRBaseMediator / XRInteractableMediator pair translated Unity XRITK interaction events into step completion signals. While functional for simple linear workflows, the approach required code changes for every new training procedure and offered no visual authoring surface for content designers.

vv1 Technical Breakdown

Handler Architecture

BaseHandler provided the shared MonoBehaviour scaffold, holding protected references to AffordanceHandler, ProcessHandler, SettingsHandler, and UIHandler, all resolved via FindObjectOfType in Awake. All domain handlers inherited from BaseHandler and could override OnEnable, OnDisable, Awake, Start, and Update. StepHandler maintained the current step index and advanced the procedure when a step's completion condition was met. ProcessHandler orchestrated the sequence, delegating to StepHandler for step advancement and to UIHandler for display updates.

XR Mediator System

XRBaseMediator listened to XRITK interaction events (hover, select, activate) and raised internal mediator events when conditions were satisfied. XRInteractableMediator subclassed it to target specific XRBaseInteractable components. The StepHandler subscribed to mediator events to determine when to advance. This was the precursor to the v2.0 state tracker + polling pattern: event-driven rather than polled, but lacking the configurability and composability of the later blueprint approach.

Limitations That Drove v2.0

Every new training scenario required writing new handler subclasses or modifying existing ones. The FindObjectOfType service discovery pattern was fragile in scenes with multiple training modules. There was no visual representation of the workflow for non-developer stakeholders. These limitations directly motivated the architectural overhaul introduced in v2.0.