Skip to content

Gesture_Interface

Kinect-powered gesture recognition display for interactive content browsing

ROLE: Lead XR Developer
PERIOD: Jul 2022 — Present
PLATFORMS: Windows
STATUS: ACTIVE
DOMAIN: Unity
C#UnityKinect SDKUniversal Render PipelineUI ToolkitCinemachineTimelineInput System
Gesture Interface cover

Overview

Gesture Interface v3.0 is a comprehensive architectural overhaul that introduced event-driven state management, dependency injection, and a sequential startup pipeline. All mode handlers were refactored to use interface-driven composition through a ModeOrchestrator, and the entire codebase was reorganized under a unified namespace. This version represents the maturation from a functional application into a well-architected, maintainable, and extensible system.

Non-Technical Summary

Under the hood, this version was about making the system professional-grade. Previously, the different parts of the application talked to each other directly: the gesture detector would call the display handler, which would call the content loader. If any piece changed, everything connected to it had to change too.

In v3.0, the entire system was redesigned so that components communicate through events and interfaces rather than direct connections. Think of it like upgrading from a telephone tree (where each person calls the next) to a bulletin board system (where anyone can post updates and anyone can read them). This makes it much easier to add new features, fix bugs, or swap out components without breaking everything else.

Highlights

  • Architected an event-driven state machine (AppFlowBehaviour) with queued mode transitions, transition scoping, and SafeEvent invocation to eliminate race conditions and exception cascades across 3 runtime modes
  • Designed and implemented a dependency injection framework using [AutoRegister] and [Inject] attributes for scene-scoped service resolution
  • Built a sequential startup pipeline with 7 ordered initialization steps (service validation, environment checks, config loading, server health, content processing, and Kinect initialization) with graceful fallbacks at each stage
  • Refactored all mode handlers to interface-driven composition with ModeOrchestrator coordinating IAppCommands and IAppActivity for fully decoupled mode lifecycle management

Quick Highlights

  • Event-driven state machine with SafeEvent preventing exception cascades
  • Dependency injection via [AutoRegister] and [Inject] attributes
  • Sequential 7-step startup pipeline with graceful fallbacks
  • Interface-driven mode orchestration with IAppCommands/IAppActivity
  • Thread-safe async/await with UnityMainThread synchronization

Technical Breakdown

Event-Driven State Machine: AppFlowBehaviour implements IAppFlow and manages runtime mode transitions through an event-driven architecture. Mode change requests are queued during active transitions, preventing race conditions. TransitionScope tracks ownership of active transitions, and when a handler begins a transition, it acquires a scope that must be disposed before new transitions can proceed. SafeEvent wraps event invocation to catch and log exceptions from individual subscribers without breaking the event chain, preventing a single faulty handler from crashing the entire application. Events include ModeChanged, TransitionChanged, and InteractionOccurred.

Dependency Injection: The ServiceRegistry provides scene-scoped dependency injection. Services are registered using [AutoRegister] attributes on MonoBehaviours, which are discovered at scene load. Consumer scripts use [Inject] attributes on fields to receive resolved services. The registry validates that all required services are registered during the startup pipeline, failing fast with clear error messages if dependencies are missing. This replaced the previous pattern of direct GetComponent and singleton references throughout the codebase.

Sequential Startup Pipeline: StartupProcessProvider defines a 7-step ordered initialization sequence: (1) Validate Services: ensure all critical services are registered, (2) Validate Environment: check persistent storage paths, layer setup, and UI document availability, (3) Load Settings: async config initialization from AppConfigAsset ScriptableObject, (4) Check Server: HEAD request to content server for availability, (5) Process Content: load local content or trigger remote download based on server availability, (6) Kinect Init: initialize Kinect sensor with timeout fallback for environments without hardware, (7) Complete: transition to initial runtime mode. Each step reports progress to the InitializationUIHandler.

Config Registry: AppConfigProvider loads configuration from AppConfigAsset ScriptableObjects, supporting runtime reload via ConfigReloaded events. The AppConfig structure provides typed access to server settings (URL, credentials), content paths, body rig configuration, and idle timing parameters. Thread-safe async initialization ensures config is available before dependent systems start.

Mode Orchestrator: ModeOrchestrator implements both IAppCommands (for requesting mode changes and marquee transitions) and IAppActivity (for reporting user interaction). This centralizes mode coordination: InputHandler calls IAppCommands.RequestMarquee(), GestureHandler subscribes to IGestureHandler events, and the orchestrator manages the lifecycle. This replaced the previous pattern where handlers directly referenced each other.

Threading: UnityMainThread provides thread-safe async/await support, ensuring callbacks from async operations return to Unity's main thread. CancellationToken integration enables clean shutdown of long-running operations during mode transitions or application exit.

Systems Used

  • Event-Driven State Machine: AppFlowBehaviour with queued mode transitions, transition scoping, and SafeEvent invocation for decoupled runtime state management
  • Service Registry Dependency Injection: Scene-scoped service registration with [AutoRegister] and [Inject] attributes for loose coupling and testability
  • Sequential Startup Pipeline: Ordered startup process with validation, config loading, server checking, content processing, and Kinect initialization steps
  • Config Registry System: ScriptableObject-based configuration with runtime reload support, async initialization, and ConfigReloaded events
  • Content Service Architecture: Modular content pipeline with ContentParser, ContentMaterializer, ContentCarousel, and PersistentContentStorage components
  • Body Rig Priority System: Multi-body skeleton tracking with priority manager, zone interactors, interaction gates, and boundary policies
  • Marquee Animation Engine: Direction-aware slide transitions with PanelPair double-buffering, MarqueeAnimator coroutines, and transition scope locking
  • Mode Orchestrator Pattern: ModeOrchestrator implementing IAppCommands and IAppActivity interfaces for centralized mode handler coordination

Deep Dive

The v3.0 architectural overhaul was driven by the codebase having grown complex enough that tight coupling was causing cascading breakages during modifications.

Event System Design: The decision to use an event-driven architecture over a traditional state machine pattern was motivated by the need for multiple independent systems to react to state changes without knowing about each other. AppFlowBehaviour publishes ModeChanged events, and any number of subscribers, including UI handlers, body rig system, and content service, can respond independently. The SafeEvent wrapper was introduced after encountering a production issue where an exception in one subscriber's handler prevented all subsequent subscribers from receiving the event, effectively breaking the entire application from a single error.

Dependency Injection Approach: Rather than adopting a heavyweight DI framework like Zenject, a lightweight custom ServiceRegistry was built specifically for Unity's MonoBehaviour lifecycle. The [AutoRegister] attribute automatically discovers and registers services during scene initialization, while [Inject] resolves dependencies before Start() is called. This approach was simpler to debug than reflection-heavy frameworks and had minimal performance overhead. The registry supports scene-scoped lifetimes, which aligns with Unity's scene-based architecture.

Startup Pipeline Architecture: The sequential startup pipeline solved a recurring problem: initialization order dependencies. Previously, scripts relied on Unity's non-deterministic Awake()/Start() ordering, leading to intermittent null reference errors when services initialized in the wrong order. StartupProcessProvider makes the order explicit and adds validation at each step. The fallback design (skip Kinect if hardware is absent, use local content if server is unreachable) was critical for development environments where not all infrastructure is available.

Refactoring Strategy: The refactoring was executed in small, incremental commits rather than a single large rewrite. Each commit introduced one architectural change (e.g., "Implement EventSystem framework", "Add ServiceRegistry system", "Refactor BaseHandler to use [Inject]"), making it possible to bisect regressions. The body rig system, content service, and mode handlers were each refactored independently to use the new patterns, validating the approach at each step.


vv1 — v1.0 — Original Kinect Prototype

vv1 Highlights

  • Built an interactive gesture-recognition prototype integrating Microsoft Kinect v2 with Unity for a museum-style content display, enabling touchless navigation through physical hand swipes
  • Engineered real-time skeleton tracking with collider-based gesture detection, translating 3D hand positions into directional content transitions
  • Developed a multi-contributor integration effort, combining work from 4+ team members into a cohesive swipe-based content browsing experience
  • Implemented dynamic image loading from local storage with fade-in/fade-out transitions and animation-gated swipe availability to prevent input conflicts

vv1 Overview

Gesture Interface v1.0 was the initial prototype for a touchless, gesture-driven content display system built for the SCiL (Salisbury Center for Interactive Learning) lab. The application used a Microsoft Kinect v2 sensor to track user hand positions in 3D space, detecting left and right swipe motions through Unity trigger colliders. When a swipe was detected, the display transitioned between content images with fade animations. The prototype established the core interaction model (stand in front of a screen, wave your hand, and browse content) that would carry through all future versions.

vv1 Technical Breakdown

Kinect Integration: The Kinect v2 SDK was imported directly into Unity with native Windows plugins. A KinectManager script polled the sensor for body frames, extracting joint positions for tracked skeletons. Hand joint positions were converted from Kinect's coordinate space to Unity world space for collision detection.

Gesture Detection: Left and right swipe gestures were detected using Unity trigger colliders positioned in 3D space. When a tracked hand entered a trigger zone, it registered a swipe event. The collider positions were iteratively tuned across multiple commits to achieve reliable detection without false positives. Animation state gating prevented swipe events from firing during active transitions.

Content Display: Images were loaded from a desktop folder at runtime. A ContentData structure held image references, and a DisplayHandler managed Canvas-based UI elements with CanvasGroup fade coroutines for smooth transitions between content items.

State Management: A simple RuntimeStatus enum tracked the application state. Swipe input was only accepted when the status was set to Gesture, preventing interactions during initialization or transitions.


vv2 — v2.0 — Display System & Body Rig Architecture

vv2 Highlights

  • Architected and built a complete body rig skeleton system with runtime rig generation, multi-body tracking, joint smoothing, and priority-based hand detection from Kinect v2 data
  • Designed a multi-mode content display supporting Gesture, Idle, and Crowd viewing modes with transition queuing, scope-based locking, and cross-dissolve animations
  • Migrated the entire UI layer from Canvas to Unity UI Toolkit, implementing marquee slide animations with double-buffered panel pairs for seamless content transitions
  • Built a content loading pipeline with JSON metadata parsing, async texture materialization, WebDAV-based remote downloads with retry logic, and fingerprint-based change detection
  • Engineered multi-display support enabling the application to drive multiple physical screens from a single Unity instance

vv2 Overview

Gesture Interface v2.0 was a ground-up rebuild that transformed the prototype into a production-ready interactive display system. The project was re-initialized with Git LFS, migrated from Canvas UI to Unity UI Toolkit, and introduced a sophisticated body rig skeleton visualization system. Three distinct viewing modes (Gesture, Idle, and Crowd) allowed the system to adapt its presentation based on user engagement: Gesture for interactive paired content, Idle for automatic slideshow, and Crowd for fullscreen rotation. A complete content pipeline supported both local and remote content with async loading, and the Kinect integration was expanded with priority-based multi-body tracking and zone-based interaction detection.

vv2 Technical Breakdown

Body Rig System: The BodyRigSystemBehaviour orchestrates multi-body skeleton tracking. It receives tracked body data from KinectMediator, which bridges the Windows.Kinect SDK to the application through a KinectBodyToTrackedBodyAdapter. For each tracked body, a runtime skeleton rig is created using BodyRigRuntimeRigBuilder, which generates a hierarchy of joint GameObjects connected by SkeletonRenderer-drawn bones. A BoneMap maps Kinect joint types to visual bone segments. The BodyRigPriorityManager determines which body has "focus" based on highest hand position, and BodyRigZoneInteractor detects hand entry/exit from trigger zones. BodyRigSystemPolicy enforces interaction boundaries, while BodyRigInteractionGate guards input acceptance based on current application state.

Content Pipeline: Content follows a multi-stage pipeline. ContentParser reads JSON metadata files describing paired left/right content items with titles, captions, and image paths. ContentMaterializer handles async image loading and Unity texture creation. PersistentContentStorage manages local file enumeration, and ContentCarousel provides navigation control for cycling through content pairs. RemoteDataServiceBehaviour supports WebDAV-based file downloads with retry logic, exponential backoff, and manifest-based change detection.

Mode Handlers: Three mode handlers extend a common ModeHandlerBase. GestureHandler displays paired left/right content and drives marquee transitions via MarqueeAnimator. IdleHandler triggers automatic marquee transitions at configurable intervals when no user input is detected. CrowdHandler shows fullscreen content panels with fade-in/fade-out cross-dissolve animations, selecting 3 random non-repeating images per cycle.

UI System: The UI layer uses Unity UI Toolkit with three document instances (Left, Right, Middle) managed by UIToolkitHandler. GestureUIView implements a PanelPair pattern with "current" and "buffer" visual elements for smooth transitions. MarqueeAnimator slides content in from off-screen based on swipe direction. PulsingText provides configurable animated text for idle prompts.

Input System: InputHandler subscribes to Unity's new Input System for keyboard and gesture input, generating swipe events (left/right) and mode hotkeys. Input is throttled during transitions to prevent event stacking.


vv3 — v3.0 — Modular Architecture & BelmontHomestead Packages

vv3 Highlights

  • Architected an event-driven state machine (AppFlowBehaviour) with queued mode transitions, transition scoping, and SafeEvent invocation to eliminate race conditions and exception cascades across 3 runtime modes
  • Designed and implemented a dependency injection framework using [AutoRegister] and [Inject] attributes for scene-scoped service resolution
  • Built a sequential startup pipeline with 7 ordered initialization steps (service validation, environment checks, config loading, server health, content processing, and Kinect initialization) with graceful fallbacks at each stage
  • Refactored all mode handlers to interface-driven composition with ModeOrchestrator coordinating IAppCommands and IAppActivity for fully decoupled mode lifecycle management

vv3 Overview

Gesture Interface v3.0 is a comprehensive architectural overhaul that introduced event-driven state management, dependency injection, and a sequential startup pipeline. All mode handlers were refactored to use interface-driven composition through a ModeOrchestrator, and the entire codebase was reorganized under a unified namespace. This version represents the maturation from a functional application into a well-architected, maintainable, and extensible system.

vv3 Technical Breakdown

Event-Driven State Machine: AppFlowBehaviour implements IAppFlow and manages runtime mode transitions through an event-driven architecture. Mode change requests are queued during active transitions, preventing race conditions. TransitionScope tracks ownership of active transitions, and when a handler begins a transition, it acquires a scope that must be disposed before new transitions can proceed. SafeEvent wraps event invocation to catch and log exceptions from individual subscribers without breaking the event chain, preventing a single faulty handler from crashing the entire application. Events include ModeChanged, TransitionChanged, and InteractionOccurred.

Dependency Injection: The ServiceRegistry provides scene-scoped dependency injection. Services are registered using [AutoRegister] attributes on MonoBehaviours, which are discovered at scene load. Consumer scripts use [Inject] attributes on fields to receive resolved services. The registry validates that all required services are registered during the startup pipeline, failing fast with clear error messages if dependencies are missing. This replaced the previous pattern of direct GetComponent and singleton references throughout the codebase.

Sequential Startup Pipeline: StartupProcessProvider defines a 7-step ordered initialization sequence: (1) Validate Services: ensure all critical services are registered, (2) Validate Environment: check persistent storage paths, layer setup, and UI document availability, (3) Load Settings: async config initialization from AppConfigAsset ScriptableObject, (4) Check Server: HEAD request to content server for availability, (5) Process Content: load local content or trigger remote download based on server availability, (6) Kinect Init: initialize Kinect sensor with timeout fallback for environments without hardware, (7) Complete: transition to initial runtime mode. Each step reports progress to the InitializationUIHandler.

Config Registry: AppConfigProvider loads configuration from AppConfigAsset ScriptableObjects, supporting runtime reload via ConfigReloaded events. The AppConfig structure provides typed access to server settings (URL, credentials), content paths, body rig configuration, and idle timing parameters. Thread-safe async initialization ensures config is available before dependent systems start.

Mode Orchestrator: ModeOrchestrator implements both IAppCommands (for requesting mode changes and marquee transitions) and IAppActivity (for reporting user interaction). This centralizes mode coordination: InputHandler calls IAppCommands.RequestMarquee(), GestureHandler subscribes to IGestureHandler events, and the orchestrator manages the lifecycle. This replaced the previous pattern where handlers directly referenced each other.

Threading: UnityMainThread provides thread-safe async/await support, ensuring callbacks from async operations return to Unity's main thread. CancellationToken integration enables clean shutdown of long-running operations during mode transitions or application exit.