Overview

PersonAI is a conversational AI application that lets users speak with AI-driven historical personas. Starting with America's Founding Fathers, users can ask questions and receive historically grounded responses delivered through synthesized voice. The system uses Google Vertex AI for grounded chat with RAG-based citations that reference primary source documents, ElevenLabs for streaming text-to-speech that gives each persona a distinct voice, and Google Cloud Speech-to-Text for voice input.

Built in Unity for Shenandoah University's School of Computing, Innovation, and Leadership, the project explores how conversational AI, voice synthesis, and retrieval-augmented generation can make historical figures feel present and accessible. It targets both desktop and Meta Quest platforms, with a persona authoring system that allows researchers to create and publish new historical figures with their associated source document collections.

Role Summary

Designed the full application architecture across all four major versions, making every technical decision from AI backend selection to namespace structure and dependency managementBuilt the entire custom Vertex AI integration from scratch: VertexChatService, VertexChatRequestBuilder, VertexChatResponseParser, VertexPathBuilder, and the dual-path RAG pipeline with DiscoveryEngineSearchClientImplemented the full authentication stack: OAuth 2.0 PKCE loopback flow, service account JWT bearer flow with custom DER/PKCS#8 RSA parser, identity management with factory pattern, and EditorPrefs-based token persistenceBuilt the streaming voice engine (ElevenLabsStreamingVoiceEngine) with thread-safe queue-based playback and crossfade, and the voice-to-text pipeline with Google Cloud STTDesigned and implemented the GCS-based persona configuration system with multi-level caching, the Editor Persona Browser with authoring services and GCS publishing, and the startup pipeline orchestrationPerformed the complete namespace refactor (108 using statements, 95 namespace declarations) and codebase cleanup across all 156 source files

Non-Technical Summary

PersonAI is an interactive educational application built for Shenandoah University that lets students and visitors have natural conversations with AI versions of historical figures, starting with America's Founding Fathers like Benjamin Franklin and George Washington. Users can type questions or speak them aloud, and the AI persona responds with historically grounded answers delivered through a realistic synthesized voice. The experience takes place in a recreated 3D Independence Hall environment.

Every response is backed by real source documents. The system searches through curated knowledge bases to find relevant historical information, weaves that context into the conversation, and provides citations so users can verify what they are told. The AI draws from actual historical records and documents, then identifies exactly where it found that information.

The application is designed so that new personas and knowledge bases can be added without touching any code. University staff can author new characters, upload their knowledge documents, and configure personality settings through a built-in editing tool. The system handles everything from securely authenticating with cloud services to managing which persona is speaking and what voice they use.

Highlights

Architected a 156-file Unity application for real-time AI conversations with historical personas, evolving across four major versions from Inworld AI SDK to a custom Vertex AI integration with dual-path RAG, streaming TTS, and voice input.Designed a dual-path RAG pipeline querying Discovery Engine datastores for deterministic grounded responses with citation extraction, falling back to model-driven retrieval tools when primary retrieval returns no results.Built a multi-mode authentication system supporting OAuth 2.0 PKCE and service account JWT bearer flows, including a custom DER/PKCS#8 RSA key parser for Unity runtime compatibility, where standard .NET APIs were unavailable.Engineered a streaming voice engine with ElevenLabs SDK using thread-safe queue-based playback, crossfade de-clicking, and pitch-based pacing, paired with Google Cloud Speech-to-Text for bidirectional voice interaction.

Quick Highlights

Dual-path RAG architecture: deterministic Discovery Engine retrieval with injected context as the primary path, model-driven grounding tools as fallback, ensuring cited, factually grounded responsesFull voice loop: Google Cloud Speech-to-Text for input, ElevenLabs streaming TTS for output, with crossfade de-clicking and speech pacing controlDynamic persona system: GCS-hosted configuration (collection.json, persona.json, system.md, profile.png) with LRU caching, TTL expiration, and a dedicated Editor authoring workspace with optimistic concurrency via GCS generation matchingSecure multi-mode authentication: OAuth 2.0 PKCE for interactive users, service account JWT with custom RSA/DER key parsing for server-to-server, with no hardcoded secretsFour-version evolution from Inworld AI SDK (2023) through a complete architectural rewrite to custom Vertex AI integration with RAG (2026), with an experimental OpenAI branch

Technical Breakdown

Startup and Application Lifecycle

The application boots through a declarative sequential pipeline orchestrated by StartupPipelineProvider, which extends SequentialPipelineProviderBehaviour from the sequential pipeline package. The pipeline executes five ordered steps: ValidateBootstrapReferencesStep (ensures scene references are intact), InitializeConfigStep (loads environment-aware settings via AppConfigProvider), InjectSecretsStep (loads secrets.json via the secrets provider and validates against SecretsManifest), ValidateConfigStep (runs ConfigValidator checks on URLs, scene names, and parameter ranges), and ValidateServicesStep (confirms required services are registered in the ServiceLocator). StartupRoot is the singleton MonoBehaviour with DefaultExecutionOrder(-2400) that ensures DontDestroyOnLoad persistence. On pipeline success, the UI shell scene is loaded asynchronously via AppSceneFlowService.

Persona Configuration and Management

The persona system is domain-driven with three layers: Domain (PersonaEngine, PersonaCollection, PersonaDefinition, CollectionDefinition), Infrastructure (GcsPersonaConfigProvider, BackendPersonaProvider, GcsPersonaPublishClient, DiscoveryEngineSearchClient), and Config (PersonaConfigurationService, PersonaConfig, PersonaIndexModels). PersonaEngine is the central orchestrator implementing IPersonaEngine, managing collection loading, persona selection, and configuration persistence with event-driven notifications (OnCollectionChanged, OnPersonaChanged, IndexLoadCompleted).

GcsPersonaConfigProvider implements four provider interfaces and loads hierarchical GCS objects (config.json, personas/index.json, collection.json, persona.json, system.md, profile.png) with multi-level caching: a 5-minute TTL for configs and LRU eviction for both profile textures (max 32) and persona configs (max 32). Datastore collection IDs and knowledge store IDs are derived from collection/persona identifiers or overridden via CollectionDefinition fields.

Vertex AI Chat Service and RAG Pipeline

VertexChatService implements IChatService with a dual-path strategy. The primary RAG path calls DiscoveryEngineSearchClient.SearchMultipleAsync() to query Discovery Engine datastores in parallel, receiving RetrievedChunk objects (title, content, sourceUri, documentId). These chunks are formatted by VertexChatRequestBuilder.BuildWithContext() and injected directly into the Gemini system instruction for deterministic grounded generation.

If the RAG path fails or returns no chunks, the fallback retrieval-tool path fires. VertexChatRequestBuilder.Build() constructs a request with vertexAiSearch retrieval tools attached, letting the model decide when to ground its response. VertexChatResponseParser.ParseGeminiResponse() returns a ParseResult struct containing text, citations, and grounding diagnostics. The parser supports six citation extraction strategies: groundingSupports, legacy citationMetadata, webSearchSources, retrievalSources, inline citations, and chunk-to-citation conversion, with deduplication based on URL/title/snippet combination.

Authentication

IdentityService is the central identity manager supporting two modes via IdentityProviderFactory. The OAuth path creates a GoogleOAuthPkceFlow with loopback redirect, EditorPrefsTokenStore for persistence, token inspector, and revoker, supporting scopes for openid, email, profile, and cloud-platform. The service account path uses GoogleServiceAccountAuthProvider which implements OAuth 2.0 JWT bearer flow: it parses PEM-encoded RSA private keys using a custom DerReader class that handles PKCS#8 unwrapping to PKCS#1 (necessary because Unity's runtime lacks ImportPkcs8PrivateKey), signs JWTs with RSA-SHA256, and exchanges them for access tokens with caching and expiry management.

Voice Synthesis

VoiceEngine is the abstract base class implementing a template method pattern with SpeakRoutine() as the extension point. ElevenLabsStreamingVoiceEngine extends this with real-time streaming via the ElevenLabs SDK. It uses a dual-coroutine architecture: one for streaming audio chunks into a thread-safe queue (with locks), another for sequential playback with configurable crossfade de-clicking (FadeVolume()). Speech speed is controlled via AudioSource pitch adjustment. Buffer management enforces maxQueuedClips and maxBufferedSeconds limits.

Voice-to-Text

VoiceToTextService delegates to GoogleCloudVoiceToTextService, which records via Unity's Microphone API with clamped sample rates, converts multi-channel float samples to mono PCM16 bytes (FloatToPcm16Mono()), and posts base64-encoded audio to Google Cloud Speech-to-Text v1 with automatic punctuation enabled. Authentication uses the identity service's OAuth access token.

Networking

UnityWebRequestHttpClient is a sealed singleton implementing IHttpClient with exponential backoff (250ms base, 10s cap, full jitter), Retry-After header respect (delta-seconds and HTTP-date formats), idempotent method detection, transient error detection, and status code retry policy (429, 500-599).

UI System

The UI is built on Unity UI Toolkit with UXML documents and USS stylesheets. MenuController manages the shell with three submenus (MainMenuController, PersonaSelectionController, ChatMenuController) inheriting from MenuControllerBase with lifecycle hooks (OnBind, OnShow, OnHide). The chat subsystem is decomposed into ChatTranscriptBuilder, ChatTypewriterEffect, ChatSourcesRenderer, and ChatVoiceInputHandler. StyleKit provides dark/light theme support with design token overrides. Platform-specific USS files handle sizing for macOS and iOS.

Editor Tooling

The Editor Persona Browser provides a full authoring workspace with adapter-pattern data sources (EditModePersonaBrowserDataSource / RuntimePersonaBrowserDataSource), six authoring services that read/write GCS JSON files with optimistic concurrency via generation matching. PersonaEditModeContext bootstraps config and identity for editor workflows. PersonaPublishPolicy validates publish readiness.

Error Handling and Utilities

The codebase uses a Result<T> monad throughout with AppError (code, message, statusCode, retryable flag, user-friendly messages). ErrorBoundary provides try/catch wrappers for resilient components. AsyncRunner runs tasks with automatic exception logging. UnityMainThread dispatches to the main thread via SynchronizationContext with a fallback queue.

Systems Used

Dual-Path RAG Pipeline - Primary Discovery Engine retrieval with context injection, falling back to model-driven grounding toolsDiscovery Engine Search Client - Parallel Vertex AI Search datastore queries returning retrieved chunks with source metadataVertex Chat Request Builder - Dual-path request construction with context injection and retrieval-tool fallbackVertex Chat Response Parser - Multi-format citation extraction with deduplication across six response schema variantsPersona Engine - Collection and persona lifecycle management with event-driven architectureEditor Persona Browser - Authoring workspace with GCS publishing and optimistic concurrency via generation matching

Impact & Results

Delivered a production-ready application for Shenandoah University's educational programs, enabling students to interact with AI-driven historical personas in an immersive 3D environmentEvolved the codebase through four major architectural versions over 2.5 years (September 2023 to March 2026), progressing from a third-party SDK dependency to a fully custom AI integrationBuilt 156 C# source files (130 runtime, 26 editor) organized across a clean domain-driven namespace hierarchy with seven subsystemsImplemented citation-grounded AI responses via RAG pipeline, ensuring educational accuracy with verifiable sources, supporting six different citation metadata formats across Vertex AI response schemas

Deep Dive

Dual-Path RAG Architecture

The most architecturally significant subsystem is the dual-path RAG pipeline in VertexChatService. When a user sends a message, the service first evaluates whether the active persona has a configured datastore. If it does, the primary RAG path executes: DiscoveryEngineSearchClient.SearchMultipleAsync() fires parallel HTTP requests to one or more Discovery Engine datastores, each returning RetrievedChunk objects with title, content, sourceUri, and documentId. The client uses a minimal request body (omitting contentSearchSpec) to avoid enterprise-tier API requirements, and supports both standard and enterprise edition response formats with priority-based content extraction (snippets, extractive answers, then text fields).

The retrieved chunks are formatted by VertexChatRequestBuilder.BuildWithContext() into a structured context block injected into the Gemini system instruction. This approach is deterministic: the model always sees the retrieved context and must ground its response in it. The request includes full conversation history (multi-turn ChatMessage objects), generation configuration (temperature, topP, topK, maxOutputTokens), and safety settings mapped from PersonaDefinition.

If the RAG path fails (no chunks returned, datastore misconfiguration, or HTTP errors), the fallback retrieval-tool path fires. VertexChatRequestBuilder.Build() constructs a request with vertexAiSearch retrieval tools attached, allowing Gemini to autonomously decide when to query the datastore. VertexChatResponseParser.ParseGeminiResponse() handles both paths uniformly, returning a ParseResult struct with extracted text, a list of ParsedCitation objects (title, url, snippet, confidenceScore), and grounding diagnostics. The parser navigates six different citation metadata formats and deduplicates citations based on a URL+title+snippet composite key.

Custom RSA Key Parsing for Unity Runtime

Unity's runtime lacks the RSACryptoServiceProvider.ImportPkcs8PrivateKey() method available in modern .NET. GoogleServiceAccountAuthProvider solves this with a custom DerReader class that manually parses ASN.1 DER-encoded binary structures. The parser reads PEM-encoded private keys, strips headers/footers, base64-decodes to DER bytes, navigates the PKCS#8 PrivateKeyInfo envelope (SEQUENCE, version INTEGER, AlgorithmIdentifier, OCTET STRING), and extracts the inner PKCS#1 RSAPrivateKey structure. This is then used with RSACryptoServiceProvider.ImportParameters() to construct the RSA key for JWT signing. The implementation handles both wrapped (PKCS#8) and unwrapped (PKCS#1) key formats.

GCS Persona Configuration with Editor Authoring

The persona configuration system uses a hierarchical GCS object structure. At the top level, config.json holds global settings (datastoreLocation, fallback model, safety defaults). Below that, personas/index.json lists all available collections and personas. Each collection has a collection.json with display metadata and optional datastore overrides. Each persona within a collection has persona.json (generation settings, safety settings, grounding config, speech pacing) and system.md (the full system prompt), plus an optional profile.png.

GcsPersonaConfigProvider implements four provider interfaces and loads these objects with a two-tier caching strategy: a dictionary-based cache with 5-minute TTL for config objects, and separate LRU caches (max 32 entries each) for profile textures and persona configs.

The Editor Persona Browser provides a complete authoring workflow. Six authoring services each wrap GcsPersonaPublishClient for reading and writing GCS objects. All write operations use optimistic concurrency via GCS ifGenerationMatch headers. The editor tracks the generation number of each loaded document and includes it in update requests, ensuring no silent overwrites if another editor session has modified the same object.

Streaming Voice Engine Internals

ElevenLabsStreamingVoiceEngine manages real-time audio streaming through a dual-coroutine architecture. The first coroutine (SpeakRoutine) calls the ElevenLabs SDK's streaming API, receiving VoiceClip objects asynchronously. Each clip's internal AudioClip is extracted via reflection (the SDK does not expose it publicly) and enqueued into a thread-safe Queue protected by a lock object. The second coroutine (PlayQueueSequentially) polls the queue and plays clips in order through a Unity AudioSource.

Between clips, FadeVolume() performs smooth volume crossfades over a configurable duration to eliminate audible clicks at clip boundaries. Speech pacing is achieved by adjusting the AudioSource.pitch property based on the persona's configured speech speed multiplier. Buffer management enforces limits on both the number of queued clips and total buffered audio duration to prevent memory pressure during long responses. When synthesis is canceled, both coroutines are stopped, the queue is drained, and all allocated AudioClip objects are explicitly destroyed to prevent memory leaks.

v4.0 — BelmontHomestead Package Ecosystem + RAG Pipeline

vv4.0 Highlights

Architected a 156-file Unity application for real-time AI conversations with historical personas, evolving across four major versions from Inworld AI SDK to a custom Vertex AI integration with dual-path RAG, streaming TTS, and voice input.Designed a dual-path RAG pipeline querying Discovery Engine datastores for deterministic grounded responses with citation extraction, falling back to model-driven retrieval tools when primary retrieval returns no results.Built a multi-mode authentication system supporting OAuth 2.0 PKCE and service account JWT bearer flows, including a custom DER/PKCS#8 RSA key parser for Unity runtime compatibility, where standard .NET APIs were unavailable.Engineered a streaming voice engine with ElevenLabs SDK using thread-safe queue-based playback, crossfade de-clicking, and pitch-based pacing, paired with Google Cloud Speech-to-Text for bidirectional voice interaction.

vv4.0 Overview

PersonAI is a Unity application developed for the Shenandoah Center for Immersive Learning that enables users to have voice-enabled, grounded conversations with AI-driven historical personas, initially America's Founding Fathers, in a 3D Constitutional Convention environment. The application uses Google Vertex AI (Gemini) as its LLM backbone with a custom dual-path RAG pipeline that retrieves relevant documents from Discovery Engine datastores and injects them into the conversation context, producing cited and factually grounded responses.

Users can interact via text or voice (Google Cloud Speech-to-Text), and persona responses are spoken aloud through ElevenLabs streaming text-to-speech with configurable pacing. The system supports dynamic persona configuration hosted on Google Cloud Storage, allowing new personas and knowledge bases to be authored and published through a custom Unity Editor toolset without code changes.

vv4.0 Technical Breakdown

Startup and Application Lifecycle

Persona Configuration and Management

Vertex AI Chat Service and RAG Pipeline

Authentication

Voice Synthesis

Voice-to-Text

Networking

UI System

Editor Tooling

Error Handling and Utilities

v3.0 — Vertex AI + Custom Architecture

vv3.0 Highlights

Executed a complete architectural rewrite, replacing the Inworld AI SDK with a custom Google Vertex AI (Gemini) integration built from scratch, including VertexChatService, request builder, response parser, and grounding tool supportImplemented dual authentication supporting both OAuth 2.0 PKCE (loopback) for interactive users and service account JWT with a custom DER/PKCS#8 RSA key parser to work around Unity runtime limitationsBuilt a streaming text-to-speech engine using ElevenLabs SDK with dual-coroutine architecture, thread-safe queue-based playback, crossfade de-clicking, and pitch-based speech pacing controlDesigned a domain-driven persona configuration system backed by Google Cloud Storage, with hierarchical config loading (config.json, index.json, collection.json, persona.json, system.md) and multi-level LRU/TTL cachingMigrated from Canvas UI to Unity UI Toolkit (UI Elements) with UXML/USS documents, implementing a three-scene architecture (Bootstrap, Shell, Chat) with submenu lifecycle management

vv3.0 Overview

PersonAI v3.0 was a ground-up architectural rewrite that replaced the third-party Inworld AI SDK with a fully custom integration. The new architecture used Google Vertex AI (Gemini) for conversational AI, ElevenLabs for streaming text-to-speech, and Google Cloud Speech-to-Text for voice input, each integrated directly via REST APIs with custom C# clients. The application was migrated to Unity 6 with a domain-driven runtime namespace hierarchy (App, Authentication, Networking, Persona, UI, Voice, VoiceToText), UI Toolkit-based interface, and a sequential startup pipeline. This version established the modular architecture that v4.0 would later refine.

vv3.0 Technical Breakdown

Core Subsystems

The v3.0 rewrite introduced seven runtime subsystems. VertexChatService implemented IChatService with Gemini generateContent API calls, using VertexChatRequestBuilder for request construction (system instruction, conversation history, generation config, safety settings, grounding tools) and VertexChatResponseParser for response extraction with citation handling.

Authentication

IdentityService supported dual authentication via IdentityProviderFactory: GoogleOAuthPkceFlow with loopback redirect for interactive use, and GoogleServiceAccountAuthProvider with custom DerReader for PKCS#8 RSA key parsing.

Voice and Audio

ElevenLabsStreamingVoiceEngine used a dual-coroutine architecture for streaming TTS with thread-safe queueing and crossfade. GoogleCloudVoiceToTextService recorded via the Microphone API, encoded to PCM16, and posted to the Speech-to-Text REST API.

Networking and Persona

UnityWebRequestHttpClient provided HTTP transport with exponential backoff and retry policies. The persona system used GcsPersonaConfigProvider to load hierarchical GCS objects.

UI

The UI was rebuilt with UI Toolkit using UXML documents for three scenes (Bootstrap, Shell, Chat) with MenuControllerBase providing lifecycle hooks. MichaelReynolds.ServiceRegistry and MichaelReynolds.ConfigRegistry provided dependency injection and configuration management.

v2.0 — Inworld 3.0 Migration

vv2.0 Highlights

Migrated the application from Inworld AI SDK to v3.0, removing all Innequin Sample dependencies and rebuilding character interaction scripts from scratch for full control over the conversation pipelineIntegrated the Inworld Starter Kit UI framework and built custom IntroductionUI and UsernameHandler components, replacing the previous ad-hoc Canvas layout with a more structured interaction flowCreated a new standalone ChatUI implementation decoupled from SDK sample code, enabling independent iteration on the conversation interface

vv2.0 Overview

PersonAI v2.0 was a migration to Inworld AI SDK v3.0, driven by breaking changes in the SDK's API surface. The upgrade required removing all dependencies on the Innequin Samples package and rebuilding character scripts from scratch without inheriting from SDK sample classes. This version introduced the Inworld Starter Kit for standardized UI patterns, along with custom components for user introduction and chat interaction. While still powered by Inworld for conversational AI, this version established the pattern of owning the interaction layer rather than relying on SDK samples.

vv2.0 Technical Breakdown

SDK Migration

The v2.0 migration centered on replacing inherited SDK sample scripts with standalone implementations. Character interaction scripts were rewritten from scratch to use the Inworld 3.0 API directly, removing the coupling to InworldCharacter sample components. The Inworld Starter Kit was imported to provide a baseline UI framework.

UI and Interaction

UsernameHandler managed user identity input and session initialization. IntroductionUI provided a structured onboarding flow before entering the chat experience. The ChatUI was rebuilt as an independent Canvas-based interface, no longer extending or referencing SDK sample UI code.

Scene and Platform

The scene structure was expanded to include an aviation-themed scene alongside the original Founding Fathers environment. The project remained on Unity 2022.3.17f1.

v1.0 — Inworld AI Native

vv1.0 Highlights

Built a 3D conversational AI prototype for Shenandoah University's Constitutional Convention exhibit using Inworld AI SDK, importing custom Founding Fathers avatars into a recreated Independence Hall environmentDeveloped a Canvas-based chat UI with message bubble system, delegate portrait display, and fade-in/fade-out transitions managed by NewSimulationHandlerIntegrated Cinemachine for dynamic camera management across multiple avatar characters, with PlayerHandler coordinating avatar selection, screen UI, and camera targets

vv1.0 Overview

PersonAI v1.0 was the initial prototype built for Shenandoah University's Constitutional Convention 2023 event. The application used the Inworld AI SDK to power real-time conversational AI characters, Benjamin Franklin and George Washington, placed in a 3D recreation of Independence Hall. Users navigated the environment and interacted with the Founding Fathers through a Canvas-based chat UI, with Inworld handling all natural language understanding, response generation, and character voice. The project established the core concept of conversational historical personas in an immersive 3D environment.

vv1.0 Technical Breakdown

Architecture

The v1.0 architecture was built around the Inworld AI SDK, which provided the full conversational AI stack (NLU, response generation, voice synthesis, and animation). PlayerHandler was the central MonoBehaviour managing avatar selection, screen UI state, and Cinemachine camera targets. NewSimulationHandler managed UI transitions with CanvasGroup-based fade-in/fade-out effects and customizable UI layouts.

Chat Interface

The chat interface used Unity's Canvas system with TextMesh Pro text prefabs for message rendering, arranged in a scrollable log. Audio was managed by SimpleAudioManager for ambient and interaction sounds.

Environment

The 3D environment consisted of the Constitutional Convention workspace scene with baked lighting, imported delegate assets (portraits and 3D models), and custom attire materials for Shenandoah University branding. The project used Unity 2022.3.17f1 with Newtonsoft JSON imported to satisfy Inworld SDK dependencies.