001 / Pontex Native IntegrationsUnity 2022.3 LTSIL2CPP & Burst Ready

The bridge between
real-time game logic
and on-device AI.

Pontex brings frame-synchronous, zero-latency LLM inference directly into Unity. ECS-driven pipeline, native RAG, and local voice synthesis — running on your player’s hardware. No API keys. No cloud costs. 100% offline.

// 002 — Performance

Engineered for the main thread.

0ms
Cloud Round-Trip
100%
IL2CPP & Burst Ready
3
Sub-Systems (LLM/STT/TTS)
0
Managed Heap Allocs
// 003 — Local. Always.

Built for the Unity main thread.

Cloud AI ruins immersion with network latency and ruins budgets with API costs. Pontex is engineered specifically for high-frequency game loops.

By bypassing the managed C# heap and running through our native C++ backend, Pontex ensures your game never drops a frame. Inference is dispatched across worker threads via Unity’s Job System and Burst Compiler, automatically utilizing the player’s CPU, CUDA, or Vulkan hardware.

// 004 — Infrastructure

Production-grade infrastructure, in the box.

01 / Pipeline

Asynchronous ECS pipeline.

Forget coroutines. Our LifecycleSystem drives inference natively using Unity's ECS. Token generation is monitored via JobHandle and sampled through Burst-compiled jobs, guaranteeing zero main-thread blocking.

ECSBurstJobHandle
02 / Tool Calling

Zero-reflection tool calling.

Give your AI the ability to trigger in-game events securely. Roslyn Source Generators detect [AITool] attributes at compile-time, emitting direct C# execution code — bypassing System.Reflection entirely for AAA performance and strict IL2CPP compatibility.

RoslynIL2CPP[AITool]
03 / RAG

Native vector database.

Equip NPCs with persistent memory and world lore. The offline Knowledge Baker converts .txt lore into searchable .json vectors. At runtime, the Burst-compiled CalculateCosineSimilarityJob searches thousands of entities in O(n) time.

RAGBurstO(n) search
04 / Voice

Complete voice pipeline.

True multimodal immersion. Convert player microphone input to text instantly via the local Whisper STT engine. Stream LLM responses into the Piper TTS engine for dynamic voice acting — synchronized with Unity's AudioSource.

Whisper STTPiper TTSAudioSource
// 005 — Developer Experience

From prototype to production.

Whether you prefer dragging components or writing unmanaged C#, Pontex adapts to your workflow.

For designers: the 5-minute NPC.

No engineering required. Use the Engine Dashboard to load models. Drop an AgentClient onto any GameObject, assign a Persona, and wire up standard UnityEvents.

Inspector
Agent Client (Script)
Model File
mistral-7b-q4.gguf
System Prompt
RAG Settings
Knowledge Base
MerchantLoreData
Max Context Docs
Voice (TTS)
Voice Model

For engineers: bare-metal API access.

Need absolute control? Interface directly with RuntimeNative. Allocate unmanaged pointers, schedule batch inference jobs, and build fully memory-managed pipelines.

AILogicSystem.cs
using Unity.Entities;
using Unity.Jobs;
using Unity.Collections;
using Pontex.Native;

public partial struct InferenceSystem : ISystem
{
    public void OnUpdate(ref SystemState state)
    {
        // 1. Allocate unmanaged request buffer
        var requests = new NativeArray<TokenRequest>(
            count, Allocator.TempJob);

        // 2. Schedule Native C++ Inference Job
        var inferenceJob = new EvaluateLLMJob
        {
            RuntimePtr = PontexRuntime.GetSharedInstance(),
            Requests   = requests,
            MaxTokens  = 128
        };

        // 3. Dispatch — zero main-thread blocking
        state.Dependency = inferenceJob.ScheduleBatch(
            requests.Length, 32, state.Dependency);
    }
}
Get Early Access

Stop renting intelligence. Own it.

Join the developers building the next generation of real-time games, powered by hardware-agnostic, on-device AI. Frame-synchronous. Offline. Yours.

Request Early Access