Cloud AI ruins immersion with network latency and ruins budgets with API costs. Pontex is engineered specifically for high-frequency game loops.
By bypassing the managed C# heap and running through our native C++ backend, Pontex ensures your game never drops a frame. Inference is dispatched across worker threads via Unity’s Job System and Burst Compiler, automatically utilizing the player’s CPU, CUDA, or Vulkan hardware.
Forget coroutines. Our LifecycleSystem drives inference natively using Unity's ECS. Token generation is monitored via JobHandle and sampled through Burst-compiled jobs, guaranteeing zero main-thread blocking.
Give your AI the ability to trigger in-game events securely. Roslyn Source Generators detect [AITool] attributes at compile-time, emitting direct C# execution code — bypassing System.Reflection entirely for AAA performance and strict IL2CPP compatibility.
Equip NPCs with persistent memory and world lore. The offline Knowledge Baker converts .txt lore into searchable .json vectors. At runtime, the Burst-compiled CalculateCosineSimilarityJob searches thousands of entities in O(n) time.
True multimodal immersion. Convert player microphone input to text instantly via the local Whisper STT engine. Stream LLM responses into the Piper TTS engine for dynamic voice acting — synchronized with Unity's AudioSource.
Whether you prefer dragging components or writing unmanaged C#, Pontex adapts to your workflow.
No engineering required. Use the Engine Dashboard to load models. Drop an AgentClient onto any GameObject, assign a Persona, and wire up standard UnityEvents.
Need absolute control? Interface directly with RuntimeNative. Allocate unmanaged pointers, schedule batch inference jobs, and build fully memory-managed pipelines.
using Unity.Entities;
using Unity.Jobs;
using Unity.Collections;
using Pontex.Native;
public partial struct InferenceSystem : ISystem
{
public void OnUpdate(ref SystemState state)
{
// 1. Allocate unmanaged request buffer
var requests = new NativeArray<TokenRequest>(
count, Allocator.TempJob);
// 2. Schedule Native C++ Inference Job
var inferenceJob = new EvaluateLLMJob
{
RuntimePtr = PontexRuntime.GetSharedInstance(),
Requests = requests,
MaxTokens = 128
};
// 3. Dispatch — zero main-thread blocking
state.Dependency = inferenceJob.ScheduleBatch(
requests.Length, 32, state.Dependency);
}
}Join the developers building the next generation of real-time games, powered by hardware-agnostic, on-device AI. Frame-synchronous. Offline. Yours.
Request Early Access