THE CORE ARCHITECTURE

Dive deep into the technology that makes AVA possible.

vLLM Engine

High-throughput and memory-efficient LLM serving engine. It utilizes PagedAttention to manage attention key and value memory effectively, delivering state-of-the-art inference speed for local models.

Ray Framework

Unified framework for scaling AI applications. AVA SDK uses Ray to orchestrate distributed inference and manage resources efficiently across your GPU and CPU, ensuring smooth multitasking.

LlamaFactory

The ultimate tool for fine-tuning. We provide predefined recipes to fine-tune Llama 3 and other models specifically for gaming and assistance contexts within the AVA ecosystem.