The easiest & fastest way
to run customized and fine-tuned Large Language Models (LLMs) locally or on the edge

Lightweight, Fast, Portable, Rust-powered and OpenAI compatible


Why choose the Rust + Wasm tech stack

Total runtime size is 30MB vs 4GB for Python and 350MB for Ollama.
Full native speed on GPUs.
Build and deploy a s cross-platform binary app on different CPUs, GPUs and OSes vs re-building for different hardwares.
Sandboxed and isolated execution on untrusted devices. Ready for the cloud.
Modern languages for inferences apps
Rust for now, and JS / Go are coming soon.
Supported in Docker, containerd, Podman, and Kubernetes.
OpenAI compatible
Seamlessly integrate into the OpenAI tooling ecosystem.

Supported Models

Llama 2 is a LLM released by Meta, ranging from 7B to 70B parameters. It is one of the most widely used LLM. Lots of LLMs are built on top of Llama 2.

Run Llama 2 on your own device now

Code Llama is a LLM for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

Run Code Llama on your own device now

Mistral is a 7B instruction-tuned LLM released by Mistral AI. It is a true open source model licensed under Apache 2.0.

Run the Llama 2 on your own device now

OpenChat is a 7B Large Language Model (LLM) developed by Alignment Lab AI.

Run OpenChat 3.5 on your own device now

MistralLite is a fine-tuned Mistral-7B-v0.1 language model, released by AWS, with enhanced capabilities of processing long context (up to 32K tokens).

Run MistralLite on your own device now

TinyLlama-1.1B is a fine-tuned Llama2 LLM with 1.1B parameters. It's a versatile tool for applications where restricted computational and memory resources are crucial.

Run TinyLlama-1.1B on your own device now.

Zephyr is fine-tuned Mistral-7B-v0.1 language model, released by the HuggingFace team. It removed the in-built alignment of these datasets boosted performance on MT Bench.

Run Zephyr on your own device now

Vicuna issa chat assistant trained by fine-tuning Llama 2 on user-shared conversations collected from ShareGPT, released by the LMSYS team.

Run Vicuna on your own device now

Baichuan2 is the new generation of large-scale open-source language models launched by Baichuan Intelligence inc.

Run Baichuan2-13B on your own device now

Yi-34B is a large language model trained from scratch by developers at 01.AI..

Run Yi-34B on your own device now.

CausalLM is fine-tuned llama2 language model, released by the Causal team. It’s claimed that its 7B model maybe better than all existing models <= 33B.

Run CausalLM-7B on your own device now.

Wizarcoder is a specialized Large Language Model (LLM) tailored for coding tasks.

Run Wizarcoder on your own device now.

How it works

* Or build you own app from Rust source code

