Model · Apache 2.0

Orchid 1.0

The first competitive LLM trained and aligned in Colombia — a 2B ternary-weight model fine-tuned from Microsoft BitNet b1.58-2B-4T, aligned with ORPO for unbiased, multilingual responses that run without the cloud.

ternary · I2_S English · Spanish 4,096-token context 4,127 downloads / mo

Standard benchmarks

Measured against its own base model.

Scored via log-probability on a live ternative server, matching the lm-evaluation-harness methodology (50 samples each). The ARC-Challenge gain confirms the reasoning fine-tuning transferred; the HellaSwag and MMLU dips are the expected ORPO alignment tax — trading some factual-recall breadth for reasoning quality and bias mitigation.

Benchmark	Orchid 1.0	BitNet b1.58-2B base	Delta
ARC-Challenge	56.0%	49.9%	+6.1 pp
WinoGrande	74.0%	—	—
HellaSwag (length-norm)	52.0%	68.4%	−16.4 pp
MMLU (57 subjects)	38.6%	53.2%	−14.6 pp

WinoGrande 74.0% is strong for 2B — comparable to the published score of Llama 3.2 3B (~74%).

Internal benchmark v2

#3 of 12 models —
top open-weight.

100 questions across 8 categories, semantic-similarity scoring. Orchid ranks above every open-weight model tested, including 7B–9B systems.

100%

Science

93.3%

Math

93.3%

Coding

#	Model	Score
1	Claude 3.5 Sonnet	89.5%
2	GPT-4o	89.2%
3	Orchid 1.0 · 2B	87.9%
4	BitNet b1.58-2B base	84.2%
5	Kimi k1.5	82.2%
6	Qwen2.5-7B	78.4%

Training

Four stages, one 4 GB laptop.

All training ran on a single NVIDIA RTX 3050 laptop GPU — 4 GB VRAM, 16 GB RAM, Windows 11. No cloud compute.

Stage	Method	Data	Time
SFT-A	LoRA r=16	Reasoning (50)	~1 h
SFT-B	LoRA r=16	5,500 samples	~88 h
ORPO-2	LoRA r=8	2,038 pairs	~26 h
ORPO-3	LoRA r=8	2,104 pairs	~54 h

What made 4 GB possible

Pre-tokenize the dataset before loading the model — avoids startup OOM
device_map="auto" — GPU + CPU split via Accelerate
Gradient checkpointing with bf16=True
ORPO with ref_model=None — saves ~1.2 GB vs DPO

Model files

Two files to run Orchid

File	Size
ggml-model-i2_s.gguf	~1.1 GB
dpo_aligned-lora.gguf	~90 MB

The base GGUF holds the ternary weights; the adapter applies alignment at runtime without re-quantizing.

Hardware requirements

It probably runs on what you have

	Minimum	Recommended
GPU VRAM	0 (CPU)	4 GB
RAM	8 GB	16 GB
Storage	1.3 GB	2 GB
OS	Win / Linux	—

Honest limitations

What Orchid isn't

MMLU 38.6% — the documented alignment tax from ORPO.
Spanish is functional (80% internal) but not state-of-the-art.
4,096-token context, inherited from the BitNet base.
Requires the ternative engine — llama.cpp produces wrong output.
Identity is strongest with a system prompt.

Cite it

Citation

orchid.bib

@misc{romerochisco2026orchid,
  title  = {Orchid 1.0: A Reproducible Recipe for
            Aligned Ternary-Weight Language Models
            on Consumer Hardware},
  author = {Romero Chisco, Michelangelo},
  year   = {2026},
  doi    = {10.5281/zenodo.20452163},
  publisher = {Zenodo}
}

Read the technical paper (PDF) →

Run Orchid in a few minutes.

Friendly desktop app, or full technical instructions for the model and engine.

Download Orchid Desktop Model card ↗