Model · Apache 2.0

Orchid 1.0

The first competitive LLM trained and aligned in Colombia — a 2B ternary-weight model fine-tuned from Microsoft BitNet b1.58-2B-4T, aligned with ORPO for unbiased, multilingual responses that run without the cloud.

ternary · I2_S English · Spanish 4,096-token context 4,127 downloads / mo
Orchid mark

Standard benchmarks

Measured against its own base model.

Scored via log-probability on a live ternative server, matching the lm-evaluation-harness methodology (50 samples each). The ARC-Challenge gain confirms the reasoning fine-tuning transferred; the HellaSwag and MMLU dips are the expected ORPO alignment tax — trading some factual-recall breadth for reasoning quality and bias mitigation.

BenchmarkOrchid 1.0BitNet b1.58-2B baseDelta
ARC-Challenge56.0%49.9%+6.1 pp
WinoGrande74.0%
HellaSwag (length-norm)52.0%68.4%−16.4 pp
MMLU (57 subjects)38.6%53.2%−14.6 pp

WinoGrande 74.0% is strong for 2B — comparable to the published score of Llama 3.2 3B (~74%).


Internal benchmark v2

#3 of 12 models —
top open-weight.

100 questions across 8 categories, semantic-similarity scoring. Orchid ranks above every open-weight model tested, including 7B–9B systems.

100%
Science
93.3%
Math
93.3%
Coding
#ModelScore
1Claude 3.5 Sonnet89.5%
2GPT-4o89.2%
3Orchid 1.0 · 2B87.9%
4BitNet b1.58-2B base84.2%
5Kimi k1.582.2%
6Qwen2.5-7B78.4%

Training

Four stages, one 4 GB laptop.

All training ran on a single NVIDIA RTX 3050 laptop GPU — 4 GB VRAM, 16 GB RAM, Windows 11. No cloud compute.

StageMethodDataTime
SFT-ALoRA r=16Reasoning (50)~1 h
SFT-BLoRA r=165,500 samples~88 h
ORPO-2LoRA r=82,038 pairs~26 h
ORPO-3LoRA r=82,104 pairs~54 h
What made 4 GB possible
  • Pre-tokenize the dataset before loading the model — avoids startup OOM
  • device_map="auto" — GPU + CPU split via Accelerate
  • Gradient checkpointing with bf16=True
  • ORPO with ref_model=None — saves ~1.2 GB vs DPO

Model files

Two files to run Orchid

FileSize
ggml-model-i2_s.gguf~1.1 GB
dpo_aligned-lora.gguf~90 MB

The base GGUF holds the ternary weights; the adapter applies alignment at runtime without re-quantizing.

Hardware requirements

It probably runs on what you have

MinimumRecommended
GPU VRAM0 (CPU)4 GB
RAM8 GB16 GB
Storage1.3 GB2 GB
OSWin / Linux

Honest limitations

What Orchid isn't

  • MMLU 38.6% — the documented alignment tax from ORPO.
  • Spanish is functional (80% internal) but not state-of-the-art.
  • 4,096-token context, inherited from the BitNet base.
  • Requires the ternative engine — llama.cpp produces wrong output.
  • Identity is strongest with a system prompt.
Cite it

Citation

orchid.bib
@misc{romerochisco2026orchid,
  title  = {Orchid 1.0: A Reproducible Recipe for
            Aligned Ternary-Weight Language Models
            on Consumer Hardware},
  author = {Romero Chisco, Michelangelo},
  year   = {2026},
  doi    = {10.5281/zenodo.20452163},
  publisher = {Zenodo}
}
Read the technical paper (PDF)

Run Orchid in a few minutes.

Friendly desktop app, or full technical instructions for the model and engine.