Apple's on-device Vision framework from the command line — OCR, image classification, barcode detection, and face detection.
No API keys. No cloud. No network. The Vision framework is already on your Mac — auge lets you use it from the terminal.
Every Mac ships with Apple's Vision framework — a powerful on-device computer vision engine for text recognition, image classification, barcode scanning, and face detection. But it's only accessible through Swift/ObjC code. auge wraps it in a UNIX CLI — so you can use it from the terminal, shell scripts, and pipelines.
- OCR — extract text from screenshots, scans, PDFs
- Classification — identify what's in an image (1000+ categories)
- Barcodes — scan QR codes, EAN, Code128, and more
- Face detection — count faces and get bounding boxes
- Pipe-friendly — works with
jq,xargs,apfel, and shell scripts - Zero cost — no API keys, no cloud, no subscriptions, no dependencies
- macOS 10.15+ (classification requires macOS 12+)
- Building from source requires Command Line Tools with Swift 6.3. No Xcode required.
Homebrew (recommended):
brew tap Arthur-Ficial/tap
brew install Arthur-Ficial/tap/augeBuild from source:
git clone https://github.com/Arthur-Ficial/auge.git
cd auge
make install# OCR — extract text from images
auge --ocr screenshot.png
auge --ocr scan.pdf
auge --ocr image1.png image2.png
# Classification — what's in this image?
auge --classify photo.jpg
auge --classify photo.jpg --top 5
# Barcodes — scan QR codes and barcodes
auge --barcode product.jpg
# Face detection — count and locate faces
auge --faces group.jpgauge --ocr screenshot.png -o json | jq .results.lines
auge --classify photo.jpg -o json | jq '.results.classifications[:3]'
auge --faces group.jpg -o json | jq .results.count{
"file" : "screenshot.png",
"metadata" : {
"on_device" : true,
"version" : "0.0.5"
},
"mode" : "ocr",
"results" : {
"lines" : ["Hello", "World"],
"text" : "Hello\nWorld"
}
}# OCR a screenshot, summarize with apfel
auge --ocr screenshot.png | apfel "summarize this"
# OCR all PNGs in a directory
ls *.png | auge --ocr
# Pipe file paths via stdin
find . -name "*.jpg" | auge --classify --top 3
# Chain with jq for structured extraction
auge --ocr receipt.png -o json | jq -r .results.textSee demo/ for real-world shell scripts powered by auge.
screenshot — capture screen and extract text instantly:
demo/screenshot # full screen OCR
demo/screenshot -r # drag to select a region
demo/screenshot -c # copy text to clipboard
demo/screenshot | grep "error" # find errors on screenclipboard-ocr — OCR an image from the clipboard:
# Press Cmd+Ctrl+Shift+4 to screenshot a region to clipboard, then:
demo/clipboard-ocr # print extracted text
demo/clipboard-ocr -c # replace clipboard image with text
demo/clipboard-ocr | apfel "summarize this"describe — describe an image in natural language (auge + apfel):
demo/describe photo.jpg # "A cat sleeping on a blue couch..."
demo/describe screenshot.png -c # describe and copytranslate — OCR text from image and translate (auge + apfel):
demo/translate menu.jpg # translate to English
demo/translate -l German sign.png # translate to German
demo/translate -l Japanese doc.pdfreceipt — extract structured data from receipt photos (auge + apfel):
demo/receipt grocery.jpg # vendor, date, total, items
demo/receipt -j scan.png | jq .totalexplain-image — full image analysis (auge + apfel):
demo/explain-image screenshot.png # classify + OCR + faces + barcodes → explanation
demo/explain-image error.png -c # explain an error dialogAlso in demo/:
- qr — read QR codes and barcodes, optionally open URLs
- sort-images — classify all images in a directory, group by category
- diff-text — OCR two images and diff the extracted text
- faces — count faces across photos with per-file summary
- monitor — watch mode: periodic screen OCR, alert on text changes or pattern match
auge --ocr <image> Extract text from image (OCR)
auge --classify <image> Classify image content
auge --barcode <image> Detect barcodes and QR codes
auge --faces <image> Detect faces
auge --release Show detailed release and build info
| Flag | Description |
|---|---|
-o, --output <fmt> |
Output format: plain, json, md, or ndjson |
--plain / --json / --md / --ndjson |
Shorthand for -o <fmt> |
--compact |
Single-line compact JSON (when output is JSON) |
-q, --quiet |
Suppress non-essential output |
--no-color |
Disable ANSI colors |
--clipboard |
Read image from the macOS clipboard (NSPasteboard) |
--dpi <n> |
PDF rasterization DPI 72-600 (default: 200) |
--prefer-embedded |
Use PDF text layer when present (default) |
--no-prefer-embedded |
Force OCR even on searchable PDFs |
--langs <a,b,c> |
BCP-47 OCR language hints (e.g. en-US,de-DE) |
--enhance |
Upscale tiny images before OCR (helps small text) |
--clean |
FoundationModels post-pass: dehyphenate, reflow, fix OCR errors (macOS 26+) |
--top <n> |
Max classification results (default: 10) |
--min-confidence <n> |
Min confidence threshold 0-1 (default: 0.01) |
-v, --version |
Print version |
--release |
Show detailed version, build, and capability info |
-h, --help |
Show help |
| Code | Meaning |
|---|---|
| 0 | Success (also: no text/results found — not an error) |
| 1 | Runtime error (bad file, invalid image, analysis failure) |
| 2 | Usage error (bad flags, missing arguments) |
| 5 | Vision framework unavailable |
| Variable | Description |
|---|---|
NO_COLOR |
Disable colors (no-color.org) |
| Mode | Framework Request | macOS | Output |
|---|---|---|---|
--ocr |
VNRecognizeTextRequest |
10.15+ | Text lines |
--classify |
VNClassifyImageRequest |
12+ | Labels with confidence |
--barcode |
VNDetectBarcodesRequest |
10.13+ | Payload + symbology |
--faces |
VNDetectFaceRectanglesRequest |
10.13+ | Count + bounding boxes |
PNG, JPEG, TIFF, BMP, GIF, HEIC, PDF
CLI (--ocr/--classify/--barcode/--faces)
│
├─→ ImageSource.validatePath() — file validation (AugeCore)
├─→ Analyzer.recognizeText() — VNRecognizeTextRequest
├─→ Analyzer.classifyImage() — VNClassifyImageRequest
├─→ Analyzer.detectBarcodes() — VNDetectBarcodesRequest
└─→ Analyzer.detectFaces() — VNDetectFaceRectanglesRequest
│
└─→ Vision framework (100% on-device, zero network)
Built with Swift 6.3 strict concurrency. Single Package.swift, three targets:
AugeCore— pure logic library (no Vision dependency, unit-testable)auge— executable (CLI + Vision framework)auge-tests— 115 unit tests, pure Swift runner (no XCTest)
No Xcode required. Builds and tests with Command Line Tools only.
# Build + install (auto-bumps patch version each time)
make install # build release + install to /usr/local/bin
make build # build release only (no install)
# Version management (zero manual editing)
make version # print current version
make release-minor # bump minor: 0.0.x -> 0.1.0
make release-major # bump major: 0.x.y -> 1.0.0
# Debug build (no version bump)
swift build # quick debug build
# Tests
swift run auge-tests # 115 pure Swift unit tests (no XCTest needed)
bash Tests/integration/run.sh # 17 integration tests (end-to-end CLI)Every make build/make install automatically:
- Bumps the patch version (
.versionfile is the single source of truth) - Updates the README version badge
- Generates build metadata (commit, date, Swift version) viewable via
auge --release
| Suite | Tests | Covers |
|---|---|---|
| AugeErrorTests | 10 | Error classification, CLI labels, exit codes, messages |
| AugeErrorDeepTests | 18 | Every keyword variant, case insensitivity, passthrough, cross-type |
| ImageSourceTests | 16 | Extension validation, path validation |
| ImageSourceDeepTests | 11 | Unicode paths, URL correctness, error propagation, exhaustive formats |
| ResultFormatterTests | 15 | OCR, classification, barcode, face formatting + JSON encoding |
| ResultFormatterDeepTests | 23 | Exact formats, boundary values, round-trip JSON, key names, unicode |
| CLIParsingTests | 22 | Edge cases, dotfiles, special chars, stress tests |
| Integration | 17 | CLI basics, exit codes, OCR, classify, faces, barcodes, piping, quiet |
| Total | 132 |
| Tool | What | Apple Framework | Repo |
|---|---|---|---|
| apfel | LLM (text generation) | FoundationModels | Arthur-Ficial/apfel |
| ohr | Speech-to-text | SpeechAnalyzer | Arthur-Ficial/ohr |
| kern | Text embeddings | NLContextualEmbedding | Arthur-Ficial/kern |
| auge | Vision / OCR | Vision | Arthur-Ficial/auge |
Meta-repo: apfel-ecosystem
