vision

Safe Rust bindings for Apple's Vision framework — on-device OCR, object detection, face landmarks, and other computer vision tasks on macOS.

Status: v0.16.0 keeps the full Vision request surface, adds a Tier-1 async_api module for one-shot OCR / face / barcode / segmentation workflows, and ships a fully-implemented COVERAGE.md + COVERAGE_AUDIT.md matrix plus a gold-standard multi-file Swift bridge.

Quick start — OCR

use apple_vision::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let recognizer = TextRecognizer::new()
        .with_recognition_level(RecognitionLevel::Accurate)
        .with_language_correction(true);

    let observations = recognizer.recognize_in_path("screenshot.png")?;
    for obs in &observations {
        println!("[{:.2}] '{}'", obs.confidence, obs.text);
    }
    Ok(())
}

Composes with the rest of the doom-fish stack

screencapturekit-rs / capture ──► IOSurface / PNG ──► vision ──► text
                                                          │
                                                          ▼
                                                  foundation-models
                                                  ("summarise this")

Feature flags

All request-type modules can be enabled independently, and the default feature set still enables the full Vision surface. v0.16.0 also adds an optional async feature for executor-agnostic Future wrappers around the Tier-1 one-shot request surface.

Async API

Enable async plus the request features you need:

apple-vision = { version = "0.16.0", features = ["async", "recognize_text"] }

use apple_vision::async_api::AsyncRecognizeText;
use apple_vision::RecognitionLevel;

# async fn example() -> Result<(), Box<dyn std::error::Error>> {
let texts = AsyncRecognizeText::new(RecognitionLevel::Accurate, true)
    .recognize_in_path("screenshot.png")
    .await?;
println!("found {} text observations", texts.len());
# Ok(())
# }

Tier-1 currently covers background-queue wrappers for OCR, face detection, barcode detection, and person segmentation. Multi-fire delegate / stream-style Vision APIs remain future Tier-2 work.

Roadmap

Single-image Vision requests (OCR, faces, landmarks, pose, contours, saliency, segmentation, Core ML, and the rest of the stateless request surface)
Pairwise image-registration requests (VNTranslationalImageRegistrationRequest, VNHomographicImageRegistrationRequest)
Stateful tracking requests (VNTrackObjectRequest, VNTrackRectangleRequest, VNTrackOpticalFlowRequest, VNTrackTranslationalImageRegistrationRequest, VNTrackHomographicImageRegistrationRequest)
Header-audited request + observation coverage matrix (COVERAGE.md) with dedicated wrappers for every current request/observation type and a split Swift bridge (all bridge files stay under 500 lines)
Explicit VNRequest / VNObservation / request-handler / VNVideoProcessor wrappers for OCR pipelines, plus base request/observation helpers reused across the rest of the crate
Async API (Tier-1 Future wrappers for OCR, face detection, barcode detection, and person segmentation; Tier-2 stream/delegate surfaces still TBD)

License

Licensed under either of Apache-2.0 or MIT at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
examples		examples
src		src
swift-bridge		swift-bridge
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
COVERAGE.md		COVERAGE.md
COVERAGE_AUDIT.md		COVERAGE_AUDIT.md
COVERAGE_AUDIT_V2.md		COVERAGE_AUDIT_V2.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vision

Quick start — OCR

Composes with the rest of the doom-fish stack

Feature flags

Async API

Roadmap

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vision

Quick start — OCR

Composes with the rest of the doom-fish stack

Feature flags

Async API

Roadmap

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages