Safe Rust bindings for Apple's Vision framework — on-device OCR, object detection, face landmarks, and other computer vision tasks on macOS.
Status: v0.16.0 keeps the full Vision request surface, adds a Tier-1
async_apimodule for one-shot OCR / face / barcode / segmentation workflows, and ships a fully-implementedCOVERAGE.md+COVERAGE_AUDIT.mdmatrix plus a gold-standard multi-file Swift bridge.
use apple_vision::prelude::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let recognizer = TextRecognizer::new()
.with_recognition_level(RecognitionLevel::Accurate)
.with_language_correction(true);
let observations = recognizer.recognize_in_path("screenshot.png")?;
for obs in &observations {
println!("[{:.2}] '{}'", obs.confidence, obs.text);
}
Ok(())
}screencapturekit-rs / capture ──► IOSurface / PNG ──► vision ──► text
│
▼
foundation-models
("summarise this")
All request-type modules can be enabled independently, and the default feature set still enables the full Vision surface. v0.16.0 also adds an optional async feature for executor-agnostic Future wrappers around the Tier-1 one-shot request surface.
Enable async plus the request features you need:
apple-vision = { version = "0.16.0", features = ["async", "recognize_text"] }use apple_vision::async_api::AsyncRecognizeText;
use apple_vision::RecognitionLevel;
# async fn example() -> Result<(), Box<dyn std::error::Error>> {
let texts = AsyncRecognizeText::new(RecognitionLevel::Accurate, true)
.recognize_in_path("screenshot.png")
.await?;
println!("found {} text observations", texts.len());
# Ok(())
# }Tier-1 currently covers background-queue wrappers for OCR, face detection, barcode detection, and person segmentation. Multi-fire delegate / stream-style Vision APIs remain future Tier-2 work.
- Single-image Vision requests (OCR, faces, landmarks, pose, contours, saliency, segmentation, Core ML, and the rest of the stateless request surface)
- Pairwise image-registration requests (
VNTranslationalImageRegistrationRequest,VNHomographicImageRegistrationRequest) - Stateful tracking requests (
VNTrackObjectRequest,VNTrackRectangleRequest,VNTrackOpticalFlowRequest,VNTrackTranslationalImageRegistrationRequest,VNTrackHomographicImageRegistrationRequest) - Header-audited request + observation coverage matrix (
COVERAGE.md) with dedicated wrappers for every current request/observation type and a split Swift bridge (all bridge files stay under 500 lines) - Explicit
VNRequest/VNObservation/ request-handler /VNVideoProcessorwrappers for OCR pipelines, plus base request/observation helpers reused across the rest of the crate - Async API (Tier-1
Futurewrappers for OCR, face detection, barcode detection, and person segmentation; Tier-2 stream/delegate surfaces still TBD)
Licensed under either of Apache-2.0 or MIT at your option.
