<!--
{
  "availability" : [

  ],
  "documentType" : "symbol",
  "framework" : "Vision",
  "identifier" : "/documentation/Vision",
  "metadataVersion" : "0.1.0",
  "role" : "Framework",
  "symbol" : {
    "kind" : "Framework",
    "modules" : [
      "Vision"
    ],
    "preciseIdentifier" : "Vision"
  },
  "title" : "Vision"
}
-->

# Vision

Analyze image and video content in your app using computer vision algorithms for object detection, text recognition, and image segmentation.

## Overview

The Vision framework provides pretrained machine learning models for computer vision tasks. Use Vision to analyze still images and video for a variety of purposes, including:

- Recognizing text in 26 languages across everyday objects, documents, and photos
- Detecting barcodes and QR codes
- Detecting faces and analyzing facial features
- Isolating people and foreground objects with subject lifting
- Tracking body poses of people and animals for action and gesture recognition
- Classifying images for categorization and search
- Measuring image quality and comparing visual similarity

![A dog isolated from its background through subject lifting.](images/Vision/vision-framework-subject-lifting~dark@2x.png)

All Vision analysis tasks follow the same steps: create a request, perform it on an image or video frame, and read the resulting observations.
For example, to detect text in an image, you create a request for the type of analysis you want to perform. Each request conforms to the [`VisionRequest`](/documentation/Vision/VisionRequest) protocol.

```swift
let request = RecognizeTextRequest()
let observations = try await request.perform(on: imageData)

// Store observations for use in your app
var scannedText: [String] = []

for observation in observations {
    scannedText.append(observation.transcript)
}
```

The request returns an array of observation objects that contain the image-analysis results. Each observation type provides specific details about the analysis results, such as recognized text, confidence scores, and bounding box locations.

For observations that describe image locations -—- such as face bounding boxes or text regions -—- Vision uses a normalized coordinate system where values range from `0.0` to `1.0`, with the origin at the lower-left corner. For more information on coordinate types and conversion helpers, see [Image locations and regions](https://developer.apple.com/documentation/vision#Image-locations-and-regions).

You can also perform multiple requests on the same image, for more information see [`ImageRequestHandler`](/documentation/Vision/ImageRequestHandler) in the Request handlers section.

This pattern applies to all Vision requests, whether you’re detecting faces, tracking motion, analyzing image quality, or performing custom analysis with Core ML models. Each request type returns observations specific to its analysis task.

> Note:
> Starting in iOS 18.0, the Vision framework provides a new Swift-only API. See <doc://Vision/documentation/Vision/original-objective-c-and-swift-api> to view the original API.

## Topics

### Text and document analysis

[Locating and displaying recognized text](/documentation/Vision/locating-and-displaying-recognized-text)

Perform text recognition on a photo using the Vision framework’s text-recognition request.

[Recognizing tables within a document](/documentation/Vision/recognize-tables-within-a-document)

Scan a document that contains a table and extract its content in a formatted way.

[`DetectBarcodesRequest`](/documentation/Vision/DetectBarcodesRequest)

A request that detects barcodes in an image.

[`DetectDocumentSegmentationRequest`](/documentation/Vision/DetectDocumentSegmentationRequest)

A request that detects rectangular regions that contain text in the input image.

[`DetectTextRectanglesRequest`](/documentation/Vision/DetectTextRectanglesRequest)

An image-analysis request that finds regions of visible text in an image.

[`RecognizeDocumentsRequest`](/documentation/Vision/RecognizeDocumentsRequest)

An image-analysis request to scan an image of a document and provide information about its structure.

[`RecognizeTextRequest`](/documentation/Vision/RecognizeTextRequest)

An image-analysis request that recognizes text in an image.

### Facial analysis

[Analyzing a selfie and visualizing its content](/documentation/Vision/analyzing-a-selfie-and-visualizing-its-content)

Calculate face-capture quality and visualize facial features for a collection of images using the Vision framework.

[`DetectFaceCaptureQualityRequest`](/documentation/Vision/DetectFaceCaptureQualityRequest)

A request that produces a floating-point number that represents the capture quality of a face in a photo.

[`DetectFaceLandmarksRequest`](/documentation/Vision/DetectFaceLandmarksRequest)

An image-analysis request that finds facial features like eyes and mouth in an image.

[`DetectFaceRectanglesRequest`](/documentation/Vision/DetectFaceRectanglesRequest)

A request that finds faces within an image.

### Image segmentation and subject lifting

[Segmenting objects using taps, scribbles or rectangles](/documentation/Vision/segmenting-objects-using-taps-scribbles-or-rectangles)

Select objects or regions in a photo using taps, scribbles, or rectangle selection, and generate a segmentation mask using the iterative segmentation API.

[`GenerateForegroundInstanceMaskRequest`](/documentation/Vision/GenerateForegroundInstanceMaskRequest)

A request that generates an instance mask of noticeable objects to separate from the background.

[`GeneratePersonInstanceMaskRequest`](/documentation/Vision/GeneratePersonInstanceMaskRequest)

A request that produces a mask of individual people it finds in the input image.

[`GeneratePersonSegmentationRequest`](/documentation/Vision/GeneratePersonSegmentationRequest)

A request that produces a matte image for a person it finds in the input image.

### Pose analysis

[`DetectAnimalBodyPoseRequest`](/documentation/Vision/DetectAnimalBodyPoseRequest)

A request that detects an animal body pose.

[`DetectHumanBodyPose3DRequest`](/documentation/Vision/DetectHumanBodyPose3DRequest)

A request that detects points on human bodies in 3D space, relative to the camera.

[`DetectHumanBodyPoseRequest`](/documentation/Vision/DetectHumanBodyPoseRequest)

A request that detects a human body pose.

[`DetectHumanHandPoseRequest`](/documentation/Vision/DetectHumanHandPoseRequest)

A request that detects a human hand pose.

[Supporting Pose Types](/documentation/Vision/supporting-pose-types)

Types you use when working with pose analysis.

### Image classification and recognition

[Classifying images for categorization and search](/documentation/Vision/classifying-images-for-categorization-and-search)

Analyze and label images using a Vision classification request.

[`ClassifyImageRequest`](/documentation/Vision/ClassifyImageRequest)

A request to classify an image.

[`DetectHumanRectanglesRequest`](/documentation/Vision/DetectHumanRectanglesRequest)

A request that finds rectangular regions that contain people in an image.

[`RecognizeAnimalsRequest`](/documentation/Vision/RecognizeAnimalsRequest)

A request that recognizes animals in an image.

### Shape and edge detection

[`DetectContoursRequest`](/documentation/Vision/DetectContoursRequest)

A request that detects the contours of the edges of an image.

[`DetectHorizonRequest`](/documentation/Vision/DetectHorizonRequest)

An image-analysis request that determines the horizon angle in an image.

[`DetectRectanglesRequest`](/documentation/Vision/DetectRectanglesRequest)

An image-analysis request that finds projected rectangular regions in an image.

### Image quality and saliency analysis

[Implementing saliency-based image cropping in iOS and watchOS](/documentation/Vision/implementing-saliency-based-image-cropping-in-iOS-and-watchOS)

Crop regions most likely drawing people’s attention from an image in your iOS or watchOS app.

[Generating high-quality thumbnails from videos](/documentation/Vision/generating-thumbnails-from-videos)

Identify the most visually pleasing frames in a video by using the image-aesthetics scores request.

[`CalculateImageAestheticsScoresRequest`](/documentation/Vision/CalculateImageAestheticsScoresRequest)

A request that analyzes an image for aesthetically pleasing attributes.

[`DetectLensSmudgeRequest`](/documentation/Vision/DetectLensSmudgeRequest)

A request that detects a smudge on a lens from an image or video frame capture.

[`GenerateAttentionBasedSaliencyImageRequest`](/documentation/Vision/GenerateAttentionBasedSaliencyImageRequest)

An object that produces a heat map that identifies the parts of an image most likely to draw attention.

[`GenerateObjectnessBasedSaliencyImageRequest`](/documentation/Vision/GenerateObjectnessBasedSaliencyImageRequest)

A request that generates a heat map that identifies the parts of an image most likely to represent objects.

### Motion and object tracking

[`DetectTrajectoriesRequest`](/documentation/Vision/DetectTrajectoriesRequest)

A request that detects the trajectories of shapes moving along a parabolic path.

[`TrackObjectRequest`](/documentation/Vision/TrackObjectRequest)

An image-analysis request that tracks the movement of a previously identified object across multiple images or video frames.

[`TrackOpticalFlowRequest`](/documentation/Vision/TrackOpticalFlowRequest)

A request that determines the direction change of vectors for each pixel from a previous to current image.

[`TrackRectangleRequest`](/documentation/Vision/TrackRectangleRequest)

An image-analysis request that tracks movement of a previously identified rectangular object across multiple images or video frames.

### Image registration and comparison

[`GenerateImageFeaturePrintRequest`](/documentation/Vision/GenerateImageFeaturePrintRequest)

An image-based request to generate feature prints from an image.

[`TrackHomographicImageRegistrationRequest`](/documentation/Vision/TrackHomographicImageRegistrationRequest)

An image-analysis request that you track over time to determine the perspective warp matrix necessary to align the content of two images.

[`TrackTranslationalImageRegistrationRequest`](/documentation/Vision/TrackTranslationalImageRegistrationRequest)

An image-analysis request that you track over time to determine the affine transform necessary to align the content of two images.

### Custom Core ML integration

[`CoreMLRequest`](/documentation/Vision/CoreMLRequest)

An image-analysis request that uses a Core ML model to process images.

### Protocols

[`ImageProcessingRequest`](/documentation/Vision/ImageProcessingRequest)

A type for image-analysis requests that focus on a specific part of an image.

[`PoseProviding`](/documentation/Vision/PoseProviding)

An observation that provides a collection of joints that make up a pose.

[`StatefulRequest`](/documentation/Vision/StatefulRequest)

The protocol for a type that builds evidence of a condition over time.

[`TargetedRequest`](/documentation/Vision/TargetedRequest)

A type for analyzing two images together.

[`VisionObservation`](/documentation/Vision/VisionObservation)

A type for objects produced by image-analysis requests.

[`VisionRequest`](/documentation/Vision/VisionRequest)

A type for image-analysis requests.

### Request handlers

[`ImageRequestHandler`](/documentation/Vision/ImageRequestHandler)

An object that processes one or more image-analysis requests pertaining to a single image.

[`TargetedImageRequestHandler`](/documentation/Vision/TargetedImageRequestHandler)

An object that performs image-analysis requests on two images.

[`VideoProcessor`](/documentation/Vision/VideoProcessor)

An object that performs offline analysis of video content.

### Image locations and regions

[`NormalizedPoint`](/documentation/Vision/NormalizedPoint)

A point in a 2D coordinate system.

[`NormalizedRect`](/documentation/Vision/NormalizedRect)

The location and dimensions of a rectangle.

[`NormalizedRegion`](/documentation/Vision/NormalizedRegion)

A polygon composed of normalized points.

[`NormalizedCircle`](/documentation/Vision/NormalizedCircle)

The center point and radius of a 2D circle.

[`BoundingBoxProviding`](/documentation/Vision/BoundingBoxProviding)

A protocol for objects that have a bounding box.

[`BoundingRegionProviding`](/documentation/Vision/BoundingRegionProviding)

A protocol for objects that have a defined boundary in an image.

[`QuadrilateralProviding`](/documentation/Vision/QuadrilateralProviding)

A protocol for objects that have a bounding quadrilateral.

[`CoordinateOrigin`](/documentation/Vision/CoordinateOrigin)

The origin of a coordinate system relative to an image.

### Errors

[`VisionError`](/documentation/Vision/VisionError)

The errors that the framework produces.

### Legacy API

[Original Objective-C and Swift API](/documentation/Vision/original-objective-c-and-swift-api)



---

Copyright &copy; 2026 Apple Inc. All rights reserved. | [Terms of Use](https://www.apple.com/legal/internet-services/terms/site.html) | [Privacy Policy](https://www.apple.com/privacy/privacy-policy)
