WWDC26 · 16 min · App Services / AI & Machine Learning

What’s new in image understanding

Unlock powerful image understanding with the latest Vision framework and Foundation Models framework updates. The new tap-to-segment request lets you segment images in new ways, and Vision now supports watchOS. Combine the new image support in Apple Foundation Model together with OCR, barcode scanning and your own tools to deliver LLM-powered visual understanding in your app.

Watch at developer.apple.com ↗

Transcript all transcripts

Chapters

0:00 — Introduction
1:36 — Segment images with tap-to-segment
5:50 — Image inputs for Foundation Models
7:57 — Image-based tool calling
13:09 — Vision on watchOS
14:39 — Next steps

Code shown on screen · 5 snippets

Segment images (tap-to-segment) swift · at 4:15 ↗

// Generate a segmentation mask of an object with a seed point
let handler = ImageRequestHandler(image)
let request = GenerateIterativeSegmentationRequest(seed: point)
let observation = try await handler.perform(request)
let mask = observation?.pixelBuffer

// Refine the mask with a new point
request.addIncludedPoint(newPoint)
let refinedObservation = try await handler.perform(request)

Generate an image caption with Foundation Models swift · at 6:41 ↗

// Generate an image caption with Foundation Models
import FoundationModels

let prompt = Prompt {
    "Generate a caption for this image"
    Attachment(image)
}
let response = try await session.respond(to: prompt)
let caption = response.content

Create an image-based tool swift · at 9:55 ↗

// Create an image-based tool
struct PlantIdentifierTool: Tool {
    @SessionProperty(\.history) var history

    @Generable
    struct Arguments {
        var image: ImageReference
    }

    func call(arguments: Arguments) async throws -> String {
        let imageReference = arguments.image
        let transcript = Transcript(history)
        guard let imageAttachment = imageReference.resolve(in: transcript) else {
            throw AppError.imageNotFound
        }
        let image = try imageAttachment.pixelBuffer()
        return classifyPlant(image)
    }
}

Use Vision tools swift · at 12:09 ↗

// Use Vision tools
import FoundationModels
import Vision

let session = LanguageModelSession(model: model, tools: [BarcodeReaderTool()])
let response = try await session.respond(generating: EventInfo.self) {
    "Get the date, location, and website from this flyer"
    Attachment(image)
        .label("flyer")
}

Create a crop that highlights a prominent subject (watchOS / saliency) swift · at 13:54 ↗

// Create a crop that highlights a prominent subject
func generateImageCrop(in image: CGImage) async throws -> NormalizedRect? {
    let request = GenerateObjectnessBasedSaliencyImageRequest()
    let observation = try await request.perform(on: image)
    let prominentObjects = observation.salientObjects
    return prominentObjects.first
}

Resources

Documentation Segmenting objects using taps, scribbles or rectangles
Sample code Implementing saliency-based image cropping in iOS and watchOS

Deep dive into the Foundation Models framework

WWDC25 · 16 snippets

26 min
Discover Swift enhancements in the Vision framework

WWDC24

17 min
What’s new in the Foundation Models framework

WWDC26 · 7 snippets

21 min

Chapters

Code shown on screen · 5 snippets

Resources

Related sessions

Deep dive into the Foundation Models framework

Discover Swift enhancements in the Vision framework

What’s new in the Foundation Models framework