Dunfey · Hotel WWDC as data, est. 1983
Front desk everything
Years
Topics

2026 App ServicesAI & Machine Learning

WWDC26 · 16 min · App Services / AI & Machine Learning

What’s new in image understanding

Unlock powerful image understanding with the latest Vision framework and Foundation Models framework updates. The new tap-to-segment request lets you segment images in new ways, and Vision now supports watchOS. Combine the new image support in Apple Foundation Model together with OCR, barcode scanning and your own tools to deliver LLM-powered visual understanding in your app.

Watch at developer.apple.com ↗

Transcript all transcripts

Chapters

  • 0:00 — Introduction
  • 1:36 — Segment images with tap-to-segment
  • 5:50 — Image inputs for Foundation Models
  • 7:57 — Image-based tool calling
  • 13:09 — Vision on watchOS
  • 14:39 — Next steps

Code shown on screen · 5 snippets

Segment images (tap-to-segment) swift · at 4:15 ↗
// Generate a segmentation mask of an object with a seed point
let handler = ImageRequestHandler(image)
let request = GenerateIterativeSegmentationRequest(seed: point)
let observation = try await handler.perform(request)
let mask = observation?.pixelBuffer

// Refine the mask with a new point
request.addIncludedPoint(newPoint)
let refinedObservation = try await handler.perform(request)
Generate an image caption with Foundation Models swift · at 6:41 ↗
// Generate an image caption with Foundation Models
import FoundationModels

let prompt = Prompt {
    "Generate a caption for this image"
    Attachment(image)
}
let response = try await session.respond(to: prompt)
let caption = response.content
Create an image-based tool swift · at 9:55 ↗
// Create an image-based tool
struct PlantIdentifierTool: Tool {
    @SessionProperty(\.history) var history

    @Generable
    struct Arguments {
        var image: ImageReference
    }

    func call(arguments: Arguments) async throws -> String {
        let imageReference = arguments.image
        let transcript = Transcript(history)
        guard let imageAttachment = imageReference.resolve(in: transcript) else {
            throw AppError.imageNotFound
        }
        let image = try imageAttachment.pixelBuffer()
        return classifyPlant(image)
    }
}
Use Vision tools swift · at 12:09 ↗
// Use Vision tools
import FoundationModels
import Vision

let session = LanguageModelSession(model: model, tools: [BarcodeReaderTool()])
let response = try await session.respond(generating: EventInfo.self) {
    "Get the date, location, and website from this flyer"
    Attachment(image)
        .label("flyer")
}
Create a crop that highlights a prominent subject (watchOS / saliency) swift · at 13:54 ↗
// Create a crop that highlights a prominent subject
func generateImageCrop(in image: CGImage) async throws -> NormalizedRect? {
    let request = GenerateObjectnessBasedSaliencyImageRequest()
    let observation = try await request.perform(on: image)
    let prominentObjects = observation.salientObjects
    return prominentObjects.first
}

Resources