Blog · Analysis · Last reviewed June 16, 2026

The Alt-Text Model Becomes the Access Clerk

AI-generated image descriptions can widen access. They can also become the hidden clerk deciding what a blind or low-vision person is allowed to know about the visual web.

The Access Clerk

An image without a usable description is not silent for everyone. It is selectively silent. A sighted user sees the chart, warning label, map, meme, family photo, product defect, ballot sample, protest sign, or evacuation route. A screen reader user may hear only "image," a file name, or an automatic guess.

The alt-text model enters at that threshold. It looks at a picture and writes a sentence for someone else to hear. That can be generous. It can make an unlabeled web more navigable, a photo archive more searchable, and a document less hostile. But the model is also an access clerk. It decides which visual facts cross the boundary into language and which disappear.

What the Standard Requires

As of June 16, 2026, the accessibility baseline is not vague. W3C's WCAG 2.2 guidance for Success Criterion 1.1.1 explains that non-text content such as pictures, charts, diagrams, animations, and controls needs text alternatives, with decorative content marked so assistive technologies can ignore it. Section508.gov describes alternative text as text that conveys the meaning of an image for people with vision disabilities and says that without alt text, screen reader users cannot access image-provided content.

The legal pressure is also increasing. The Department of Justice's Title II web and mobile accessibility fact sheet says state and local government web content and mobile apps must use WCAG 2.1, Level AA as the technical standard. The same fact sheet notes that an April 2026 interim final rule extended compliance dates to April 26, 2027 for public entities with populations of 50,000 or more, and April 26, 2028 for smaller public entities and special district governments.

Those rules do not say "generate any caption." They point toward meaningful access. A caption that names objects but misses the point can satisfy the shape of accessibility while failing the person.

What the Model Adds

The model layer is already ordinary. Microsoft says Microsoft 365 can generate alt text automatically or on demand. Its current FAQ says Copilot+PCs generate alt text locally at image insertion, while other devices send the image to a cloud model when a user requests it. Microsoft also says AI-generated descriptions are labeled as possibly inaccurate and encourages users to review and revise them, especially for sensitive or specialized content.

Apple's VoiceOver Recognition can describe images in apps and on the web using on-device intelligence, and its support page warns that VoiceOver Recognition should not be relied on where harm or injury could result, in high-risk situations, for navigation, or for medical diagnosis or treatment. Be My Eyes describes Be My AI as an AI-powered visual description feature for blind and low-vision users, with human support available for tougher requests. Meta's Automatic Alt Text research page says Facebook deployed a computer-vision system to generate photo alt text for screen reader users.

These are real accessibility tools. The point is not to reject them. The point is to govern the difference between assistance and substitution.

Where Context Breaks

Alt text is not a generic photograph caption. It is a contextual act. The same image can need different descriptions in a museum catalog, a medical training slide, a school newsletter, a police evidence file, a product recall, or a public-health warning.

A model may say "bar chart" when the important fact is that cases doubled after a policy change. It may say "crowd outdoors" when the important fact is that police are blocking an exit. It may describe a medication box but omit dosage warnings because they are small. It may read a meme literally and miss the joke. It may treat a decorative flourish as content or a content-bearing watermark as decoration.

Section508.gov names this failure mode directly: a common mistake is alt text that is a computer-generated visual description but does not describe the relevant content of the image. That is the whole governance problem in one sentence. Vision recognition can identify things. Accessibility needs purpose.

Governance Standard

A serious AI alt-text workflow should start with the author, not the model. The author knows why the image is there.

First, require human review for institutional content. Public services, education, health care, employment, voting, benefits, safety instructions, financial records, legal notices, and emergency information should not publish machine alt text without review by someone responsible for the page.

Second, preserve the distinction between decorative and meaningful images. A model that fills every empty alt field can create noise. Decorative content should be marked correctly so the user is not forced to hear irrelevant visual clutter.

Third, disclose generated descriptions in editing tools. Editors should know whether a description was human-written, machine-generated, or machine-drafted and human-approved. Users should have a way to report bad descriptions.

Fourth, test with affected users. Accuracy metrics should include blind and low-vision users, screen reader workflows, refreshable braille, low-bandwidth modes, multilingual pages, charts, screenshots, scanned documents, maps, memes, and forms.

Fifth, set privacy limits. If an image is sent to a cloud model to generate alt text, the editor should know what is transmitted, retained, and used for training. Private documents, faces, medical images, student records, and legal material need stricter handling.

Sixth, audit the record. Important documents should store the source image, alt text, approval state, date, tool, and responsible editor. Accessibility is maintenance, not a one-time caption.

What This Changes

The visual internet has always been edited for access. Someone chose the image, cropped it, captioned it, omitted it, or treated it as decoration. AI does not remove that editorial layer. It hides it behind a sentence that sounds finished.

The Spiralist reading is simple: an alt-text model is a small interface with institutional consequences. It can make a page more open, but it can also make exclusion harder to notice. The person who cannot see the image may be least able to know whether the description is missing the decisive fact.

Good alt text is not a courtesy after publication. It is part of publication. The machine can draft. It can notice missing fields. It can help a person move through an inaccessible archive. But when a school, city, hospital, court, employer, or platform lets the model be the only access clerk, the institution is no longer merely describing images. It is deciding whose version of the visual record counts.

Source Discipline

Claims on this page are grounded in W3C accessibility guidance, U.S. government accessibility materials, DOJ Title II web rule materials, official product documentation, and original Meta research documentation. Product claims are treated as feature descriptions, not as proof that generated descriptions are accurate in every context.

Sources


Return to Blog