The Alt-Text Model Becomes the Access Clerk
AI-generated image descriptions can widen access. They can also become the hidden clerk deciding what a blind or low-vision person is allowed to know about the visual web.
The Access Clerk
An image without a usable description is not silent for everyone. It is selectively silent. A sighted user sees the chart, warning label, map, meme, family photo, product defect, ballot sample, protest sign, or evacuation route. A screen reader user may hear only "image," a file name, or an automatic guess.
The alt-text model enters at that threshold. It looks at a picture and writes a sentence for someone else to hear. That can be generous. It can make an unlabeled web more navigable, a photo archive more searchable, and a document less hostile. But the model is also an access clerk. It decides which visual facts cross the boundary into language and which disappear.
What the Standard Requires
As of June 16, 2026, the accessibility baseline is not vague. W3C's WCAG 2.2 guidance for Success Criterion 1.1.1 explains that non-text content such as pictures, charts, diagrams, animations, and controls needs text alternatives, with decorative content marked so assistive technologies can ignore it. Section508.gov describes alternative text as text that conveys the meaning of an image for people with vision disabilities and says that without alt text, screen reader users cannot access image-provided content.
The legal pressure is also increasing. The Department of Justice's Title II web and mobile accessibility fact sheet says state and local government web content and mobile apps must use WCAG 2.1, Level AA as the technical standard. The same fact sheet notes that an April 2026 interim final rule extended compliance dates to April 26, 2027 for public entities with populations of 50,000 or more, and April 26, 2028 for smaller public entities and special district governments.
Those rules do not say "generate any caption." They point toward meaningful access. A caption that names objects but misses the point can satisfy the shape of accessibility while failing the person.
What the Model Adds
The model layer is already ordinary. Microsoft says Microsoft 365 can generate alt text automatically or on demand. Its current FAQ says Copilot+PCs generate alt text locally at image insertion, while other devices send the image to a cloud model when a user requests it. Microsoft also says AI-generated descriptions are labeled as possibly inaccurate and encourages users to review and revise them, especially for sensitive or specialized content.
Apple's VoiceOver Recognition can describe images in apps and on the web using on-device intelligence, and its support page warns that VoiceOver Recognition should not be relied on where harm or injury could result, in high-risk situations, for navigation, or for medical diagnosis or treatment. Be My Eyes describes Be My AI as an AI-powered visual description feature for blind and low-vision users, with human support available for tougher requests. Meta's Automatic Alt Text research page says Facebook deployed a computer-vision system to generate photo alt text for screen reader users.
These are real accessibility tools. The point is not to reject them. The point is to govern the difference between assistance and substitution.
Where Context Breaks
Alt text is not a generic photograph caption. It is a contextual act. The same image can need different descriptions in a museum catalog, a medical training slide, a school newsletter, a police evidence file, a product recall, or a public-health warning.
A model may say "bar chart" when the important fact is that cases doubled after a policy change. It may say "crowd outdoors" when the important fact is that police are blocking an exit. It may describe a medication box but omit dosage warnings because they are small. It may read a meme literally and miss the joke. It may treat a decorative flourish as content or a content-bearing watermark as decoration.
Section508.gov names this failure mode directly: a common mistake is alt text that is a computer-generated visual description but does not describe the relevant content of the image. That is the whole governance problem in one sentence. Vision recognition can identify things. Accessibility needs purpose.
Governance Standard
A serious AI alt-text workflow should start with the author, not the model. The author knows why the image is there.
First, require human review for institutional content. Public services, education, health care, employment, voting, benefits, safety instructions, financial records, legal notices, and emergency information should not publish machine alt text without review by someone responsible for the page.
Second, preserve the distinction between decorative and meaningful images. A model that fills every empty alt field can create noise. Decorative content should be marked correctly so the user is not forced to hear irrelevant visual clutter.
Third, disclose generated descriptions in editing tools. Editors should know whether a description was human-written, machine-generated, or machine-drafted and human-approved. Users should have a way to report bad descriptions.
Fourth, test with affected users. Accuracy metrics should include blind and low-vision users, screen reader workflows, refreshable braille, low-bandwidth modes, multilingual pages, charts, screenshots, scanned documents, maps, memes, and forms.
Fifth, set privacy limits. If an image is sent to a cloud model to generate alt text, the editor should know what is transmitted, retained, and used for training. Private documents, faces, medical images, student records, and legal material need stricter handling.
Sixth, audit the record. Important documents should store the source image, alt text, approval state, date, tool, and responsible editor. Accessibility is maintenance, not a one-time caption.
What This Changes
The visual internet has always been edited for access. Someone chose the image, cropped it, captioned it, omitted it, or treated it as decoration. AI does not remove that editorial layer. It hides it behind a sentence that sounds finished.
The Spiralist reading is simple: an alt-text model is a small interface with institutional consequences. It can make a page more open, but it can also make exclusion harder to notice. The person who cannot see the image may be least able to know whether the description is missing the decisive fact.
Good alt text is not a courtesy after publication. It is part of publication. The machine can draft. It can notice missing fields. It can help a person move through an inaccessible archive. But when a school, city, hospital, court, employer, or platform lets the model be the only access clerk, the institution is no longer merely describing images. It is deciding whose version of the visual record counts.
Source Discipline
Claims on this page are grounded in W3C accessibility guidance, U.S. government accessibility materials, DOJ Title II web rule materials, official product documentation, and original Meta research documentation. Product claims are treated as feature descriptions, not as proof that generated descriptions are accurate in every context.
Sources
- W3C WAI, Understanding Success Criterion 1.1.1: Non-text Content, reviewed June 16, 2026.
- Section508.gov, Authoring Meaningful Alternative Text, reviewed June 16, 2026.
- ADA.gov, Fact Sheet: New Rule on the Accessibility of Web Content and Mobile Apps Provided by State and Local Governments, reviewed June 16, 2026.
- Microsoft Support, Frequently asked questions about AI-generated alt text, reviewed June 16, 2026.
- Microsoft Support, Everything you need to know to write effective alt text, reviewed June 16, 2026.
- Apple Support, Use VoiceOver Recognition on your iPhone or iPad, reviewed June 16, 2026.
- Apple Support, Change VoiceOver Recognition settings in VoiceOver Utility on Mac, reviewed June 16, 2026.
- Be My Eyes, Be My AI, reviewed June 16, 2026.
- Meta AI Research, Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service, February 22, 2017.
- NIST, AI Risk Management Framework, reviewed June 16, 2026.
- Related pages: The World Becomes an Embedding, The Machine Interpreter Becomes the Language Gate, The Government Chatbot Becomes the Front Desk, The AI Detector Becomes the Discipline Machine, Accessibility, and Privacy and Data.