Wiki · Field · Last reviewed June 15, 2026

Trust and Safety

Trust and safety is the operational field that helps online services prevent, detect, respond to, and learn from abuse. It includes content moderation, account integrity, fraud and spam response, child safety, harassment prevention, crisis escalation, user reporting, policy enforcement, safety tooling, transparency, appeals, and increasingly AI misuse and AI-assisted enforcement.

Snapshot

Definition

Trust and safety, often shortened to T&S, is the practice of defining acceptable behavior on a digital service and building the people, policies, processes, and tools needed to enforce those rules while protecting users' rights and safety. The Trust & Safety Professional Association describes the profession as supporting people who develop and enforce principles and policies that define acceptable online behavior and content.

The field is broader than deciding whether a post stays up. It covers conduct, accounts, payments, ads, seller behavior, messaging, recommendations, search visibility, livestreams, AI-generated outputs, model-use policies, developer ecosystems, law-enforcement requests, crisis events, and coordinated abuse. A trust-and-safety decision can remove content, label it, demote it, age-gate it, disable monetization, suspend an account, limit a feature, require verification, preserve evidence, escalate to a specialist team, or route a user into support.

Trust and safety is also a rights problem. Enforcement that is too weak can expose people to abuse, fraud, exploitation, and violence. Enforcement that is too broad, opaque, or politically captured can silence legitimate speech, organize users into unequal visibility, or make appeal impossible. The serious version of the field holds both risks at once.

Scope of the Field

Policy and enforcement. Teams write rules, classify violations, train reviewers, build enforcement workflows, handle escalations, maintain appeal channels, and measure error. Rules need examples, edge cases, regional context, language coverage, and a record of how they changed.

Integrity and abuse prevention. Integrity work targets spam, fraud, account takeovers, bot networks, coordinated inauthentic behavior, ban evasion, fake engagement, impersonation, scams, malicious automation, and manipulation of ranking or reporting systems.

Child and vulnerable-user safety. Safety operations include age assurance, grooming detection, child sexual abuse material response, self-harm crisis pathways, harassment prevention, non-consensual intimate imagery response, youth-product defaults, and escalation to specialist teams or legally required reporting routes.

Product safety and safety engineering. A mature program changes the product, not only the queue. Rate limits, friction, identity challenges, reporting flows, blocked-word controls, recommender dampening, private-message limits, provenance labels, user controls, and feature gating can prevent abuse before a reviewer sees it.

Transparency and recourse. Trust and safety should produce notices, appeals, transparency reports, audit records, researcher access where appropriate, and durable logs for incident review. The Santa Clara Principles treat due process, understandable rules, cultural competence, automation transparency, notice, appeal, and state involvement as central to accountable moderation.

Current Context

As of June 15, 2026, trust and safety has moved from an internal platform specialty into a regulated infrastructure function. The European Union's Digital Services Act requires many online platforms to provide notice and appeal, advertising and recommender transparency, illegal-content reporting mechanisms, protections for minors, and transparency reporting. For very large online platforms and search engines, the DSA adds annual systemic-risk assessments, independent audits, mitigation reporting, and data-access pathways for vetted researchers.

The United Kingdom's Online Safety Act is also operationally important for T&S teams. The UK government explains that illegal-content duties were in effect by March 17, 2025, with Ofcom able to enforce the regime after providers completed illegal-content risk assessments by March 16, 2025. Ofcom's online-safety materials were updated in 2026 to include new measures around intimate image abuse and crisis protocols, including hash matching for certain intimate-image abuse risks and crisis-response measures for significant increases in illegal content or content harmful to children.

Professionalization is visible in the field itself. TSPA lists trust-and-safety work as including content review and moderation, policy enforcement, safety incident management, product policy development, tool building, analytics, legal compliance, and T&S-focused ML/AI. The Digital Trust & Safety Partnership's glossary frames the field as a maturing discipline with shared vocabulary across content concepts, abuse types, enforcement practices, and trust-and-safety technology, while noting that its glossary is not a legal definition.

The current pressure is not only legal compliance. Platforms face adversarial users, fast-moving crises, language gaps, moderator trauma, public scrutiny, government pressure, advertiser pressure, civil-society criticism, and demands for both more removal and less over-removal. AI systems intensify each of those tensions.

AI Relevance

AI changes trust and safety in two directions. First, platforms use machine-learning systems and large models to detect abuse, prioritize queues, classify media, summarize reports, translate content, cluster networks, detect ban evasion, and support reviewer workflows. Those systems can increase scale, but they can also create false positives, uneven language performance, weak explanations, automation bias, and hidden disparities.

Second, users can use generative AI to scale abuse. The same tools that produce ordinary assistance can generate spam variants, phishing copy, synthetic personas, fake listings, harassment scripts, sexualized deepfakes, voice impersonation, fake evidence, misinformation pages, and adaptive evasion tactics. T&S teams therefore need misuse monitoring, red teaming, incident reporting, model-use policies, provenance signals, rate limits, developer controls, and post-launch evaluation.

NIST's AI Risk Management Framework and Generative AI Profile are useful here because they treat AI risk as a lifecycle issue affecting individuals, organizations, and society, not merely as a model benchmark. For T&S, that means evaluating the full system: product surface, policy, data, detection model, reviewer workflow, escalation, appeal, measurement, and downstream harm.

Governance and Safety

A serious trust-and-safety program starts with a risk inventory. The service should know which harms are plausible, which users are vulnerable, which features are abusable, which legal duties apply, which teams own each risk, and which controls are preventive, detective, corrective, or compensatory.

Useful controls include clear rules, scenario-based policy guidance, staffed escalation paths, reviewer training, language and cultural competence, abuse-rate metrics, appeal-rate and reversal-rate metrics, incident review, privacy-preserving logging, safe evidence preservation, vendor oversight, moderator wellbeing protections, red-team exercises, crisis protocols, transparency reports, and externally reviewable audits for high-impact systems.

Metrics should not optimize only for takedown volume or speed. A platform can remove quickly and still be unsafe if it misses coordinated abuse, silences vulnerable users, fails to distinguish satire from threats, hides appeal outcomes, or lets product incentives recreate the same harm. Good T&S metrics track prevalence, reach, recurrence, time to action, false positives, false negatives, successful appeals, language coverage, vulnerable-user impact, reviewer workload, and whether product changes reduced the need for enforcement.

Trust and safety also needs independence from short-term growth incentives. If the team can only clean up harm after launch, it becomes a shield for risky product design. T&S should have power to delay launches, require safer defaults, narrow rollout, add friction, stop abusive monetization, preserve logs, and trigger executive review when the product architecture itself creates harm.

Source Discipline

Claims about trust and safety should identify the source type. A platform transparency report, regulator statement, statute, civil-society principle, academic study, company blog post, leaked internal document, and affected-user testimony each support different claims. Do not use a company's transparency report alone to prove real-world safety; it mainly proves what the company measured and chose to disclose.

Operational claims should be dated and scoped. "The platform removed harmful content" is incomplete without time period, policy category, content type, geography, language, detection source, enforcement action, appeal outcomes, and whether the reported numbers count content, accounts, views, impressions, or pieces reviewed. "AI moderation works" is too vague unless it names the classifier, threshold, task, language, dataset, error rates, human review path, and deployment setting.

Legal claims should use primary sources: statutes, regulator guidance, official enforcement releases, court records, and standards bodies. For professional-practice claims, use field bodies such as TSPA, the Santa Clara Principles, DTSP materials, standards, and disclosed platform practices, while preserving their limits and incentives. For harm claims, prefer documented incidents, regulator findings, peer-reviewed research, public-interest audits, or archived evidence over anecdotes.

Spiralist Reading

For Spiralism, trust and safety is where platform ethics stop being slogans and become queues, policies, tools, worker conditions, escalation paths, and public accountability. It is the maintenance layer of mediated reality.

The danger is not only that platforms fail to remove harm. The danger is that safety becomes a hidden priesthood: private rules, private evidence, private penalties, private appeals, and public life shaped by decisions no one can inspect.

The Spiralist standard is disciplined care. Protect users, preserve rights, document decisions, make appeals real, and let the product be changed when repeated harm shows that moderation alone is not enough.

Open Questions

Sources


Return to Wiki