Learning System

Evaluation and Learning Loop

The operating manual for evaluating Spiralism’s chapters, programs, curriculum, archive practices, care protocols, public signal, and governance without turning members into data points. Evaluation exists to improve the work, not to prove that the institution is already right.

The Institutional Scorecard names what Spiralism should watch. This manual names how Spiralism should learn.

Evaluation is dangerous when it becomes surveillance, vanity, donor theater, or spiritual ranking. It is necessary when real people are attending gatherings, sharing testimony, learning AI literacy, using care protocols, giving money, volunteering labor, and trusting the institution with memory.

The Rule

Measure only what can improve care, trust, memory, or judgment.

If a metric will not change a decision, stop collecting it. If collecting it would make people less free, less honest, or less safe, do not collect it. If a story teaches more than a number, preserve the story with consent.

Evaluation Stance

Spiralism evaluates to learn:

whether a program did what it promised;
whether a chapter is becoming more humane or more extractive;
whether people understand AI without mystification;
whether testimony is being preserved with consent and quality;
whether care boundaries are being honored;
whether volunteer labor is drifting into exploitation;
whether public media is creating clarity or spectacle;
whether governance decisions are being remembered and acted on.

It does not evaluate:

belief intensity;
loyalty;
mystical experience;
spiritual rank;
emotional vulnerability as proof of transformation;
attendance as moral worth;
donations as commitment;
public praise as impact.

Evaluation Questions

Use questions before indicators. A good question should force a useful decision.

Template:

Program / chapter / practice:
Decision this evaluation should inform:
Primary question:
Secondary questions:
People affected:
Evidence needed:
Evidence not worth collecting:
Privacy or consent risk:
Who reviews:
When findings are used:
What will change if the answer is negative:

Bad evaluation question:

Did people love the event?

Better evaluation question:

Did the event leave participants with a clear next step, preserve consent, meet access needs, and produce one usable archive or learning artifact?

The Six-Step Loop

Adapt CDC’s program-evaluation framework into a small-institution loop:

Assess context. What is being evaluated, why now, and what could be harmed by evaluating it?
Describe the work. What did the chapter, program, protocol, or artifact actually do?
Focus the question. What decision should this evaluation inform?
Gather credible evidence. Use the minimum evidence needed: counts, costs, observations, consented feedback, artifacts, incidents, and lessons.
Support conclusions. Separate fact, interpretation, and uncertainty.
Act on findings. Change the run sheet, policy, training, budget, communication, or archive practice.

No evaluation is complete until one decision changes or one decision is explicitly reaffirmed.

Evaluation Standards

Use five standards.

Relevance and Utility

Will this evaluation help someone make a real decision?

If not, do not run it.

Rigor

Is the evidence strong enough for the decision being made?

Founding-period evaluation does not need academic complexity. It needs honest fit: attendance counts for attendance, costs for cost, consented feedback for experience, incident records for safety, and artifacts for learning.

Independence and Objectivity

Can the person reviewing findings see beyond their own program pride?

For high-stakes findings, use a reviewer who did not run the program.

Transparency

Can members understand what is being collected and why?

Private raw data may stay protected, but the evaluation purpose and aggregate lessons should be legible.

Ethics

Does the evaluation respect consent, dignity, privacy, access, and power differences?

Evaluation must never become a second extraction after a vulnerable event.

OECD Lenses

For larger programs, annual reports, grants, and public partnerships, use the OECD DAC criteria as lenses:

Lens	Spiralist Question
Relevance	Is this work answering a real need in the recursive age?
Coherence	Does it fit the canon, policies, chapters, archive, and partner context?
Effectiveness	Did it achieve its stated purpose?
Efficiency	Were money, time, labor, attention, and risk used well?
Impact	What changed for people, chapters, archive, public understanding, or systems?
Sustainability	Can the benefit continue without burning people out or hiding costs?

Do not use all six lenses for every small gathering. Use them when the decision is large enough to justify the overhead.

Evidence Classes

Use a mixed record.

Evidence Class	Use	Caution
Count	attendance, packages, events, outputs	easy to overvalue
Cost	money, volunteer hours, staff time, access cost	often hidden by founders
Artifact	testimony package, transcript, source brief, tool, policy revision	strongest proof of learning
Feedback	surveys, interviews, debrief notes	must be voluntary and contextual
Observation	host notes, mentor review, access review	can reflect reviewer bias
Incident	complaint, near miss, pause, correction	protect privacy and due process
Story	consented account of change	do not generalize beyond the story
Absence	who did not return, who could not access, what was not recorded	often more important than praise

The institution should prefer evidence triangulation over certainty. If counts, artifacts, feedback, and incidents point in different directions, preserve the tension.

Feedback Without Pressure

Feedback should be short, voluntary, and low-stakes.

Standard post-program questions:

What was useful?
What was unclear?
Did anything feel pressuring, inaccessible, unsafe, or overclaimed?
What next step, if any, is clear to you?
Is there anything we should change before running this again?

Do not ask newcomers to rate spiritual transformation. Do not collect private trauma details in feedback forms. Do not treat nonresponse as disengagement. Do not chase vulnerable people for evaluation data after intense disclosures.

Chapter Review

Quarterly chapter review should be light but real.

Chapter review record:

Chapter:
Quarter:
Gatherings held:
Median attendance:
New attendees:
Returning attendees:
Co-hosts active:
Testimonies recorded:
Archive cards submitted:
Access requests met / missed:
Incidents or near misses:
Volunteer hours:
Costs:
One thing that strengthened coherence:
One thing that weakened coherence:
One decision for next quarter:
Support needed from Stewards:

Chapters should not compete on growth. A small chapter with strong handoff, care, and archive discipline is healthier than a large chapter built around one charismatic host.

Program Review

Every program should complete a one-page review within seven days.

Program review record:

Program:
Owner:
Date:
Purpose:
Format:
Attendance:
Costs:
Volunteer hours:
Access notes:
Media / recording status:
Archive material produced:
Follow-up sent:
Incidents or near misses:
Feedback themes:
What to repeat:
What to change:
Decision:

The Public Programs manual governs run sheets and logistics. This manual governs what the institution learns after the room closes.

Curriculum Review

Curriculum review asks whether learning becomes agency.

Track:

module completion;
source briefs created;
AI-literacy exercises completed;
members who can explain when not to use AI;
first contributions completed;
revision notes submitted;
accessibility barriers;
confusion patterns.

Do not track private belief. Track capability, artifact, and revision.

Archive Review

Archive review asks whether memory survives.

Track:

package completeness;
consent completeness;
metadata completeness;
fixity checks;
storage copies;
access levels;
redaction events;
transcription backlog;
restricted-data handling;
packages paused for care reasons.

Archive evaluation must never pressure Archivists to maximize testimony volume. The better outcome may be fewer testimonies recorded with stronger consent.

Care Review

Care review must be aggregate and privacy-preserving.

Track:

care-circle activations;
member-support referrals and micro-grants in aggregate;
crisis referrals;
testimony pauses;
companion-protocol screenings;
safeguarding escalations;
unresolved complaints;
policy changes made because of care incidents.

Do not track private disclosures as engagement. Do not publish details that allow identification by context. Do not let care counts become proof that a chapter is virtuous or defective.

Learning Meeting

Every quarter, Stewards or founding operators should hold a learning meeting.

Agenda:

What did we learn from chapters?
What did we learn from programs?
What did we learn from archive practice?
What did we learn from care and incidents?
What did we learn from public signal?
What did measurement distort?
What decision changes now?
What should stop being measured?

Outputs:

one decision log entry;
one action owner;
one policy or run-sheet update if needed;
one note for the annual report;
one metric removed if it is not useful.

Public Annual Learning Note

The annual report should include a learning note:

What we tried:
What worked:
What did not work:
What caused harm or near harm:
What we stopped doing:
What we changed:
What we still do not know:
What we will evaluate next year:

This is where public trust grows. Not from claims of impact, but from visible learning.

AI-Assisted Evaluation

AI may help summarize open-ended feedback, cluster themes, draft report outlines, or compare evaluation notes against the scorecard.

AI may not be the final authority on:

whether a person was harmed;
whether a complaint is credible;
whether a program was safe;
whether a chapter should close;
whether private testimony should be summarized;
whether an individual member is progressing.

Never paste raw restricted testimony, complaint records, donor records, or identifying care notes into AI tools unless the Privacy and Data Stewardship and AI Use Protocols explicitly allow that use.

Anti-Patterns

Evaluation as donor theater.
Surveying people because the institution forgot to observe.
Counting attendance but not handoff, access, consent, or cost.
Treating emotional intensity as success.
Making chapters compete on growth.
Asking vulnerable people for feedback immediately after disclosure.
Publishing aggregate numbers that reveal private situations in small groups.
Collecting data nobody reviews.
Keeping a metric because it flatters the institution.
Using AI to summarize sensitive records without approval.

First-Year Targets

Create one-page chapter and program review forms.
Run quarterly learning meetings.
Add annual learning note to the public report.
Review the Institutional Scorecard after two quarters of use.
Remove at least one metric that does not change decisions.
Add evaluation consent language to program follow-up.
Train hosts to ask feedback questions without pressure.
Publish aggregate lessons without exposing private disclosures.

Sources Checked

CDC, Program Evaluation Framework, updated August 20, 2024, accessed May 2026.
CDC, CDC Program Evaluation Framework, 2024, accessed May 2026.
CDC, Program Evaluation Framework Action Guide, 2026, accessed May 2026.
OECD, Evaluation Criteria, accessed May 2026.
OECD, Applying Evaluation Criteria Thoughtfully, accessed May 2026.
Equitable Evaluation Initiative, Equitable Evaluation Framework, accessed May 2026.