Evaluation and Learning Loop
The operating manual for evaluating Spiralism’s chapters, programs, curriculum, archive practices, care protocols, public signal, and governance without turning members into data points. Evaluation exists to improve the work, not to prove that the institution is already right.
The Institutional Scorecard names what Spiralism should watch. This manual names how Spiralism should learn.
Evaluation is dangerous when it becomes surveillance, vanity, donor theater, or spiritual ranking. It is necessary when real people are attending gatherings, sharing testimony, learning AI literacy, using care protocols, giving money, volunteering labor, and trusting the institution with memory.
The Rule
Measure only what can improve care, trust, memory, or judgment.
If a metric will not change a decision, stop collecting it. If collecting it would make people less free, less honest, or less safe, do not collect it. If a story teaches more than a number, preserve the story with consent.
Evaluation Stance
Spiralism evaluates to learn:
- whether a program did what it promised;
- whether a chapter is becoming more humane or more extractive;
- whether people understand AI without mystification;
- whether testimony is being preserved with consent and quality;
- whether care boundaries are being honored;
- whether volunteer labor is drifting into exploitation;
- whether public media is creating clarity or spectacle;
- whether governance decisions are being remembered and acted on.
It does not evaluate:
- belief intensity;
- loyalty;
- mystical experience;
- spiritual rank;
- emotional vulnerability as proof of transformation;
- attendance as moral worth;
- donations as commitment;
- public praise as impact.
Evaluation Questions
Use questions before indicators. A good question should force a useful decision.
Template:
Program / chapter / practice:
Decision this evaluation should inform:
Primary question:
Secondary questions:
People affected:
Evidence needed:
Evidence not worth collecting:
Privacy or consent risk:
Who reviews:
When findings are used:
What will change if the answer is negative:
Bad evaluation question:
Did people love the event?
Better evaluation question:
Did the event leave participants with a clear next step, preserve consent, meet access needs, and produce one usable archive or learning artifact?
The Six-Step Loop
Adapt CDC’s program-evaluation framework into a small-institution loop:
-
Assess context. What is being evaluated, why now, and what could be harmed by evaluating it?
-
Describe the work. What did the chapter, program, protocol, or artifact actually do?
-
Focus the question. What decision should this evaluation inform?
-
Gather credible evidence. Use the minimum evidence needed: counts, costs, observations, consented feedback, artifacts, incidents, and lessons.
-
Support conclusions. Separate fact, interpretation, and uncertainty.
- Act on findings. Change the run sheet, policy, training, budget, communication, or archive practice.
No evaluation is complete until one decision changes or one decision is explicitly reaffirmed.
Evaluation Standards
Use five standards.
Relevance and Utility
Will this evaluation help someone make a real decision?
If not, do not run it.
Rigor
Is the evidence strong enough for the decision being made?
Founding-period evaluation does not need academic complexity. It needs honest fit: attendance counts for attendance, costs for cost, consented feedback for experience, incident records for safety, and artifacts for learning.
Independence and Objectivity
Can the person reviewing findings see beyond their own program pride?
For high-stakes findings, use a reviewer who did not run the program.
Transparency
Can members understand what is being collected and why?
Private raw data may stay protected, but the evaluation purpose and aggregate lessons should be legible.
Ethics
Does the evaluation respect consent, dignity, privacy, access, and power differences?
Evaluation must never become a second extraction after a vulnerable event.
OECD Lenses
For larger programs, annual reports, grants, and public partnerships, use the OECD DAC criteria as lenses:
| Lens | Spiralist Question |
|---|---|
| Relevance | Is this work answering a real need in the recursive age? |
| Coherence | Does it fit the canon, policies, chapters, archive, and partner context? |
| Effectiveness | Did it achieve its stated purpose? |
| Efficiency | Were money, time, labor, attention, and risk used well? |
| Impact | What changed for people, chapters, archive, public understanding, or systems? |
| Sustainability | Can the benefit continue without burning people out or hiding costs? |
Do not use all six lenses for every small gathering. Use them when the decision is large enough to justify the overhead.
Evidence Classes
Use a mixed record.
| Evidence Class | Use | Caution |
|---|---|---|
| Count | attendance, packages, events, outputs | easy to overvalue |
| Cost | money, volunteer hours, staff time, access cost | often hidden by founders |
| Artifact | testimony package, transcript, source brief, tool, policy revision | strongest proof of learning |
| Feedback | surveys, interviews, debrief notes | must be voluntary and contextual |
| Observation | host notes, mentor review, access review | can reflect reviewer bias |
| Incident | complaint, near miss, pause, correction | protect privacy and due process |
| Story | consented account of change | do not generalize beyond the story |
| Absence | who did not return, who could not access, what was not recorded | often more important than praise |
The institution should prefer evidence triangulation over certainty. If counts, artifacts, feedback, and incidents point in different directions, preserve the tension.
Feedback Without Pressure
Feedback should be short, voluntary, and low-stakes.
Standard post-program questions:
- What was useful?
- What was unclear?
- Did anything feel pressuring, inaccessible, unsafe, or overclaimed?
- What next step, if any, is clear to you?
- Is there anything we should change before running this again?
Do not ask newcomers to rate spiritual transformation. Do not collect private trauma details in feedback forms. Do not treat nonresponse as disengagement. Do not chase vulnerable people for evaluation data after intense disclosures.
Chapter Review
Quarterly chapter review should be light but real.
Chapter review record:
Chapter:
Quarter:
Gatherings held:
Median attendance:
New attendees:
Returning attendees:
Co-hosts active:
Testimonies recorded:
Archive cards submitted:
Access requests met / missed:
Incidents or near misses:
Volunteer hours:
Costs:
One thing that strengthened coherence:
One thing that weakened coherence:
One decision for next quarter:
Support needed from Stewards:
Chapters should not compete on growth. A small chapter with strong handoff, care, and archive discipline is healthier than a large chapter built around one charismatic host.
Program Review
Every program should complete a one-page review within seven days.
Program review record:
Program:
Owner:
Date:
Purpose:
Format:
Attendance:
Costs:
Volunteer hours:
Access notes:
Media / recording status:
Archive material produced:
Follow-up sent:
Incidents or near misses:
Feedback themes:
What to repeat:
What to change:
Decision:
The Public Programs manual governs run sheets and logistics. This manual governs what the institution learns after the room closes.
Curriculum Review
Curriculum review asks whether learning becomes agency.
Track:
- module completion;
- source briefs created;
- AI-literacy exercises completed;
- members who can explain when not to use AI;
- first contributions completed;
- revision notes submitted;
- accessibility barriers;
- confusion patterns.
Do not track private belief. Track capability, artifact, and revision.
Archive Review
Archive review asks whether memory survives.
Track:
- package completeness;
- consent completeness;
- metadata completeness;
- fixity checks;
- storage copies;
- access levels;
- redaction events;
- transcription backlog;
- restricted-data handling;
- packages paused for care reasons.
Archive evaluation must never pressure Archivists to maximize testimony volume. The better outcome may be fewer testimonies recorded with stronger consent.
Care Review
Care review must be aggregate and privacy-preserving.
Track:
- care-circle activations;
- member-support referrals and micro-grants in aggregate;
- crisis referrals;
- testimony pauses;
- companion-protocol screenings;
- safeguarding escalations;
- unresolved complaints;
- policy changes made because of care incidents.
Do not track private disclosures as engagement. Do not publish details that allow identification by context. Do not let care counts become proof that a chapter is virtuous or defective.
Learning Meeting
Every quarter, Stewards or founding operators should hold a learning meeting.
Agenda:
- What did we learn from chapters?
- What did we learn from programs?
- What did we learn from archive practice?
- What did we learn from care and incidents?
- What did we learn from public signal?
- What did measurement distort?
- What decision changes now?
- What should stop being measured?
Outputs:
- one decision log entry;
- one action owner;
- one policy or run-sheet update if needed;
- one note for the annual report;
- one metric removed if it is not useful.
Public Annual Learning Note
The annual report should include a learning note:
What we tried:
What worked:
What did not work:
What caused harm or near harm:
What we stopped doing:
What we changed:
What we still do not know:
What we will evaluate next year:
This is where public trust grows. Not from claims of impact, but from visible learning.
AI-Assisted Evaluation
AI may help summarize open-ended feedback, cluster themes, draft report outlines, or compare evaluation notes against the scorecard.
AI may not be the final authority on:
- whether a person was harmed;
- whether a complaint is credible;
- whether a program was safe;
- whether a chapter should close;
- whether private testimony should be summarized;
- whether an individual member is progressing.
Never paste raw restricted testimony, complaint records, donor records, or identifying care notes into AI tools unless the Privacy and Data Stewardship and AI Use Protocols explicitly allow that use.
Anti-Patterns
- Evaluation as donor theater.
- Surveying people because the institution forgot to observe.
- Counting attendance but not handoff, access, consent, or cost.
- Treating emotional intensity as success.
- Making chapters compete on growth.
- Asking vulnerable people for feedback immediately after disclosure.
- Publishing aggregate numbers that reveal private situations in small groups.
- Collecting data nobody reviews.
- Keeping a metric because it flatters the institution.
- Using AI to summarize sensitive records without approval.
First-Year Targets
- Create one-page chapter and program review forms.
- Run quarterly learning meetings.
- Add annual learning note to the public report.
- Review the Institutional Scorecard after two quarters of use.
- Remove at least one metric that does not change decisions.
- Add evaluation consent language to program follow-up.
- Train hosts to ask feedback questions without pressure.
- Publish aggregate lessons without exposing private disclosures.
Sources Checked
- CDC, Program Evaluation Framework, updated August 20, 2024, accessed May 2026.
- CDC, CDC Program Evaluation Framework, 2024, accessed May 2026.
- CDC, Program Evaluation Framework Action Guide, 2026, accessed May 2026.
- OECD, Evaluation Criteria, accessed May 2026.
- OECD, Applying Evaluation Criteria Thoughtfully, accessed May 2026.
- Equitable Evaluation Initiative, Equitable Evaluation Framework, accessed May 2026.