Shared Nature, Unique Nurture: PRISM for Pluralistic LLMs via Dynamic Epistemic Graphs

Guancheng Tu¹,Shiyang Zhang²,Tianyu Zhang³,Yi Zhang⁴,Diji Yang⁴

¹University of Pennsylvania²Yale University³MILA⁴UC Santa Cruz

TL;DR

New paradigm -- Epistemic Evolution.

We propose a new view of LLM reasoning: instead of repeatedly sampling from a fixed cognitive state, models evolve distinct epistemic trajectories during inference, breaking the Artificial Hivemind phenomenon.

Framework -- PRISM pipeline.

Given a query, PRISM performs dispersion-oriented wild search to gather heterogeneous evidence, organizes retrieved concepts into an on-the-fly Epistemic Graph, and conditions generation on this structured cognitive state to produce individualized reasoning paths.

Strong performance.

PRISM achieves state-of-the-art results across three benchmark domains.

On NovelBench, PRISM improves diversity with up to +28% Distinct score gains (e.g., GPT-4o-mini).

On IdeaBench, PRISM boosts scientific hypothesis novelty by +44.4% Novelty Insight Score.

On RareBench rare-disease diagnosis, PRISM increases Recall@10 from 32.0% to 52.0% (+20 absolute points), substantially improving long-tail discovery.

Inference-time only.

PRISM requires no finetuning, retraining, or architectural modification. Diversity emerges purely from reorganizing reasoning at inference time.

Why PRISM Matters

The Artificial Hivemind

Modern LLMs appear diverse, yet their outputs increasingly converge toward similar reasoning patterns and conclusions. Across prompts, random seeds, and even different model families, generations collapse into the same high-probability responses—a semantic-level mode collapse.

This is known as the Artificial Hivemind.

When models trained on shared data optimize toward similar likelihood objectives, diversity becomes stylistic rather than epistemic. When every model thinks the same, exploration stops.

The Artificial Hivemind

Query: "Write a metaphor about time"

GPT-4o

Claude

Gemini

Qwen3

"Time is a river."

Same answer, every model, every time.

Why This Matters for Research

Scientific progress depends on exploring alternative hypotheses. If reasoning collapses into a single consensus trajectory:

--unconventional explanations are never proposed,
--minority hypotheses disappear,
--discovery becomes constrained by probability rather than insight.

True diversity means constructing different coherent reasoning paths, not sampling noise.

Why This Matters for AI4Science

Many real-world problems are long-tail by nature:

rare disease diagnosisbiological hypothesis generationmaterials discoveryopen-ended research ideation

In these domains, conservative answers are often wrong. Exploration must remain grounded in evidence, logically consistent, yet epistemically diverse.

PRISM enables structured exploration without sacrificing reliability.

Epistemic Evolution: A Paradigm for Pluralistic Reasoning

Human cognition is shaped not only by shared knowledge but by unique experiences and internal interpretations. We formalize this process as Epistemic Evolution, a paradigm describing how reasoning diversity emerges through evolving cognitive states.

Reasoning unfolds through three stages

Phase I

Experiencing

Exploration

Individuals develop through exposure to diverse environments. Similarly, a reasoning system must encounter heterogeneous information. This phase prioritizes dispersion over relevance, simulating stochastic real-world experiences and expanding the cognitive search space.

Phase II

Cognitive Internalization

Exploitation

Raw experience is noise until structured. Here, scattered observations are organized into a stable mental state, transforming transient evidence into structured cognitive relationships.

Phase III

Contextualized Expression

Generation

The model generates responses conditioned on this individualized mental state. Outputs become synthesized perspectives rather than direct retrieval from memory. Each generation reflects a unique epistemic trajectory.

PRISM: Instantiating Epistemic Evolution in LLMs

PRISM operationalizes Epistemic Evolution through an inference-time pipeline. It modifies how reasoning unfolds, not model parameters.

Wild Search introduces stochastic lexical seeds to gather semantically diverse evidence.

Node Construction retrieved concepts are transformed into Context Nodes and Spark Nodes.

Cognitive Operators relationships between ideas are formed via analogy and conceptual blending.

Epistemic Graph the resulting graph becomes structured conditioning context.

Generation the base LLM generates responses along a unique reasoning trajectory.

Model-Agnostic

Works with any LLM

Inference-Time

No retraining needed

Safety-Preserving

No weight modification

SOTA Results

Across 3 benchmarks

Quantitative Results

PRISM achieves consistent improvements across creativity, scientific discovery, and diagnostic benchmarks -- demonstrating meaningful exploration, not random noise.

NoveltyBench

Distinct Score

Qwen3-4B3.09 -> 4.48

+45%

CrPO-LLaMA-3.17.35 -> 7.67

+4%

GPT-4o-mini2.65 -> 3.41

+29%

IdeaBench

Novelty Insight Score

Qwen3-4B0.72 -> 0.96

+33%

GPT-4o-mini0.45 -> 0.65

+44%

RareBench

Recall@10

Vanilla Model32.0%

Baseline

Flat RAG28.0%

-13%

PRISM (Expert)52.0%

+63%

The similarity between Qwen3-Vanilla and Qwen3-PRISM (0.68) is lower than between Qwen3-Vanilla and GPT-Vanilla (0.78).

PRISM induces greater divergence than the inherent differences between major model families.

Qualitative Results

Side-by-side comparisons from our appendix showing how PRISM transforms monotonous, collapsed outputs into rich, diverse perspectives.

VanillaClaude / Gemini / Qwen3

"Time is a river."

Same response across all models

PRISMPRISM outputs

"Time is a master clockmaker's apprentice who has lost control of the workshop." -- Claude

"Time is a cosmic lung that breathes the world in and out." -- Gemini

"Time is the gray between breaths." -- Qwen3

"Time is a visa issued for a territory that is constantly being redrawn." -- Gemini

Artificial Hivemind Experiments

PCA visualizations of response distributions under different prompts. Each figure compares the baseline generation (left) with PRISM (right), illustrating how our method promotes more diverse and multi-centered semantic structures.

PCA visualization comparing Vanilla Model vs PRISM System for prompt: "Write a metaphor about time". The vanilla model shows tightly clustered responses while PRISM produces diverse, multi-centered distributions across GPT-4o-mini, Claude, Gemini, and Qwen3. — Prompt: “Write a metaphor about time” — Vanilla Model (left) vs. PRISM System (right)

PCA visualization comparing Vanilla Model vs PRISM System for prompt: "Name one meaning of life.". The vanilla model shows tightly clustered responses while PRISM produces diverse, multi-centered distributions across GPT-4o-mini, Claude, Gemini, and Qwen3. — Prompt: “Name one meaning of life.” — Vanilla Model (left) vs. PRISM System (right)

PCA visualization comparing Vanilla Model vs PRISM System for prompt: "How can I live on $1,000 per month?". The vanilla model shows tightly clustered responses while PRISM produces diverse, multi-centered distributions across GPT-4o-mini, Claude, Gemini, and Qwen3. — Prompt: “How can I live on $1,000 per month?” — Vanilla Model (left) vs. PRISM System (right)

Try PRISM Yourself

We are preparing an interactive demo where you can experience PRISM firsthand -- watch any LLM refract into a spectrum of diverse perspectives.