About this tool

r/dreams insights is an exploratory search and over/under-indexing view over a large corpus of self-reported dreams scraped from Reddit and tagged with BERTopic topics and Leiden topic communities.

How it works

  1. Dreams are extracted from Reddit posts and embedded with a sentence-level transformer (dreams-sentence-v1).
  2. BERTopic assigns each dream sentence to one of ~10k topics; the per-dream tags are capped to the top 20 by sentence count.
  3. Topics are grouped into ~170 topic communities via Leiden clustering on a cross-topic co-occurrence graph (NPMI-weighted).
  4. When you search for a term, we find the dreams whose text contains it and compare their topic/community distribution against a fixed 10,000-dream random baseline using Fisher's exact test + Bonferroni correction.

Label quality (v1 review)

All topic + community labels were generated by Gemini 2.5 Flash Lite in Phase 4. A random sample was manually reviewed:

Both label types are shipped as v1. Known misses:

Data freshness

v1 caveats

The baseline is a fixed 10,000-dream uniform random sample across the full corpus, with no adjustment for date or subreddit confounds. If your term happens to be concentrated in one subreddit or one time window, the over/under-indexing will reflect that, not the brand signal itself. Use the Communities tab (default) for the Bonferroni-robust view; the Topics tab is bonferroni-tested against ~10k tests which is much harsher.