SIOP 2026 — Topic Dashboard

What is this dashboard?

This is an exploratory look at the SIOP 2026 conference programme. It is built from two pieces of public information for each session: the session title and the number of people who added that session to their personal Whova agenda ("agenda-adds"). Sessions were tagged with one or more scientific topic labels by an LLM running an active-learning loop, then placed on a 2D map by embedding each topic name with a local multilingual sentence-embedding model (gte-multilingual-base) and projecting the embeddings to two dimensions with classical MDS on cosine distance. Topics that are semantically similar end up near each other.

Important caveats. Agenda-adds are a proxy for interest, not a measure of attendance, session quality, or scientific importance. They tell us which sessions people thought were worth bookmarking before the conference — nothing more. Topics are not mutually exclusive, so totals double-count multi-topic sessions. Poster sessions are excluded from this dashboard: each Whova "Posters" listing aggregates many independent posters under one entry, so its agenda-add count isn't comparable to single-presentation sessions. Treat everything here as exploratory.

The three tabs

1. Topic map

Each label is one of 30 topics that emerged from the active-learning tagging loop. Position reflects semantic similarity (closer = more related in meaning). The dropdown "Size by" controls what drives the size of the labels and dots (no colour coding — only size encodes value):

Median session residual log (default)	Median of session residuals after adjusting for day and time slot. Bigger label = more over-indexed (more interest than expected); smaller label = more under-indexed. Median is used (rather than mean) so a single outlier session can't dominate a topic's score.
Frequency across sessions	How many sessions carry this topic.
Agenda-adds across sessions	Sum of agenda-add counts across the topic's sessions. Not unique people.
Median agenda-adds per session	Median number of adds among the topic's sessions — robust to outliers.
No scaling	Uniform font size. Use to read the layout itself without size influence.

Click any topic — or a combination — to filter. Selected topics highlight in orange; the right-hand panel shows the sessions that match all selected topics, sorted by agenda-adds. Click again or use the pills to remove. Hovering any label or dot shows a tooltip with the underlying numbers.

2. Sessions table

One row per session. You can search by title, multi-select tag filters (sessions must contain every chosen tag), sort any column by clicking its header, and download the current filtered & sorted view as CSV.

The last column, "session attention ratio", is the session-level attention ratio — agenda-adds divided by what the baseline model would predict for a session held on the same day, in the same time slot. The number is colour-coded for quick scanning (deep red = strongly over-indexed, deep blue = strongly under-indexed) — this is the only place on the dashboard that uses colour. At the topic level (used for size on the map), this is the median of the residuals across the topic's sessions, exponentiated.

1.00 ≈ exactly as expected.
> 1 = more agenda-adds than expected (e.g. 1.50 = 50% more).
< 1 = fewer than expected (e.g. 0.70 = 30% fewer).

3. About / How to use

This page.

Methodology in plain language

How topics were created

Starting from two seed topics ("Artificial Intelligence", "Well-being") an LLM (Gemini Flash) iterated through every session, tagged each with all relevant topics, then proposed new candidate topics from sessions that had no match. The loop stopped when no useful new topics were produced. The vocabulary was filtered to keep scientific subject areas and drop session-format labels (Panel, Workshop, Poster, etc.). Final vocabulary: 30 topics. After excluding the 17 Whova "Posters" listings (which aggregate many independent posters under a single entry) and 2 placeholder "Friday Seminars" listings (no-topic umbrellas), the dashboard covers 473 single-presentation sessions.

How the map is laid out

Each topic name was passed through a local multilingual embedding model (gte-multilingual-base) producing a 768-dimensional vector that captures meaning. Vectors were reduced to 2D with multidimensional scaling (MDS) on cosine distance, which tries to preserve every pairwise distance — so a topic's neighbours on the map reflect its actual semantic neighbours in embedding space. With only 30 points, perfect 2D preservation isn't possible, but global structure (which topic clusters with which) is faithful and stable across re-runs.

How "interest vs. expected" is computed

A simple linear regression predicts log(agenda-adds + 1) from the day and the time slot. The residual for each session is what's left over after that prediction — i.e. how much more (or less) interest the session attracted than its scheduling peers. attention_ratio = exp(residual) turns that into an intuitive multiplier (1.50 = 50% more than expected). Aggregated to the topic level, it identifies topics that over-index on attendee interest.

The residual adjusts only for day and time slot. It cannot rule out scheduling effects, "must-see" speakers, or upstream selection bias in the programme. Don't read it as scientific importance.

How to use this

Find over-indexed topics — leave "Size by" on its default Median session residual log and look for the biggest labels: those topics attracted more interest than expected given when their sessions were scheduled. The smallest labels in that view are the most under-indexed.
Find core areas — switch to Frequency across sessions: where the programme spent its real estate.
Find broad-appeal sessions — switch to Agenda-adds across sessions for raw audience footprint.
Compare apples to apples — switch to Median agenda-adds per session so a few mega-sessions don't dominate.
Drill into a topic — click it on the map, read the matching sessions on the right, then jump to the Sessions table to filter / sort / download.
Combine topics — click two or more (e.g. "Artificial Intelligence" + "Well-being") to find sessions sitting at their intersection.

What the numbers mean (cheat sheet)

Sessions tagged — count of sessions a topic appears in.
Agenda-adds (sum) — total adds across those sessions. Not unique people.
Median per session — typical-session interest level for the topic.
Median session residual (log) — typical leftover after the day/time baseline, taken across the topic's sessions. 0 ≈ baseline, > 0 = over-indexed, < 0 = under-indexed. Median is used so one outlier session can't dominate.
Time-adjusted attention ratio — the median residual exponentiated. 1.30 = 30% more interest than expected for a typical session under that topic.

Known limitations

Title text from the source CSV had encoding loss upstream; smart apostrophes were recovered where unambiguous, but some em-dashes and ellipses were dropped.
Agenda-adds are not unique people. One person who adds three sessions counts three times.
The baseline model is intentionally simple. It does not control for session format, room size, speaker prominence, or competing sessions in the same slot.
Topics with very few sessions (n < 5) can swing wildly; treat their ratios with caution.
This is exploratory, not causal.