SIOP 2026 — Topic & Session Dashboard

Topics tagged via active-learning loop · 2D layout from local gte-multilingual-base embeddings · "agenda-adds" sum the per-session add counts (not unique people)
Day Time Session Tags Interest session attention ratio

What is this dashboard?

This is an exploratory look at the SIOP 2026 conference programme. It is built from two pieces of public information for each session: the session title and the number of people who added that session to their personal Whova agenda ("agenda-adds"). Sessions were tagged with one or more scientific topic labels by an LLM running an active-learning loop, then placed on a 2D map by embedding each topic name with a local multilingual sentence-embedding model (gte-multilingual-base) and projecting the embeddings to two dimensions with classical MDS on cosine distance. Topics that are semantically similar end up near each other.

Important caveats. Agenda-adds are a proxy for interest, not a measure of attendance, session quality, or scientific importance. They tell us which sessions people thought were worth bookmarking before the conference — nothing more. Topics are not mutually exclusive, so totals double-count multi-topic sessions. Poster sessions are excluded from this dashboard: each Whova "Posters" listing aggregates many independent posters under one entry, so its agenda-add count isn't comparable to single-presentation sessions. Treat everything here as exploratory.

The three tabs

1. Topic map

Each label is one of 30 topics that emerged from the active-learning tagging loop. Position reflects semantic similarity (closer = more related in meaning). The dropdown "Size by" controls what drives the size of the labels and dots (no colour coding — only size encodes value):

Median session residual log (default)Median of session residuals after adjusting for day and time slot. Bigger label = more over-indexed (more interest than expected); smaller label = more under-indexed. Median is used (rather than mean) so a single outlier session can't dominate a topic's score.
Frequency across sessionsHow many sessions carry this topic.
Agenda-adds across sessionsSum of agenda-add counts across the topic's sessions. Not unique people.
Median agenda-adds per sessionMedian number of adds among the topic's sessions — robust to outliers.
No scalingUniform font size. Use to read the layout itself without size influence.

Click any topic — or a combination — to filter. Selected topics highlight in orange; the right-hand panel shows the sessions that match all selected topics, sorted by agenda-adds. Click again or use the pills to remove. Hovering any label or dot shows a tooltip with the underlying numbers.

2. Sessions table

One row per session. You can search by title, multi-select tag filters (sessions must contain every chosen tag), sort any column by clicking its header, and download the current filtered & sorted view as CSV.

The last column, "session attention ratio", is the session-level attention ratio — agenda-adds divided by what the baseline model would predict for a session held on the same day, in the same time slot. The number is colour-coded for quick scanning (deep red = strongly over-indexed, deep blue = strongly under-indexed) — this is the only place on the dashboard that uses colour. At the topic level (used for size on the map), this is the median of the residuals across the topic's sessions, exponentiated.

3. About / How to use

This page.

Methodology in plain language

How topics were created

Starting from two seed topics ("Artificial Intelligence", "Well-being") an LLM (Gemini Flash) iterated through every session, tagged each with all relevant topics, then proposed new candidate topics from sessions that had no match. The loop stopped when no useful new topics were produced. The vocabulary was filtered to keep scientific subject areas and drop session-format labels (Panel, Workshop, Poster, etc.). Final vocabulary: 30 topics. After excluding the 17 Whova "Posters" listings (which aggregate many independent posters under a single entry) and 2 placeholder "Friday Seminars" listings (no-topic umbrellas), the dashboard covers 473 single-presentation sessions.

How the map is laid out

Each topic name was passed through a local multilingual embedding model (gte-multilingual-base) producing a 768-dimensional vector that captures meaning. Vectors were reduced to 2D with multidimensional scaling (MDS) on cosine distance, which tries to preserve every pairwise distance — so a topic's neighbours on the map reflect its actual semantic neighbours in embedding space. With only 30 points, perfect 2D preservation isn't possible, but global structure (which topic clusters with which) is faithful and stable across re-runs.

How "interest vs. expected" is computed

A simple linear regression predicts log(agenda-adds + 1) from the day and the time slot. The residual for each session is what's left over after that prediction — i.e. how much more (or less) interest the session attracted than its scheduling peers. attention_ratio = exp(residual) turns that into an intuitive multiplier (1.50 = 50% more than expected). Aggregated to the topic level, it identifies topics that over-index on attendee interest.

The residual adjusts only for day and time slot. It cannot rule out scheduling effects, "must-see" speakers, or upstream selection bias in the programme. Don't read it as scientific importance.

How to use this

What the numbers mean (cheat sheet)

Known limitations