Data Recap · Episode 04

Cookies That Taste Like Summer

ChatGPT posted its highest composite score in four episodes. Gemini won the room anyway. Someone put basil in a sugar cookie and two neighbors independently identified it — and both chose it as their favorite.

Tasters15
Data Points135
Composite WinnerChatGPT
Room WinnerGemini
Favorites7 / 15

The Scoreboard

ChatGPT led on composite score and on smell — its strongest individual metric in the series. Gemini led on texture and had the most favorite votes. Claude came in third in every category for the second episode running.

Gemini — The Berry Patch
ChatGPT — The Overachiever
Claude — The Beach Day

Average scores by criterion

15 tasters · scored 1–10 on Taste, Texture, and Smell

Composite averages

CookieTasteTextureSmellFavoritesWant AgainComposite
ChatGPT — The Overachiever7.278.078.156 / 1567%7.83
Gemini — The Berry Patch7.607.937.217 / 1580%7.58
Claude — The Beach Day7.137.537.212 / 1560%7.29

The Math and the Room Disagreed

ChatGPT led on every scored metric. Gemini led on every democratic measure. This is the second episode where composite score and favorite votes pointed in different directions.

Composite Score
7.83
ChatGPT
Favorites
7/15
Gemini (47%)
Want Again
80%
Gemini
Per-Taster Wins
6/15
Gemini

Favorites & Want-Again Rate

The cookie that scored highest on paper did not get the most favorites or the highest want-again rate.

Per-Taster Breakdown

Gemini won 6 individual matchups. ChatGPT won 5. Claude won 2. Two tasters produced mathematical ties — Tasters 7 and 15 — and in both cases the tasters' favorite picks differed from each other.

T1 · 11MCH10.000
T2 · 15FGE9.000
T3 · 19MGE8.667
T4 · 45FGE9.333
T5 · 19FCH7.667
T6 · 50MCL6.667
T7 · 47MTIE7.333
T8 · 47MGE9.000
T9 · 78MGE8.000
T10 · 74FGE5.667
T11 · 35FCH9.667
T12 · 55FCL9.333
T13 · 53FCH8.333
T14 · 23FCH9.000
T15 · 55MTIE9.667
Gemini wins (6)
ChatGPT wins (5)
Claude wins (2)
Tied (2)

Composite = average of taste, texture, and smell. Where smell was not recorded, average of available scores used.

The Basil and the Snickerdoodle

Two independent signals emerged from the scorecard language — one for ChatGPT's cookie and one for Claude's. Both revealed something about how the recipes were actually being experienced.

The Basil Signal — ChatGPT

ChatGPT's recipe included 6g of fresh basil. Two tasters independently identified it as the main flavor.

TASTER 5 · Age 19 · Female
"Fascinating — strawberry & basil"
ChatGPT taste: 7/10 · Texture: 8/10 · Chose ChatGPT as favorite. The only taster to use the word "intriguing" as a descriptor.
TASTER 14 · Age 23 · Female
"Basil"
ChatGPT taste: 7/10 · Texture: 10/10 · Smell: 10/10 · Chose ChatGPT as favorite. Described the cookie as "interesting."

What this means: Both tasters who detected the basil chose ChatGPT as their favorite. The ingredient that was the riskiest creative choice turned out to be the strongest converting factor for the tasters who caught it. ChatGPT's cookie rewarded attentive palates.

The Snickerdoodle Signal — Claude

Two tasters independently described Claude's cookie as "snickerdoodle" — a cinnamon cookie. Claude's recipe contains no cinnamon.

TASTER 7 · Age 47 · Male
"Snickerdoodle"
Claude score: 5.00 composite · Did not want another · Chose Gemini as favorite.
TASTER 8 · Age 47 · Male
"Snickerdoodle"
Claude score: 8.50 composite · Did want another · Chose Gemini as favorite. Different outcome, same word.

What this means: The issue was visual, not flavor. Claude's turbinado sugar coating produces a crackled golden crust that looks like a snickerdoodle before the first bite. Tasters arrived expecting cinnamon and found citrus and coconut instead. A cookie that looks like one thing and tastes like another sets up a mismatch the flavor has to overcome — and in a blind tasting, first impressions are everything.

The Language of Each Cookie

Gemini's words were clear and sensory. ChatGPT's split between the curious and the confused. Claude's coconut and turbinado profile produced the widest range of any cookie this episode.

Gemini — The Berry Patch · Descriptors
Delicious A-mazing Muffin Muffin top/chewy Moist/FreshFresh BalancedLemony BlueberryLemon-Blue ThickUnexpected DoughyDisappointing Bland
ChatGPT — The Overachiever · Descriptors
Sweet Sweet Intriguing Ambitious LemonadeBright InterestingAppealing FruityTangy GoodYummy MoistWeird Off
Claude — The Beach Day · Descriptors
Snickerdoodle Snickerdoodle Sweet Sweet Crunchy Crunchy LovelyBeachy Lemon-CoconutSugar Cookie SavorySalty DryAverage Flat
Gemini · Main Flavor Identified
Blueberry ×8 Berry ×2 MarionberryLemon-Blue Sugar Cookie Dough VanillaDough
ChatGPT · Main Flavor Identified
Strawberry ×3 Lemon ×3 Basil ×2 RaspberryGrapefruit CranberryFruit Tart Something I Don't Like Sweet Frosting/Lemon/Citrus
Claude · Main Flavor Identified
Coconut ×6 Citrus/Lemon ×3 Brown SugarButter Lemon & SaltSugar Cookie Lightly Greasy Something I Don't Like

Highlighted chips = appeared 2+ times or notable. Nine out of fifteen tasters identified blueberry as Gemini's main flavor — the strongest single-ingredient identification of any cookie in any episode.

Who Was Tasting

15 tasters. Ages 11–78. 8 female, 7 male. The age groups split perfectly — five tasters each in Under 26, 26–50, and 51+. The demographic story this episode was about coconut.

Favorites by gender

Male tasters chose Gemini at a 71% rate — 5 of 7. Claude received zero male favorites.

Favorites by age group

Under-26 gave Claude zero favorites and zero want-anothers. 26–50 gave Gemini 100% want-again.

Want-again by gender

Female tasters had Gemini and ChatGPT tied at 75%. Male want-again favored Gemini (86%).

Want-again by age group

26–50 and 51+ both gave Gemini 100%. The 51+ group gave Claude 100% as well — its strongest demographic result.

The Series Arc

Every possible ranking order has now occurred. Claude won Episodes 1 and 2. Gemini won Episode 3. ChatGPT leads composite in Episode 4 for the first time. Gemini is the only AI that has never finished last.

Composite score by episode

Episodes 1–4 · Note: Episode 1 used a 5-metric composite (taste, chocolate, texture, toffee, smell). Episodes 2–4 use 3 metrics (taste, texture, smell).

Want-again rate by episode

Gemini has been at or above 70% every episode. ChatGPT's worst result was 40% in Episode 1; Episode 4 is 67%. Claude's range has been the widest of any AI — 90% in Episode 1, 54% in Episode 3.

Episode rankings — composite score

EpisodeBrief1st2nd3rd
Ep 1Toffee-forward chocolate chipClaude 7.38Gemini 6.84ChatGPT 6.20
Ep 2Classic chocolate chipClaude 7.73Gemini 7.50ChatGPT 7.41
Ep 3Umami-forward chocolate chipGemini 7.64ChatGPT 7.08Claude 6.26
Ep 4Cookies that taste like summerChatGPT 7.83Gemini 7.58Claude 7.29

Episode 3 Data Error

What happened

In the Episode 3 video, ChatGPT's want-again rate was stated as 77%. After returning to the raw scorecards, the correct count is 62% — 8 of 13 tasters said yes to another ChatGPT cookie, not 10. The error appears to have been a counting mistake during production.

What it changes

The Episode 3 result and rankings are unaffected. Gemini still won by a substantial margin on all measures. ChatGPT's want-again rate at 62% is still a majority, still ahead of Claude's 54%, and still reflects the same relative standing. The corrected number appears in all charts and comparisons on this site from this point forward.

A note on Episode 4 data

Gemini's want-again rate was stated as 87% in this episode's video. The correct count from the raw scorecards is 80% — 12 of 15 tasters said yes. This was corrected on screen. The website reflects the accurate figure throughout.

Watch the Episode

See the Full Tasting

The complete blind tasting, the basil reveal, the snickerdoodle visual confusion, the composite vs. room split, and the four-episode series arc — all on camera.

Watch on YouTube ↗