Data Recap · Episode 04

Cookies That Taste Like Summer

ChatGPT posted its highest composite score in four episodes. Gemini won the room anyway. Someone put basil in a sugar cookie and two neighbors independently identified it — and both chose it as their favorite.

Tasters15

Data Points135

Composite WinnerChatGPT

Room WinnerGemini

Favorites7 / 15

Tasting Results

The Scoreboard

ChatGPT led on composite score and on smell — its strongest individual metric in the series. Gemini led on texture and had the most favorite votes. Claude came in third in every category for the second episode running.

Gemini — The Berry Patch

ChatGPT — The Overachiever

Claude — The Beach Day

Average scores by criterion

15 tasters · scored 1–10 on Taste, Texture, and Smell

Composite averages

Cookie	Taste	Texture	Smell	Favorites	Want Again	Composite
ChatGPT — The Overachiever	7.27	8.07	8.15	6 / 15	67%	7.83
Gemini — The Berry Patch	7.60	7.93	7.21	7 / 15	80%	7.58
Claude — The Beach Day	7.13	7.53	7.21	2 / 15	60%	7.29

The Result

The Math and the Room Disagreed

ChatGPT led on every scored metric. Gemini led on every democratic measure. This is the second episode where composite score and favorite votes pointed in different directions.

Composite Score

7.83

ChatGPT

Favorites

7/15

Gemini (47%)

Want Again

80%

Gemini

Per-Taster Wins

6/15

Gemini

Favorites & Want-Again Rate

The cookie that scored highest on paper did not get the most favorites or the highest want-again rate.

Individual Results

Per-Taster Breakdown

Gemini won 6 individual matchups. ChatGPT won 5. Claude won 2. Two tasters produced mathematical ties — Tasters 7 and 15 — and in both cases the tasters' favorite picks differed from each other.

T1 · 11MCH10.000

T2 · 15FGE9.000

T3 · 19MGE8.667

T4 · 45FGE9.333

T5 · 19FCH7.667

T6 · 50MCL6.667

T7 · 47MTIE7.333

T8 · 47MGE9.000

T9 · 78MGE8.000

T10 · 74FGE5.667

T11 · 35FCH9.667

T12 · 55FCL9.333

T13 · 53FCH8.333

T14 · 23FCH9.000

T15 · 55MTIE9.667

Gemini wins (6)

ChatGPT wins (5)

Claude wins (2)

Tied (2)

Composite = average of taste, texture, and smell. Where smell was not recorded, average of available scores used.

Signal Moments

The Basil and the Snickerdoodle

Two independent signals emerged from the scorecard language — one for ChatGPT's cookie and one for Claude's. Both revealed something about how the recipes were actually being experienced.

The Basil Signal — ChatGPT

ChatGPT's recipe included 6g of fresh basil. Two tasters independently identified it as the main flavor.

TASTER 5 · Age 19 · Female

"Fascinating — strawberry & basil"

ChatGPT taste: 7/10 · Texture: 8/10 · Chose ChatGPT as favorite. The only taster to use the word "intriguing" as a descriptor.

TASTER 14 · Age 23 · Female

"Basil"

ChatGPT taste: 7/10 · Texture: 10/10 · Smell: 10/10 · Chose ChatGPT as favorite. Described the cookie as "interesting."

What this means: Both tasters who detected the basil chose ChatGPT as their favorite. The ingredient that was the riskiest creative choice turned out to be the strongest converting factor for the tasters who caught it. ChatGPT's cookie rewarded attentive palates.

The Snickerdoodle Signal — Claude

Two tasters independently described Claude's cookie as "snickerdoodle" — a cinnamon cookie. Claude's recipe contains no cinnamon.

TASTER 7 · Age 47 · Male

"Snickerdoodle"

Claude score: 5.00 composite · Did not want another · Chose Gemini as favorite.

TASTER 8 · Age 47 · Male

"Snickerdoodle"

Claude score: 8.50 composite · Did want another · Chose Gemini as favorite. Different outcome, same word.

What this means: The issue was visual, not flavor. Claude's turbinado sugar coating produces a crackled golden crust that looks like a snickerdoodle before the first bite. Tasters arrived expecting cinnamon and found citrus and coconut instead. A cookie that looks like one thing and tastes like another sets up a mismatch the flavor has to overcome — and in a blind tasting, first impressions are everything.

Taster Language

The Language of Each Cookie

Gemini's words were clear and sensory. ChatGPT's split between the curious and the confused. Claude's coconut and turbinado profile produced the widest range of any cookie this episode.

Gemini — The Berry Patch · Descriptors

Delicious A-mazing Muffin Muffin top/chewy Moist/FreshFresh BalancedLemony BlueberryLemon-Blue ThickUnexpected DoughyDisappointing Bland

ChatGPT — The Overachiever · Descriptors

Sweet Sweet Intriguing Ambitious LemonadeBright InterestingAppealing FruityTangy GoodYummy MoistWeird Off

Claude — The Beach Day · Descriptors

Snickerdoodle Snickerdoodle Sweet Sweet Crunchy Crunchy LovelyBeachy Lemon-CoconutSugar Cookie SavorySalty DryAverage Flat

Gemini · Main Flavor Identified

Blueberry ×8 Berry ×2 MarionberryLemon-Blue Sugar Cookie Dough VanillaDough

ChatGPT · Main Flavor Identified

Strawberry ×3 Lemon ×3 Basil ×2 RaspberryGrapefruit CranberryFruit Tart Something I Don't Like Sweet Frosting/Lemon/Citrus

Claude · Main Flavor Identified

Coconut ×6 Citrus/Lemon ×3 Brown SugarButter Lemon & SaltSugar Cookie Lightly Greasy Something I Don't Like

Highlighted chips = appeared 2+ times or notable. Nine out of fifteen tasters identified blueberry as Gemini's main flavor — the strongest single-ingredient identification of any cookie in any episode.

Panel Demographics

Who Was Tasting

15 tasters. Ages 11–78. 8 female, 7 male. The age groups split perfectly — five tasters each in Under 26, 26–50, and 51+. The demographic story this episode was about coconut.

Favorites by gender

Male tasters chose Gemini at a 71% rate — 5 of 7. Claude received zero male favorites.

Favorites by age group

Under-26 gave Claude zero favorites and zero want-anothers. 26–50 gave Gemini 100% want-again.

Want-again by gender

Female tasters had Gemini and ChatGPT tied at 75%. Male want-again favored Gemini (86%).

Want-again by age group

26–50 and 51+ both gave Gemini 100%. The 51+ group gave Claude 100% as well — its strongest demographic result.

Four Episodes In

The Series Arc

Every possible ranking order has now occurred. Claude won Episodes 1 and 2. Gemini won Episode 3. ChatGPT leads composite in Episode 4 for the first time. Gemini is the only AI that has never finished last.

Composite score by episode

Episodes 1–4 · Note: Episode 1 used a 5-metric composite (taste, chocolate, texture, toffee, smell). Episodes 2–4 use 3 metrics (taste, texture, smell).

Want-again rate by episode

Gemini has been at or above 70% every episode. ChatGPT's worst result was 40% in Episode 1; Episode 4 is 67%. Claude's range has been the widest of any AI — 90% in Episode 1, 54% in Episode 3.

Episode rankings — composite score

Episode	Brief	1st	2nd	3rd
Ep 1	Toffee-forward chocolate chip	Claude 7.38	Gemini 6.84	ChatGPT 6.20
Ep 2	Classic chocolate chip	Claude 7.73	Gemini 7.50	ChatGPT 7.41
Ep 3	Umami-forward chocolate chip	Gemini 7.64	ChatGPT 7.08	Claude 6.26
Ep 4	Cookies that taste like summer	ChatGPT 7.83	Gemini 7.58	Claude 7.29

A Correction

Episode 3 Data Error

What happened

In the Episode 3 video, ChatGPT's want-again rate was stated as 77%. After returning to the raw scorecards, the correct count is 62% — 8 of 13 tasters said yes to another ChatGPT cookie, not 10. The error appears to have been a counting mistake during production.

What it changes

The Episode 3 result and rankings are unaffected. Gemini still won by a substantial margin on all measures. ChatGPT's want-again rate at 62% is still a majority, still ahead of Claude's 54%, and still reflects the same relative standing. The corrected number appears in all charts and comparisons on this site from this point forward.

A note on Episode 4 data

Gemini's want-again rate was stated as 87% in this episode's video. The correct count from the raw scorecards is 80% — 12 of 15 tasters said yes. This was corrected on screen. The website reflects the accurate figure throughout.

Watch the Episode

See the Full Tasting

The complete blind tasting, the basil reveal, the snickerdoodle visual confusion, the composite vs. room split, and the four-episode series arc — all on camera.

Watch on YouTube ↗