Cookies That Taste Like Summer
ChatGPT posted its highest composite score in four episodes. Gemini won the room anyway. Someone put basil in a sugar cookie and two neighbors independently identified it — and both chose it as their favorite.
The Scoreboard
ChatGPT led on composite score and on smell — its strongest individual metric in the series. Gemini led on texture and had the most favorite votes. Claude came in third in every category for the second episode running.
Average scores by criterion
15 tasters · scored 1–10 on Taste, Texture, and Smell
Composite averages
| Cookie | Taste | Texture | Smell | Favorites | Want Again | Composite |
|---|---|---|---|---|---|---|
| ChatGPT — The Overachiever | 7.27 | 8.07 | 8.15 | 6 / 15 | 67% | 7.83 |
| Gemini — The Berry Patch | 7.60 | 7.93 | 7.21 | 7 / 15 | 80% | 7.58 |
| Claude — The Beach Day | 7.13 | 7.53 | 7.21 | 2 / 15 | 60% | 7.29 |
The Math and the Room Disagreed
ChatGPT led on every scored metric. Gemini led on every democratic measure. This is the second episode where composite score and favorite votes pointed in different directions.
Favorites & Want-Again Rate
The cookie that scored highest on paper did not get the most favorites or the highest want-again rate.
Per-Taster Breakdown
Gemini won 6 individual matchups. ChatGPT won 5. Claude won 2. Two tasters produced mathematical ties — Tasters 7 and 15 — and in both cases the tasters' favorite picks differed from each other.
Composite = average of taste, texture, and smell. Where smell was not recorded, average of available scores used.
The Basil and the Snickerdoodle
Two independent signals emerged from the scorecard language — one for ChatGPT's cookie and one for Claude's. Both revealed something about how the recipes were actually being experienced.
The Basil Signal — ChatGPT
ChatGPT's recipe included 6g of fresh basil. Two tasters independently identified it as the main flavor.
What this means: Both tasters who detected the basil chose ChatGPT as their favorite. The ingredient that was the riskiest creative choice turned out to be the strongest converting factor for the tasters who caught it. ChatGPT's cookie rewarded attentive palates.
The Snickerdoodle Signal — Claude
Two tasters independently described Claude's cookie as "snickerdoodle" — a cinnamon cookie. Claude's recipe contains no cinnamon.
What this means: The issue was visual, not flavor. Claude's turbinado sugar coating produces a crackled golden crust that looks like a snickerdoodle before the first bite. Tasters arrived expecting cinnamon and found citrus and coconut instead. A cookie that looks like one thing and tastes like another sets up a mismatch the flavor has to overcome — and in a blind tasting, first impressions are everything.
The Language of Each Cookie
Gemini's words were clear and sensory. ChatGPT's split between the curious and the confused. Claude's coconut and turbinado profile produced the widest range of any cookie this episode.
Highlighted chips = appeared 2+ times or notable. Nine out of fifteen tasters identified blueberry as Gemini's main flavor — the strongest single-ingredient identification of any cookie in any episode.
Who Was Tasting
15 tasters. Ages 11–78. 8 female, 7 male. The age groups split perfectly — five tasters each in Under 26, 26–50, and 51+. The demographic story this episode was about coconut.
Favorites by gender
Male tasters chose Gemini at a 71% rate — 5 of 7. Claude received zero male favorites.
Favorites by age group
Under-26 gave Claude zero favorites and zero want-anothers. 26–50 gave Gemini 100% want-again.
Want-again by gender
Female tasters had Gemini and ChatGPT tied at 75%. Male want-again favored Gemini (86%).
Want-again by age group
26–50 and 51+ both gave Gemini 100%. The 51+ group gave Claude 100% as well — its strongest demographic result.
The Series Arc
Every possible ranking order has now occurred. Claude won Episodes 1 and 2. Gemini won Episode 3. ChatGPT leads composite in Episode 4 for the first time. Gemini is the only AI that has never finished last.
Composite score by episode
Episodes 1–4 · Note: Episode 1 used a 5-metric composite (taste, chocolate, texture, toffee, smell). Episodes 2–4 use 3 metrics (taste, texture, smell).
Want-again rate by episode
Gemini has been at or above 70% every episode. ChatGPT's worst result was 40% in Episode 1; Episode 4 is 67%. Claude's range has been the widest of any AI — 90% in Episode 1, 54% in Episode 3.
Episode rankings — composite score
| Episode | Brief | 1st | 2nd | 3rd |
|---|---|---|---|---|
| Ep 1 | Toffee-forward chocolate chip | Claude 7.38 | Gemini 6.84 | ChatGPT 6.20 |
| Ep 2 | Classic chocolate chip | Claude 7.73 | Gemini 7.50 | ChatGPT 7.41 |
| Ep 3 | Umami-forward chocolate chip | Gemini 7.64 | ChatGPT 7.08 | Claude 6.26 |
| Ep 4 | Cookies that taste like summer | ChatGPT 7.83 | Gemini 7.58 | Claude 7.29 |
Episode 3 Data Error
What happened
In the Episode 3 video, ChatGPT's want-again rate was stated as 77%. After returning to the raw scorecards, the correct count is 62% — 8 of 13 tasters said yes to another ChatGPT cookie, not 10. The error appears to have been a counting mistake during production.
What it changes
The Episode 3 result and rankings are unaffected. Gemini still won by a substantial margin on all measures. ChatGPT's want-again rate at 62% is still a majority, still ahead of Claude's 54%, and still reflects the same relative standing. The corrected number appears in all charts and comparisons on this site from this point forward.
A note on Episode 4 data
Gemini's want-again rate was stated as 87% in this episode's video. The correct count from the raw scorecards is 80% — 12 of 15 tasters said yes. This was corrected on screen. The website reflects the accurate figure throughout.
See the Full Tasting
The complete blind tasting, the basil reveal, the snickerdoodle visual confusion, the composite vs. room split, and the four-episode series arc — all on camera.
Watch on YouTube ↗