AI Duel
Abinav 0
vs
Pranav 0

One screen.
Two brothers.
Four ways to duel.

Four hands-on AI projects you build together — then race each other through. Train, prompt, predict, deploy. Whoever wins the most rounds wins the season.

PLAYER 1
vs
PLAYER 2
PROJECT 01

Doodle Duel

Teach the AI to tell two things apart by drawing examples. Then duel: who can draw clearer pictures?

teaches → classification · features · training data
PROJECT 02

Mind Reader

Write sentences. Guess which one the AI will think is closest to your opponent's query. Read its mind.

teaches → embeddings · cosine · vector DB
PROJECT 03

Prompt Wars

Same challenge, two prompts. The model judges. Precision beats eloquence — every time.

teaches → prompting · constraints · format
PROJECT 04

Agent Arena

Each player builds an agent with tools and memory. Same task. Side-by-side traces. Best agent wins.

teaches → tools · memory · agent loop
★ CHAMPIONSHIP

The Gauntlet

Random rotation through all four projects. First to win 3 takes the season.

teaches → all of it · under pressure

SEASON STANDINGS

← back to home
Project 01 · Classification

Doodle Duel

Co-train a classifier. Then take turns drawing test pictures — the AI judges yours, the cleaner picture wins the point.

▾ Deeper — how the classifier actually works

When you press "+ add", the drawing is shrunk to a 16×16 grayscale grid — that's 256 numbers per drawing. Each label's gallery becomes a list of vectors.

When a new doodle comes in, the model computes its 256-number vector and finds the 3 nearest neighbours across both classes (cosine distance). Majority vote wins; the margin is shown as "confidence". This is literally k-NN — the simplest possible classifier, no neural net.

Bigger lesson: a clean, diverse set of examples beats a clever algorithm. If your "cat" doodles all look similar, the model will only recognise cats that look like yours.

← back to home
Project 02 · Embeddings

Mind Reader

Write sentences into a shared library. On your turn, write a query — your opponent must guess which library sentence the AI picks as #1. Read the AI's mind.

▾ Deeper — how the AI compares meaning

Every sentence gets turned into a 12-number vector by hashing each word into one of 12 bins (a tiny embedding). Sentences that share words land in similar shapes.

To rank the library, the AI computes cos(query, sentence) for every sentence — that's the same cosine similarity from Day 7½ of the course. Highest cosine wins. We show all 10 numbers so you can see exactly why one beat another.

Bigger lesson: an embedding doesn't care about word order, only what words are in the sentence. Real embedding models do care about order (that's what transformers fixed). The intuition you're learning here is the foundation.

← back to home
Project 03 · Prompting

Prompt Wars

Same challenge. Two prompts. One deterministic model judges both. The judge counts bullets, validates JSON, checks for keywords — eloquence won't save you, precision will.

▾ Deeper — what the "model" and judge actually do

The "model" here is a tiny rule engine — it inspects your prompt for keywords ("bullet", "haiku", "json", "pirate", explicit counts like "5 sentences") and renders text accordingly. It's deterministic: the same prompt always produces the same reply.

The judge then runs measurable checks: count bullets, count syllables (for haiku), JSON.parse for valid JSON, regex for required keywords. Each check is worth 1 point.

Bigger lesson: real LLMs also reward specificity. "5 short bullets" beats "make a list". "Reply in valid JSON" beats "give me data". The skill that wins here is the same skill that wins in production.

← back to home
Project 04 · Agents

Agent Arena

Both players assemble an agent. Pick tools, write a system prompt, seed memory. Same task. Watch both agents run side by side. Judge by tool fit, requirements hit, and trace length.

▾ Deeper — how the agents run and how they're judged

Each agent goes through the THINK → ACT → OBSERVE loop. The trace branches based on which tools you gave it. No tool for the job → BLOCKED step (and lost points).

Auto-judge scores each agent on: tool fit (did the picked tools match the task?), requirements hit (did the final answer mention the key things?), and trace efficiency (shorter is better — but not so short it skipped a needed step).

Bigger lesson: more tools isn't better. A focused agent with the right 2-3 tools and a sharp prompt beats a kitchen-sink agent every time.

↪ PASS THE LAPTOP

Pass to Pranav

Hand the laptop over. When you're ready, press start.