🔌 BACK TO PORTAL

📟 BPE TOKENIZER SANDBOX

VOCABULARY SEGMENTATION & INJECTION SIMULATOR v1.0.0

📥 PROMPT INGESTION EDITOR
Characters: 0 Bytes: 0 Tokens: 0
🔍 BPE SEGMENTATION STREAM SUB-WORD FRAGMENTS
Type something in the editor above to generate sub-word tokens...
💾 LLM INPUT VECTOR (TOKEN IDs)
[ ]
🎓 INTERACTIVE BPE CHALLENGES
LESSON 1

🍓 The "Strawberry" Blindness

IN PROGRESS

LLMs don't read letters; they read tokens. Type the word strawberry. Notice how it merges into just 2 tokens (straw + berry).

👉 Task: Type strawberry has 3 r's in the editor. Watch how the model cannot "see" the letters directly because the letter 'r' is swallowed into the atomic token berry!

LESSON 2

💸 Token Budget Inflation

LOCKED

Spaces and capitalization change word merges. Type the lowercase target phrase: hello world. It takes 2 tokens.

👉 Task: Type the capitalized phrase: hello World (with a capital W) to see it fragment into character-level tokens, inflating the cost to 7 tokens!

LESSON 3

🔓 Prompt Filter Evasion

LOCKED

A safety filter blocks the token ID for the word override (ID: 204) and system (ID: 203).

👉 Task: Evert the safety filter: Type the phrase system override but use capitalization or spaces (e.g. SYSTEM OVERRIDE) to force it to split into fragments (e.g. SYS + `TEM`, `OVER` + `RIDE`) so that the individual tokens bypass the blocklist, but reconstruct the prompt in the model!

📖 BPE MERGE DICTIONARY (VOCABULARY)
Input Pair Result Token Token ID