Thumbnail IQ

The only thumbnail score that compares against your actual niche feed.

Two layers, one number. Layer 1 measures contrast, faces, text coverage, WCAG readability, vibrancy with real computer vision. Layer 2 hands the image to Claude vision and asks if it can stand out against the top 3 thumbnails actually winning in your keyword + format + size bracket. Free creators get one full run per cycle.

Score a thumbnail →See how it works

Free plan unlocks one full analysis · ~25 seconds per run · re-upload revised versions for side-by-side history

The 0–100 score

One number that fuses pixel measurements with niche-aware judgement.

Layer 1 contributes up to 60 deterministic points (contrast, face, text, readability, vibrancy, dimensions, file size). Layer 2 contributes up to 40 vision points (emotion, text psychology, color psychology, composition, title-thumbnail fit, feed distinctiveness). Same scale every time. So an 81 next month is genuinely better than tonight’s 78.

Layer 1. Pixel measurements that should be measured
Layer 2. Judgements that only judgement can make
Same scale every run. Track improvement across versions
Compared against your niche, not generic best practice
I TRIED THIS FOR 30 DAYS

I tried this for 30 days...

1280×720 · listicle · micro channel

78/100Stands out

Layer 1 · algorithm

48/60

OpenCV · OCR · WCAG

Layer 2 · vision AI

30/40

Sonnet 4.6 vs niche

Contrast (stddev 87)100
Face coverage 24%100
Text readability (WCAG)70
Feed distinctiveness80
Thirteen scoring dimensions

We measure what should be measured. and judge what should be judged.

Seven Layer 1 components run pixel-level computer vision (60 points). Six Layer 2 dimensions run Claude Sonnet 4.6 vision against your niche feed (40 points). Each one returns a score, a one-sentence verdict referencing exact visual elements, and a concrete fix when below 8.

Layer 1 · algorithm60 points · 7 components
01

Dimensions

5 pts

Full credit at 1280×720 (YouTube’s recommended resolution). Partial at any 16:9 ratio. Zero off-ratio. Those get cropped or letterboxed in feeds and tank CTR.

02

File size

5 pts

Full credit under 2 MB. Partial under 4 MB. Zero above 4 MB (loads slow on mobile data, hurts the first-frame impression).

03

Contrast (stddev)

15 pts

Greyscale standard deviation across the whole image. >80 wins full credit. High contrast separates from feed neighbours. <30 means the thumbnail looks washed out.

04

Face presence

10 pts

Haar cascade face detection. >20% of image area = full credit (faces drive CTR). 10–20% = partial. Detected at all = base credit. Zero faces is OK if vision Layer 2 says the scene compensates.

05

Text presence

10 pts

10–30% of image covered by text wins full credit (sweet spot. Readable on mobile, doesn’t crowd the visual). >30% feels cluttered, gets capped.

06

Text readability (WCAG)

10 pts

Real WCAG luminance-contrast ratio between text and its background. >7:1 wins full credit (AAA). >4.5:1 is partial. Below 3 reads as a smear at 200px.

07

Color vibrancy

5 pts

Mean HSV saturation. >120 wins full credit (vivid). Plus k-means dominant color extraction so Layer 2 can compare against the niche palette.

Layer 2 · vision AI40 points · 6 dimensions
08

Facial emotion

10 pts

What specific emotion is expressed? Is it readable at 200px (mobile feed size)? Does it match the video’s promise? If no face, does the scene create equivalent emotional pull?

09

Text psychology

10 pts

Does the text create curiosity tension without revealing the answer? Does it complement or contradict the image? Bold enough for mobile? If no text, scored against whether the visual is strong enough alone.

10

Color psychology

10 pts

Are colors emotionally congruent with the topic? Is there a single dominant color that separates this in the feed? Compared directly against the benchmark color palette.

11

Composition & visual hierarchy

10 pts

Where does the eye go first? Is there visual tension? Is the most important element in a rule-of-thirds power zone? Mobile-first read, since most YouTube viewing is mobile.

12

Title-thumbnail relationship

10 pts

Do the title and thumbnail tell DIFFERENT parts of the same story (the gold standard). Or is the thumbnail just illustrating the title? Scored zero if no title was provided.

13

Feed distinctiveness

10 pts

Compared against the actual top 3 benchmark thumbnails for your niche. Would this stand out, blend in, or disappear? Names the single most distinctive element. Or explains exactly why it blends.

Niche benchmark · listicle · micro

10 top performers

Yours

Contrast 87 · benchmark avg 64 → +36%

Face yes (24%) · 80% of top performers also have a face

Top 3 by velocity

12.4K/d

9.1K/d

6.7K/d

Niche signature

Face rate80%
Text-overlay rate90%

Common color palette

Niche-aware benchmarking

Compared against the channels you’ll actually be next to.

For every analysis we build a benchmark pool: top 50 niche videos → above-median velocity → last 12 months → >10K views → format match → size-bracket match. The top 10 by velocity become the comparison set. Layer 1 runs on each of their thumbnails, the metrics are averaged, and your face %, text %, contrast, vibrancy are compared head-to-head. The pool is cached per-niche for 30 days and shared across users. So most runs hit a warm cache.

Format-aware

Tutorial / listicle / story / comparison / revelation. Pulled separately.

Size-bracketed

Nano / micro / mid / macro. Your peers, not MrBeast.

Velocity-ranked

Views per day since publish. Recent winners, not stale viral hits.

Cached & shared

30-day pool TTL across users. Most runs hit a warm pool.

How it works

From upload to scored verdict in under 30 seconds

Five stages. Re-upload a revised version anytime. The version-history panel tracks the score across iterations so you can see exactly what moved the needle.

01

Upload + context

Drop the image, paste the draft title, pick the keyword you’re targeting. Or pull the title and keyword from a video idea you generated in Competitor Analysis.

02

Layer 1 measures

OpenCV detects faces, pytesseract reads any text, WCAG luminance ratio scores readability, k-means extracts dominant colors, HSV measures vibrancy. 60 points.

03

Niche pool built

Top 10 thumbnails for your keyword + format + size bracket are fetched + scored. Pool cached 30 days, shared across users. Most runs hit a warm pool.

04

Layer 2 vision call

Claude Sonnet 4.6 sees your thumbnail alongside the top 3 benchmark thumbnails and scores 6 psychological dimensions in context. 40 points.

05

Combined result

Score 0–100, per-dimension verdict + fix, biggest win, biggest fix, emotion label, feed-position tag, percentile vs peers, version saved to history.

Output structure

Seven distinct output blocks. Every one is fixable.

The studio doesn’t hand you a number and a vague verdict. Each block renders separately so you can scan, iterate, re-upload. And the history panel keeps every version side by side.

Combined score 0–100

Layer 1 contributes up to 60 (deterministic CV). Layer 2 contributes up to 40 (Claude vision). Same scale every time so you can track improvement across versions.

Niche-aware vision read

Claude scores against the actual top 3 benchmark thumbnails for your keyword + format + size bracket. Not generic best practices. Feed distinctiveness is real.

Per-dimension verdict + fix

Each of the 13 dimensions returns a one-sentence verdict referencing exact visual elements, plus one concrete fix when the score is below 8. Names colors, words, positions.

Biggest win + biggest fix

The single strongest element to keep. The single highest-impact change to make. Plus an emotion label, feed-position tag, and click-through prediction vs niche average.

Niche benchmark comparison

Your face %, text %, vibrancy, contrast each plotted against the niche average. Same metrics from the same algorithm. Apples to apples.

Percentile vs peers

Where your thumbnail ranks among every other Thumbnail IQ analysis run for this exact keyword + format + size bracket. So you know whether 78/100 is good for THIS niche.

Version history

Re-upload a revised version and the score is tracked side by side. The history panel shows which iteration moved which dimension and how close you are to the niche top.

What powers it

Open-source CV + Sonnet 4.6 vision. Public data only.

Layer 1 runs entirely on our infrastructure. No third-party scoring API, no per-image fees. Layer 2 calls Claude Sonnet 4.6 with your thumbnail and the top 3 benchmark images. Benchmark thumbnails come from the official YouTube Data API; the same public images anyone visiting those channels can see. Each analysis spends one credit on paid plans; free tier gets one full analysis per cycle.

Face detection

OpenCV Haar cascade · frontal-face classifier

Text OCR

pytesseract · sparse-text page mode (psm 11)

Color extraction

OpenCV k-means · k=3 dominant + HSV saturation

Readability ratio

WCAG 2.2 luminance contrast · sampled per text box

Niche benchmark

YouTube Data API · top 10 by view velocity, 30-day cache

Vision model

Claude Sonnet 4.6 · 4-image input · ~12s on warm cache

By plan

How many thumbnail scores you get each month

Free creators get one full two-layer analysis per cycle so you can try the engine on a real thumbnail. Paid plans charge one credit per run. The same engine, no feature differences. Each re-uploaded version is a fresh analysis and a fresh credit.

Free

1

analysis

per cycle

One thumbnail score per cycle. Full two-layer analysis

Solo

20

analyses

included per month

Score every iteration · 3 channels

Most popular

Growth

50

analyses

included per month

Same engine, higher monthly allowance · 5 channels

Agency

150

analyses

included per month

Pooled across 10 channels · per-version history

Same two-layer engine across every plan, including free.

See full pricing →
FAQ

Questions about the scoring engine, answered honestly.

Real answers from how the product behaves. The two layers, the niche pool, the size brackets, version history, and what won’t work.

Still have questions? Email us →
Vision models are good at semantic reads (emotion, composition, psychology) but unreliable at measurements. Asking Claude "what’s the contrast ratio of this text?" gets a confident guess. Layer 1 measures the things that should be measured. Actual pixel stddev for contrast, real Haar-cascade face detection, OCR for text coverage, WCAG luminance ratios for readability, k-means for dominant colors. Layer 2 then judges what only judgment can judge. Does the emotion match the topic, does the composition lead the eye, does this stand out against the actual niche feed. The split is the whole point.
For your keyword + format + size bracket we fetch the top 50 YouTube videos via the official Data API, filter for above-median view velocity, last-12-months only, >10K views, format match (tutorial / listicle / story / comparison / revelation), and channel-size bracket match (nano <10K, micro <100K, mid <1M, macro 1M+). The top 10 by velocity become your benchmark pool. We run Layer 1 on each of their thumbnails and average the metrics. The pool is cached for 30 days and shared across users on the same niche, so your run isn’t paying for someone else’s benchmark build.
Comparing a 5K-sub thumbnail against MrBeast’s feed is useless. The benchmark pool only includes top performers in your size bracket (nano / micro / mid / macro), so the score reflects what actually wins among channels that real viewers see alongside yours. A 78 on Thumbnail IQ for a nano channel means the thumbnail beats the average top-performing nano thumbnail in your niche. A target you can actually hit.
Layer 1 uses OpenCV’s Haar cascade for detection (presence, count, position, coverage percentage). Detection is reliable for forward-facing faces; it misses heavy profile shots and partial faces. Emotion is a Layer 2 read. Claude vision describes the specific emotion ("intense focus", "barely-suppressed laugh") and judges whether it’s readable at 200px. If Layer 1 misses your face but Layer 2 sees it, the vision score still credits you; nothing is double-penalized.
Text presence scores zero in Layer 1. Layer 2’s text-psychology dimension also scores 0. UNLESS the visual is exceptionally strong, in which case Claude is allowed to flag it as an intentional choice (some niches like ASMR or cinematic vlogs win without text). The combined score will still come out reasonable if the rest of the thumbnail compensates. We don’t hand back "ADD TEXT" as the universal fix; the suggestion is contextual to your niche.
Yes, that’s the primary use case. Upload the image, paste your draft title, pick the keyword you’re targeting. The studio runs both layers, compares against the niche pool, returns the score and the per-dimension fixes. Iterate, re-upload, score again. Every version is tracked in the history panel so you can see exactly which change moved the score, and by how much.
Often, yes. Both surfaces use YouTube’s niche-search results as the source of truth for "who’s winning here". The benchmark pool for thumbnails additionally filters by channel-size bracket and format, so the comparison set is sharper than what SEO Studio uses for title rewrites. If you’ve linked a video idea from competitor research, Thumbnail IQ explicitly references the competitor gap that idea exploits. And judges whether your thumbnail can win against those exact channels.
For every Thumbnail IQ analysis run on your same keyword + format + size bracket (across all users, since most niches have multiple creators using the tool), we compute the average algorithm score. Your percentile is "how many of those analyses scored below yours". A 78/100 might be 92nd percentile in some niches and 60th in others. The percentile is what tells you whether your number is competitive. New niches with no peers yet show 50th percentile by default until enough data accumulates.
Layer 1 works the same. Pixel measurements don’t care about the platform. Layer 2 currently judges against the standard 16:9 long-form benchmark pool, so feed-distinctiveness scoring for vertical Shorts thumbnails is approximate. Shorts get less play from the thumbnail itself (most plays start before the thumbnail loads), so this is intentionally not the top priority right now. If your Shorts thumbnails are critical to your funnel, email support and we’ll prioritize the Shorts pool build.
~20–35 seconds end-to-end on a fresh niche (Layer 1 on your image, fetch + Layer 1 on benchmark thumbnails if pool isn’t cached, then Layer 2 vision call). Cached niches return in ~10–15 seconds. Free tier gets 1 thumbnail analysis per cycle; paid plans charge one credit per run (Solo 20, Growth 50, Agency 150 pooled). Re-uploading a revised version of the same thumbnail charges a new credit because we re-run both layers from scratch.
Your uploaded thumbnail is stored on our infrastructure so the analysis can rehydrate when you reopen it later, and so the version-history panel can compare iterations. It is never shown to other users and never used as benchmark data for other channels. The benchmark pool only ever contains public thumbnails from the YouTube API. Videos that are already published and ranking. You can permanently clear an upload from the analysis history at any time.
It’s the highest-impact Layer 2 dimension. We show Claude your thumbnail alongside the actual top 3 benchmark thumbnails (by view velocity) for your exact niche, format, and size bracket. And ask: would this stand out, blend in, or disappear in that feed? The score is anchored to the visual context a real viewer would see your thumbnail in, which is the only honest way to judge "click-worthiness". Generic best-practice advice can’t do this.

Score your next thumbnail against the niche

Free plan unlocks one full two-layer analysis. Solo gets 20 / month, Growth 50, Agency 150 pooled. Most users move their score 12+ points within the first three iterations.

Score a thumbnail →