ResearchMarch 12, 2026 · 8 min read

We Put Our Watermarks Through Real WordPress and CDN Pipelines. Here's What Actually Survived.

Last time, we ran a controlled lab benchmark. This time, we pushed the same marked images through a real WordPress installation and a real Cloudinary CDN. 366 decode tests later, we have numbers — and a few surprises.

The short version

Across 11 images pushed through real WordPress renditions and Cloudinary CDN transforms, our leading experimental candidate recovered provenance in 111 out of 122 test cases (91%) — versus 109 out of 122 for the current baseline (89%). The gap is smaller than in our lab test. But the real story is in where each one fails — and what that means for the migration decision.

Why we needed to run this

Two weeks ago, we published our first controlled benchmark — 11 images, 3 watermark configurations, 11 synthetic transformations each. The experimental candidate cut failures from 7 to 2. That was a good signal.

But we said explicitly: those transformations were run in a controlled lab. We simulated what WordPress, CDNs, and social platforms do — but we had not actually sent marked images through the real thing.

This matters because real pipelines are messy. WordPress does not just resize — it applies its own JPEG quality settings, sometimes re-encodes to WebP, and generates seven different renditions per upload. A CDN like Cloudinary applies its own compression algorithms, format conversions, and quality settings that may differ from what we simulated.

So we ran the same 11 original images — each marked with all three watermark candidates — through a real WordPress site and a real Cloudinary account. This is what came back.

What we actually did

Scale

11 original images
3 watermark candidates
33 marked uploads
12 transforms per image
366 total decode checks

Real infrastructure

WordPress 6.x (live site)
7 WordPress rendition sizes
Cloudinary CDN
5 CDN transform profiles

For WordPress, we uploaded each marked image via the REST API and let WordPress generate all its standard renditions — thumbnail, medium, medium_large, large, 1536, 2048, and full. For Cloudinary, we uploaded the same marked images and applied five transform profiles representing common CDN delivery scenarios: hero images, aggressive JPEG, WebP, square crops, and small thumbnails.

Then we decoded every single rendition against the original watermark. No filtering, no cherry-picking. Every result is in the tables below.

The overall numbers

Configuration	Recovered	Rate	Note
Current baseline	109 / 122	89%	Production default today
Experimental Q	111 / 122	91%	Best overall — 2 fewer failures
Experimental P	91 / 122	75%	Best on crops, weak on small sizes

The first thing you notice: the gap between baseline and Candidate Q is smaller than in our lab test. In the lab, the experimental candidate had 119/121 vs. 114/121 — a clean 5-failure advantage. In the real world, it is 111/122 vs. 109/122 — only 2 fewer failures.

That is an honest result. Real pipelines narrow the gap because they combine transformations differently than our synthetic tests did. But Candidate Q still leads, and importantly: it never performed worse than the baseline on any single transform category.

WordPress: the detailed picture

WordPress generates multiple sizes per upload. Here is how each watermark candidate performed across all seven standard rendition sizes:

WordPress renditions (7 sizes × 11 images)

Transform	Baseline	Candidate Q	Candidate P
Full size (original)	✅	✅	10/11
2048 × 2048	6/6	6/6	6/6
1536 × 1536	8/8	8/8	8/8
Large (1024 px)	10/10	10/10	10/10
Medium-large (768 px)	10/10	10/10	10/10
Medium (300 px)	10/11	✅	3/11
Thumbnail (150 × 150)	3/11	4/11	1/11

Three things stand out:

Candidate Q fixed the WordPress “medium” gap

The baseline lost 1 out of 11 images at the 300px medium size. Candidate Q recovered all 11. This is exactly the kind of small-but-meaningful improvement that adds up across a site with hundreds of images.

Thumbnails are brutal — for everyone

At 150 × 150 pixels, the watermark barely survives regardless of configuration. The baseline recovered 3 out of 11, Candidate Q got 4, Candidate P only 1. This is not a bug — it is a physics problem. At that resolution, there simply are not enough pixels to carry a robust watermark signal. Our recommendation: treat thumbnails as outside the watermark recovery envelope and rely on the audit trail + fingerprint match for those sizes.

Everything above 768 px: near-perfect for baseline and Q

At medium_large and above, both the baseline and Candidate Q achieved perfect or near-perfect recovery. This confirms that for the sizes visitors actually see on a published page — hero images, content images, gallery views — the watermark layer is reliable.

Cloudinary CDN: the detailed picture

CDN transforms are different from WordPress renditions. Instead of fixed sizes, they apply on-the-fly format conversion, quality adjustment, and cropping. Here is how each candidate performed:

Cloudinary CDN transforms (5 profiles × 11 images)

Transform	Baseline	Candidate Q	Candidate P
Hero image (1200 px, JPEG q85)	✅	✅	10/11
Standard (800 px, JPEG q60)	✅	✅	9/11
WebP (800 px, q80)	✅	✅	10/11
Large WebP (1200 px, q85)	✅	✅	✅
Square crop (800 × 800)	8/11	8/11	10/11
Small thumbnail (400 px, JPEG q70)	10/11	10/11	4/11

The CDN picture confirms the lab test results more closely:

Baseline and Q: nearly flawless on standard CDN

Both achieved perfect scores on hero images, standard JPEG, WebP conversion, and large WebP. Only the square crop (which removes 36% of pixels) and small thumbnails caused any losses.

Candidate P shines on crops — but pays for it elsewhere

Here is the most interesting result: Candidate P recovered 10 out of 11 on the square crop, versus 8 out of 11 for both baseline and Q. But it dropped to 4 out of 11 on small thumbnails and lost images on nearly every other transform. The trade-off profile is very clear: P is optimized for aggressive geometric changes, not for compression resilience.

Lab test vs. real world — what changed

Metric	Lab test	Real-world test
Total tests	363	366
Baseline recovery	114/121 (94%)	109/122 (89%)
Candidate Q recovery	119/121 (98%)	111/122 (91%)
Candidate P recovery	117/121 (97%)	91/122 (75%)
Q advantage over baseline	+5 recoveries	+2 recoveries

The gap narrowed. That is expected — real pipelines apply transformations that our synthetic tests did not perfectly replicate. The important finding is directional: Candidate Q still leads, and the overall robustness of both baseline and Q is solid enough for production use at sizes above thumbnail.

The biggest surprise was Candidate P. In the lab, it performed nearly as well as Q (97% vs. 98%). In the real world, it dropped to 75%. Real WordPress JPEG rendering and Cloudinary's compression algorithms hit P's encoding approach much harder than our synthetic transforms did. This is exactly why real-world testing matters.

What this means for our migration decision

We started this benchmark series with a clear question:

Is the experimental configuration better enough to justify changing the production default?

After both the lab test and the real-world test, the answer is: yes, but cautiously.

Candidate Q is the better all-rounder.

It never performed worse than the baseline on any transform. It fixed specific gaps (WordPress medium, some CDN edge cases). The improvement is modest but consistent.

Candidate P is not a production default.

The real-world results eliminated it from consideration as a general-purpose configuration. Its crop resilience is remarkable, but the compression weakness is too severe.

Thumbnails need a different strategy.

No watermark configuration reliably survives 150 × 150 pixels. For those sizes, the audit trail and perceptual fingerprint are the recovery path — not the watermark.

What we have not tested yet

Full transparency on scope:

Social platforms. We have not yet pushed images through Instagram, Twitter/X, Facebook, or LinkedIn upload pipelines. Those are planned.
WebP-native WordPress. Our test site delivered JPEG renditions. WordPress 5.8+ with WebP conversion enabled adds another layer of re-encoding. We want to test this separately.
Image optimization plugins. Smush, ShortPixel, Imagify — these add yet another round of compression on top of WordPress defaults.
Other CDNs. Cloudflare Image Resizing, imgix, AWS CloudFront — each has its own compression pipeline.

We are committed to running and publishing these tests. The methodology is documented and the scripts are reusable — adding new pipeline targets is straightforward.

Where we go from here

Based on two benchmark rounds — 729 total decode tests across lab and real-world pipelines — we are confident enough to make a decision:

Our plan

Candidate Q will become the new production default for new marks. The current baseline remains valid and verifiable — existing marks are not affected. We are making this switch before onboarding paying customers specifically so that no one needs to worry about a mid-service migration later.

We will continue testing against additional pipelines (social platforms, more CDN providers, image optimizer plugins) and publish those results the same way — with full data, honest framing, and no marketing spin.

If you are evaluating MarkMyAI, this is the kind of transparency we believe the market needs. Watermark robustness is not a checkbox feature — it is an engineering problem that requires real data. We intend to keep showing ours.

Previous benchmark

This article is the follow-up to our controlled lab benchmark. If you want to see the original test setup and synthetic transformation results, start there.

Lab benchmark results Why CMS pipelines break provenance