Email A/B Testing Guide 2026: Optimisation Tips

What makes one subject line outperform another? Does send time genuinely matter? Should the CTA button be blue or green? Eighty percent of marketers answer these questions with gut feeling. The other twenty percent test and win. Email A/B testing is the practice of sending two variations of the same campaign to small subsets of your list, measuring which version performs better, and then sending the winner to the remainder. HubSpot’s 2025 data shows that businesses which conduct regular A/B tests generate 37% more revenue from their email channel than those which do not. That is not a marginal gain. It is the difference between a mediocre email programme and a genuinely profitable one.

Many UK and US businesses either skip A/B testing entirely or test only subject lines and stop there. There is far more to test, and each variable carries the potential to shift conversion rates in a meaningful way.

In This Guide

• A/B Testing Fundamentals and Rules
• Subject Line Tests
• Send Time Tests
• Content and Design Tests
• CTA Button Tests
• Understanding Statistical Significance
• Step-by-Step Testing Process
• 6 Testing Mistakes to Avoid
• Frequently Asked Questions

A/B Testing Fundamentals and Rules

A/B testing appears straightforward, but producing reliable results requires discipline. Testing without rules leads to false conclusions and misguided decisions.

The Single Variable Rule

Change only one thing per test. If you are testing the subject line, keep the email body, send time, sender name, and preheader identical. If you change the subject line and the button colour simultaneously, you cannot attribute the result to either change. This is the principle of a controlled experiment and it forms the foundation of valid A/B testing.

In practice, this requires patience. The temptation to change the subject, the hero image, and the CTA all at once is understandable. But what you would be creating is not an A/B test. It is two entirely different emails. And you would have no way of knowing which change actually made the difference.

Minimum Sample Size

Sending version A to 100 people and version B to 100 people rarely produces a statistically reliable result. With small samples, random variation can dominate. Version A might achieve 25% open rate and version B 22%, but with only 100 recipients per group, this 3-point gap could easily be noise rather than signal. Repeat the same test tomorrow and the result might reverse.

General guideline: each test group should contain at least 500 recipients, preferably 1,000 or more. If your list has fewer than 5,000 subscribers, allocate 25% to each variant (1,250 per group) and send the winner to the remaining 50%. On larger lists, 10-15% per variant is sufficient, with the winner going to the remaining 70-80%.

Sufficient Wait Time

Checking results two hours after sending and declaring a winner is premature. Early openers may have a different profile from evening openers. Allow at least 4 hours before evaluating, and ideally wait 24 hours. Mailchimp, Klaviyo, and other platforms offer “automatic winner” functionality that waits a defined period before selecting and sending the winning version.

Subject Line Tests

Subject lines should be your first testing priority. The logic is simple: if the email does not get opened, nothing inside it matters. The subject line is the gatekeeper.

Variables Worth Testing

Length. Short (3-5 words) versus long (8-12 words). “New Arrivals” against “This Season’s 5 Most Popular Styles.” Mobile favours shorter subject lines because they display fully on screen. But ultra-short lines sometimes lack enough information to compel a click. Test to find your audience’s sweet spot.

Personalisation. Name versus no name. “[Name], we picked these for you” against “We picked these for you.” Some audiences respond strongly to name personalisation (10-15% open rate lift). Others find it mechanical or even unsettling. You will not know until you test.

Emoji. With versus without. “Weekend sale starts now” against “Weekend sale starts now 🎉.” In consumer-facing industries targeting younger demographics, emojis can boost opens by 5-10%. In B2B and professional services, they may undermine credibility.

Question versus statement. “Have you planned your summer holiday?” against “Summer holiday deals are live.” Questions generate curiosity. Statements provide immediate clarity. Both approaches work, but their relative effectiveness varies by audience and topic.

Numbers. Subject lines containing specific numbers (“5 tips”, “30% off”, “3 hours left”) tend to outperform abstract phrasing. Numbers create specificity and set clear expectations.

Urgency. “Final day: your basket items are going fast” versus a neutral subject line. Time-based urgency works, but only when the deadline is genuine and the approach is not used every week.

Real-World Subject Line Test Results

Test Variable	Version A	Version B	Winner
Length	“New products” (19% open)	“This week’s 5 best sellers” (24% open)	B (+26%)
Name	“Weekly newsletter” (17% open)	“Sarah, here’s what’s new this week” (23% open)	B (+35%)
Emoji	“Spring sale” (21% open)	“Spring sale 🌸” (23% open)	B (+9%)
Format	“Is your wardrobe ready?” (26% open)	“Summer collection now live” (20% open)	A (+30%)

These results reflect general trends, not universal rules. Your audience may respond differently. That is precisely why testing matters.

Send Time Tests

When you send an email can be as important as what it says. The same campaign sent on Tuesday at 10:00 might achieve a 28% open rate while the same content sent on Friday at 18:00 reaches only 16%.

Day of Week Tests

Tuesday, Wednesday, and Thursday generally produce the highest open rates for B2B audiences. Monday mornings are cluttered with weekend catch-up emails. Friday afternoons find people mentally checked out. But exceptions exist. B2B audiences in certain industries respond well to Monday morning sends (planning mode). E-commerce audiences sometimes respond best to Sunday evenings (browsing and shopping mindset).

To test effectively, split your list and send version A on one day, version B on another. Repeat for several weeks to identify consistent patterns.

Time of Day Tests

Strong send windows for UK audiences: 09:00-10:00 (work start), 12:00-13:00 (lunch break), 20:00-21:00 (evening browsing). For US audiences, adjust for time zones. EST and PST differ by three hours, so a 10:00 EST send arrives at 07:00 PST.

Test unconventional times as well: 07:30 (commuters checking phones), 14:00-15:00 (post-lunch lull), 22:00 (late-night browsers). If your platform supports send time optimisation (per-subscriber optimal time based on past engagement data), test it against a fixed time send. In some cases, per-subscriber optimisation outperforms static timing by 10-15%.

Want Data-Driven Email Optimisation?

The Bravery team runs systematic A/B testing programmes for every element of your email campaigns.

Get in Touch →

Content and Design Tests

Beyond subject lines and send times, the email body itself offers multiple testing opportunities.

Text Length

Short copy (50-100 words, single message with one CTA) versus long copy (200-400 words with detailed explanation). Promotional emails typically perform better with shorter copy. Educational newsletters and nurture content may benefit from longer form. Test the extremes first to establish a baseline, then refine.

Visual vs Text-Heavy

Beautifully designed, image-rich emails look professional but carry risks: some email clients block images by default, loading can be slow, and accessibility suffers. Text-heavy emails load instantly and display reliably everywhere but may lack visual appeal. A hybrid approach (one hero image, concise text, prominent CTA) often works best. But test it. Some industries (fashion, interior design, food) perform dramatically better with visual-heavy formats.

Sender Name

Most businesses never test this variable. Yet the sender name is one of the first things subscribers check when deciding whether to open. “Bravery” versus “Sarah from Bravery” versus “The Bravery Team.” Personal names tend to outperform brand names because people are inclined to open emails from other people. But brand-only sender names build consistent recognition over time.

Preheader Text

The preheader appears next to or below the subject line in the inbox. Many marketers leave it empty or let it default to “View in browser.” Testing an optimised preheader against a default one can lift open rates by 5-7%. Try benefit-focused (“3 new products added”) versus curiosity-focused (“You won’t want to miss this one”) approaches.

CTA Button Tests

The CTA button drives clicks, and clicks drive conversions. Small changes to button design can produce surprisingly large performance shifts.

Button text. Generic (“Click here”, “Learn more”) versus specific (“Download the guide”, “View the collection”, “Book your call”). Specific, action-oriented text almost always wins. Test different phrasing to find what resonates. First person (“Get my discount”) versus second person (“Get your discount”) is another variable worth exploring.

Button colour. High-contrast colours that stand out from the email background consistently outperform subtle ones. But the “best” colour depends on your brand palette and email design. Red, green, and orange are common top performers, but test rather than assume.

Button placement. Above the fold (visible without scrolling) versus below content. For promotional emails, above-the-fold CTA placement tends to perform better. For educational content, readers may need to read the full body before the CTA feels relevant.

Number of CTAs. Single CTA versus multiple CTAs. In most scenarios, a single CTA with one clear objective outperforms multiple competing calls to action. The exception is long-form newsletters where multiple articles each warrant their own link.

Understanding Statistical Significance

Statistical significance tells you whether your test result reflects a genuine difference or just random chance. If version A gets 24% opens and version B gets 23%, is A really better, or could the result easily reverse if you ran the test again?

A result is typically considered statistically significant when the confidence level reaches 95%. This means there is only a 5% probability that the observed difference occurred by chance. At 90% confidence, you have a higher risk of false positives. At 99%, you have very high certainty but need larger sample sizes to achieve it.

Most email platforms display confidence levels in their A/B test reports. If your tool does not, online calculators (like the one from Optimizely or VWO) can compute significance from your raw numbers.

When a test does not reach statistical significance, the answer is not “A wins” or “B wins.” The answer is “there is no meaningful difference.” In that case, choose whichever version aligns with your brand voice and test a different variable next time.

Step-by-Step Testing Process

Step 1: Define the hypothesis. “We believe that a question-format subject line will increase open rates compared to a statement format.” Every test should start with a clear hypothesis that connects the change to an expected outcome.

Step 2: Choose the variable. Subject line, send time, preheader, CTA, content length, or sender name. One variable per test, always.

Step 3: Create the variants. Version A (control) and version B (variation). The control should be your current approach. The variation introduces the single change you are testing.

Step 4: Set sample sizes. Minimum 500 per variant. Larger lists should use 10-15% per variant. Define how long the test will run before the winner is selected (4-24 hours).

Step 5: Send and wait. Resist checking results prematurely. Let the full test duration elapse.

Step 6: Analyse results. Compare the primary metric (open rate for subject line tests, CTR for content tests, conversion rate for CTA tests). Check statistical significance. If significance is reached, adopt the winner. If not, note the inconclusive result and move on.

Step 7: Document. Record every test in a shared document or spreadsheet: date, hypothesis, variable tested, sample sizes, results, and the decision made. Over time, this becomes your email marketing playbook, unique to your brand and audience.

A recommended testing cadence: subject line tests in month one, send time tests in month two, preheader tests in month three, CTA tests in month four. This four-month rotation gives you a data-backed understanding of what works for your specific list.

6 Testing Mistakes to Avoid

1. Changing multiple variables. The most common error. If you test subject line and button colour simultaneously, you learn nothing about either. Isolate variables rigorously.

2. Declaring winners too early. A 2-hour snapshot is unreliable. Wait at least 4 hours, preferably 24. Early and late openers behave differently, and premature conclusions miss half the picture.

3. Testing with tiny samples. 50 recipients per group will not produce meaningful data. Minimum 500, ideally 1,000+. Small samples generate noise, not insight.

4. Ignoring statistical significance. A 1-point difference in open rate between variants might mean nothing. If your tool reports less than 90% confidence, the result is inconclusive. Do not make strategic changes based on noise.

5. Testing for testing’s sake. Every test should connect to a business question. “Which colour button do our subscribers prefer?” is less useful than “Which CTA text drives more conversions to our pricing page?” Start with the business outcome and work backwards to the test.

6. Not documenting results. If you do not record your findings, you will repeat tests you have already run, forget what you have learned, and fail to build institutional knowledge. Keep a testing log. Share it with the team. Refer to it before designing new tests.

A/B Testing Within Automation Flows

Most businesses only A/B test broadcast campaigns. But automated flows (welcome series, cart recovery, post-purchase sequences) run continuously and often generate more revenue per email than one-off campaigns. Testing within these flows produces compounding returns because the improvements apply to every subscriber who enters the flow from that point forward.

Klaviyo, ActiveCampaign, and HubSpot all support A/B testing within automation flows. You can test subject lines, content variations, timing intervals, and incentive strategies within any automated sequence.

For a welcome series, test the timing between emails. Does a 2-day gap between email 1 and email 2 outperform a 3-day gap? For cart recovery, test whether the first reminder at 30 minutes outperforms one at 60 minutes. For post-purchase flows, test whether requesting a review at 7 days post-delivery performs better than at 10 days.

The key advantage of testing automations is sample size accumulation. A welcome series that processes 200 new subscribers per month will generate a usable sample within 5-10 months. A cart recovery flow processing 500 abandonments per month reaches statistical significance within 2-3 months. The results, once validated, continue to generate returns for as long as the flow runs.

Building a Testing Culture

The businesses that extract the most value from email A/B testing are those where testing is embedded in the workflow, not treated as an occasional exercise.

Practical steps to build a testing habit. First, include a test in every campaign. Even if it is just a subject line test, make it standard practice. Over 12 months, that produces 50+ data points about your audience. Second, hold a monthly review meeting where you examine what was tested, what was learned, and what will be tested next. Third, maintain a shared testing log that the entire marketing team can access. Include the date, hypothesis, variable tested, sample size, results, confidence level, and the action taken. Fourth, celebrate insights, not just wins. A test that produces an inconclusive result still tells you something: that variable does not meaningfully impact performance for your audience. That knowledge is valuable because it redirects effort toward variables that do matter.

Over time, a consistent testing programme transforms email from a guesswork-driven channel into a precision instrument. Each test removes an assumption and replaces it with evidence. After 12 months of systematic testing, you will know your audience better than any competitor who relies on intuition. That knowledge translates directly into higher open rates, higher click rates, higher conversion rates, and higher revenue per email. For an overview of how testing fits into broader email strategy, see our email marketing guide.

Frequently Asked Questions

How large does my list need to be for A/B testing?

You need at least 1,000 total subscribers to run a basic A/B test (500 per variant). Lists of 5,000+ allow for more precise testing with smaller percentage allocations to each variant. If your list is under 1,000, you can still test by running the same test across multiple campaigns and aggregating the data over time.

What should I test first?

Start with subject lines. They determine whether your email gets opened, which is the prerequisite for everything else. Once you have optimised subject lines, move to send times, then preheader text, then CTA elements, then content format. This progression moves from highest impact to more incremental improvements.

Can I A/B test automated emails (flows)?

Yes. Klaviyo, ActiveCampaign, and HubSpot all support A/B testing within automation flows. You can test different subject lines, content variations, or timing intervals within a welcome series, cart recovery flow, or any other automation. This is particularly valuable because automation emails run continuously, giving you a large aggregate sample over time. For setup guidance, see our email automation guide.

How long should I wait before declaring a test winner?

A minimum of 4 hours, ideally 24 hours. Most email opens occur within the first 24 hours of delivery. Evaluating earlier risks missing a significant portion of your audience. Many platforms allow you to set an automatic evaluation window (e.g., 4 hours or 24 hours) after which the winning variant is sent to the remaining list automatically.

Is multivariate testing better than A/B testing?

Multivariate testing (testing multiple variables simultaneously) is more powerful but requires significantly larger sample sizes to produce statistically significant results. For most email lists under 50,000 subscribers, sequential A/B testing (one variable at a time) is more practical and produces clearer, actionable insights. Multivariate testing becomes viable for very large lists where you can allocate thousands of recipients per variant combination.

Data beats guesswork. Let us prove it.

The Bravery team runs systematic testing programmes across subject lines, send times, content, and CTAs. Full-service email marketing management.

Get in Touch →

Sources

HubSpot. Email Marketing Research and A/B Testing Data 2025
Mailchimp. A/B Testing Best Practices Guide
Campaign Monitor. Subject Line Research 2025
Litmus. State of Email Report 2025

More than just an agency, a true business partner

Website

Google Ads

Social Ads

Email Marketing

Email A/B Testing Guide 2026: Optimisation Tips