When Generalizability Breaks: Understanding Data Quality in Market Research

Renato Silvestre
5 days ago
5 min read

If it quacks like a bot, it is one. Rethinking trust, data quality, and fraud detection in modern survey research.

The Duck Test No Longer Works

In a city that never sleeps, even a moment can make a statement. STRATEGENCE was in Times Square thanks to Verisoul

Our billboard may have only been up for a moment, but the issue it represents is not new. We've been focused on sampling and generalizability since the early days of online research, long before data quality became the headline.

A quick rewind

From 2000 to 2004

It started during my time at Millward Brown Interactive in San Francisco in the early 2000s. Even then, many resisted, questioning the biases inherent in online research.

SSI was beginning to transition telephone-based sampling into online panels (SurveySpot), and they weren't even the dominant player. Feasibility was always the question.

E-Rewards came knocking, but there were immediate questions about how representative the panel really was. Recruiting through loyalty point programs seemed likely to skew the sample toward frequent travelers, particularly higher-income business professionals. Looking back, the better question might have been, "How do I invest?" It went on to become, in my view, one of the best-managed panels in the industry. Recruitment was controlled via email invitations, and participation was governed by clearly defined rules actively managed.

Global Market Insite (GMI). As a market research practitioner, I made it a point to join panels to understand how they operated, from membership to survey practices. At one point, I forgot my password and couldn't get back in. After a few attempts to reset it and even creating a new account, I found myself completely locked out, effectively blacklisted from rejoining or reactivating. Looking back, it's actually refreshing. A level of control and discipline that feels almost foreign by today's standards of convenience.

Greenfield Online filed for an IPO in 2000, at the height of the Internet bubble, then tabled it until 2004. In the interim, they quietly offered select customers the opportunity to purchase shares at a reduced price. I was seeing the writing on the wall. The bubble was already imploding.

From 2005 to 2016

SSI and E-Rewards were among my primary sources. SSI was my primary source for consumer sample, while E-Rewards was the go-to for B2B.

Early Quality Metrics

From 2009 to 2013, as part of a large engagement, we conducted a study with over 100,000 completions across multiple panels. Because we were able to collect names and email addresses, we could observe patterns that are typically hidden. There was about a 3% overlap across panels. Roughly 5% of respondents made careless errors, such as entering an email incorrectly, misspelling, or placing it in the wrong field. More notably, around 7-10% appeared to be fraudulent, manually attempting to participate more than once. Ah, the good ol' days.

Mobile Usage

In 2012, about 10-12% of respondents were accessing surveys on their phones. We monitored this group closely and, in some cases, restricted access, especially when concept images were involved. The devices at the time were nowhere near what they are today, and they introduced a real layer of response bias tied to screen size and form factor. That made survey design and look-and-feel even more critical. By 2017, the share of respondents coming in via smartphone had more than tripled.

From 2017 to 2019

We conducted our own research on research (RoR), studying sampling, survey design, and data quality across both traditional panels and programmatic samples.

Overall Topline Learnings

The headline finding was sobering:

Regardless of sample type, 15% of respondents exerted questionable effort, failing multiple trap questions. The problem wasn't unique to any one source. It was systemic.
Age mattered more than expected. About 30% of 18-to-34-year-olds showed questionable effort;.
Respondents aged 55 to 70 had the lowest failure rates and took three to four minutes longer to complete the same survey. All things considered, older respondents simply seemed more engaged as suggested by the quality of their data.

Traditional Non-programmatic vs. Programmatic Sample

The programmatic sample presented its own challenges.

Traditional, non-programmatic panel respondents were twice as likely to complete a recontact survey.
Programmatic fielding was erratic. In one instance, 600 completes came in within 20 minutes, followed by inconsistency for the remainder of the field period.
This uncontrolled velocity suggested that once devolves for open, sources had little control in the sampling.

Survey Design & Implementation

We tested a three-flag QC approach: speed, question traps, and human review for duplicate IPs and verbatim responses. We learned...

Depending on the speed threshold applied, between 16% and 28% of respondents were flagged across all three methods.
The standard threshold at the time was catching only the most egregious offenders. The data made a clear case for tightening controls and monitoring systems in survey design and throughout the interview flow.
One finding that tends to get overlooked: questionable effort was highest on Day 1 of fielding and declined over time. The respondents who rushed in first were the most problematic.

These practitioner-level insights had real implications for how we recruited sample, designed surveys, and managed fieldwork.

The Landscape Today

What we were seeing back then was a warning sign. The shift from managed panels to programmatic and exchange-based sampling has created a fragmented ecosystem that is, in many cases, opaque. Efficiency improved while control and transparency declined. That tradeoff happened quietly, at scale, and the industry is still dealing with the consequences.

The "get paid to" affiliate model only accelerated it. When respondents are recruited through cash or reward-based offers on third-party sites and apps, the primary incentive is earning, not answering. Traffic gets routed into surveys with little regard for intent or identity.

Panel consolidation added another layer. What was once a competitive landscape with distinct approaches has narrowed into a handful of scaled players. SSI and Research Now merged into Dynata in 2017. By 2024, Dynata was restructuring under bankruptcy pressure. Opinions 4 Good, once a known panel provider, saw its executives federally indicted in 2025 for large-scale survey fraud. These are not isolated events. They are signals of a system that prioritized throughput over integrity for too long.

The numbers reflect it. Without robust detection systems, up to 35% of collected data can be biased, duplicated, or fabricated. Compare that to the 7-10% manual fraud we were seeing in the early 2010s, when this was still done by hand. Technology didn't solve the fraud problem. It scaled it.

At SampleCon, an annual sample industry conference, a speaker tried to map the current provider ecosystem on a single slide. It was so fragmented that even the organizers couldn't agree on how to categorize the players. That image says more about the state of the industry than any white paper.

The billboard in Times Square asks a simple question using a familiar line. If it looks like a duck and quacks like a duck, it's probably a bot in your survey. That line works because it used to be obvious. Today, it takes a lot more than common sense to tell the difference, and that's exactly why working with the right partner and the right systems matters more than ever.

When Generalizability Breaks: Understanding Data Quality in Market Research

The Duck Test No Longer Works

A quick rewind

The Landscape Today

Recent Posts

Comments