The Wild. The Zoo. The Synthetics.

It’s nearly impossible to escape AI these days – and burying your head in the sand like the mythical ostrich won’t make the challenge go away. I posted about the topic nearly two years ago, reminding readers that we should not lose sight of real intelligence and human imperfections. Two years on, I use AI more in my research as a competent but still error-prone research assistant. It often thinks it knows what I want, but some of those suggestions can still miss the mark.

The next frontier in this discussion is synthetic data. Within marketing and market research, synthetic data refers to simulated observations designed to mimic real world behaviour within clearly defined boundaries. Its appeal is obvious: it’s faster, cheaper, and often easier to obtain than collecting data from real people. In situations where data collection is costly, slow, or constrained by privacy and ethical requirements, synthetic data can appear to offer a viable alternative.

This raises an important question for researchers and practitioners: what role should synthetic data play in understanding markets? Is it a useful simulation tool, similar to a flight simulator or a crash-test dummy? Or can it serve as a substitute for observing real behaviour? As synthetic data providers gain greater visibility and access to senior decision-makers, these questions become increasingly difficult to ignore.

To me, the debate is reminiscent of the difference between observing animals in the wild, observing them in a zoo, and observing them through a simulation. Each has its place. Each can teach us something. But they are not the same thing. So, here are my thoughts.

Synthetics: What is it good for?

No, the answer is not ‘absolutely nothing‘.

Because of its simulated nature, it can be useful to predict an uptake or appeal of new product concepts. For example, how would another variant of ‘Hot and spicy’ potato chips affect brand buyers? How would it attract category buyers? What are the likely effects of the introduction to the sales of other products? What if the brand introduces a mid-sized pack to supplement a single pack and a family pack?

It may also be useful to get responses of synthetic panels of the possible critiques or feedback of a product concept too. Especially if the synthetic panel is given the ability to provide qualitative feedback that can inform the company in refining the product formulation.

As you can appreciate, its usefulness remains firmly in the nature of simulation environment. It may well be that this is a useful step for new product introductions, or a simulated scenario if the brand sets up shop in a new geographical area or in a new retail chain. Synthetic data is similar to flight simulators for pilots or crash-test dummies for carmakers. These environments cannot perfectly recreate reality – and their usefulness lies only within set boundaries.

The Sims Don’t Behave Like Real People

Those who have played The Sims would know what I’m talking about. It is dangerous when companies treat synthetic data as an empirical substitute for observed behaviour.

With the scarcity and cost of panel data, I tried to create synthetic panel data once. The dataset looked plausible because I asked it to have variables such as brands, price, units sold, and complete with unique panelist IDs. However, when I subjected the dataset to the same battery of analyses that I routinely use to examine the Laws of Growth, the results were completely wrong.

No doubt, in corporate settings the creation of synthetic panels is likely to be more sophisticated than my attempt. However, the same real danger remains. Consumers are not wholly rational buyers. Real consumers are easily distracted by life’s challenges and our decision-making can be irrational (like buying a new laptop on a whim and yet spending half an hour scrutinising shampoo bottles). Furthermore, our motive to buy certain brands or categories is affected by whichever Category Entry Point that are triggered for that particular point. Sometimes you buy chocolate bars for your personal consumption, at other times you buy the fancy selection box because your in-laws are visiting.

This leads to the third caution.

Beware of ‘Beigification’

As an Editor of the International Journal of Market Research, I was asked to present about the role of AI in market research in an event organised by the Market Research Society in London in April this year. Along with the Editor-in-Chief and a fellow Editor, we presented our position and our advice on the usage of AI in market research.

I mentioned the danger of ‘beigification’ in synthetic data.

Synthetic data is built around probabilistic averages – thus, even if certain randomness is introduced into the data, these may be set by certain parameters. However, as I have pointed out in the previous point, these will not sufficiently capture the complexities in buyer behaviours. It will not show brands that actually deviate from the norms. For example, if they have buyer deficits because they are only available in certain states or provinces — or if the brands are only purchased less frequently because they have incomplete range of products.

This was why my attempt at creating synthetic panel data failed because everything was so beige, and so wrong.

Biases, there are Plenty

In such a confusing environment, businesses need to be aware not only of the strengths and weaknesses of synthetic data, but also of the incentives of those participating in the debate. I try to give a balanced perspective here, but even this post is influenced by my own empiricist background and preference for observing real-life behaviour. The Ehrenberg-Bass Institute‘s independence allows me to approach the issue as objectively as possible, as we are not affiliated with data collection companies, panel providers, agencies, or firms offering synthetic data solutions.

Businesses should therefore be cautious whenever they encounter certainty. Some experts will confidently declare that synthetic data will revolutionise marketing decision-making. Others will dismiss it as entirely useless. Before accepting either position, it is worth asking a simple question: what incentives might be shaping that viewpoint?

This is not unique to synthetic data. Throughout marketing, technology, and consulting, influential voices often have commercial relationships with the products, services, or solutions they advocate. Equally, incumbents whose businesses depend on traditional approaches may have reasons to resist change. The solution is not cynicism, but curiosity. Listen to the arguments, examine the evidence, and understand the interests involved before making a decision. Do the necessary due diligence.

There will be other points to consider here as well and the debate will likely continue on to the future. However, consider this. If zoos are acceptable substitute for wildlife, why would people like David Attenborough visit far-flung places to observe animals in their natural habitat? Why would anthropologists travel to countries to immerse themselves within certain ethnic groups rather than only spending time with migrants who have settled in big cities in the US, Europe, or Australia?

The answer lies in the need to observe real behaviour, in their complexities and randomness.

Jono Hey from Sketchplanations perfectly captures George Box’s iconic statement that, “All models are wrong, but some are useful.” George Box was a renowned British statistician who later worked and settled in the United States. Organic panel data and the models derived from panel data are not 100% accurate in capturing all the wonderful randomness out there in the wild, but they are useful for researchers and practitioners to observe patterns such as Double Jeopardy and Duplication of Purchase. These models have guided many brands and companies to navigate the market and to grow against their competitors.

Synthetic data has its uses, just as petri dishes, flight simulators, and crash-test dummies have theirs. Eventually, researchers must leave their laboratory and venture into the wild. Reality remains the ultimate test whether our assumptions hold.

Synthetics: What is it good for?

The Sims Don’t Behave Like Real People

Beware of ‘Beigification’

Biases, there are Plenty

Leave a Reply Cancel reply