Skip navigation

Is MR ready for synthetic data?

Posted 1st April 2025 | Category: MarketResearch

Researchers have used mock data for as long as I can remember to expedite the research process and prep the tables and charts before any real data came back from the field. Though the mock data never made complete sense, it served a purpose. With the advent of AI in all its guises, this next-gen mock data has become very clever mimicking the behaviours and attitudes of real people and generating cast amounts of data in little to no time. But is it going to replace old-school data collection and talking to actual people?

The short answer is no, at least not yet!

The potential benefits of synthetic data in market research are varied and widespread. At the ground level it could offer time- and cost-effectiveness, when compared to equivalent real-world data collection, as well as scalability including boosting hard-to-reach samples or creating additional data where samples are naturally small. Because synthetic data doesn't contain any real personal information, compliance with strict data privacy regulations like GDPR becomes less onerous.

Synthetic data has the potential to limit or even eliminate the bias inherent in real-world data - because people are people after all and don't always know what they want (and certainly don't always respond to requests to complete a survey!) Slightly ironically, however, there is a significant risk that synthetic data could introduce its own biases - from mismatched source data, insufficient and/or low-quality real-world data leading to poorly trained algorithms.

For most SME research agencies like Phoenix MRC Ltd, the resources needed to train, test and validate synthetic data systems far outweigh the benefits currently. This is especially true when existing datasets are relatively small and often highly confidential. These challenges are compounded by concerns over using personal data to train synthetic data systems and whether synthetic data can actually capture all the nuances of real-world behaviour. This is a limitation when detailed and context-specific insights are required, as is often the case.

Until synthetic data software providers can truly overcome the issues around bias and ensure watertight data protection compliance, the best-case scenario would be to have a small-scale study with real participants which could then be supplemented with synthetic data to increase the data's validity, at least for now.

While the ability of AI (and our interface with it) is growing perhaps exponentially, it has a long way to go before it can replace traditional fieldwork entirely.

Post a comment

Comment Form

There are currently no comments on this blog.