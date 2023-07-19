Jul 19, 2023 / 12:34 PM ISTAll Rights Reserved. FT and Financial Times are trademarks of the Financial Times Limited Not to be redistributed, copied or modified in any way.

Data used to train these systems includes digitised books, news articles, blogs, search queries, Twitter and Reddit posts, YouTube videos and Flickr images, among other content.

Madhumita Murgia in London Artificial intelligence companies are exploring a new avenue to obtain the massive amounts of data needed to develop powerful generative models: creating the information from scratch. Microsoft, OpenAI and Cohere are among the groups testing the use of so-called “synthetic data” — computer-generated information to train their AI systems known as large language models (LLMs) — as they reach the limits of human-made data that can further improve the cutting-edge technology. The launch of Microsoft-backed OpenAI’s ChatGPT last...