Apple researchers have unveiled a novel method to train a large language model (LLM) to produce high-quality SwiftUI code — and it essentially taught itself through an automated feedback loop.
The project, detailed in the study UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback, addresses a longstanding problem in AI coding: while LLMs have become adept at general programming and creative writing, they often fail to generate syntactically correct, well-structured user interface code. The reason, according to the researchers, is that examples of UI code are scarce in most training datasets, sometimes making up less than one percent of the data.
Starting from scratch with minimal SwiftUI exposureThe team began with StarChat-Beta, an open-source coding-focused LLM, and provided it with a list of UI descriptions. From there, the model generated a large synthetic dataset of SwiftUI programs based on those descriptions. Each generated program was first compiled using Swift to ensure it ran without errors.
The compiled interfaces were then analysed by GPT-4V, a vision-language model that compared them to the original descriptions. The output was refined over multiple iterations — with each improved version of the model producing cleaner and more accurate SwiftUI code than the last.
After five training rounds, the researchers had created nearly one million SwiftUI programs (996,000 to be exact) and a new model, UICoder, that consistently produced interfaces matching the given prompts far more closely than the original StarChat-Beta.
Testing showed that UICoder significantly outperformed the base StarChat-Beta on both automated metrics and human evaluations. It came close to GPT-4 in overall quality and even surpassed it in compilation success rates — a crucial measure for any coding model.
One surprising aspect of the study was the lack of SwiftUI examples in StarChat-Beta’s original training data. The model had been trained on three main datasets: TheStack (a massive collection of permissively licensed code), crawled web pages, and the OpenAssistant-Guanaco instruction-tuning dataset. Due to an oversight, Swift repositories were excluded from TheStack, and OpenAssistant-Guanaco contained just one Swift-related example in 10,000 responses. This meant that almost all SwiftUI knowledge in UICoder came from the self-generated, high-quality dataset built during Apple’s experiment — not from pre-existing examples.
The researchers believe their approach could be adapted for other languages and UI frameworks, potentially improving code generation across a wide range of platforms.
The full study is available on arXiv under the title UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.