Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Traditional travel surveys are costly, time-consuming and face declining response rates, motivating the exploration of artificial data generation methods. In this research, we propose a novel Persona-Driven Method (PDM) for generating synthetic mobility survey data via large language models (LLMs). The method defines representative personas—each characterized by specific sociodemographic attributes—and prompts an LLM to emulate survey respondents with these personas. A guided prompting strategy is introduced to calibrate the synthetic data distributions so that they closely match real-world population statistics. We evaluate the approach on the German MiD 2017 (Mobilität in the Deutschland 2017) dataset. The quality of the LLM-PDM-generated synthetic data is assessed against ground truth data via a comprehensive set of metrics, including the mean absolute error (MAE), root mean square error (RMSE), Jensen‒Shannon distance (JSD), entropy, conditional entropy and the Earth Mover’s distance (EMD). The empirical results demonstrate that the LLM-PDM approach produces high-fidelity synthetic populations that preserve key distributions and relationships present in real data. Across the case studies, the LLM-PDM method achieves low distributional errors (e.g., MAE < 3%) and captures important joint patterns, significantly outperforming a number of LLM baselines.

This is an open access article under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0 http://creativecommons.org/licenses/by/4.0/).
Comments on this article