Simulating Consumer Decisions via Persona-based LLM Agents (Master's Thesis) | Dongjae Kim

Master’s Thesis: This research represents the core foundation of my work as a Market Simulation Architect, presented for my M.S. in Data Engineering at Pukyong National University.

Figure 1. Framework

Full Title: Consumer Decision-Making Simulation Using Persona-Based Large Language Model Agents

How can we predict consumer responses to a product or service that does not yet exist in the market? To engineer the dynamics of emerging markets, we need a “synthetic laboratory” that perfectly mirrors human cognition.

📌 The Problem: The Limits of Traditional Consumer Research

Traditionally, predicting market demand has relied on stated-preference (SP) surveys analyzed via Discrete Choice Models (DCM) or Agent-Based Models (ABM). However, these methods face fundamental limitations:

Static and Costly: Conducting large-scale, repetitive surveys for every new market scenario (like the introduction of Mobility as a Service) is prohibitively expensive and induces respondent fatigue.
Lack of Cognitive Depth: While DCMs calculate choice probabilities based on utility maximization, they fail to capture the complex, contextual cognitive process—the “how” and “why” behind a consumer’s decision. Furthermore, traditional ABMs rely on rigid, manually coded heuristics that lack psychological expressiveness.

⚙️ The Method: Persona-Based LLM Agent Framework

To build a truly dynamic simulation environment, I designed a framework utilizing Large Language Models (LLMs) as synthetic human agents (simulacra) based on 646 real-world survey responses.

Structured Reasoning Templates: Rather than relying on generic prompts, I engineered specific zero-shot reasoning structures mimicking human psychology: Deliberative Reasoning (DR, top-down value assessment), Alternative-Based Reasoning (AR, bottom-up feasibility check), and a Hybrid Reasoning (HR) approach that integrates both.
Multidimensional Persona Injection: Agents were injected with diverse combinations of real consumer data. The study systematically tested how different subsets of information (demographics, lifestyle & attitudes, built environment, past experiences) influence the simulation’s fidelity.
Informational Interaction via Expert Agent: I introduced a multi-agent dynamic where a ‘Transport Analyst Agent’ deeply analyzed a consumer’s profile and injected high-level expert opinions back into the consumer agent’s persona, enriching the context beyond raw data arrays.

🚀 The Impact: A High-Fidelity Synthetic Market Laboratory

Evaluated using robust metrics for imbalanced data (MCC and Balanced Accuracy), this thesis proves that LLM agents can successfully simulate complex behavioral shifts—specifically, the intention to reduce private car usage in a MaaS ecosystem.

The Power of Lifestyle over Demographics: The simulation revealed that injecting a persona’s lifestyle and intrinsic attitudes vastly outperforms surface-level socio-demographic data in replicating real human decisions.
Solving the Minority Prediction Gap: For hard-to-predict minority groups (e.g., ‘light car users’), simple data injection failed. However, the informational interaction with the Expert Agent successfully bridged this gap, proving that synthesized insights can substitute for missing detailed data.
The Ultimate Decision Engine: This architecture provides a rapid, cost-effective synthetic laboratory to test product-market fit, proving that AI agents constrained by structured reasoning can foresee the future outcomes of emerging markets with unprecedented agility.