The Impact of LLM Design on the Structural and Semantic Consistency of Generated Outputs

Master Theses Digi 2026

The Impact of LLM Design on the Structural and Semantic  Consistency of Generated Outputs
Conceptual illustration of consistency stabilization and drift in LLM architectures. Generated using OpenAI image generation tools.


Topic

This Master’s thesis investigates how Large Language Model (LLM) design variations influence the structural and semantic consistency of generated outputs. The study compares a Dense architecture (Gemma 4 31B IT) and a Mixture-of-Experts (MoE) architecture (Gemma 4 26B A4B IT) to examine whether differences in model architecture affect how consistently LLMs organize information, preserve meaning, and maintain stable outputs across repeated prompting conditions. The research further explores how prompt structure and attentional guidance influence consistent behaviour in AI-generated responses.

Relevance

As LLMs become increasingly integrated into decision-support systems, communication tools, and information-processing environments, consistency becomes critical for reliability and user trust. Inconsistent outputs across similar prompts may reduce confidence in AI-generated information, particularly in high-stakes fields such as healthcare, law, education, and business. While existing research has focused heavily on hallucinations and prompt engineering, limited attention has been given to how architectural design itself influences output consistency. Understanding these consistency mechanisms is important for practitioners seeking to implement trustworthy and predictable AI systems in professional environments.

Results

The findings demonstrated that both Dense and MoE architectures maintained high levels of structural and semantic consistency when explicit prompting instructions and formatting constraints were provided. Under controlled prompting conditions, architectural differences became less visible because structured prompting acted as a stabilization mechanism. More noticeable variation patterns emerged during repeated unconstrained generations, particularly in section organization, heading formulation, and semantic framing. Additionally, attentional prompting reduced structural and semantic drift across both architectures without requiring explicit formatting instructions. The study concludes that consistency is influenced not only by architecture, but also by prompt structure, attentional guidance, and generation context.

Implications for Practitioners

  • Structured prompting and explicit instructions can significantly improve consistency in AI-generated outputs.
  • Prompt design itself may function as a stabilization mechanism that reduces unpredictable variation.
  • Researchers evaluating LLM consistency should consider not only model architecture, but also prompting strategies and interaction design.
  • Repeated unconstrained generations may introduce section-level drift and semantic variation that can influence reliability in professional environments.
  • Attentional guidance techniques may help improve output stability without requiring rigid formatting constraints.

Methods

The thesis applied an iterative quasi-experimental research design consisting of three interconnected experimental phases. Two open-weight Gemma models from the same model family were compared: the Dense model Gemma 4 31B IT and the Mixture-of-Experts model Gemma 4 26B A4B IT. The first experimental phase examined structural and semantic consistency using systematic prompt variations across 25 prompts per model. The second phase explored recurring variation patterns through repeated unconstrained generations using a custom Replit-based prompting application. The third phase investigated whether attentional prompting improved consistency behaviour across repeated outputs. Generated outputs were evaluated using human evaluation, rubric-based scoring, GPT-5 LLM-as-a-judge evaluation, and Gemini 2.5-supported pattern analysis.