Master Thesen BA 2026

Artificial Intelligence in Portfolio Management: An Empirical Comparison of Tactical and Strategic Asset Allocation in a CHF-Denominated Global ETF Portfolio

Yannic Rieder

07 Jun 2026 / 2 min read

AI-generated visualisation of artificial intelligence, global financial markets, and Swiss-franc-based portfolio allocation.

Topic

This thesis investigates whether weekly tactical asset allocation recommendations elicited from a large language model can improve the risk-adjusted performance of a Swiss-franc-denominated global equity portfolio. A CHF 10 million portfolio of six regional iShares Core ETFs — Switzerland, the Eurozone, the United States, Pacific ex-Japan, Japan, and Emerging Markets — was rebalanced weekly based on zero-shot ChatGPT recommendations over an eight-week live window in early 2026 and compared with an equally weighted 1/N buy-and-hold benchmark. All returns, risk measures, and attribution results are evaluated from a Swiss-franc investor perspective.

Relevance

Large language models can now produce fully reasoned allocation recommendations within seconds, and financial practitioners are increasingly tempted to use them as investment decision aids. Yet whether this verbal fluency translates into portfolio value remains largely untested, especially for Swiss-franc-based investors exposed to both global equity markets and currency movements. This thesis provides a structured CHF-denominated evaluation of LLM-elicited tactical asset allocation under a realistic wealth-management mandate. It also examines how the model forms its recommendations, which is essential for responsible and controlled adoption in portfolio management practice.

Results

The model did not outperform the strategic benchmark. The LLM-based tactical allocation returned −1.60% in Swiss-franc terms, compared with −0.35% for the equally weighted benchmark, resulting in an underperformance of 1.25 percentage points. Its realised Sharpe ratio was −0.50 versus −0.01 for the benchmark; marginally lower volatility of 17.53% versus 18.44% did not compensate. The shortfall was mainly an allocation result, particularly a persistent underweight in Europe, rather than a currency or cash effect. The model’s weekly weights were systematically aligned with backward-looking performance figures supplied in the prompt, consistent with an anchoring-like output pattern.

Implications for Practitioners

Large language models can help structure investment reasoning and synthesise complex market information, but they should not be used as standalone portfolio allocators.
LLM-generated allocation recommendations require independent quantitative validation before implementation.
Prompt design matters: when supplied mainly with backward-looking performance figures, the model may strongly reflect these inputs in its allocation output.
Regime-break controls are essential, because reliance on backward-looking signals can be particularly problematic during market regime transitions.
Mandate-compliance checks should be enforced outside the model, not left to the model itself.
Verbal plausibility is not the same as allocation quality; coherent investment rationales should still be treated with disciplined scepticism.

Methods

The thesis applies an empirical case-study design based on a CHF 10 million global equity ETF portfolio. Six regional iShares Core ETFs form the investment universe, while an optional CHF money-market sleeve is available only to the tactical strategy. Each week, a zero-shot prompt supplied trailing performance data and market-context information to ChatGPT, which generated regional allocation weights. The tactical portfolio was rebalanced weekly and compared with a 1/N buy-and-hold benchmark over the period from 27 February to 24 April 2026, based on 38 daily CHF return observations. The analysis combines performance and risk metrics with Brinson allocation attribution, Karnosky-Singer currency decomposition, counterfactual cash analysis, a naive-momentum comparator, transaction-cost sensitivity, regime triangulation, and a Spearman rank-correlation test for the Sharpe-anchoring hypothesis.