Project
Expanding the HORIZON (MSc Thesis)

Overview
I investigated whether Horizon Europe’s climate-related research portfolio is undergoing a rhetorical shift toward “strategic autonomy” framing; I built an LLM-based pipeline on ~53k SEDIA project descriptions, introduced a funding-weighted Strategic Alignment Score on ~4k climate-aligned projects, and released sedia-api-fetchers for reproducible EU funding data access.
Expanding the HORIZON: The Strategic Repurposing of Europe's Research Framework
Utrecht University · MSc Applied Data Science · July 7, 2025 · I wrote this thesis in cooperation with Open Future
I researched whether the EU's Horizon Europe programme, with its €95.5 billion budget, is undergoing a strategic rhetorical shift in its climate-related research portfolio, especially as policy talk moves toward “strategic autonomy” and security. I did not stop at headlines: I audited discourse at the level of individual funded projects, because that is where budget, transparency, and democratic oversight actually meet the text.
I designed a computational framework around large language models: I combined semantic similarity (embeddings), transformer-based classification, and zero-shot inference to detect discursive drift from green language toward strategic framing. I introduced a new metric, the Strategic Alignment Score, and I tracked it over time weighted by funding volume, so “green-to-strategic” drift is measurable in euros, not just word counts. I built the corpus from roughly 53,000 project descriptions from the SEDIA database and concentrated my main analysis on 4,049 high-confidence climate-aligned projects I identified from objective text alone.
Here is what I found (aligned with my abstract and results chapters): I report a statistically significant increase in funding-weighted strategic alignment when I compare the Horizon 2014 to 2020 period with Horizon Europe 2020 to 2027. Within Horizon Europe itself, I do not see the same shift, including around the 2022 Russian invasion of Ukraine. My unweighted strategic framing scores stay largely flat over time; the action shows up in which projects get money, not in a uniform language change across all project texts. I frame this as a reproducible NLP method for discourse auditing on large funding programmes, with direct stakes for climate policy and transparency.
I developed and released the open-source Python package sedia-api-fetchers so others can reproduce automated retrieval and processing from the EU funding APIs: pagination, rate limits, temporal partitioning past hard result caps, change detection, and pandas-friendly normalization. I used it to support this thesis and to put EU funding data access on firmer Open Science footing (FAIR-style, as far as the pipeline allows).
Repositories
Stack
- Python
- transformers
- embeddings
- NLI (MNLI)
- pandas
- SEDIA / EU APIs
- reproducible research
- LaTeX