Skip to main content
Back to Blog
4 min read

Back to Academia: A PhD at the Intersection of CS and Science of Science

#phd #llms #nlp #science of science

Well… here I am and it feels like I have come full circle. After graduating from university and working in industry for a while, I have now returned to academia to pursue a PhD at the intersection of Computer Science and the Science of Science (SciSci).This means moving from immediate product application to the fundamentals again. After spending time in both corporate structures at Porsche and the fast-paced start-up ecosystem at Cargodaces, I was getting more and more interested in really digging deep into the topic of how we structure and understand knowledge.

I have joined the Hamburg University of Technology (TUHH) to pursue a PhD at the intersection of Computer Science and the Science of Science (SciSci). My research focuses on applying advanced Natural Language Processing (NLP) and Large Language Models (LLMs) to map the diffusion of concepts across the global scientific landscape.

The Research Domain: Science of Science

Science of Science is an interdisciplinary field that uses quantitative methods to understand the evolution of scientific research. Historically, this field relied on bibliometric metadata—primarily citation counts and co-authorship networks—to evaluate impact and trends. However, metadata is a low-fidelity signal. It captures the connections between documents but ignores the content within them. With the exponential growth of scientific literature (now estimated at over 100 million scholarly documents), we face a data scale that requires automated, semantic analysis. This is (at least in my opinion) where the real value of Science of Science lies. But it also means a lot of technical challenges on the way like data ingestion, text processing, and semantic analysis at a huge scale.

Technical Challenges

As mentioned before, I see a lot of technical challenges ahead of me - mainly handling the whole of scientific knowledge in either databases, graphs or any other format. I am most certainly at some point running into the limits of my current setup and will have to find a way to scale it up. But on the plus side, scaling IT infrastructure is something I have done a lot of in the past and I am sure I can handle it.

In general I am very pleased with what I was given at TUHH. A nice, state-of-the-art MacBook Pro M1 with 16GB RAM and 1TB SSD is a big plus. It is a starting point but to be honest, my plan is not to run big calculations on this laptop. It is more or less a central hub from which I will orchestrate a much bigger setup. I envison a two-node setup which consists of the TUHH HPC cluster for huge workloads and a local high-memory workstation with medium-sized GPU for prototyping. Also, I might need a way of using LLMs (either self-hosted or via API) for inference.

Also data storage is a big point. The prototyping workstation should come with considerable storage both fast (NVMe/SSD) and slow (HDD). I will need to store the data of the TUHH HPC cluster on the workstation and also have a backup of the data on the HPC cluster. Maybe I will also find an old PC to use as a backup server with linux.

So where to go from here?

Well it is just the start of a long journey and I am sure there will be a lot of ups and downs. But I am excited to see where this will lead me and I am sure it will be a lot of fun along the way. I am also looking forward to the opportunity to work with some great people and to contribute to the field of Science of Science. Meeting new fellow researchers and collaborators on conferences or at workshops is always a highlight of my academic career.

Share this article

Spread the word with your network