Homepage
(Tuesday, 13th May 2025)
Title : Mapping Knowledge: Text-Based Methods for Studying Innovation, Science and Technology
Text data from patents, academic papers, and other sources allow researchers to explore key questions, such as how breakthrough innovations emerge (novelty vs. impact), how knowledge recombines across fields, and how technological shifts unfold over time, including tracking spillovers from R&D investments. The course presents a set of recent studies that have advanced these approaches, such as Kelly et al. (2021) on long-term innovation measurement, Arts et al. (2021) on patent novelty, and Carvalho et al. (2021) on modelling exploration versus exploitation. These studies leverage rich text datasets, including patent abstracts and claims, scientific publications, and firm reports, providing deeper insights into innovation beyond citation-based measures. The course will offer an overview of available data sources and how to access them, many of which are freely available. It will also cover key techniques, such as cleaning and preprocessing text data, introducing topic models for detecting knowledge patterns, and using advanced NLP methods like BERT to measure semantic similarity. The course will conclude with a demonstration using a Jupyter notebook to show how text analysis can efficiently capture innovation trends in patent data.