The Sound of the City: Object Recognition of Sound Sources on Historical Photographs

Vastert Cuykens

Master Erfgoedstudies
2024 — 2025

onderzoek

promotoren
Hasan Baran Firat
Piraye Hacigüzeller

The Sound of the City: Object Recognition of Sound Sources on Historical Photographs
 

This thesis explores how machine learning can be used to detect sound sources in historical photographs, with the aim of developing a model that can be used for historical soundscape reconstruction. While machine learning, and more specifically object detection, is increasingly being used in heritage studies, its application on historical photographs is largely unexplored.

The research is divided into four stages. First, three object detection models, YOLOv8, Faster R-CNN, and RetinaNet, were benchmarked on both modern and historical datasets to assess their baseline performance, without any training. Although all models performed significantly worse on historical photographs, YOLOv8 showed the smallest drop in accuracy. In the second and third parts, YOLOv8 was fine-tuned on two classes of sound-producing objects: an existing COCO class, “train”, and a newly added class, “carriage”.

In total, over 1600 photographs were annotated for the various datasets. While the modest dataset size and limited computational resources influenced the performance of the custom models, the research nevertheless offers some interesting insights into the potential and limitations of applying object detection to historical photographs. In the last stage, a custom CycleGAN model was trained to transform modern images into a historical style and vice versa, helping to generate training data and reduce the need for manual annotation.

The thesis demonstrates the feasibility of using object detection on historical photographs for historical soundscape research and highlights promising directions for future work on machine learning and heritage studies.

Contact

Vastert Cuykens
vastert95@gmail.com 

CL1R1SZ