A web-based search interface for the Journal Digital Corpus - transcripts from Swedish historical newsreels (SF Veckorevy).
- Full-text search across ~6,800 transcript files
- Fuzzy search for finding matches despite OCR/ASR errors
- Filter by transcript type (speech/intertitle), collection, and year
- Side-by-side viewer showing speech and intertitle transcripts with timestamps
- Shareable URLs for bookmarking searches and specific videos
- Client-side only - loads corpus directly from Zenodo, no backend required
Visit the hosted version at: https://[username].github.io/jdc_browser/
Or run locally:
git clone https://github.com/[username]/jdc_browser.git
cd jdc_browser
python3 -m http.server 8000
# Open http://localhost:8000- Push the repository to GitHub
- Go to Settings > Pages
- Set source to "Deploy from a branch" and select
main/root - The site will be available at
https://[username].github.io/jdc_browser/
The corpus is loaded directly from Zenodo at runtime (~13 MB download). It contains:
- Speech transcripts: Automatic speech recognition via SweScribe
- Intertitle transcripts: OCR from silent film text cards via stum
Source repository: Modern36/journal_digital_corpus
Developed for the Modern Times 1936 research project at Lund University, Sweden. The project investigates what software "sees," "hears," and "perceives" when pattern recognition technologies such as 'AI' are applied to media historical sources. The project is funded by Riksbankens Jubileumsfond.
The Journal Digital Corpus is licensed under the CC-BY-NC 4.0 International license.
@article{aspenskog2025journal,
title={Journal Digital Corpus: Swedish Newsreel Transcriptions},
author={Aspenskog, Robert and Johansson, Mathias and Snickars, Pelle},
journal={Journal of Open Humanities Data},
volume={11},
number={1},
year={2025}
}