Journal Digital Corpus Reader

A web-based search interface for the Journal Digital Corpus - transcripts from Swedish historical newsreels (SF Veckorevy).

Features

Full-text search across ~6,800 transcript files
Fuzzy search for finding matches despite OCR/ASR errors
Filter by transcript type (speech/intertitle), collection, and year
Side-by-side viewer showing speech and intertitle transcripts with timestamps
Shareable URLs for bookmarking searches and specific videos
Client-side only - loads corpus directly from Zenodo, no backend required

Usage

Visit the hosted version at: https://[username].github.io/jdc_browser/

Or run locally:

git clone https://github.com/[username]/jdc_browser.git
cd jdc_browser
python3 -m http.server 8000
# Open http://localhost:8000

Deployment to GitHub Pages

Push the repository to GitHub
Go to Settings > Pages
Set source to "Deploy from a branch" and select main / root
The site will be available at https://[username].github.io/jdc_browser/

Data Source

The corpus is loaded directly from Zenodo at runtime (~13 MB download). It contains:

Speech transcripts: Automatic speech recognition via SweScribe
Intertitle transcripts: OCR from silent film text cards via stum

DOI: 10.5281/zenodo.15596191

Source repository: Modern36/journal_digital_corpus

Credits

Developed for the Modern Times 1936 research project at Lund University, Sweden. The project investigates what software "sees," "hears," and "perceives" when pattern recognition technologies such as 'AI' are applied to media historical sources. The project is funded by Riksbankens Jubileumsfond.

License

The Journal Digital Corpus is licensed under the CC-BY-NC 4.0 International license.

References

@article{aspenskog2025journal,
  title={Journal Digital Corpus: Swedish Newsreel Transcriptions},
  author={Aspenskog, Robert and Johansson, Mathias and Snickars, Pelle},
  journal={Journal of Open Humanities Data},
  volume={11},
  number={1},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
css		css
js		js
.gitignore		.gitignore
.nojekyll		.nojekyll
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Journal Digital Corpus Reader

Features

Usage

Deployment to GitHub Pages

Data Source

Credits

License

References

About

Uh oh!

Releases

Packages

Languages

License

Modern36/jdc_reader

Folders and files

Latest commit

History

Repository files navigation

Journal Digital Corpus Reader

Features

Usage

Deployment to GitHub Pages

Data Source

Credits

License

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages