OD-1836 Convert Automated Archive output to working schema #85

brianna-dardin · 2025-03-18T03:02:21Z

There are a few changes here so that ODAP can process an Automated Archive like Unit B, and also that the end result is the same working schema that is the output of the eFiction repo so it can be fed into steps 3-6.

Conversion to working schema changes

Use the working sql file from the eFiction repo as the starting point for the database. I didn't want to replicate the file in this repo so the script uses a GET request. If there are better ways to handle this I can make updates
Insert item_authors rows to record the relationships between stories and authors
Insert unique tags into the tags table then insert item_tags rows to record the relationships between stories and tags

Changes due to issues processing Unit B

Its ARCHIVE_DB.pl file was not utf-8 encoded but latin-1 encoded, so it prompts for the encoding of the file
The values in the Date field were Unix timestamps not datetime strings so it checks whether the date is in the Unix timestamp format
The chapter URLs (the Location field) included "/", example "5/heatenough.html", which caused an issue since I was running the script on windows and the files were downloaded locally. So it corrects for this issue if it's run on windows.

Other

Since the other PR had an issue with the github workflows I tried to fix that here.

…ard slashes

ariana-paris

A couple of comments but some may be because my Python is getting rusty or I'm misreading things!

.github/workflows/python-app-macos-windows.yml

automated_archive/aa.py

shared_python/Chapters.py

ariana-paris · 2025-03-18T08:50:45Z

automated_archive/aa.py

+            _extract_date(args, FILES[i], log),
            FILES[i].get("Location", "").replace("'", "\\'"),
            FILES[i]
            .get("LocationURL", FILES[i].get("StoryURL", ""))


I wonder why these lines weren't indented, or is that a Python thing I've forgotten?

All the items in this array have the same indentation, though it is kinda confusing that the ruff formatter broke up certain lines but not others (like the Location line vs the LocationURL line). So it may look odd but it is fine, unless you're talking about something else?

Just about it looking odd and more difficult for a human to interpret since continuation lines are usually indented relative to their first line. However, this file isn't a high priority so if ruff is going to be run on the whole repo, we'll have to let it do its thing to avoid noisy diffs!

per Ariana's suggestion Co-authored-by: Ariana <ariana-paris@users.noreply.github.com>

potatoeggy and others added 11 commits May 26, 2024 15:28

fix: specify database when fetching

d4f0b20

fix: correct args

f8c98b4

fix: correct string type

8ff4a68

chore: fix lint

4a08b7e

chore: fix format

b02e293

Updated step 1 to use working schema & other changes for Unit B

53415c0

Updated step 2b to insert unique tags and item_tags

cac604a

Ran ruff formatter

afb8e65

Changed macos github action from latest to 13

d5e4ab1

Prompt for encoding of ARCHIVE_DB.pl

75d8f65

Updated step 2a so it'll work on windows if chapter urls contain forw…

30733a7

…ard slashes

brianna-dardin requested a review from ariana-paris March 18, 2025 03:09

ariana-paris reviewed Mar 18, 2025

View reviewed changes

brianna-dardin and others added 2 commits March 18, 2025 18:15

Updated the wording for the ARCHIVE_DB.pl prompt

c6731cd

per Ariana's suggestion Co-authored-by: Ariana <ariana-paris@users.noreply.github.com>

Ran ruff formatter again to fix build checks

8dfed1f

ariana-paris approved these changes Mar 21, 2025

View reviewed changes

brianna-dardin merged commit cdb6bd6 into master Mar 22, 2025
3 checks passed

brianna-dardin deleted the fix/types branch March 22, 2025 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OD-1836 Convert Automated Archive output to working schema #85

OD-1836 Convert Automated Archive output to working schema #85

Uh oh!

brianna-dardin commented Mar 18, 2025

Uh oh!

ariana-paris left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ariana-paris Mar 18, 2025

Uh oh!

brianna-dardin Mar 19, 2025

Uh oh!

ariana-paris Mar 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

OD-1836 Convert Automated Archive output to working schema #85

OD-1836 Convert Automated Archive output to working schema #85

Uh oh!

Conversation

brianna-dardin commented Mar 18, 2025

Uh oh!

ariana-paris left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ariana-paris Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

brianna-dardin Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

ariana-paris Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants