-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/doc qty per source #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…p version to 1.1.0
…rresponding models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces document quantity reporting features per corpus/source by adding new materialized views and expanding the corpus data model with a URL field.
- Creates two materialized views for tracking document counts per corpus (all documents and Qdrant-indexed documents)
- Adds a
main_urlfield to thecorpustable to store primary URLs for each corpus - Updates project version from 1.0.0 to 1.1.1
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| welearn_database/alembic/versions/0e0bc0fca384_doc_qty_per_source.py | Creates materialized views for document quantity reporting |
| welearn_database/alembic/versions/4f5a188dd614_add_main_url_column.py | Adds nullable main_url column to corpus table |
| welearn_database/data/models/document_related.py | Defines ORM models for the new materialized views |
| welearn_database/data/models/corpus_related.py | Adds main_url field to Corpus model |
| pyproject.toml | Bumps version to 1.1.1 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| __read_only__ = True | ||
|
|
||
| source_name: Mapped[str] = mapped_column(primary_key=True) | ||
| count: Mapped[int] = mapped_column(primary_key=True) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The count column should not be part of the primary key. A count value is an aggregate result that can change when the materialized view is refreshed, making it unsuitable as a primary key component. Consider using only source_name as the primary key, or add a composite key with source_name and a timestamp if uniqueness across refreshes is needed.
| __read_only__ = True | ||
|
|
||
| source_name: Mapped[str] = mapped_column(primary_key=True) | ||
| count: Mapped[int] = mapped_column(primary_key=True) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The count column should not be part of the primary key. A count value is an aggregate result that can change when the materialized view is refreshed, making it unsuitable as a primary key component. Consider using only source_name as the primary key, or add a composite key with source_name and a timestamp if uniqueness across refreshes is needed.
This pull request introduces new features and schema changes to support document quantity reporting per source and corpus, as well as the addition of a new column to the
corpustable. The main changes are the creation of new materialized views for reporting, the corresponding data models, and the addition of amain_urlfield to corpus-related data.Database schema and migration changes:
main_urlcolumn to thecorpustable in thecorpus_relatedschema, with an Alembic migration for upgrade and downgrade. (welearn_database/alembic/versions/4f5a188dd614_add_main_url_column.py,welearn_database/data/models/corpus_related.py) [1] [2]qty_document_in_qdrant_per_corpusandqty_document_per_corpusin thedocument_relatedschema, with an Alembic migration to manage their lifecycle. (welearn_database/alembic/versions/0e0bc0fca384_doc_qty_per_source.py)Data model updates:
QtyDocumentInQdrantPerCorpusandQtyDocumentPerCorpusto represent the new materialized views for document quantity reporting, including relevant columns and primary keys. (welearn_database/data/models/document_related.py)Project metadata:
pyproject.tomlto1.1.1to reflect these changes. (pyproject.toml)