Refactor project structure and enhance documentation#59
Refactor project structure and enhance documentation#59NathanGavenski wants to merge 5 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR continues the repo split into prove-api, prove-processing, and prove-shared by introducing a standalone shared Python package, updating many imports to use it, adding CI coverage for the shared package, and adding new API/docs/dashboard assets.
Changes:
- Add
prove-sharedas an installable package (src-layout) with shared auth, Mongo handler, objects, logging, and utility modules, plus initial pytest coverage and CI workflow. - Refactor API/processing code to import shared components from
prove_sharedand standardize logging usage. - Add/refresh API assets (Flask app, templates/static, Swagger/Flasgger docs) and remove legacy deployment/backup scripts and unused SQLAlchemy models.
Reviewed changes
Copilot reviewed 29 out of 78 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/restart.sh | Removed legacy deployment restart/copy script. |
| scripts/historical_backup.sh | Removed legacy weekly MongoDB backup script. |
| scripts/dr_backup.sh | Removed legacy disaster-recovery MongoDB backup script. |
| pyproject.toml | Add local path dependency on prove_shared. |
| prove-shared/tests/test_objects.py | Add tests for shared dataclasses behavior. |
| prove-shared/tests/test_mongo_handler.py | Add tests for Mongo handler helper logic. |
| prove-shared/tests/test_auth.py | Add tests for RSA auth helpers. |
| prove-shared/tests/conftest.py | Inject test secrets/module stubs for shared package tests. |
| prove-shared/src/prove_shared/wikidata_utils.py | Add cached Wikidata API helper (with caching). |
| prove-shared/src/prove_shared/queue_manager.py | Update imports to package-relative shared modules. |
| prove-shared/src/prove_shared/objects.py | Add shared dataclasses (Status, HtmlContent, Entailment). |
| prove-shared/src/prove_shared/mongo_handler.py | Update logger import to shared package logger. |
| prove-shared/src/prove_shared/logger.py | Adjust fallback log directory for shared package. |
| prove-shared/src/prove_shared/file_utils.py | Add AllenNLP-adapted cache/download helpers. |
| prove-shared/src/prove_shared/auth.py | Add RSA encrypt/decrypt + API key validation helpers. |
| prove-shared/src/prove_shared/init.py | Export shared package public API symbols. |
| prove-shared/pyproject.toml | Define prove_shared package metadata + deps. |
| prove-shared/config.yaml | Add shared runtime config defaults. |
| prove-shared/README.md | Document shared package installation and usage. |
| prove-processing/wikidata_parser.py | Switch to stdlib logger instance. |
| prove-processing/utils/verbalisation_module.py | Add verbalisation module implementation. |
| prove-processing/utils/utils_verbalisation_module.py | Add verbalisation utilities/helpers. |
| prove-processing/utils/utils_graph2text.py | Add graph-to-text normalization/eval helpers. |
| prove-processing/utils/textual_entailment_module.py | Add textual entailment model wrapper. |
| prove-processing/utils/sentence_retrieval_module.py | Switch to stdlib logger instance. |
| prove-processing/utils/sentence_retrieval_model.py | Add sentence retrieval model wrapper class. |
| prove-processing/utils/pagepileList.txt | Add pagepile QID list data file. |
| prove-processing/utils/lightning_base.py | Add Lightning base utilities for training. |
| prove-processing/utils/graph2text.py | Add graph2text placeholder module. |
| prove-processing/utils/finetune.py | Add finetuning/training module for graph2text. |
| prove-processing/utils/callbacks.py | Add Lightning callbacks for training. |
| prove-processing/utils/bert_model.py | Switch cached file helper import to prove_shared. |
| prove-processing/refs_html_to_evidences.py | Switch to stdlib logger instance. |
| prove-processing/refs_html_collection.py | Switch to stdlib logger instance. |
| prove-processing/properties_to_remove.json | Add property filter configuration file. |
| prove-processing/claim_entailment.py | Switch to stdlib logger instance. |
| prove-processing/background_processing.py | Update imports to shared Mongo handler + stdlib logger. |
| prove-processing/ProVe_main_service.py | Update imports to prove_shared modules + stdlib logger. |
| prove-processing/ProVe_main_process.py | Add pipeline orchestration entry point. |
| prove-processing/ProVe_heuristic_service.py | Switch to stdlib logger instance. |
| prove-api/test_functions.py | Add local performance comparison script for API functions. |
| prove-api/swagger.json | Add OpenAPI 3 spec JSON. |
| prove-api/info.py | Add script for aggregating usage stats from MongoDB. |
| prove-api/index.html | Add Swagger UI HTML entry. |
| prove-api/functions.py | Update imports to prove_shared + stdlib logger. |
| prove-api/dashboard.py | Add Dash dashboard for usage analytics. |
| prove-api/api/wsgi.py | Add WSGI entrypoint for deployment. |
| prove-api/api/utils_api.py | Add API logger + IP geolocation helper. |
| prove-api/api/templates/prove.html | Add ProVe landing page template. |
| prove-api/api/templates/hackathon.html | Add hackathon submission/auth template. |
| prove-api/api/static/style.css | Add shared styling for templates. |
| prove-api/api/queue_manager.py | Update to import shared Mongo handler. |
| prove-api/api/page/worklist/pagePileList.yml | Add Flasgger page doc for pagepile worklist view. |
| prove-api/api/page/worklist/generationBasics.yml | Add Flasgger page doc for basic worklist view. |
| prove-api/api/page/plot.yml | Add Flasgger page doc for plot view. |
| prove-api/api/index.html | Add placeholder API index HTML. |
| prove-api/api/hackathon/api_code.py | Add standalone hackathon Flask app prototype. |
| prove-api/api/docs/process_reference.yml | Add Flasgger doc for reference processing endpoint. |
| prove-api/api/docs/page/worklist/pagePileList.yml | Add Flasgger docs copy for pagepile worklist view. |
| prove-api/api/docs/page/worklist/generationBasics.yml | Add Flasgger docs copy for basic worklist view. |
| prove-api/api/docs/page/plot.yml | Add Flasgger docs copy for plot view. |
| prove-api/api/docs/api/worklist/generationBasics.yml | Add Flasgger doc for worklist generation API. |
| prove-api/api/docs/api/task/checkQueue.yml | Add Flasgger doc for queue endpoint. |
| prove-api/api/docs/api/task/checkErrors.yml | Add Flasgger doc for errors endpoint. |
| prove-api/api/docs/api/task/checkCompleted.yml | Add Flasgger doc for completed endpoint. |
| prove-api/api/docs/api/requests/requestItem.yml | Add Flasgger doc for requestItem endpoint. |
| prove-api/api/docs/api/items/summary.yml | Add Flasgger doc for item summary endpoint. |
| prove-api/api/docs/api/items/history.yml | Add Flasgger doc for item history endpoint. |
| prove-api/api/docs/api/items/getSimpleResult.yml | Add Flasgger doc for simple results endpoint. |
| prove-api/api/docs/api/items/comprehensiveResults.yml | Add Flasgger doc for comprehensive results endpoint. |
| prove-api/api/docs/api/items/checkItemStatus.yml | Add Flasgger doc for item status endpoint. |
| prove-api/api/docs/api/config.yml | Add Flasgger doc for config endpoint. |
| prove-api/api/custom_decorators.py | Update to import AsyncAuth/Mongo handler from prove_shared. |
| prove-api/api/app.py | Add/refresh Flask API app with routes and Flasgger wiring. |
| api/db/website.py | Remove legacy SQLAlchemy models file. |
| README.new.md | Add new “split repo structure” README draft. |
| .github/workflows/prove-shared-ci.yml | Add CI workflow for prove-shared tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # ProVe (Provenance Verification for Wikidata claims) | ||
|
|
||
|
|
||
| ## Overview | ||
|
|
||
| ProVe is a system designed to automatically verify claims and references in Wikidata. It extracts claims from Wikidata entities, fetches the referenced URLs, processes the HTML content, and uses NLP models to determine whether the claims are supported by the referenced content. | ||
|
|
||
| It: | ||
| 1. extracts claims and references from a Wikidata item, | ||
| 2. fetches reference content from external URLs, | ||
| 3. selects evidence sentences, | ||
| 4. runs textual entailment, | ||
| 5. stores and serves results through API and background services. | ||
|
|
||
| ## Current Repository Structure | ||
|
|
||
| The codebase is now organized into three top-level folders inside this workspace: | ||
|
|
||
| - prove-api: HTTP/API layer, dashboard, templates, docs, queue endpoint | ||
| - prove-processing: background workers, pipeline orchestration, ML/NLP models | ||
| - prove-shared: pip-installable shared package (MongoDB models/handlers, auth, utilities) | ||
|
|
||
| Root-level files still include global project metadata such as pyproject.toml, README.md, LICENSE, and project planning docs. | ||
|
|
||
| ## Architecture Summary |
There was a problem hiding this comment.
The PR description says the README was replaced, but this change adds a new README.new.md while the existing README.md remains in the repo. If the intent is to update the main project documentation, consider renaming this to README.md (or updating tooling/config/docs to point at README.new.md) to avoid having two competing entrypoint READMEs.
| from .auth import AsyncAuth | ||
| from .mongo_handler import MongoDBHandler, requestItemProcessing | ||
| from .objects import Entailment, HtmlContent, Status | ||
| from .queue_manager import QueueManager | ||
| from .wikidata_utils import CachedWikidataAPI | ||
|
|
There was a problem hiding this comment.
prove_shared/__init__.py eagerly imports modules that pull in heavy runtime dependencies (e.g., wikidata_utils imports numpy/pandas/requests and even pdb). This increases import time for any consumer doing import prove_shared and is the reason tests have to bypass __init__ in conftest.py. Consider keeping __init__ lightweight (export names via lazy imports or optional imports) so normal imports and tests don't need sys.modules hacks.
This pull request introduces a new continuous integration workflow for the
prove-sharedpackage, updates the project documentation with a comprehensive new README, removes unused SQLAlchemy models from the API codebase, and standardizes imports to use the shared package after a repository restructure. These changes help streamline development, clarify the architecture, and improve maintainability.CI/CD and Documentation Updates:
.github/workflows/prove-shared-ci.yml) to automatically run tests for theprove-sharedpackage on pushes and pull requests, ensuring code quality and early detection of issues.README.new.mdthat explains the architecture, setup instructions, repository structure, and usage examples, reflecting the recent codebase split intoprove-api,prove-processing, andprove-shared.Codebase Cleanup and Refactoring:
website.pyfile fromapi/db, eliminating obsolete SQLAlchemy models no longer needed after the repository restructuring.prove-api/api/custom_decorators.py(formerlyapi/custom_decorators.py) to referenceprove_sharedmodules instead of localutils, aligning with the new shared package structure.