Skip to content

Refactor project structure and enhance documentation#59

Open
NathanGavenski wants to merge 5 commits intomainfrom
CodeSplit
Open

Refactor project structure and enhance documentation#59
NathanGavenski wants to merge 5 commits intomainfrom
CodeSplit

Conversation

@NathanGavenski
Copy link
Copy Markdown
Collaborator

This pull request introduces a new continuous integration workflow for the prove-shared package, updates the project documentation with a comprehensive new README, removes unused SQLAlchemy models from the API codebase, and standardizes imports to use the shared package after a repository restructure. These changes help streamline development, clarify the architecture, and improve maintainability.

CI/CD and Documentation Updates:

  • Added a new GitHub Actions workflow (.github/workflows/prove-shared-ci.yml) to automatically run tests for the prove-shared package on pushes and pull requests, ensuring code quality and early detection of issues.
  • Replaced the project README with a new, detailed README.new.md that explains the architecture, setup instructions, repository structure, and usage examples, reflecting the recent codebase split into prove-api, prove-processing, and prove-shared.

Codebase Cleanup and Refactoring:

  • Removed the unused website.py file from api/db, eliminating obsolete SQLAlchemy models no longer needed after the repository restructuring.
  • Updated imports in prove-api/api/custom_decorators.py (formerly api/custom_decorators.py) to reference prove_shared modules instead of local utils, aligning with the new shared package structure.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the repo split into prove-api, prove-processing, and prove-shared by introducing a standalone shared Python package, updating many imports to use it, adding CI coverage for the shared package, and adding new API/docs/dashboard assets.

Changes:

  • Add prove-shared as an installable package (src-layout) with shared auth, Mongo handler, objects, logging, and utility modules, plus initial pytest coverage and CI workflow.
  • Refactor API/processing code to import shared components from prove_shared and standardize logging usage.
  • Add/refresh API assets (Flask app, templates/static, Swagger/Flasgger docs) and remove legacy deployment/backup scripts and unused SQLAlchemy models.

Reviewed changes

Copilot reviewed 29 out of 78 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/restart.sh Removed legacy deployment restart/copy script.
scripts/historical_backup.sh Removed legacy weekly MongoDB backup script.
scripts/dr_backup.sh Removed legacy disaster-recovery MongoDB backup script.
pyproject.toml Add local path dependency on prove_shared.
prove-shared/tests/test_objects.py Add tests for shared dataclasses behavior.
prove-shared/tests/test_mongo_handler.py Add tests for Mongo handler helper logic.
prove-shared/tests/test_auth.py Add tests for RSA auth helpers.
prove-shared/tests/conftest.py Inject test secrets/module stubs for shared package tests.
prove-shared/src/prove_shared/wikidata_utils.py Add cached Wikidata API helper (with caching).
prove-shared/src/prove_shared/queue_manager.py Update imports to package-relative shared modules.
prove-shared/src/prove_shared/objects.py Add shared dataclasses (Status, HtmlContent, Entailment).
prove-shared/src/prove_shared/mongo_handler.py Update logger import to shared package logger.
prove-shared/src/prove_shared/logger.py Adjust fallback log directory for shared package.
prove-shared/src/prove_shared/file_utils.py Add AllenNLP-adapted cache/download helpers.
prove-shared/src/prove_shared/auth.py Add RSA encrypt/decrypt + API key validation helpers.
prove-shared/src/prove_shared/init.py Export shared package public API symbols.
prove-shared/pyproject.toml Define prove_shared package metadata + deps.
prove-shared/config.yaml Add shared runtime config defaults.
prove-shared/README.md Document shared package installation and usage.
prove-processing/wikidata_parser.py Switch to stdlib logger instance.
prove-processing/utils/verbalisation_module.py Add verbalisation module implementation.
prove-processing/utils/utils_verbalisation_module.py Add verbalisation utilities/helpers.
prove-processing/utils/utils_graph2text.py Add graph-to-text normalization/eval helpers.
prove-processing/utils/textual_entailment_module.py Add textual entailment model wrapper.
prove-processing/utils/sentence_retrieval_module.py Switch to stdlib logger instance.
prove-processing/utils/sentence_retrieval_model.py Add sentence retrieval model wrapper class.
prove-processing/utils/pagepileList.txt Add pagepile QID list data file.
prove-processing/utils/lightning_base.py Add Lightning base utilities for training.
prove-processing/utils/graph2text.py Add graph2text placeholder module.
prove-processing/utils/finetune.py Add finetuning/training module for graph2text.
prove-processing/utils/callbacks.py Add Lightning callbacks for training.
prove-processing/utils/bert_model.py Switch cached file helper import to prove_shared.
prove-processing/refs_html_to_evidences.py Switch to stdlib logger instance.
prove-processing/refs_html_collection.py Switch to stdlib logger instance.
prove-processing/properties_to_remove.json Add property filter configuration file.
prove-processing/claim_entailment.py Switch to stdlib logger instance.
prove-processing/background_processing.py Update imports to shared Mongo handler + stdlib logger.
prove-processing/ProVe_main_service.py Update imports to prove_shared modules + stdlib logger.
prove-processing/ProVe_main_process.py Add pipeline orchestration entry point.
prove-processing/ProVe_heuristic_service.py Switch to stdlib logger instance.
prove-api/test_functions.py Add local performance comparison script for API functions.
prove-api/swagger.json Add OpenAPI 3 spec JSON.
prove-api/info.py Add script for aggregating usage stats from MongoDB.
prove-api/index.html Add Swagger UI HTML entry.
prove-api/functions.py Update imports to prove_shared + stdlib logger.
prove-api/dashboard.py Add Dash dashboard for usage analytics.
prove-api/api/wsgi.py Add WSGI entrypoint for deployment.
prove-api/api/utils_api.py Add API logger + IP geolocation helper.
prove-api/api/templates/prove.html Add ProVe landing page template.
prove-api/api/templates/hackathon.html Add hackathon submission/auth template.
prove-api/api/static/style.css Add shared styling for templates.
prove-api/api/queue_manager.py Update to import shared Mongo handler.
prove-api/api/page/worklist/pagePileList.yml Add Flasgger page doc for pagepile worklist view.
prove-api/api/page/worklist/generationBasics.yml Add Flasgger page doc for basic worklist view.
prove-api/api/page/plot.yml Add Flasgger page doc for plot view.
prove-api/api/index.html Add placeholder API index HTML.
prove-api/api/hackathon/api_code.py Add standalone hackathon Flask app prototype.
prove-api/api/docs/process_reference.yml Add Flasgger doc for reference processing endpoint.
prove-api/api/docs/page/worklist/pagePileList.yml Add Flasgger docs copy for pagepile worklist view.
prove-api/api/docs/page/worklist/generationBasics.yml Add Flasgger docs copy for basic worklist view.
prove-api/api/docs/page/plot.yml Add Flasgger docs copy for plot view.
prove-api/api/docs/api/worklist/generationBasics.yml Add Flasgger doc for worklist generation API.
prove-api/api/docs/api/task/checkQueue.yml Add Flasgger doc for queue endpoint.
prove-api/api/docs/api/task/checkErrors.yml Add Flasgger doc for errors endpoint.
prove-api/api/docs/api/task/checkCompleted.yml Add Flasgger doc for completed endpoint.
prove-api/api/docs/api/requests/requestItem.yml Add Flasgger doc for requestItem endpoint.
prove-api/api/docs/api/items/summary.yml Add Flasgger doc for item summary endpoint.
prove-api/api/docs/api/items/history.yml Add Flasgger doc for item history endpoint.
prove-api/api/docs/api/items/getSimpleResult.yml Add Flasgger doc for simple results endpoint.
prove-api/api/docs/api/items/comprehensiveResults.yml Add Flasgger doc for comprehensive results endpoint.
prove-api/api/docs/api/items/checkItemStatus.yml Add Flasgger doc for item status endpoint.
prove-api/api/docs/api/config.yml Add Flasgger doc for config endpoint.
prove-api/api/custom_decorators.py Update to import AsyncAuth/Mongo handler from prove_shared.
prove-api/api/app.py Add/refresh Flask API app with routes and Flasgger wiring.
api/db/website.py Remove legacy SQLAlchemy models file.
README.new.md Add new “split repo structure” README draft.
.github/workflows/prove-shared-ci.yml Add CI workflow for prove-shared tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.new.md
Comment on lines +1 to +25
# ProVe (Provenance Verification for Wikidata claims)


## Overview

ProVe is a system designed to automatically verify claims and references in Wikidata. It extracts claims from Wikidata entities, fetches the referenced URLs, processes the HTML content, and uses NLP models to determine whether the claims are supported by the referenced content.

It:
1. extracts claims and references from a Wikidata item,
2. fetches reference content from external URLs,
3. selects evidence sentences,
4. runs textual entailment,
5. stores and serves results through API and background services.

## Current Repository Structure

The codebase is now organized into three top-level folders inside this workspace:

- prove-api: HTTP/API layer, dashboard, templates, docs, queue endpoint
- prove-processing: background workers, pipeline orchestration, ML/NLP models
- prove-shared: pip-installable shared package (MongoDB models/handlers, auth, utilities)

Root-level files still include global project metadata such as pyproject.toml, README.md, LICENSE, and project planning docs.

## Architecture Summary
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the README was replaced, but this change adds a new README.new.md while the existing README.md remains in the repo. If the intent is to update the main project documentation, consider renaming this to README.md (or updating tooling/config/docs to point at README.new.md) to avoid having two competing entrypoint READMEs.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +6
from .auth import AsyncAuth
from .mongo_handler import MongoDBHandler, requestItemProcessing
from .objects import Entailment, HtmlContent, Status
from .queue_manager import QueueManager
from .wikidata_utils import CachedWikidataAPI

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prove_shared/__init__.py eagerly imports modules that pull in heavy runtime dependencies (e.g., wikidata_utils imports numpy/pandas/requests and even pdb). This increases import time for any consumer doing import prove_shared and is the reason tests have to bypass __init__ in conftest.py. Consider keeping __init__ lightweight (export names via lazy imports or optional imports) so normal imports and tests don't need sys.modules hacks.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants