adding files related to NOA GFS poc#1824
adding files related to NOA GFS poc#1824balit-raibot wants to merge 41 commits intodatacommonsorg:masterfrom
Conversation
Summary of ChangesHello @balit-raibot, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational data and configuration for a Proof of Concept (PoC) to integrate NOAA Global Forecast System (GFS) data into the system. It includes the raw GFS data, the necessary metadata and mapping files to process this data, and the resulting transformed output, along with the corresponding statistical variable definitions and schema. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds new files for importing NOAA GFS data as a proof-of-concept for Data Commons. While this is a valuable addition, I've identified several critical and high-severity issues in the generated files and configurations that should be addressed to ensure the data is imported correctly and the process is robust. Key issues include hardcoded values leading to data precision loss, incomplete and fragile TMCF definitions, malformed data in the output CSV, and stylistic issues in the MCF files.
…to noa_gfs_poc
…to noa_gfs_poc
d93478f to
f7a7bb4
Compare
|
|
||
| # --- Updated 1. Runtime Installation --- | ||
| # Added libaec-dev to the list | ||
| $SUDO apt-get update && $SUDO apt-get install -y \ |
There was a problem hiding this comment.
please verify if apt-get can run in auto-refresh setup on cloud batch jobs.
curl is already present. For other tools, consider adding it to the docker image:
| build-essential gfortran cmake git libg2c-dev libaec-dev curl | ||
|
|
||
| # Clone and build | ||
| rm -rf wgrib2 |
There was a problem hiding this comment.
should we try grib2io developed by NOAA as a python module instead of building this package locally?
| --bucket_name="${BUCKET}" \ | ||
| --input_local="./${FILE_NAME}.csv" \ | ||
| --forecast_hour="${FHOUR}" \ | ||
| --output_blob_name="noaa_gfs/${DATE_STAMP}/output/noaa_gfs_output.csv" |
There was a problem hiding this comment.
let's make this an incremental build.
pls add a date suffix to the GCS file name soa new file gets created per day rather than rewriting an existing file.
| --input_local="./${FILE_NAME}.csv" \ | ||
| --forecast_hour="${FHOUR}" \ | ||
| --output_blob_name="noaa_gfs/${DATE_STAMP}/output/noaa_gfs_output.csv" | ||
|
|
There was a problem hiding this comment.
if the python script exits, return a non-zero status code so the autorefresh monitoring can detect a failure.
(( $? > 0 )) && echo "Failed to run script for ${FILE_NAME}" && exit 1
| if l == "mean sea level": | ||
| return "0MetersAboveMeanSeaLevel" | ||
| if "m above mean sea level" in l: | ||
| val = l.split(" ")[0].replace("-", "To") | ||
| return f"{val}MetersAboveMeanSeaLevel" | ||
|
|
||
| if l == "surface": return "SurfaceLevel" | ||
| if "entire atmosphere" in l: return "" | ||
| if l == "planetary boundary layer": return "PlanetaryBoundaryLayer" | ||
| if "low cloud layer" in l: return "LowCloudLayer" | ||
| if "middle cloud layer" in l: return "MiddleCloudLayer" | ||
| if "high cloud layer" in l: return "HighCloudLayer" | ||
| if l == "0c isotherm": return "Isotherm0C" | ||
| if l == "highest tropospheric freezing level": return "HighestTroposphericFreezingLevel" |
There was a problem hiding this comment.
For the 5 conditions with '==' consider using a map.
|
|
||
| # --- 6. Upload TMCF to GCS --- | ||
| echo "Uploading TMCF to gs://${BUCKET}/noaa_gfs/${DATE_STAMP}/output/..." | ||
| gsutil cp "gs://${BUCKET}/noaa_gfs/noaa_gfs_output.tmcf" "gs://${BUCKET}/noaa_gfs/${DATE_STAMP}/output/" |
There was a problem hiding this comment.
please keep a local copy of the TMCF checked into github and copy that to the GCS destination folder.
| # --- 3. Convert GRIB2 to CSV --- | ||
| echo "Converting to CSV..." | ||
| # Call the binary using the confirmed install path | ||
| $WGRIB2_BIN "./${FILE_NAME}" -csv "./${FILE_NAME}.csv" |
There was a problem hiding this comment.
wgrib2 has awarnign for converting entire file sinto csv:
https://screenshot.googleplex.com/8hWbbRqivKw7kht
Have you considered other options such as pygrib:
https://screenshot.googleplex.com/8CcyzTLTYn3WGuF
Setting up auto-refresh of NOAA GFS Data for Data Commons ingestion.