Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new data import for the WHO Tuberculosis Percentage dataset, including the necessary download script, configuration files, and test data. The review identified several issues regarding file path conventions, incorrect script references in the documentation, and a need for more robust error handling in the download script. These changes are necessary to ensure the import automation functions correctly.
statvar_imports/tuberculosis_percentage/download_who_tuberculosis.py
Outdated
Show resolved
Hide resolved
|
|
||
| **Download input file** | ||
| ```bash | ||
| python3 statvar_imports/tuberculosis_percentage/tuberculosisPercentage_input.py |
There was a problem hiding this comment.
The command to download the input file refers to a non-existent script tuberculosisPercentage_input.py. It should point to the download_who_tuberculosis.py script.
| python3 statvar_imports/tuberculosis_percentage/tuberculosisPercentage_input.py | |
| python3 statvar_imports/tuberculosis_percentage/download_who_tuberculosis.py |
| **For Main data run** | ||
| ```bash | ||
| python3 tools/statvar_importer/stat_var_processor.py \ | ||
| --input_data=statvar_imports/tuberculosis_percentage/tuberculosisPercentage_input.csv \ |
There was a problem hiding this comment.
The --input_data path for the main data run is incorrect. Based on standard project structure and where the download script should save the file, the input file will be located in the source_files subdirectory. Please update the path accordingly.
| --input_data=statvar_imports/tuberculosis_percentage/tuberculosisPercentage_input.csv \ | |
| --input_data=statvar_imports/tuberculosis_percentage/source_files/tuberculosisPercentage_input.csv \ |
| if api_response.status_code != 200: | ||
| print(f"Failed to fetch API data. HTTP {api_response.status_code}") | ||
| return |
There was a problem hiding this comment.
The error checking for the API request can be improved. Using response.ok is a more idiomatic way to check for successful requests with the requests library, and including response.reason can provide more context on failure.
| if api_response.status_code != 200: | |
| print(f"Failed to fetch API data. HTTP {api_response.status_code}") | |
| return | |
| if not api_response.ok: | |
| print(f"Failed to fetch API data. HTTP {api_response.status_code}: {api_response.reason}") | |
| return |
|
|
||
| # We only pull the 'country' (for matching) and 'iso3' columns | ||
| geo_columns = ['country', 'iso3'] | ||
| master_df = pd.read_csv(master_url, usecols=geo_columns).drop_duplicates() |
There was a problem hiding this comment.
The call to pd.read_csv with a URL can fail due to network issues or if the remote server is unavailable. It's good practice to wrap this in a try...except block to handle potential errors gracefully.
| master_df = pd.read_csv(master_url, usecols=geo_columns).drop_duplicates() | |
| try: | |
| master_df = pd.read_csv(master_url, usecols=geo_columns).drop_duplicates() | |
| except Exception as e: | |
| print(f"Failed to fetch country iso3 codes from WHO master database: {e}") | |
| return |
345605a to
46594cf
Compare
Tuberculosis: Percentage of people diagnosed with a new episode of pulmonary TB whose disease was bacteriologically confirmed
data source: https://data.who.int/indicators/i/1891124/449F55C