Skip to content

tuberculosis_percentage#1939

Open
shvngisingh wants to merge 2 commits intodatacommonsorg:masterfrom
shvngisingh:tuberculosis_percentage
Open

tuberculosis_percentage#1939
shvngisingh wants to merge 2 commits intodatacommonsorg:masterfrom
shvngisingh:tuberculosis_percentage

Conversation

@shvngisingh
Copy link
Copy Markdown

@shvngisingh shvngisingh commented Mar 30, 2026

Tuberculosis: Percentage of people diagnosed with a new episode of pulmonary TB whose disease was bacteriologically confirmed
data source: https://data.who.int/indicators/i/1891124/449F55C

@google-cla
Copy link
Copy Markdown

google-cla bot commented Mar 30, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new data import for the WHO Tuberculosis Percentage dataset, including the necessary download script, configuration files, and test data. The review identified several issues regarding file path conventions, incorrect script references in the documentation, and a need for more robust error handling in the download script. These changes are necessary to ensure the import automation functions correctly.


**Download input file**
```bash
python3 statvar_imports/tuberculosis_percentage/tuberculosisPercentage_input.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The command to download the input file refers to a non-existent script tuberculosisPercentage_input.py. It should point to the download_who_tuberculosis.py script.

Suggested change
python3 statvar_imports/tuberculosis_percentage/tuberculosisPercentage_input.py
python3 statvar_imports/tuberculosis_percentage/download_who_tuberculosis.py

**For Main data run**
```bash
python3 tools/statvar_importer/stat_var_processor.py \
--input_data=statvar_imports/tuberculosis_percentage/tuberculosisPercentage_input.csv \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The --input_data path for the main data run is incorrect. Based on standard project structure and where the download script should save the file, the input file will be located in the source_files subdirectory. Please update the path accordingly.

Suggested change
--input_data=statvar_imports/tuberculosis_percentage/tuberculosisPercentage_input.csv \
--input_data=statvar_imports/tuberculosis_percentage/source_files/tuberculosisPercentage_input.csv \

Comment on lines +18 to +20
if api_response.status_code != 200:
print(f"Failed to fetch API data. HTTP {api_response.status_code}")
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error checking for the API request can be improved. Using response.ok is a more idiomatic way to check for successful requests with the requests library, and including response.reason can provide more context on failure.

Suggested change
if api_response.status_code != 200:
print(f"Failed to fetch API data. HTTP {api_response.status_code}")
return
if not api_response.ok:
print(f"Failed to fetch API data. HTTP {api_response.status_code}: {api_response.reason}")
return


# We only pull the 'country' (for matching) and 'iso3' columns
geo_columns = ['country', 'iso3']
master_df = pd.read_csv(master_url, usecols=geo_columns).drop_duplicates()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The call to pd.read_csv with a URL can fail due to network issues or if the remote server is unavailable. It's good practice to wrap this in a try...except block to handle potential errors gracefully.

Suggested change
master_df = pd.read_csv(master_url, usecols=geo_columns).drop_duplicates()
try:
master_df = pd.read_csv(master_url, usecols=geo_columns).drop_duplicates()
except Exception as e:
print(f"Failed to fetch country iso3 codes from WHO master database: {e}")
return

@shvngisingh shvngisingh force-pushed the tuberculosis_percentage branch from 345605a to 46594cf Compare March 31, 2026 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant