Assignment 06: Descriptive Statistics

For your first milestone, you harmonized two datasets to find patients in need of screening for kidney disease. That was an example of how you can use Python to make an impact - applying a formula to data to derive insight.

Using the same data, we can generate descriptive statistics and visualizations about the population to look for patterns.

Disclaimer: The data we are analyzing is randomly generated and contains no real patient data or true data patterns.

Acceptance Criteria

Part One: Minor Refactor

Using your Milestone 01 script as a starting point, refactor your script in two ways:

Instead of pulling from your local filesystem, pull the data from the web.
- https://ils.unc.edu/courses/2024_fall/chip490_335/patient_demographics.csv
- https://ils.unc.edu/courses/2024_fall/chip490_335/cmp.json
Convert your final output to a Pandas DataFrame.
The final output dataframe should include the following for each patient:
- Patient Age
- Patient Height
- Patient Weight
- Patient BMI (see the BMI Formula here)
- Patient Sex
- Patient eGFR

Part Two: Descriptive Statistics

Create a Jupyter Notebook that provides the following analysis about the output population (with eGFR <= 65):

A table or tables listing descriptive statistics for age, height, and weight.
A bar chart showing the average eGFR for each BMI category described here.
A scatter plot using patient age for the X axis and patient eGFR for the Y axis.
A pie chart using patient sex.

Deliverable

Upload a .zip file containing your script(s), notebooks, and requirements.txt file to Canvas named descriptive-stats-onyen.zip where onyen is your onyen. Make sure you include a notebook named main.ipynb that I will run to see your results.

Alternatively, you can provide me with a link to a Github repository.

You should expect that I will do the following:

Create a virtual environment using the Python venv module
Run pip install -r requirements.txt to install the dependencies you've specified
Run your main.ipynb file

I'll be grading your assignment based on output correctness and code readability based on the best practices we've discussed so far. Think about how you can best abstract your code using functions and/or classes, and think about how you can best organize your code using modules and packages.

Hints and Tips

As always, make use of Piazza to ask any questions and work with your fellow students. Feel free to reach out to me directly via Canvas or Email to schedule office hours or to stay after class and talk through any questions you may have - I'm happy to be your sounding board.

BMI Categories
Category	Range
Underweight	< 18.5
Normal Weight	18.5 - 24.9
Overweight	25 - 29.9
Obese	≥ 30

BMI Formula

BMI = (weight in lbs ⨯ 703) ÷ height in inches2