Last spring, I completed a training course from the National Network of Libraries of Medicine called Research Data Management 102. The course explored topics from all stages of the RDM lifecycle:
- open science principles
- data literacy
- data wrangling with OpenRefine
- data visualization with Python
- data storytelling with Jupyter Notebooks
I enjoyed experimenting with basic statistics in Python and appreciated how easy is to store code and text in Jupyter. My final project was to replicate an analysis of FDA data on participation in clinical trials originally reported by ProPublica. The data show that African American patients are underrepresented in most cancer trials, even for cancers that disproportionately burden this population. The analysis is documented in my Jupyter notebook: