python

intro to data science programming#

My aim with this course is to give students experience with Python in the context of Data Science.

[Link to course webpage in progress as IU migrates its online assets to an updated platform.]

tools#

Tool	Note
Jupyter	Weekly labs are hosted on Jupyter notebooks, and students are encouraged to use these to test out code for their projects, or just try out/learn from the code in the weekly notebook.
Miniconda	Since the Anaconda distribution gets bloated easily, students build pip environments using Miniconda for this class.
GitHub	Give students real-world experience with version control. This is useful for project group work, weekly exercises, and it’s helpful for me and TAs to track individual students’ progress.
Streamlit	Introduce students to end-to-end development for data science models.
Gradescope	This allows me to autograde weekly exercises a bit easier. Students turn in their GitHub repositories — each week, they see a new way to incorporate what they learn into the “data science development pipeline”.
Docker	Right now, this just provides the framework needed for the exercise autograders (e.g., Gradescope). Though, in the future, I intend to incorporate this into the course curriculum¹.

Docker is a very widely used tool in the tech industry, and thus an incredibly valuable skill to have as a data scientist. But, from what I can tell, it’s often undervalued in higher-ed. ↩︎