python

intro to data science programming#

My aim with this course is to give students experience with Python in the context of Data Science.

[Link to course webpage in progress as IU migrates its online assets to an updated platform.]

tools#

ToolNote
JupyterWeekly labs are hosted on Jupyter notebooks, and students are encouraged to use these to test out code for their projects, or just try out/learn from the code in the weekly notebook.
MinicondaSince the Anaconda distribution gets bloated easily, students build pip environments using Miniconda for this class.
GitHubGive students real-world experience with version control. This is useful for project group work, weekly exercises, and it’s helpful for me and TAs to track individual students’ progress.
StreamlitIntroduce students to end-to-end development for data science models.
GradescopeThis allows me to autograde weekly exercises a bit easier. Students turn in their GitHub repositories — each week, they see a new way to incorporate what they learn into the “data science development pipeline”.
DockerRight now, this just provides the framework needed for the exercise autograders (e.g., Gradescope). Though, in the future, I intend to incorporate this into the course curriculum1.

  1. Docker is a very widely used tool in the tech industry, and thus an incredibly valuable skill to have as a data scientist. But, from what I can tell, it’s often undervalued in higher-ed. ↩︎