intro to data science programming#
My aim with this course is to give students experience with Python in the context of Data Science.
[Link to course webpage in progress as IU migrates its online assets to an updated platform.]
tools#
Tool | Note |
---|---|
Jupyter | Weekly labs are hosted on Jupyter notebooks, and students are encouraged to use these to test out code for their projects, or just try out/learn from the code in the weekly notebook. |
Miniconda | Since the Anaconda distribution gets bloated easily, students build pip environments using Miniconda for this class. |
GitHub | Give students real-world experience with version control. This is useful for project group work, weekly exercises, and it’s helpful for me and TAs to track individual students’ progress. |
Streamlit | Introduce students to end-to-end development for data science models. |
Gradescope | This allows me to autograde weekly exercises a bit easier. Students turn in their GitHub repositories — each week, they see a new way to incorporate what they learn into the “data science development pipeline”. |
Docker | Right now, this just provides the framework needed for the exercise autograders (e.g., Gradescope). Though, in the future, I intend to incorporate this into the course curriculum1. |
Docker is a very widely used tool in the tech industry, and thus an incredibly valuable skill to have as a data scientist. But, from what I can tell, it’s often undervalued in higher-ed. ↩︎