Overview of ds385

This website aims to help students get going in the wide wide world of data science. The functions within are meant to showcase simpler (and thus less robust) implementations of more complex topics, which come up frequently throughout data science. Below you’ll find some references that should be useful at various times throughout the semester.


Before we get to any Python, you should first familiarize yourself with at the following topics from MIT’s course The Missing Semester of Your CS Education:

  • Course overview + the shell

  • Shell Tools and Scripting

  • Editors

  • Data Wrangling

  • Command-line Environment

  • Version Control (Git)

We won’t cover all the subtopics, but it wouldn’t hurt you if you did.

If you don’t have a working Python environment and IDE, then I suggest you use Jupyter Desktop for DS385. This isn’t my recommendation for long term use of Python, but it’s a good start.

Stanford’s CS231n course has a good and relatively quick introduction to the Python programming language.

For some help getting started with building a Python package see Alex Kyllo’s post Easy Python Package Publishing with Poetry. Of course the documentation for the Python package poetry, even if it takes a while to figure out.

Documentation on the Sphinx markup language reStructured Text.

Tips for contour plots in Python’s Matplotlib from Jake Vanderplas.

Louis Tiao has a nice post about visualizing and animating optimization algorithms with matplotlib.

The book Hands-On Machine Learning with R, by Bradley Boehmke & Brandon Greenwellis, a great resource.

Drawing again on Stanford’s CS231n course, I like their discussion of hyperparameter tuning.

Sebastian Ruder has a fairly popular blog post which provides an overview of some of the more popular gradient descent optimization algorithms, An overview of gradient descent optimization algorithms. John Chen provided a follow up post An updated overview of recent gradient descent algorithms, which includes some more mordern methods.