Data Science

Kaggle Competition:  Sberbank Russian Housing Market (Top 10%)

  • Implemented a weighted-average blend of regularized linear regression, random forest, and gradient-boosted decision tree models, with data cleanup, pre-processing, and missing-value imputation as needed.

Kaggle Competition:  Zillow SoCal Housing Market (Top 15%)

  • Implemented a weighted-average blend of regularized linear regression, random forest, and gradient-boosted decision tree models, with data cleanup, pre-processing, and missing-value imputation as needed.

Kaggle Competition: Jigsaw Toxic Comment Classification (Top 7%)

  • Implemented a weighted-average blend of logistic regression (with word-level and character-level TFIDF with engineered features), recurrent neural nets (with various embeddings layers, architectures, data augmentations, and hyper-parameters), and character-level very-deep convolutional neural nets.

Deep Dreaming Burning Man (100+ Followers)

  • An exploration of the artistic applications of neural networks, using the Google Deep-Dream algorithm and a PyTorch style-transfer implementation

Burning Man Events

  • Performed an exploratory data analysis of 10,000+ scheduled events at Burning Man between 2015-2017
  • Developed a classifier for categorizing event types (e.g. games, ceremonies, food, child-friendly)

DrudeLorentz.com

  • Developed a web app for visualizing the optical permittivity of common metals
  • Entirely javascript-based software stack: node, express, MongoDB, angular, D3, mathjs.

Recommended Books

  • Learning from Data by Mostafa, Magdon-Ismail, and Lin
  • Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani
  • Elements of Statistical Learning by Friedman, Tibshirani, and Hastie
  • Deep Learning by Goodfellow, Bengio, and Courville
  • Data Science for Business by Provost and Fawcett
  • Superforecasting by Gardner and Tetlock
  • Dataclysm by Rudder
  • Thinking, Fast, and Slow by Kahneman
  • The Undoing Project by Lewis