Multi-source NBA shot data engineering package with cleaning, schema validation, and rerunnable SQLite upserts that power GAM modeling and shot archetype analysis across five NBA seasons.
62.7%
xFG Holdout
1.1M+
Training Shots
~136K
Evaluation Shots
467
Players
5
Seasons
Key Features
- ESPN + NBA Stats ingestion with parallel collection and exponential backoff
- Schema validation and rerunnable SQLite upsert behavior
- xFG logistic regression + GAM tensor surface modeling
- Shot Difficulty Index feature engineering
- GMM player archetypes with BIC selection and PCA projection
R
Python
scikit-learn
PyGAM
SQLite
Streamlit
Full-stack AI web application with chat, PDF Q&A, and real-time tools. Two-service architecture with JWT authentication and Gemini 2.5-Flash via a LlamaIndex ReAct agent.
5
API Tools
7+
REST Endpoints
2-service
Architecture
Key Features
- Gemini 2.5-Flash + LlamaIndex ReAct orchestration
- RAG pipeline for PDF parsing, indexing, and querying
- Five real-time tools: stocks, crypto, weather, news, market
- JWT access and refresh token lifecycle
- User-scoped data isolation for notes, chat, and PDF indexes
Django REST
React 18
Vite
JWT
Gemini
LlamaIndex
R package for end-to-end ESPN NBA data collection, parsing, validation, and storage. Handles 1,000+ games per season with parallel collection and exponential backoff retry logic.
1,000+
Games/Season
13
Test Files
4
Core Tables
5
Retry Attempts
Key Features
- Phase-based pipeline: collection, inventory, parsing, quality
- Parallel collection via future::multisession
- Schema enforcement with explicit type casting
- SQLite upserts with schema versioning via nbadata_meta
- Fixture-based tests with optional live-gated integration tests
R
httr2
DBI
RSQLite
future
testthat
Automated web scraping and AI curriculum analysis pipeline. Scraped 12,944 course records across Fall 2025 and Spring 2026 with a three-tier classification system.
12,944
Courses
150+
Prefixes
2
Terms
3
Analysis Tiers
Key Features
- PDF prefix extraction with pdfplumber
- Selenium browser automation with resumable crawl states
- Three analysis scripts: high-precision, broad recall, and ethics
- Fuzzy string matching for typo-tolerant detection
- CSV exports and R Markdown reporting for precision/recall tradeoffs
Python
Selenium
pdfplumber
pandas
thefuzz
R Markdown
Leakage-aware predictive modeling for NBA home-team win probability using rolling window features across 2002-2025 seasons.
67%
Best Accuracy
0.684
Best AUC
59.2%
Baseline
2002-2025
Seasons
Key Features
- Rolling 3/5/10-game windows with lag-1 leakage prevention
- Advanced stats and matchup differential feature engineering
- Ridge, elastic net, logistic, and random forest model comparison
- Elo features with season-reset and all-time variants
- Chronological train/test split preserving temporal order
R
glmnet
pROC
randomForest
slider
hoopR
Four progressive deep learning projects covering ANN baselines, hyperparameter optimization, CNN interpretation, and RNN sequence modeling.
4
Projects
PyTorch
Framework
W&B
Tracking
Key Features
- ANN project with custom training loops
- Grid/random hyperparameter optimization with W&B
- CNN filter visualization workflow
- RNN modeling with a custom tokenizer
- Reproducible shared infrastructure across project modules
PyTorch
TorchVision
scikit-learn
Weights & Biases
pytest
Freemium NBA analytics platform with dashboards, AI-style Q&A over NBA data, Stripe billing, and production-grade security hardening.
Monorepo
Architecture
Fastify API
Backend
Next.js 14
Frontend
PostgreSQL + Redis
Data Layer
Key Features
- Fastify + TypeScript backend with intent-driven Q&A
- Next.js 14 App Router frontend with feature gating
- Stripe checkout and webhook subscription flow
- Security stack: JWT, Zod safeParse, hashing, rate limiting
- Redis caching with Lua scripts and performance optimizations
TypeScript
Fastify
Next.js
Prisma
PostgreSQL
Redis
Research capstone analyzing expected points/goals across NBA and NHL with shot-quality modeling, calibration diagnostics, and player value analysis.
467
Players Profiled
3
Data Sources
NBA + NHL
Sports
Key Features
- xFG logistic regression and POE computation pipeline
- GAM tensor surface modeling with partial dependence diagnostics
- GMM archetypes with BIC selection and PCA projection
- Shot Difficulty Index and residual analysis
- POE per million salary value rankings
Python
scikit-learn
PyGAM
Streamlit
plotly
matplotlib
Production-grade Python/Docker pipeline for high-volume NBA data ingestion into PostgreSQL with idempotent upsert behavior.
PostgreSQL 16
Database
Makefile
Automation
Dockerized
Architecture
Key Features
- concurrent.futures parallel HTTP fetching
- Chunked SQLAlchemy ON CONFLICT DO UPDATE upserts
- Complex ESPN JSON normalization to relational tables
- Failure-stage tracking through nba_ingest_failures
- Makefile automation for ingest/reset/up/down
Python
Docker
PostgreSQL
SQLAlchemy
pandas
Makefile
Technical cybersecurity analysis of Stuxnet covering propagation, zero-days, PLC targeting, sensor spoofing, and critical infrastructure impact.
4
Zero-Days
Operation Olympic Games
Attribution
Rmd + PDF
Output
Key Features
- Analysis of four simultaneous zero-day exploits
- Code-signing abuse via stolen certificates
- Siemens S7-315 PLC hardware targeting and fingerprinting
- Man-in-the-middle PLC manipulation and sensor spoofing
- Impact comparison against WannaCry and NotPetya
R Markdown
LaTeX
ICS/SCADA security
Exploit analysis