01 — AI Engineer Track
RAG Document Q&A Agent
End-to-end retrieval-augmented generation pipeline supporting PDF and plain text ingestion. Integrates TF-IDF indexing with cosine similarity retrieval and the Claude LLM API for context-grounded, source-attributed answers.
LLM-powered
Chunk overlap strategy
Source attribution
Python
JavaScript
Claude API
TF-IDF
pdf.js
02 — AI Engineer / DS Track
EEG Seizure Detection
Compared four deep learning architectures (CNN, CNN+LSTM, CNN+GRU, TCN) on 23-channel pediatric EEG data. Addressed severe class imbalance (~0.23% positive rate) with weighted loss and threshold optimization. Saliency map explainability for per-patient error analysis.
F1: 0.8537
AUC: 0.9924
CHB-MIT dataset
PyTorch
Python
MNE
CNN+LSTM
TCN
03 — DS Track
Fraud Detection System
Random Forest model trained on a 108K-sample balanced dataset. Only 84 false positives against 2,016 correctly flagged fraud cases. SHAP explanations provide both global feature importance and individual prediction transparency.
F1: 97.1%
Recall: 98.2%
ROC-AUC: 0.999
Python
scikit-learn
SHAP
pandas
04 — DS / AI Track
Amazon Review Semantic Search
Replaced keyword search with SBERT-based semantic retrieval over 210K+ product reviews. Cleaner, curated datasets outperformed larger unfiltered corpora. Deployed as a Flask API for real-time semantic queries.
+47.7% precision
−83% irrelevant results
210K reviews
Python
SBERT
NLP
Flask
05 — DS Track
Tennis Match Outcome Predictor
Logistic regression and random forest models on ATP historical data (2003–2024). Domain-specific feature engineering: ranking deltas, surface type, tournament level. Deployed as an interactive Shiny app for real-time match predictions.
63.7% accuracy
AUC: 0.696
20yr dataset
Python
R
scikit-learn
Shiny