Machine Learning
What is Machine Learning?
Machine Learning (ML) is a subfield of AI focused on building systems that learn patterns from data and improve their performance on a task without being explicitly programmed with rules.
In practice, ML maps inputs to outputs by optimizing a model’s parameters to minimize error on data—then generalizes to unseen examples.
Where ML is used
- Recommendations (shopping, streaming)
- Fraud detection and risk scoring
- Speech and image recognition
- Forecasting and optimization
History of ML
From early perceptrons (1950s–60s) to statistical learning (1990s), the field accelerated with big data and GPUs in the 2010s, enabling deep learning breakthroughs across vision, speech, and language.
Types of ML
- Supervised: Learn from labeled examples (classification, regression).
- Unsupervised: Discover structure in unlabeled data (clustering, dimensionality reduction).
- Semi-supervised: Combine small labeled with large unlabeled datasets.
- Self-supervised: Learn labels from the data itself (pretext tasks).
- Reinforcement Learning: Learn via rewards by interacting with an environment.
Supervised vs Unsupervised Learning
Supervised learning maps inputs to known targets and is evaluated on predictive accuracy. Unsupervised learning seeks patterns and structure without labels—useful for exploration, compression, and feature discovery.
Core Algorithms
- Linear/Logistic Regression
- Decision Trees, Random Forests, Gradient Boosting: XGBoost, LightGBM
- k-NN, Naive Bayes, SVM
- Clustering & Dimensionality Reduction: k-means, DBSCAN, PCA, t-SNE, UMAP
Feature Engineering
Transform raw data into informative features: handling missing values, encoding categoricals, scaling, domain-derived features, interaction terms, and embeddings.
Model Evaluation & Metrics
Proper Splits & Validation: Train/validation/test splits, cross-validation, and metrics that match the objective.
- Classification Metrics: Accuracy, Precision/Recall, F1, ROC-AUC, PR-AUC
- Regression Metrics: MAE, RMSE, R²
- Ranking Metrics: NDCG, MAP
Watch For: Data leakage, distribution shift, overfitting
ML Pipelines & MLOps
Automate data prep, training, evaluation, and deployment. Track experiments, version data/models, and monitor for drift, latency, and fairness in production.
Deep Learning in ML
Neural networks (CNNs, RNNs, Transformers) excel at high-dimensional data like images, audio, and text. Transfer learning and foundation models enable strong performance with limited task-specific data.
Ethics in ML
- Bias & Fairness: Dataset imbalance, proxy variables
- Transparency & Explainability: Model interpretability
- Privacy: Data minimization, federated learning, differential privacy
- Accountability & Safety
Future of ML
Expect stronger multimodal models, better small-data learning, energy-efficient training, and tighter integration with MLOps and governance for responsible deployment at scale.