Regression Calculator
Perform least-squares regression on your data. Toggle between Linear Regression (y = a + bx) and Polynomial Regression (degree 2–4). Add up to 50 (x,y) points, inspect fitted coefficients, R², residuals, and visualize the fit. You can copy results, download CSV, or print the page.
| # | x | y | ŷ (pred) | residual |
|---|
Regression and Least Squares — Understanding Fit, Error and Interpretation
Regression analysis estimates relationships between variables. The simplest and most common is linear regression, which finds the best-fit straight line y = a + bx that minimizes the sum of squared residuals between observed values y_i and predicted values ŷ_i. Polynomial regression generalizes this idea to estimate a polynomial relationship. Least squares is robust, intuitive and computationally efficient, forming the core of many statistical and machine-learning pipelines.
Least squares in brief
Given n observations (x_i, y_i), the least-squares estimate chooses coefficients that minimize S = Σ (y_i − ŷ_i)². For linear regression (degree 1), closed-form solutions exist using simple summations: b = Cov(x,y)/Var(x) and a = ȳ − b x̄. For polynomial regression we solve a normal equation (XᵀX)β = Xᵀy for coefficient vector β where X is the Vandermonde/design matrix.
Key statistics
- Slope / coefficients: parameters that define the fitted relationship.
- Residuals: e_i = y_i − ŷ_i, indicating errors of the fit per point.
- Sum of squared residuals (SSR): Σ e_i² — objective minimized by least squares.
- Correlation coefficient (r): measures linear association between x and y (−1 to 1). For linear regression, b = r (s_y/s_x).
- Coefficient of determination (R²): proportion of y-variance explained by the model: R² = 1 − SSR / SST where SST = Σ (y_i − ȳ)².
Computational details
Polynomial regression solves normal equations which can be ill-conditioned for high degrees or poorly scaled x. This tool supports up to degree 4 and up to 50 points; for larger or noisy datasets consider numerical libraries with regularization (ridge, orthogonal polynomials) or piecewise models (splines).
Interpreting results
High R² indicates the model explains much of the variance, but watch for overfitting with high-degree polynomials. Residual plots help detect non-random patterns: residuals should be roughly randomly scattered — patterns suggest model misspecification.
Examples
Example 1 (linear): Points (1,2),(2,3),(3,5) produce slope b ≈ 1.5 and intercept a ≈ 0.1 — predicted line approximates observed trend. R² tells how tight the points lie around the line.
Example 2 (polynomial): Quadratic fit to ballistic motion data yields coefficients that relate to initial position, velocity and acceleration.
Best practices
- Plot residuals to check fit quality.
- Center and scale x when fitting higher-degree polynomials to reduce numerical issues.
- Prefer lower-degree models unless there is strong theoretical justification.
- For predictive tasks validate models on held-out data.
This calculator gives you the core tools to run exploratory regression quickly. Use the plot, residuals and summary statistics together to judge model appropriateness. For production analyses, complement this with cross-validation and diagnostic checks.
Frequently Asked Questions
Up to 50 points. Add/remove points with the controls above.
It draws vertical bars between observed y values and predicted ŷ values to visualize pointwise errors.
Yes — choose Polynomial mode and select degree 2–4. Results include expanded coefficients.
Inputs accept decimals and simple fractions like 3/4; results are computed with JavaScript numbers and rounded for display.
Slope/intercept (or polynomial coefficients), r, R², SSR, and standard error (for linear).
Use 'Download CSV' to save x, y, ŷ and residuals. 'Copy Result' copies a summary to the clipboard.
For linear regression there must be variance in x; identical x values cause a degenerate design matrix for polynomial fits. The tool will alert you.
Coefficients are numeric decimals; if you require rational exactness use symbolic tools with rational arithmetic.
A high R² doesn't guarantee appropriateness—check residual patterns and consider overfitting.
Yes — methods like ridge, lasso or robust M-estimators can be added on request.