Understanding Scatter Plots
A scatter plot (or scatter diagram) is a type of data visualization that uses Cartesian coordinates to display values for two variables. Each point represents one observation, with position determined by its x and y values. Scatter plots reveal relationships, patterns, and outliers in data.
Key Concepts
Correlation Coefficient (r)
Measures the strength and direction of the linear relationship between two variables. Range: -1 to +1.
r = sum((xi-x_mean)(yi-y_mean)) / sqrt(sum((xi-x_mean)^2)*sum((yi-y_mean)^2))
Least-Squares Regression
Finds the best-fit straight line through the data by minimizing the sum of squared residuals.
y = mx + b
R-Squared
The coefficient of determination. Represents the proportion of variance in y explained by x.
R^2 = r^2
Interpreting Correlation
- r = 1: Perfect positive linear relationship.
- r = -1: Perfect negative linear relationship.
- r = 0: No linear relationship.
- 0.7 to 1.0: Strong positive correlation.
- -0.7 to -1.0: Strong negative correlation.