Descriptive statistics summarize and describe features of a dataset using measures of central tendency, dispersion, and data distribution.
These statistics describe the center of the data:
Formula:
xˉ=∑xin\bar{x} = \frac{\sum x_i}{n}
The mean is sensitive to outliers.
If is odd, the median is the middle value.
nn
If is even, the median is the average of the two middle values.
nn
Less sensitive to outliers than the mean.
Dispersion measures the spread of data points around the center.
Variance (σ2\sigma^2): Measures the average squared deviation from the mean.
Population Variance:
σ2=∑(xi−xˉ)2n\sigma^2 = \frac{\sum (x_i - \bar{x})^2}{n}
Sample Variance:
s2=∑(xi−xˉ)2n−1s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1}
Standard Deviation (σ\sigma): The square root of variance, representing data spread in the same unit as the data.
Formula:
σ=σ2\sigma = \sqrt{\sigma^2}
Range: The difference between the maximum and minimum values in the dataset.
Formula:
Range=max(x)−min(x)\text{Range} = \max(x) - \min(x)
Interquartile Range (IQR): The range of the middle 50% of data, calculated as:
Distributions describe how data is spread.
A bell-shaped and symmetric distribution defined by:
Mean ()
μ\mu
Standard deviation ()
σ\sigma
Empirical Rule (68-95-99.7 Rule):
68% of values fall within
μ±σ\mu \pm \sigma
95% of values fall within
μ±2σ\mu \pm 2\sigma
99.7% of values fall within
μ±3σ\mu \pm 3\sigma