Shapiro-Wilk Test, Anderson-Darling Test, D’Agostino’s K-squared Test

# Table of content:-

- Shapiro- wilk Test
- Anderson-Darling Test
- D’Agostino’s K-squared Test

## 1. Shapiro- wilk Test

The **Shapiro–Wilk test** is a test of normality in frequentist statistics. It was published in 1965 by Samuel Sanford Shapiro and Martin Wilk.

The Shapiro-Wilk test is uesd to calculates a W statistic that tests whether a random sample, x1,x2,…,xn comes from (specifically) a normal distribution . Small values of W are evidence of departure from normality and percentage points for the W statistic, obtained via Monte Carlo simulations.

This test has done very well in comparison studies with other goodness of fit tests.

**Assumptions:**

- Observations in each sample are independent and identically distributed (iid).

**Hypothesis:**

H0: Data follows Normal Distribution.

H1: Data does not follows Normal Distribution.

#Python code

#Example of Shapiro Wilk Test

from scipy.stats import shapiro

data = [1,1.2,0.2,0.3,-1,-0.2,-0.6,-0.8,0.8,0.1]

stat, p = shapiro(data)

print('stat=%.3f, p=%.3f' % (stat, p))

if p > 0.05:

print("Data follows Normal Distribution")

else:

print("Data does not follow Normal Distribution")OUTPUT:Data follows Normal Distribution

## 2. Anderson-Darling Test

The **Anderson–Darling test** is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free.

It can be used to check whether a data sample is normal. The test is a modified version of a more sophisticated nonparametric goodness-of-fit statistical test called the Kolmogorov-Smirnov test.

**Assumptions:**

- Observations in each sample are independent and identically distributed (iid).

**Hypothesis:**

H0: Data follows Normal Distribution.

H1: Data does not follows Normal Distribution.

#Python code

#Example of Anderson-Darling Test

from scipy.stats import anderson

data = [1,1.2,0.2,0.3,-1,-0.2,-0.6,-0.8,0.8,0.1]

result = anderson(data):OUTPUTAndersonResult(statistic=0.19788206806788722, critical_values=array([0.501, 0.57 , 0.684, 0.798, 0.95 ]), significance_level=array([15. , 10. , 5. , 2.5, 1. ]))

The test statistic is **0.1979**. We can compare this value to each critical value that corresponds to each significance level to see if the test results are significant.

#Python codeprint('stat=%.3f' % (result.statistic))

for i in range(len(result.critical_values)):

sl, cv = result.significance_level[i], result.critical_values[i]

if result.statistic < cv:

print('Data follows Normal at the %.1f%% level' % (sl))

else:

print('Data does not follows Normal at the %.1f%% level' % (sl))OUTPUT:Data follows Normal at the 15.0% level

Data follows Normal at the 10.0% level

Data follows Normal at the 5.0% level

Data follows Normal at the 2.5% level

Data follows Normal at the 1.0% level

## 3. D’Agostino’s K-squared Test

The D’Agostino’s K² test calculates summary statistics from the data, namely kurtosis and skewness, to determine if the data distribution departs from the normal distribution, named for Ralph D’Agostino.

**Skew**is a quantification of how much a distribution is pushed left or right, a measure of asymmetry in the distribution.**Kurtosis**quantifies how much of the distribution is in the tail. It is a simple and commonly used statistical test for normality.

**Assumptions:**

- Observations in each sample are independent and identically distributed (iid).

**Hypothesis:**

H0: Data follows Normal Distribution.

H1: Data does not follows Normal Distribution.

#Python codefrom scipy.stats import normaltest

#Example ofD’Agostino’s K-squared Test

data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]

stat, p = normaltest(data)

print('stat=%.3f, p=%.3f' % (stat, p))

if p > 0.05:

print('Data follows normal')

else:

print('Data does not follow normal')OUTPUT:stat=3.392, p=0.183

Data follows normal