# Normality Test in Python

Shapiro-Wilk Test, Anderson-Darling Test, D’Agostino’s K-squared Test

# Table of content:-

1. Shapiro- wilk Test
2. Anderson-Darling Test
3. D’Agostino’s K-squared Test

## 1. Shapiro- wilk Test

The Shapiro–Wilk test is a test of normality in frequentist statistics. It was published in 1965 by Samuel Sanford Shapiro and Martin Wilk.

The Shapiro-Wilk test is uesd to calculates a W statistic that tests whether a random sample, x1,x2,…,xn comes from (specifically) a normal distribution . Small values of W are evidence of departure from normality and percentage points for the W statistic, obtained via Monte Carlo simulations.

This test has done very well in comparison studies with other goodness of fit tests.

Assumptions:

• Observations in each sample are independent and identically distributed (iid).

Hypothesis:

H0: Data follows Normal Distribution.

H1: Data does not follows Normal Distribution.

`#Python code#Example of Shapiro Wilk Testfrom scipy.stats import shapirodata = [1,1.2,0.2,0.3,-1,-0.2,-0.6,-0.8,0.8,0.1]stat, p = shapiro(data)print('stat=%.3f, p=%.3f' % (stat, p))if p > 0.05:    print("Data follows Normal Distribution")else:    print("Data does not follow Normal Distribution")OUTPUT:Data follows Normal Distribution`

## 2. Anderson-Darling Test

The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free.

It can be used to check whether a data sample is normal. The test is a modified version of a more sophisticated nonparametric goodness-of-fit statistical test called the Kolmogorov-Smirnov test.

Assumptions:

• Observations in each sample are independent and identically distributed (iid).

Hypothesis:

H0: Data follows Normal Distribution.

H1: Data does not follows Normal Distribution.

`#Python code#Example of Anderson-Darling Testfrom scipy.stats import andersondata = [1,1.2,0.2,0.3,-1,-0.2,-0.6,-0.8,0.8,0.1]result = anderson(data)OUTPUT:AndersonResult(statistic=0.19788206806788722, critical_values=array([0.501, 0.57 , 0.684, 0.798, 0.95 ]), significance_level=array([15. , 10. ,  5. ,  2.5,  1. ]))`

The test statistic is 0.1979. We can compare this value to each critical value that corresponds to each significance level to see if the test results are significant.

`#Python codeprint('stat=%.3f' % (result.statistic))for i in range(len(result.critical_values)): sl, cv = result.significance_level[i], result.critical_values[i] if result.statistic < cv:  print('Data follows Normal at the %.1f%% level' % (sl)) else:  print('Data does not follows Normal at the %.1f%% level' % (sl))OUTPUT:Data follows Normal at the 15.0% levelData follows Normal at the 10.0% levelData follows Normal at the 5.0% levelData follows Normal at the 2.5% levelData follows Normal at the 1.0% level`

## 3. D’Agostino’s K-squared Test

The D’Agostino’s K² test calculates summary statistics from the data, namely kurtosis and skewness, to determine if the data distribution departs from the normal distribution, named for Ralph D’Agostino.

• Skew is a quantification of how much a distribution is pushed left or right, a measure of asymmetry in the distribution.
• Kurtosis quantifies how much of the distribution is in the tail. It is a simple and commonly used statistical test for normality.

Assumptions:

• Observations in each sample are independent and identically distributed (iid).

Hypothesis:

H0: Data follows Normal Distribution.

H1: Data does not follows Normal Distribution.

`#Python code#Example ofD’Agostino’s K-squared Testfrom scipy.stats import normaltestdata = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]stat, p = normaltest(data)print('stat=%.3f, p=%.3f' % (stat, p))if p > 0.05: print('Data follows normal')else: print('Data does not follow normal')OUTPUT:stat=3.392, p=0.183Data follows normal`