# Outlier Detection using Boxplot in Python

### Box plots and Outlier Detection

• Box plots have box from LQ to UQ, with median marked.
• They portray a five-number graphical summary of the data Minimum, LQ, Median, UQ, Maximum
• Helps us to get an idea on the data distribution
• Helps us to identify the outliers easily
• 25% of the population is below first quartile,
• 75% of the population is below third quartile
• If the box is pushed to one side and some values are far away from the box then it’s a clear indication of outliers
• Some set of values far away from box,  gives us a clear indication of outliers.
• In this example the minimum is 5, maximum is 120, and 75% of the values are less than 15.
• Still there are some records reaching 120. Hence a clear indication of outliers.
• Sometimes the outliers are so evident that, the box appear to be a horizontal line in box plot.

### Box plots and outlier detection on Python

In [30]:

```import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

plt.boxplot(bank.balance)
```

Out[30]:

```{'boxes': [<matplotlib.lines.Line2D at 0xcbcd400>],
'caps': [<matplotlib.lines.Line2D at 0xcbdde10>,
<matplotlib.lines.Line2D at 0xcbddf28>],
'fliers': [<matplotlib.lines.Line2D at 0xccc4f98>],
'means': [],
'medians': [<matplotlib.lines.Line2D at 0xccc4780>],
'whiskers': [<matplotlib.lines.Line2D at 0xcbcdda0>,
<matplotlib.lines.Line2D at 0xcbcdeb8>]}```

### Practice: Box plots and outlier detection

• Dataset: “./Bank Marketing/bank_market.csv”
• Draw a box plot for balance variable
• Do you suspect any outliers in balance ?
• Get relevant percentiles and see their distribution.
• Draw a box plot for age variable
• Do you suspect any outliers in age?
• Get relevant percentiles and see their distribution.

In [31]:

```plt.boxplot(bank.balance)
```

Out[31]:

```{'boxes': [<matplotlib.lines.Line2D at 0xcc78208>],
'caps': [<matplotlib.lines.Line2D at 0xcc7fc18>,
<matplotlib.lines.Line2D at 0xcc7fd30>],
'fliers': [<matplotlib.lines.Line2D at 0xcc84da0>],
'means': [],
'medians': [<matplotlib.lines.Line2D at 0xcc84588>],
'whiskers': [<matplotlib.lines.Line2D at 0xcc78ba8>,
<matplotlib.lines.Line2D at 0xcc78cc0>]}```

outlier are present in balance variableIn [32]:

```#Get relevant percentiles and see their distribution
bank['balance'].quantile([0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])
```

Out[32]:

```0.0     -8019.0
0.1         0.0
0.2        22.0
0.3       131.0
0.4       272.0
0.5       448.0
0.6       701.0
0.7      1126.0
0.8      1859.0
0.9      3574.0
1.0    102127.0
Name: balance, dtype: float64```

In [33]:

```# Draw a box plot for age variable
plt.boxplot(bank.age)
```

Out[33]:

```{'boxes': [<matplotlib.lines.Line2D at 0xcf54470>],
'caps': [<matplotlib.lines.Line2D at 0xcf5be80>,
<matplotlib.lines.Line2D at 0xcf5bf98>],
'fliers': [<matplotlib.lines.Line2D at 0xcf65748>],
'means': [],
'medians': [<matplotlib.lines.Line2D at 0xcf617f0>],
'whiskers': [<matplotlib.lines.Line2D at 0xcf54e10>,
<matplotlib.lines.Line2D at 0xcf54f28>]}```

No outliers are presentIn [34]:

```#Get relevant percentiles and see their distribution
bank['age'].quantile([0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])
```

Out[34]:

```0.0    18.0
0.1    29.0
0.2    32.0
0.3    34.0
0.4    36.0
0.5    39.0
0.6    42.0
0.7    46.0
0.8    51.0
0.9    56.0
1.0    95.0
Name: age, dtype: float64

```

Next post is about creating graphs in python.