Standard Deviation

Coding a stdev() Function in Python

To calculate the standard deviation of a dataset, we’re going to rely on our variance() function. We’re also going to use the sqrt() function from the math module of the Python standard library. Here’s a function called stdev() that takes the data from a population and returns its standard deviation:

>>> import math

>>> # We relay on our previous implementation for the variance
>>> def variance(data, ddof=0):
...     n = len(data)
...     mean = sum(data) / n
...     return sum((x - mean) ** 2 for x in data) / (n - ddof)
...

>>> def stdev(data):
...     var = variance(data)
...     std_dev = math.sqrt(var)
...     return std_dev

>>> stdev([4, 8, 6, 5, 3, 2, 8, 9, 2, 5])
2.4

Our stdev() function takes some data and returns the population standard deviation. To do that, we rely on our previous variance() function to calculate the variance and then we use math.sqrt() to take the square root of the variance.

If we want to use stdev() to estimate the population standard deviation using a sample of data, then we just need to calculate the variance with n – 1 degrees of freedom as we saw before. Here’s a more generic stdev() that allows us to pass in degrees of freedom as well:

>>> def stdev(data, ddof=0):
...     return math.sqrt(variance(data, ddof))

>>> stdev([4, 8, 6, 5, 3, 2, 8, 9, 2, 5])
2.4

>>> stdev([4, 8, 6, 5, 3, 2, 8, 9, 2, 5], ddof=1)
2.5298221281347035

With this new implementation, we can use ddof=0 to calculate the standard deviation of a population, or we can use ddof=1 to estimate the standard deviation of a population using a sample of data.

Using Python’s pstdev() and stdev()

The Python statistics module also provides functions to calculate the standard deviation. We can find pstdev() and stdev(). The first function takes the data of an entire population and returns its standard deviation. The second function takes data from a sample and returns an estimation of the population standard deviation.

Here’s how these functions work:

>>> import statistics

>>> statistics.pstdev([4, 8, 6, 5, 3, 2, 8, 9, 2, 5])
2.4000000000000004

>>> statistics.stdev([4, 8, 6, 5, 3, 2, 8, 9, 2, 5])
2.5298221281347035

We first need to import the statistics module. Then, we can call statistics.pstdev() with data from a population to get its standard deviation.

If we don’t have the data for the entire population, which is a common scenario, then we can use a sample of data and use statistics.stdev() to estimate the population standard deviation.


Statistics module in Python provides a function known as stdev() , which can be used to calculate the standard deviation. stdev() function only calculates standard deviation from a sample of data, rather than an entire population. 

To calculate standard deviation of an entire population, another function known as pstdev() is used. 

Standard Deviation is a measure of spread in Statistics. It is used to quantify the measure of spread, variation of a set of data values. It is very much similar to variance, gives the measure of deviation whereas variance provides the squared value. 
A low measure of Standard Deviation indicates that the data are less spread out, whereas a high value of Standard Deviation shows that the data in a set are spread apart from their mean average values. A useful property of the standard deviation is that, unlike the variance, it is expressed in the same units as the data. 

Standard Deviation is calculated by :

{\displaystyle s = {\sqrt {\frac {\sum _{i=1}^{N}(x_{i}-{\overline {x}})^{2}}{N-1}}} }
where x1, x2, x3.....xn are observed values in sample data,
\scriptstyle {\overline {x}} is the mean value of observations and
N is the number of sample observations.

Syntax : stdev( [data-set], xbar )
Parameters : 
[data] : An iterable with real valued numbers. 
xbar (Optional): Takes actual mean of data-set as value.
Returnype : Returns the actual standard deviation of the values passed as parameter.
Exceptions : 
StatisticsError is raised for data-set less than 2 values passed as parameter. 
Impossible/precision-less values when the value provided as xbar doesn’t match actual mean of the data-set. 
 

Code #1 :  

  • Python3
# Python code to demonstrate stdev() function 
# importing Statistics module

import statistics 

# creating a simple data - set

sample = [1, 2, 3, 4, 5]

 # Prints standard deviation
# xbar is set to default value of 1

print("Standard Deviation of sample is % s "               
 % (statistics.stdev(sample)))

Output : 

Standard Deviation of the sample is 1.5811388300841898 

Code #2 : Demonstrate stdev() on a varying set of data types  

  • Python3
# Python code to demonstrate stdev() 
# function on various range of datasets 

# importing the statistics modulefrom statistics
import stdev 

# importing fractions as parameter valuesfrom fractions
import Fraction as fr 

# creating a varying range of sample sets
# numbers are spread apart but not very much

sample1 = (1, 2, 5, 4, 8, 9, 12) 

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

 # tuple of a set of positive and negative numbers
# data-points are spread apart considerably

sample3 = (-9, -1, -0, 2, 1, 3, 4, 19) 

# tuple of a set of floating point values

sample4 = (1.23, 1.45, 2.1, 2.2, 1.9)

 # Print the standard deviation of 
# following sample sets of observations

print("The Standard Deviation of Sample1 is % s"                              %(stdev(sample1)))                               
print("The Standard Deviation of Sample2 is % s"                              %(stdev(sample2)))                              
 print("The Standard Deviation of Sample3 is % s"                              %(stdev(sample3)))                                                             
 print("The Standard Deviation of Sample4 is % s"                              %(stdev(sample4)))

Output : 

The Standard Deviation of Sample1 is 3.9761191895520196
The Standard Deviation of Sample2 is 1.8708286933869707
The Standard Deviation of Sample3 is 7.8182478855559445
The Standard Deviation of Sample4 is 0.41967844833872525

Code #3 :Demonstrate the difference between results of variance() and stdev()  

  • Python3
# Python code to demonstrate difference
# in results of stdev() and variance() # importing Statistics module

import statistics 

# creating a simple data-set

sample = [1, 2, 3, 4, 5]

 # Printing standard deviation
# xbar is set to default value of 1

print("Standard Deviation of the sample is % s "                    %(statistics.stdev(sample)))

 # variance is approximately the
# squared result of what stdev is

print("Variance of the sample is % s"     %(statistics.variance(sample)))

Output : 

Standard Deviation of the sample is 1.5811388300841898 
Variance of the sample is 2.5

Code #4 : Demonstrate the use of xbar parameter  

  • Python3
# Python code to demonstrate use of xbar
# parameter while using stdev() function

 # Importing statistics moduleimport statistics
 # creating a sample list

sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2) 

# calculating the mean of sample set

m = statistics.mean(sample) 

# xbar is nothing but stores
# the mean of the sample set 
# calculating the variance of sample set

print("Standard Deviation of Sample set is % s"         %(statistics.stdev(sample, xbar = m)))

Output : 

Standard Deviation of Sample set is 0.6047037842337906

Code #5 : Demonstrates StatisticsError  

  • Python3
# Python code to demonstrate StatisticsError 
# importing the statistics module

import statistics 

# creating a data-set with one element

sample = [1]
 # will raise StatisticsError

print(statistics.stdev(sample))

Output : 

Traceback (most recent call last):
  File "/home/f921f9269b061f1cc4e5fc74abf6ce10.py", line 12, in 
    print(statistics.stdev(sample))
  File "/usr/lib/python3.5/statistics.py", line 617, in stdev
    var = variance(data, xbar)
  File "/usr/lib/python3.5/statistics.py", line 555, in variance
    raise StatisticsError('variance requires at least two data points')
statistics.StatisticsError: variance requires at least two data points

Applications :  

  • Standard Deviation is highly essential in the field of statistical maths and statistical study. It is commonly used to measure confidence in statistical calculations. For example, the margin of error in calculating marks of an exam is determined by calculating the expected standard deviation in the results if the same exam were to be conducted multiple times.
  • It is very useful in the field of financial studies as well as it helps to determine the margin of profit and loss. The standard deviation is also important, where the standard deviation on the rate of return on an investment is a measure of the volatility of the investment.
Follow Us On