Series and Dataframe in Python – Shishir Kant Singh

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.

pandas.Series

A pandas Series can be created using the following constructor −

pandas.Series( data, index, dtype, copy)

The parameters of the constructor are as follows −

Sr.No	Parameter & Description
1	data – data takes various forms like ndarray, list, constants
2	index – Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
3	dtype – dtype is for data type. If None, data type will be inferred
4	copy – Copy data. Default False

A series can be created using various inputs like −

Array
Dict
Scalar value or constant

Create an Empty Series

A basic series, which can be created is an Empty Series.

Example

#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print s

Its output is as follows −

Series([], dtype: float64)

Create a Series from ndarray

If data is an ndarray, then index passed must be of the same length. If no index is passed, then by default index will be range(n) where n is array length, i.e., [0,1,2,3…. range(len(array))-1].

Example 1

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s

Its output is as follows −

0   a
1   b
2   c
3   d
dtype: object

We did not pass any index, so by default, it assigned the indexes ranging from 0 to len(data)-1, i.e., 0 to 3.

Example 2

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print s

Its output is as follows −

100  a
101  b
102  c
103  d
dtype: object

We passed the index values here. Now we can see the customized indexed values in the output.

Create a Series from dict

A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.

Example 1

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s

Its output is as follows −

a 0.0
b 1.0
c 2.0
dtype: float64

Observe − Dictionary keys are used to construct index.

Example 2

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print s

Its output is as follows −

b 1.0
c 2.0
d NaN
a 0.0
dtype: float64

Observe − Index order is persisted and the missing element is filled with NaN (Not a Number).

Create a Series from Scalar

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print s

Its output is as follows −

0  5
1  5
2  5
3  5
dtype: int64

Accessing Data from Series with Position

Data in the series can be accessed similar to that in an ndarray.

Example 1

Retrieve the first element. As we already know, the counting starts from zero for the array, which means the first element is stored at zero^th position and so on.

import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first element
print s[0]

Its output is as follows −

Example 2

Retrieve the first three elements in the Series. If a : is inserted in front of it, all items from that index onwards will be extracted. If two parameters (with : between them) is used, items between the two indexes (not including the stop index)

import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first three element
print s[:3]

Its output is as follows −

a  1
b  2
c  3
dtype: int64

Example 3

Retrieve the last three elements.

import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the last three element
print s[-3:]

Its output is as follows −

c  3
d  4
e  5
dtype: int64

Retrieve Data Using Label (Index)

A Series is like a fixed-size dict in that you can get and set values by index label.

Example 1

Retrieve a single element using index label value.

import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve a single element
print s['a']

Its output is as follows −

Example 2

Retrieve multiple elements using a list of index label values.

import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements
print s[['a','c','d']]

Its output is as follows −

a  1
c  3
d  4
dtype: int64

Example 3

If a label is not contained, an exception is raised.

import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements
print s['f']

Its output is as follows −

…
KeyError: 'f'

Pandas Dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

Different ways of creating a Pandas Dataframe

A Pandas Dataframe can be created/constructed using the following pandas.DataFrame() constructor:-

pd.DataFrame([data, index, columns, dtype, name, copy, …])

A Pandas Dataframe can be created from:-

Dict of 1D ndarrays, lists, dicts, or Series
2-D numpy.ndarray
Structured or record ndarray
A Series
Another DataFrame

The parameters for the constuctor of a Pandas Dataframe are detailed as under:-

Parameters	Remarks
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame	Dict can contain Series, arrays, constants, or list-like objects Changed in version 0.23.0: If data is a dict, column order follows insertion-order for Python 3.6 and later. Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order for Python 3.6 and later.
index : Index or array-like	Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided
columns : Index or array-like	Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided
dtype, default None	Data type to force. Only a single dtype is allowed. If None, infer
copy : bool, default False	Copy data from inputs. Only affects DataFrame / 2d ndarray input

How to create an empty Pandas Dataframe in Python?

You can create an empty Pandas Dataframe using pandas.Dataframe() and later on you can add the columns using df.columns = [list of column names] and append rows to it.

>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>>

How to create a Pandas Dataframe from a single Series object?

We can create a Pandas Dataframe from a sing Pandas Series by passing the series in pd.DataFrame(), the index of the series will become the index of the dataframe and pandas will automatically set 0 as the column name of the Dataframe:-

population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}
population = pd.Series(population_dict)
df = pd.DataFrame(population)
print (df)

# Output

                   0
California  38332521
Texas       26448193
New York    19651127
Florida     19552860
Illinois    12882135

Since, we have not passed the columns argument, it has been given a default value of 0.

How to create a Pandas Dataframe from a dictionary of two or more (multiple) Pandas Series?

We can create a Pandas Dataframe from multiple Pandas Series by passing the dictionary of multiple series to pd.DataFrame() as under. The keys of the dictionary will comprise the columns of the Pandas Dataframe:-

import pandas as pd

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}

area = pd.Series(area_dict)
population = pd.Series(population_dict)
states = pd.DataFrame({'population': population, 'area': area})

print(states)

# Output 

            population    area
California    38332521  423967
Texas         26448193  695662
New York      19651127  141297
Florida       19552860  170312
Illinois      12882135  149995

As you can see here, the resulting index is the union of the keys of the dictionaries and the missing value will be replaced by NaN (not a number). You can optionally pass index (row labels) and columns (column labels) arguments also. A dict of series alongwith specific index will discard all data not matching the passed index.

How to create a Pandas Dataframe from a list of Python Dictionaries?

We can create a Pandas Dataframe from python dictionaries by passing the list of the dictionaries to pd.DataFrame():-

import pandas as pd
df = pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
print(df)

# Output
     a  b    c
0  1.0  2  NaN
1  NaN  3  4.0

Here, the Pandas Dataframe has been constructed with columns as a union of keys of the dictionaries and the missing value has been added as ‘NaN’.

How to create a Pandas Dataframe from 2D Numpy array?

A pandas dataframe can also be created from a 2 dimensional numpy array by using the following code:-

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(3, 2))
print(df)

# Output

          0         1
0  0.059926  0.119440
1  0.548637  0.232405
2  0.343573  0.809589

Since, we have not passed the column and index, the default integers have been used for the same. Alternatively, we can pass the columns and index in the constructor itself:-

df = pd.DataFrame(np.random.rand(3, 2), index = ['a','b','c'], columns = ['x', 'y'])
print(df)

# Output

          x         y
a  0.854185  0.871370
b  0.419274  0.123717
c  0.989986  0.811176

How to create a Pandas Dataframe from a Dictionary of Numpy arrays or list?

Alternatively, a Pandas Dataframe can also be created from a dictionary of nd arrays or list, the keys of the dictionaries will be the columns of the dataframe and it will have the default integer index, if no index is passed.

import pandas as pd
a_dict = {'one': [1., 2., 3., 4.], 'two': [4., 3., 2., 1.]}
df = pd.DataFrame(a_dict)
print(df)

# Output

   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

How to create Pandas Dataframe from a Numpy structured array?

We can create a Pandas Dataframe from a numpy structured array using the following code:-

import pandas as pd
import numpy as np

data = np.zeros((2, ), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')])
data[:] = [(1, 2., 'Hello'), (2, 3., "World")]
df = pd.DataFrame(data)
print(df)

# Output

   A    B         C
0  1  2.0  b'Hello'
1  2  3.0  b'World'

How to check the Index and columns of a Pandas Dataframe?

You can get the index and column of a pandas dataframe using the following codes:-

print(states.index)
print(states.columns)

# Output

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')
Index(['population', 'area'], dtype='object')