Shishir Kant Singh – Page 48 – Shishir Kant Singh

Label Encoding in Python

On: August 24, 2021

An Introduction Before we begin learning the categorical variable encoding, let us understand the basics of data types and their scales at first. It becomes essential for the learners to understand these topics in order to proceed to work with categorical variable encoding. As we all know, Data is aContinue Reading

Handling Error in Dataframe in Python

By: Shishir Kant Singh

On: August 24, 2021

In: Statistics, Probability & Analytics

generating errors we’ve already seen the raise keyword, in passing raise Exception is the simplest way to have your program stop when something goes wrong in a notebook/console environment, it stops the current cell/function (doesn’t crash the session) you have to raise <something> Exception is the most general case (“something happened”) other possibilities TypeError: someContinue Reading

Handling Duplicacy Data Frame in Python

By: Shishir Kant Singh

On: August 24, 2021

In: Statistics, Probability & Analytics

Find duplicate rows: duplicated() Determines which duplicates to mark: keep Specify the column to find duplicate: subset Count duplicate / non-duplicate rows Remove duplicate rows: drop_duplicates() keep, subset inplace Aggregate based on duplicate elements: groupby() The following data is used as an example. row #6 is a duplicate of row #3. The sample csv file is linked below. FindContinue Reading

Handling Missing Values in Python

By: Shishir Kant Singh

On: August 24, 2021

In: Statistics, Probability & Analytics

Missing data is always a problem in real life scenarios. Areas like machine learning and data mining face severe issues in the accuracy of their model predictions because of poor quality of data caused by missing values. In these areas, missing value treatment is a major point of focus toContinue Reading

Sorting Data in Python

By: Shishir Kant Singh

On: August 24, 2021

In: Statistics, Probability & Analytics

There are two kinds of sorting available in Pandas. They are − By label By Actual Value Let us consider an example with an output. import pandas as pd import numpy as np unsorted_df=pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu mns=[‘col2′,’col1’]) print unsorted_df Its output is as follows − col2 col1 1 -2.063177 0.537527 4 0.142932 -0.684884 6Continue Reading

Adding Removing Rows and Column in Python

By: Shishir Kant Singh

On: August 24, 2021

In: Statistics, Probability & Analytics

Inserting rows and columns You can insert rows or columns using the relevant worksheet methods: openpyxl.worksheet.worksheet.Worksheet.insert_rows() openpyxl.worksheet.worksheet.Worksheet.insert_cols() openpyxl.worksheet.worksheet.Worksheet.delete_rows() openpyxl.worksheet.worksheet.Worksheet.delete_cols() The default is one row or column. For example to insert a row at 7 (before the existing row 7): >>> ws.insert_rows(7) Deleting rows and columns To delete the columns F:H: >>>Continue Reading

Combining Data Frames in Python

By: Shishir Kant Singh

On: August 24, 2021

In: Statistics, Probability & Analytics

In many “real world” situations, the data that we want to use come in multiple files. We often need to combine these files into a single DataFrame to analyze the data. The pandas package provides various methods for combining DataFrames including merge and concat. To work through the examples below, we first need to loadContinue Reading

map(), apply() in Python

By: Shishir Kant Singh

On: August 24, 2021

In: Statistics, Probability & Analytics

applymap() method only works on a pandas dataframe where function is applied on every element individually. apply() method can be applied both to series and dataframes where function can be applied both series and individual elements based on the type of function provided. map() method only works on a pandas series where typeContinue Reading

crosstab () in Python

By: Shishir Kant Singh

On: August 24, 2021

In: Statistics, Probability & Analytics

A crosstab computes aggregated metrics among two or more columns in a dataset that contains categorical values. Import Modules import pandas as pd import seaborn as sns Get Tips Dataset Let’s get the tips dataset from the seaborn library and assign it to the DataFrame df_tips. df_tips = sns.load_dataset(‘tips’) Each row represents a unique meal at aContinue Reading

groupby () in Python

By: Shishir Kant Singh

On: August 24, 2021

In: Statistics, Probability & Analytics

Any groupby operation involves one of the following operations on the original object. They are − Splitting the Object Applying a function Combining the results In many situations, we split the data into sets and we apply some functionality on each subset. In the apply functionality, we can perform the following operations − Aggregation − computingContinue Reading

Shishir Kant Singh (Page 48)