Pandas – Select Columns

 In Pandas, selecting columns by name or index allows you to access specific columns in a DataFrame based on their labels (names) or positions (indices). Use loc[] & iloc[] to select a single column or multiple columns from pandas DataFrame by column names/label or index position respectively.

In this article, I will explain how to select one or more columns from a DataFrame using different methods such as column labels, index, positions, and ranges.

Key Points –

  • Pandas allow selecting columns from a DataFrame by their names using square brackets notation or the .loc[] accessor.
  • The .loc[] accessor allows for more explicit selection, accepting row and column labels or boolean arrays.
  • Alternatively, you can use the .iloc[] accessor to select columns by their integer index positions.
  • For selecting the last column, use df.iloc[:,-1:], and for the first column, use df.iloc[:,:1].
  • Understanding both column name and index-based selection is essential for efficient data manipulation with Pandas.

Quick Examples of Select Columns by Name or Index

If you are in a hurry, below are some quick examples of selecting columns by name or index in Pandas DataFrame.


# Quick examples of select columns by name or index

# Example 1: By using df[] notation
df2 = df[["Courses","Fee","Duration"]] # select multile columns

# Example 2: Using loc[] to take column slices
df2 = df.loc[:, ["Courses","Fee","Duration"]] # Selecte multiple columns
df2 = df.loc[:, ["Courses","Fee","Discount"]] # Select Random columns
df2 = df.loc[:,'Fee':'Discount'] # Select columns between two columns
df2 = df.loc[:,'Duration':]  # Select columns by range
df2 = df.loc[:,:'Duration']  # Select columns by range
df2 = df.loc[:,::2]          # Select every alternate column

# Example 3: Using iloc[] to select column by Index
df2 = df.iloc[:,[1,3,4]] # Select columns by Index
df2 = df.iloc[:,1:4] # Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,2:] # Select From 3rd to end
df2 = df.iloc[:,:2] # Select First Two Columns

First, let’s create a pandas DataFrame.


import pandas as pd
technologies = {
    'Courses':["Shishir","Pandas"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days'],
    'Discount':[1000,2300]
              }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

Create DataFrame:
Courses Fee Duration Discount
0 Shishir 20000 30days 1000
1 Pandas 25000 40days 2300

Using loc[] to Select Columns by Name

The df[] and DataFrame.loc[] methods in Pandas provide convenient ways to select multiple columns by names or labels, you can use the syntax [:, start:stop:step] to define the range of columns to include, where the start is the index where the slice starts (inclusive), stop is the index where the slice ends (exclusive), and step is the step size between elements. Another syntax supported by pandas.DataFrame.loc[] is [:, [labels]], where you provide a list of column names as labels.


# loc[] syntax to slice columns
df.loc[:,start:stop:step]

Select DataFrame Columns by Name

To select DataFrame columns by name, you can directly specify the column names within square brackets []. Here, df[['Courses', 'Fee', 'Duration']] select only the CoursesFee, and Duration columns from the DataFrame df.


# Select Columns by labels
df2 = df[["Courses","Fee","Duration"]]
print("Select columns by labels:\n", df2)

Yields below output.

Select columns by labesl:
Courses Fee Duration
0 Shishir 20000 30days
1 Pandas 25000 40days

Select Columns by Index in Multiple Columns

To select multiple columns using df.loc[], you specify both row and column labels. If you want to select all rows and specific columns, you can use : to select all rows and provide a list of column labels. Note that loc[] also supports multiple conditions when selecting rows based on column values.


# Select multiple columns
df2 = df.loc[:, ["Courses","Fee","Discount"]]
print("Select multiple columns by labels:\n", df2)

# Output:
# Select multiple columns by labels:
#   Courses    Fee  Discount
# 0  Shishir  20000      1000
# 1   Pandas  25000      2300

In the above example, df.loc[:, ["Courses", "Fee", "Discount"]] selects all rows (:) and the columns labeled CoursesFee, and Discount from the DataFrame df.

Select Columns Based on Label Indexing

When you want to select columns based on label Indexes, provide start and stop indexes.

  • If you don’t specify a start index, iloc[] selects from the first column.
  • If you don’t provide a stop index, iloc[] selects all columns from the start index to the last column.
  • Specifying both start and stop indexes selects all columns in between, including the start index but excluding the stop index.

# Select all columns between Fee an Discount columns
df2 = df.loc[:,'Fee':'Discount']
print("Select columns by labels:\n", df2)

# Output:
# Select columns by labels:
#     Fee Duration  Discount
# 0  20000   30days      1000
# 1  25000   40days      2300

# Select from 'Duration' column
df2 = df.loc[:,'Duration':]
print("Select columns by labels:\n", df2)

# Output
# Select columns by labels:
#  Duration  Discount   Tutor
# 0   30days      1000  Michel
# 1   40days      2300     Sam

# Select from beginning and end at 'Duration' column
df2 = df.loc[:,:'Duration']
print("Select columns by labels:\n", df2)

# Output
# Select columns by labels:
#   Courses    Fee Duration
# 0  Shishir  20000   30days
# 1  Pandas  25000   40days

Select Every Alternate Column

You can select every alternate column from a DataFrame, you can use the iloc[] accessor with a step size of 2.


# Select every alternate column
df2 = df.loc[:,::2]
print("Select columns by labels:\n", df2)

# Output:
# Select columns by labels:
#   Courses Duration   Tutor
# 0  Shishir   30days  Michel
# 1  Pandas   40days     Sam

This code effectively selects every alternate column, starting from the first column, which results in selecting Courses and Duration.

Pandas iloc[] to Select Column by Index or Position

By using pandas.DataFrame.iloc[], to select multiple columns from a DataFrame by their positional indices. You can use the syntax [:, start:stop:step] to define the range of columns to include, where the start is the index where the slice starts (inclusive), stop is the index where the slice ends (exclusive), and step is the step size between elements. Or, you can use the syntax [:, [indices]] with iloc[], where you provide a list of column names as labels.

Select Columns by Index Position

To select multiple columns from a DataFrame by their index positions, you can use the iloc[] accessor. For instance, retrieves Fee, Discount and Duration and returns a new DataFrame with the columns selected.


# Select columns by position
df2 = df.iloc[:,[1,3,4]]
print("Selec columns by position:\n", df2)

# Output:
# Selec columns by position:
#     Fee  Discount   Tutor
# 0  20000      1000  Michel
# 1  25000      2300     Sam

Select Columns by Position Range

You can also slice a DataFrame by a range of positions. For instance, select columns by position range using the .iloc[] accessor in Pandas. It selects columns with positions 1 through 3 (exclusive of position 4) from the DataFrame df and assigns them to df2.


# Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,1:4]
print("Select columns by position:\n", df2)

# OUtput:
# Selec columns by position:
#     Fee Duration  Discount
# 0  20000   30days      1000
# 1  25000   40days      2300

# Select From 3rd to end
df2 = df.iloc[:,2:]
print("Select columns by position:\n", df2)

# Output:
# Selec columns by position:
#  Duration  Discount   Tutor
# 0   30days      1000  Michel
# 1   40days      2300     Sam

# Select First Two Columns
df2 = df.iloc[:,:2]
print("Selec columns by position:\n", df2))

# Output:
# Selec columns by position:
#   Courses    Fee
# 0  Shishir  20000
# 1   Pandas  25000

To retrieve the last column of a DataFrame, you can use df.iloc[:,-1:], and to obtain just the first column, you can use df.iloc[:,:1].

Complete Example


import pandas as pd
technologies = {
    'Courses':["Shishir","Pandas"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days'],
    'Discount':[1000,2300],
    'Tutor':['Michel','Sam']
              }
df = pd.DataFrame(technologies)
print(df)

# Select multiple columns
print(df[["Courses","Fee","Duration"]])

# Select Random columns
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select columns by range
print(df.loc[:,'Fee':'Discount']) 
print(df.loc[:,'Duration':])
print(df.loc[:,:'Duration'])

# Select every alternate column
print(df.loc[:,::2])

# Selected by column position
print(df.iloc[:,[1,3,4]])

# Select between indexes 1 and 4 (2,3,4)
print(df.iloc[:,1:4])

# Select From 3rd to end
print(df.iloc[:,2:])

# Select First Two Columns
print(df.iloc[:,:2])
Follow Us On

Leave a Reply

Your email address will not be published. Required fields are marked *