Python – Shishir Kant Singh https://shishirkant.com Jada Sir जाड़ा सर :) Mon, 27 Jan 2025 15:05:26 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://shishirkant.com/wp-content/uploads/2020/05/cropped-shishir-32x32.jpg Python – Shishir Kant Singh https://shishirkant.com 32 32 187312365 Pandas – What is Series https://shishirkant.com/pandas-what-is-series/?utm_source=rss&utm_medium=rss&utm_campaign=pandas-what-is-series Mon, 27 Jan 2025 15:05:21 +0000 https://shishirkant.com/?p=4333 Pandas Series Introduction

This is a beginner’s guide of Python pandas Series Tutorial where you will learn what is pandas Series? its features, advantages, and how to use panda Series with sample examples.

Every sample example explained in this tutorial is tested in our development environment and is available for reference.

All pandas Series examples provided in this tutorial are basic, simple, and easy to practice for beginners who are enthusiastic to learn pandas and advance their career in Data Science, analytics, and Machine Learning.

Note: In case you can’t find the pandas Series examples you are looking for on this tutorial page, I would recommend using the Search option from the menu bar to find your tutorial and sample example code, there are hundreds of tutorials in pandas on this website you can learn from.

What is the Pandas Series

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, etc.). It’s similar to a one-dimensional array or a list in Python, but with additional functionalities. Each element in a Pandas Series has a label associated with it, called an index. This index allows for fast and efficient data access and manipulation. Pandas Series can be created from various data structures like lists, dictionaries, NumPy arrays, etc.

Pandas Series vs DataFrame?

  • As I explained above, pandas Series is a one-dimensional labeled array of the same data type whereas DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. 
  • In a DataFrame, each column of data is represented as a pandas Series.
  • DataFrame column can have a name/label but, Series cannot have a column name.
  • DataFrame can also be converted to Series and single or multiple Series can be converted to a DataFrame. Refer to pandas DataFrame Tutorial for more details and examples on DataFrame.

Syntax of pandas.series() 

Following is the syntax of the pandas.series(), which is used to create Pandas Series objects.


# Pandas Series Constructor Syntax
Pandas.series(data,index,dtype,copy)
  • data – The data to be stored in the Series. It can be a list, ndarray, dictionary, scalar value (like an integer or string), etc.
  • Index – Optional. It allows you to specify the index labels for the Series. If not provided, default integer index labels (0, 1, 2, …) will be used.
  • dtype – Optional. The data type of the Series. If not specified, it will be inferred from the data.
  • copy – Optional. If True, it makes a copy of the data. Default is False.

Create pandas Series

pandas Series can be created in multiple ways, From array, list, dict, and from existing DataFrame.

Create Series using array

Before creating a Series, first, we have to import the NumPy module and use array() function in the program. If the data is ndarray, then the passed index should be in the same length, if the index is not passed the default value is range(n).


# Create Series from array
import pandas as pd 
import numpy as np
data = np.array(['python','php','java'])
series = pd.Series(data)
print (series)

# Output:
# 0    python
# 1       php
# 2      java
# dtype: object

Notice that the column doesn’t have a name. And Series also adds an incremental sequence number as Index (first column) by default.

To customize the index of a Pandas Series, you can provide the index parameter when creating the Series using the pd.Series() constructor.


# Create pandas DataFrame with custom index
s2=pd.Series(data=data, index=['r1', 'r2', 'r3'])
print(s2)

# Output:
# r1    python
# r2       php
# r3      java
# dtype: object

Create Series using Dict

Dict can be used as input. Keys from Dict are used as Index and values are used as a column.


# Create a Dict from a input
data = {'Courses' :"pandas", 'Fees' : 20000, 'Duration' : "30days"}
s2 = pd.Series(data)
print (s2)

# Output:
# Courses     pandas
# Fees         20000
# Duration    30days
# dtype: object

Now let’s see how to ignore Index from Dict and add the Index while creating a Series with Dict.


# To See index from Dict and add index while creating a Series.
data = {'Courses' :"pandas", 'Fees' : 20000, 'Duration' : "30days"}
s2 = pd.Series(data, index=['Courses','Course_Fee','Course_Duration'])
print (s2)

Create Series using List

Below is an example of creating DataFrame from List.


# Creating DataFrame from List
data = ['python','php','java']
s2 = pd.Series(data, index=['r1', 'r2','r3'])
print(s2)

# Output:
# r1 python
# r2 php
# r3 java
# dtype:object

Create Empty Series

Sometimes you would require to create an empty Series. you can do so by using its empty constructor.


# Create empty Series
import pandas as pd
s = pd.Series()
print(s)

This shows an empty series.

Convert a Series into a DataFrame

To convert Series into DataFrame, you can use pandas.concat()pandas.merge()DataFrame.join(). Below I have explained using concat() function. For others, please refer to pandas combine two Series to DataFrame


# Convert series to dataframe
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
df=pd.concat([courses,fees],axis=1)
print(df)

# Output:
#   courses   fees
# 0    Spark  22000
# 1  PySpark  25000
# 2   Hadoop  23000

Convert pandas DataFrame to Series

In this section of the pandas Series Tutorial, I will explain different ways to convert DataFrame to Series. As I explained in the beginning. Given that each column in a DataFrame is essentially a Series, it follows that we can easily extract single or multiple columns from a DataFrame and convert them into Series objects

  1. You can convert a single-column DataFrame into a Series by extracting that single column.
  2. To obtain a Series from a specific column in a multi-column DataFrame, simply access that column using its name.
  3. To convert a single row of a DataFrame into a Series, you can utilize indexing to select the row and obtain it as a Series

Convert a single DataFrame column into a series

To run some examples of converting a single DataFrame column into a series, let’s create a DataFrame. By using DataFrame.squeeze() to convert the DataFrame into a Series:


# Create DataFrame with single column
data =  ["Python","PHP","Java"]
df = pd.DataFrame(data, columns = ['Courses'])
my_series = df.squeeze()
print(my_series)
print (type(my_series))

The DataFrame will now get converted into a Series:


# Output:
0    Python
1       PHP
2      Java
Name: Courses, dtype: object
<class 'pandas.core.series.Series'>

Convert the DataFrame column into a series

You can use the .squeeze() method to convert a DataFrame column into a Series.

For example, if we a multiple-column DataFrame


# Create DataFrame with multiple columns
import pandas as pd
data = {'Courses': ['Spark', 'PySpark', 'Python'],
        'Duration':['30 days', '40 days', '50 days'],
        'Fee':[20000, 25000, 26000]
        }
df = pd.DataFrame(data, columns = ['Courses', 'Duration', 'Fee'])
print(df)
print (type(df))

This will convert the Fee column of your DataFrame df into a Series named my_series. If the column contains only one level of data (i.e., it’s not a DataFrame itself), .squeeze() will return it as a Series.


# Pandas DataFrame column to series
my_series= df['Fee'].squeeze()

Convert DataFrame Row into a Series

You can use .iloc[] to access a row by its integer position and then use .squeeze() to convert it into a Series if it has only one element.


# Convert dataframe row to series
my_series = df.iloc[2].squeeze()
print(my_series)
print (type(my_series))

Then, we can get the following series:


# Output:
Courses      Python
Duration    50 days
Fee             NaN
Name: 2, dtype: object
<class 'pandas.core.series.Series'>

Merge DataFrame and Series?

  1. Construct a dataframe from the series.
  2. After that merge with the dataframe.
  3. Specify the data as the values, multiply them by the length, set the columns to the index and set params for left_index and set the right_index to True.

# Syntax for merge with the DataFrame.
df.merge(pd.DataFrame(data = [s.values] * len(s), columns = s.index), left_index=True, right_index=True)

Pandas Series Attributes:

TReturn the transpose, which is by definition self.
arrayThe ExtensionArray of the data backing this Series or Index.
atAccess a single value for a row/column label pair.
attrsDictionary of global attributes of this dataset.
axesReturn a list of the row axis labels.
dtypeReturn the dtype object of the underlying data.
dtypesReturn the dtype object of the underlying data.
flagsGet the properties associated with this pandas object.
hasnasReturn if I have any nans; enables various perf speedups.
iatAccess a single value for a row/column pair by integer position.
ilocPurely integer-location based indexing for selection by position.
indexThe index (axis labels) of the Series.
is_monotonicReturn boolean if values in the object are monotonic_increasing.
is_monotonic_decreasingReturn boolean if values in the object are monotonic_decreasing.
is_monotonic_increasingAlias for is_monotonic.
is_uniqueReturn boolean if values in the object are unique.
locAccess a group of rows and columns by label(s) or a boolean array.
nameReturn the name of the Series.
nbytesReturn the number of bytes in the underlying data.
ndimNumber of dimensions of the underlying data, by definition 1.
shapeReturn a tuple of the shape of the underlying data.
sizeReturn the number of elements in the underlying data.
valuesReturn Series as ndarray or ndarray-like depending on the dtype.

Pandas Series Methods:

abs()Return a Series/DataFrame with absolute numeric value of each element.
add(other[, level, fill_value, axis])Return Addition of series and other, element-wise (binary operator add).
add_prefix(prefix)Prefix labels with string prefix.
add_suffix(suffix)Suffix labels with string suffix.
agg([func, axis])Aggregate using one or more operations over the specified axis.
aggregate([func, axis])Aggregate using one or more operations over the specified axis.
align(other[, join, axis, level, copy, …])Align two objects on their axes with the specified join method.
all([axis, bool_only, skipna, level])Return whether all elements are True, potentially over an axis.
any([axis, bool_only, skipna, level])Return whether any element is True, potentially over an axis.
append(to_append[, ignore_index, …])Concatenate two or more Series.
apply(func[, convert_dtype, args])Invoke function on values of Series.
argmax([axis, skipna])Return int position of the largest value in the Series.
argmin([axis, skipna])Return int position of the smallest value in the Series.
argsort([axis, kind, order])Return the integer indices that would sort the Series values.
asfreq(freq[, method, how, normalize, …])Convert time series to specified frequency.
asof(where[, subset])Return the last row(s) without any NaNs before where.
astype(dtype[, copy, errors])Cast a pandas object to a specified dtype .
at_time(time[, asof, axis])Select values at particular time of day (e.g., 9:30AM).
autocorr([lag])Compute the lag-N autocorrelation.
backfill([axis, inplace, limit, downcast])Synonym for DataFrame.fillna() with method=”bfill”.
between(left, right[, inclusive])Return boolean Series equivalent to left <= series <= right.
between_(start_time, end_time[, …])Select values between particular times of the day (e.g., 9:00-9:30 AM).

Continue..

bfill([axis, inplace, limit, downcast])Synonym for DataFrame.fillna() with method=”bfill” .
bool()Return the bool of a single element Series or DataFrame.
catalias of pandas.core.arrays.categorical.categoricalAccessor
clip([lower, upper, axis, inplace])Trim values at input threshold(s).
combine(other, func[, fill_value])Combine the Series with a Series or scalar according to func.
combine_first(other)Update null elements with value in the same location in ‘other’.
compare(other[, align_axis, keep_shape, …])Compare to another Series and show the differences.
convert([infer_objects, …])Convert columns to best possible dtypes using dtypes supporting pd.NA.
copy([deep])Make a copy of this object’s indices and data.
corr(other[, method, min_periods])Compute correlation with other Series, excluding missing values.
count([level])Return number of non-NA/null observations in the Series.
cov(other[, min_periods, ddof])Compute covariance with Series, excluding missing values.
cummax([axis, skipna])Return cumulative maximum over a DataFrame or Series axis.
cummin([axis, skipna])Return cumulative minimum over a DataFrame or Series axis.
cumprod([axis, skipna])Return cumulative product over a DataFrame or Series axis.
cumsum([axis, skipna])Return cumulative sum over a DataFrame or Series axis.
describe([percentiles, include, exclude, …])Generate descriptive statistics.
diff([periods])First discrete difference of element.
div(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
divide(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
divmod(other[, level, fill_value, axis])Return Integer division and modulo of series and other, element-wise (binary operator divmod).
dot(other)Compute the dot product between the Series and the columns of other.
drop([labels, axis, index, columns, level, …])Return Series with specified index labels removed.
drop_duplicate([keep, inplace])Return Series with duplicate values removed.
droplevel(level[, axis])Return Series/DataFrame with requested index / column level(s) removed.
dropna([axis, inplace, how])Return a new Series with missing values removed.
dtalias of pandas.core.indexes.accessors.CombinedDatetimelikeproperties.
duplicated([keep])Indicate duplicate Series values.
eq(other[, level, fill_value, axis])Return Equal to of series and other, element-wise (binary operator eq).
equals(other)Test whether two objects contain the same elements.
ewm([com, span, halflife, alpha, …])Provide exponential weighted (EW) functions.
expanding([min_periods, center, axis, method])Provide expanding transformations.
explode([ignore_index])Transform each element of a list-like to a row.
factorize([sort, na_sentinel])Encode the object as an enumerated type or categorical variable.
ffill([axis, inplace, limit, downcast])Synonym for DataFrame.fillna()with method=ffill().
fillna([value, method, axis, inplace, …])Fill NA/NaN values using the specified method.
filter([items, like, regex, axis])Subset the dataframe rows or columns according to the specified index labels.
first(offset)Select initial periods of time series data based on a date offset.
first_valid_other()Return index for first non-NA value or None, if no NA value is found.

Conclusion

In this pandas Series tutorial, we have learned about what is panda series? how to create a Panda Series with different types of inputs, convert Pandas Series to DataFrame, and vice versa with working examples.

]]>
4333
Pandas Introduction https://shishirkant.com/pandas-introduction/?utm_source=rss&utm_medium=rss&utm_campaign=pandas-introduction Mon, 27 Jan 2025 14:41:55 +0000 https://shishirkant.com/?p=4324 This is a beginner’s guide of Python Pandas DataFrame Tutorial where you will learn what is DataFrame? its features, its advantages, and how to use DataFrame with sample examples.

Every sample example explained in this tutorial is tested in our development environment and is available for reference.

All pandas DataFrame examples provided in this tutorial are basic, simple, and easy to practice for beginners who are enthusiastic to learn about Pandas and advance their careers in Data Science, Analytics, and Machine Learning.

Note: In case you can’t find the pandas DataFrame examples you are looking for on this tutorial page, I would recommend using the Search option from the menu bar to find your example code, there are hundreds of tutorials in pandas on this website you can learn from.

2. What is Python Pandas?

Pandas is the most popular open-source library in the Python programming language and pandas is widely used for data science/data analysis and machine learning applications. It is built on top of another popular package named Numpy, which provides scientific computing in Python and supports multi-dimensional arrays. It is developed by Wes McKinney, check his GitHub for other projects he is working on.

Following are the main two data structures supported by Pandas.

  • pandas Series
  • pandas DataFrame
  • pandas Index

2.1 What is Pandas Series

In simple words Pandas Series is a one-dimensional labeled array that holds any data type (integers, strings, floating-point numbers, None, Python objects, etc.). The axis labels are collectively referred to as the index. The later section of this pandas tutorial covers more on the Series with examples.

2.2 What is Pandas DataFrame

Pandas DataFrame is a 2-dimensional labeled data structure with rows and columns (columns of potentially different types like integers, strings, float, None, Python objects e.t.c). You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. The later section of this pandas tutorial covers more on DataFrame with examples.

3. Pandas Advantages

4. Pandas vs PySpark

In very simple words Pandas run operations on a single machine whereas PySpark runs on multiple machines. If you are working on a Machine Learning application where you are dealing with larger datasets, PySpark is the best where you need to process operations many times(100x) faster than Pandas.

PySpark is also very well used in Data Science and Machine Learning community as there are many widely used data science libraries written in Python including NumPy, TensorFlow. Also, PySpark is used due to its efficient processing of large datasets. PySpark has been used by many organizations like Walmart, Trivago, Sanofi, Runtastic, and many more.

PySpark is a Spark library written in Python to run Python applications using Apache Spark capabilities. Using PySpark we can run applications parallelly on the distributed cluster (multiple nodes) or even on a single node.

Apache Spark is an analytical processing engine for large scale powerful distributed data processing and machine learning applications.

Spark was basically written in Scala and later on due to its industry adaptation, its API PySpark was released for Python using Py4J. Py4J is a Java library that is integrated within PySpark and allows python to dynamically interface with JVM objects, hence to run PySpark you also need Java to be installed along with Python, and Apache Spark.

Additionally, For the development, you can use Anaconda distribution (widely used in the Machine Learning community) which comes with a lot of useful tools like Spyder IDE, Jupyter notebook to run PySpark applications.

4.1 How to Decide Between Pandas vs PySpark

Below are a few considerations when choosing PySpark over Pandas.

  • If your data is huge and grows significantly over the years and you wanted to improve your processing time.
  • If you want fault-tolerant.
  • ANSI SQL compatibility.
  • Language to choose (Spark supports Python, Scala, Java & R)
  • When you want Machine-learning capability.
  • Would like to read Parquet, Avro, Hive, Casandra, Snowflake e.t.c
  • If you wanted to stream the data and process it real-time.

5. Installing Pandas

In this section of the python pandas tutorial let’s see how to install & upgrade pandas. In order to run pandas, you should have python installed first. You can install Python either directly downloading from python or using Anaconda distribution. Depending on your need, follow the below link’s to install Python, Anaconda, and Jupyter notebook to run pandas examples. I would recommend installing Anaconda with Jupyter as a good choice if you are intended to learn pandas for data science, analytics & machine learning.

  • Step-by-Step Instruction of Install Anaconda & Pandas
  • Run pandas from Anaconda & Jupyter Notebook
  • Install Python & Run pandas from Windows

Once you have either Python or Anaconda setup, you can install pandas on top of Python or Anaconda in simple steps.

5.1 Install Pandas using Python pip Command

pip (Python package manager) is used to install third-party packages from PyPI. Using pip you can install/uninstall/upgrade/downgrade any python library that is part of Python Package Index.

Since the Pandas package is available in PyPI (Python Package Index), we should use it to install Pandas latest version on windows.


# Install pandas using pip
pip install pandas
(or)
pip3 install pandas

This should give you the output as below. If your pip is not up to date, then upgrade pip to the latest version.

python pandas tutorial

5.2 Install Pandas using Anaconda conda Command

Anaconda distribution comes with a conda tool that is used to install/upgrade/downgrade most of the python and other packages.


# Install pandas using conda
conda install pandas

6. Upgrade Pandas to Latest or Specific Version

In order to upgrade pandas to the latest or specific version, you can use either pip install command or conda install if you are using Anaconda distribution. Before you start to upgrade, you use the following command to know the current version of pandas installed.

pandas installed version

Below are statements to upgrade pandas. Depending on how you wanted to update, use either pip or conda statements.


# Using pip to upgrade pandas
pip install --upgrade pandas

# Alternatively you can also try
python -m pip install --upgrade pandas

# Upgrade pandas to specific version
pip install pandas==specific-higher-version

# Use conda update
conda update pandas

#Upgrade to specific version
conda update pandas==0.14.0

If you use pip3 to upgrade, you should see something like the below.

pandas tutorial

7. Run Pandas Hello World Example

7.1 Run Pandas From Command Line

If you installed Anaconda, open the Anaconda command line or open the python shell/command prompt and enter the following lines to get the version of pandas, to learn more follow the links from the left-hand side of the pandas tutorial.


>>> import pandas as pd
>>> pd.__version__
'1.3.2'
>>>

7.2 Run Pandas From Jupyter

Go to Anaconda Navigator -> Environments -> your environment (I have created pandas-tutorial) -> select Open With Jupyter Notebook

python pandas tutorial

This opens up Jupyter Notebook in the default browser.

jupyter notenook

Now select New -> PythonX and enter the below lines and select Run.

jupyter notebook

7.3 Run Pandas from IDE

You can also run pandas from any python IDE’s like Spyder, PyCharm e.t.c

8. Pandas Series Introduction

A pandas Series is a one-dimensional array that can accommodate diverse data types, including integers, strings, floats, Python objects, and more. Utilizing the series() method, we can convert lists, tuples, and dictionaries into Series objects. Within a pandas Series, the row labels are referred to as the index. It’s important to note that a Series can only consist of a single column and cannot hold multiple columns simultaneously. Lists, NumPy arrays, and dictionaries can all be transformed into pandas Series.

8.1. Pandas.series() Constructor

Below is the syntax of pandas Series Constructor, which is used to create Series object.


# Pandas Series Constructor Syntax
Pandas.series(data,index,dtype,copy)
  • data: The data contains ndarray, list, constants.
  • Index: The index must be unique and hashable. np.arrange(n) if no index is passed.
  • dtype: dtype is also a data type.
  • copy: It is used to copy the data. The data contains ndarray, list, constants.

8.2 . Create Pandas Series

pandas Series can be created in multiple ways, From array, list, dict, and from existing DataFrame.

8.2.1 Creating Series from NumPy Array


# Create Series from array
import pandas as pd 
import numpy as np
data = np.array(['python','php','java'])
series = pd.Series(data)
print (series)

8.2.2 Creating Series from Dict


# Create a Dict from a input
data = {'Courses' :"pandas", 'Fees' : 20000, 'Duration' : "30days"}
s2 = pd.Series(data)
print (s2)

8.3.3 Creating Series from List


#Creating DataFrame from List
data = ['python','php','java']
s2 = pd.Series(data, index=['r1', 'r2','r3'])
print(s2)

9. Pandas DataFrame

I have a dedicated tutorial for python pandas DataFrame hence, in this section I will briefly explain what is DataFrame. DataFrame is a Two-Dimensional data structure, immutable, heterogeneous tabular data structure with labeled axes rows, and columns. pandas Dataframe is consists of three components principal, data, rows, and columns.

9.1 DataFrame Features

  • DataFrames supported named rows & columns (you can also provide names to rows)
  • Pandas DataFrame size is mutable.
  • Supports Hetrogenous Collections of data.
  • DataFrame labeled axes (rows and columns).
  • Can perform arithmetic operations on rows and columns.
  • Supporting reading flat files like CSV,Excel, JSON and also reading SQL tables’s
  • Handling of missing data.

10. Pandas Series vs DataFrame?

Here is a comparison between pandas Series and DataFrames.

FeatureSeriesDataFrame
DimensionalityOne-dimensionalTwo-dimensional
StructureLabeled arrayLabeled data structure with rows and columns
ComponentsConsists of data and indexConsists of data, row index, and column index
Data TypesHomogeneous (same data type)Heterogeneous (different data types per column)
CreationFrom lists, arrays, dictionaries, or scalarsFrom dictionaries, arrays, lists, or other DataFrames
OperationsSupports operations like indexing, slicing, arithmetic operationsSupports operations like merging, joining, grouping, reshaping
Use CasesUseful for representing a single column of data or simple data structuresSuitable for tabular data with multiple columns and rows
]]>
4324
Python CGI Programming https://shishirkant.com/python-cgi-programming/?utm_source=rss&utm_medium=rss&utm_campaign=python-cgi-programming Thu, 28 Sep 2023 15:11:08 +0000 https://shishirkant.com/?p=4312 The Concept of CGI

CGI is an abbreviation for Common Gateway Interface. It is not a type of language but a set of rules (specification) that establishes a dynamic interaction between a web application and the client application (or the browser). The programs based on CGI helps in communicating between the web servers and the client. Whenever the client browser makes a request, it sends it to the webserver, and the CGI programs return output to the webserver based on the input that the client-server provides.

Common Gateway Interface (CGI) provides a standard for peripheral gateway programs to interface with the data servers like an HTTP server.

The programming with CGI is written dynamically, which generates web-pages responding to the input from the user or the web-pages interacting with the software on the server.

The Concept of Web Browsing

Have you ever wondered how these blue-colored underlined texts, commonly known as hyperlinks, able to take you from one web-page or Uniform Resource Locator (URL) to another? What exactly happens when some user clicks on a hyperlink?

Let’s understand the very concept behind web browsing. Web browsing consists of some steps that are as follows:

STEP 1: Firstly, the browser communicates with the data server, say HTTP server, to demand the URL.

STEP 2: Once it is done, then it parses the URL.

STEP 3: After then, it checks for the specified filename.

STEP 4: Once it finds that file, a request is made and sent it back.

STEP 5: The Web browser accepts a response from the webserver.

STEP 6: As the server’s response, it can either display the requested file or a message showing error.

However, it may be possible to set up an HTTP server because whenever a file in a specific directory is requested, it is processed as a program rather than sending that file back. The output of that program is shown back to the browser. This function is also known as the Common Gateway Interface or abbreviated as CGI. These processed programs are known as CGI scripts, and they can be a C or C++ program, Shell Script, PERL Script, Python Script, etc.

The working of CGI

Whenever the client-server requests the webserver, the Common Gateway Interface (CGI) handles these requests using external script files. These script files can be written in any language, but the chief idea is to recover the data more efficiently and quickly. These scripts are then used to convert the recovered data into an HTML format that can send data to these web servers in HTML formatted page.

An architectural diagram representing the working of CGI is shown below:

Usage of cgi module

Python provides the cgi module consisting of numerous useful core properties. These properties and functions can be used by importing the cgi module, in current working program as shown below:

import cgi

Now, We will use cgitb.enable() in our script to stimulate an exception handler in the web browser to display the detailed report for the errors that occurred. The save will look as shown below:

import cgi

cgitb.enable()

Now, we can save the report with the help of the following script.

import cgitb 
cgitb.enable(display = 0, logdir = “/path/to/logdir” )

The function of the cgi module stated above would help throughout the script development. These reports help the user for debugging the script efficiently. Whenever the users get the expected result, they can eliminate this.

As we have discussed earlier, we can save information with the help of the form. But the problem is, how can we obtain that information? To answer this question, let’s understand the FieldStorage class of Python. If the form contains the non-ASCII characters, we can apply the encoding keyword parameter to the document. We will find the content <META> tag inside the <HEAD> section of the HTML file.

 The FieldStorage class is used to read the form data from the environment or the standard input.

FieldStorage instance is similar to the Python dictionary. The user can utilize the len() and all the dictionary functions as the FieldStorage instance. It is used to overlook the fields that have values as an empty string. The users can also consider the void values with the optional keyword parameter keep_blank_values by setting it to True.

Let’s see an example:

form = cgi.FieldStorage()   if ("name" not in form or "add" not in form):       
    print("<H1>Input Error!!</H1>")
   print("Please enter the details in the Name and Address fields!")    return 
print("<p>Name: file_item = form["userfile"]   
if (fileitem.file):      
    # It represents the uploaded file     
    count_line = 0       
    while(True):           
        line = fileitem.file.readline()   
        if not line: break           
        count_line = count_line + 1   # The execution of next lines of code will start here...

In the above snippet of code, we have utilized the form [“name”], where name is key, for extracting the value which the user enters.

To promptly fetch the string value, we can utilize the getvalue() method. This method also takes a second argument by default. And if the key is not available, it will return the value as default.

Moreover, if the information in the submitted form has multiple fields with the same name, we should take the help of the form.getlist() function. This function helps in returning the list of strings. Now let’s look at the following snippet of code; we have added some username fields and separate them by commas.

first_value = form.getlist("username")   f_username = ",".join(value)

If we want to access the field where a file is uploaded and read that in bytes, we can use the value attribute or the getvalue() method. Let’s see the following snippet of code if the user uploads the file.

file_item = form["userfile"]   
if (fileitem.file):      
    # It represents the uploaded file     
    count_line = 0       
    while(True):           
        line = fileitem.file.readline()           if not line: break           
        count_line = count_line + 1  

An error can often interrupt the program while reading the content of the file that was uploaded. It may happen when a user clicks on the Back Button or the Cancel Button. However, to set the value – 1, the FieldStorage class provides the done attribute.

Furthermore, the item will be objects of the MiniFieldStorage class if the submitted form is in the “old” format. The attributes like list, filename, and file are always None in this class.

Usually, the form is submitted with the POST method’s help and contains a query string with both the MiniFieldStorage and FieldStorage items.

Let’s see a list of the FieldStorage attribute in the following table.

FieldStorage Attributes:

S. No.AttributesDescription
1NameThe Name attribute is used to represent the field name.
2FileThe File attribute is used as a file(-like) instance to read data as bytes.
3FilenameThe Filename attribute is used to represent the filename at the Client-side.
4TypeThe Type attribute is used to show the type of content.
5ValueThe Value attribute is used to upload files, read the files and return byte. It is a string type value.
6HeaderThe Header attribute is used as a dictionary type instance containing all headers.

In addition to the above, the FieldStorage instance uses various core methods for manipulating users’ data. Some of them are listed below:

FieldStorage Methods:

S. No.MethodsDescription
1getfirst()The getfirst() method is used to return the received first value.
2getvalue()The getvalue() method is used as a dictionary get() method
3getlist()The getlist() method is used to return the list of values received.
4keys()The keys() method is used as the dictionary keys() method
5make_file()The make_file() method is used to return a writable and readable file.

CGI Program Structure in Python

Let’s understand the structure of a Python CGI Program:

  • There must be two sections that separate the output of the Python CGI script by a blank line.
  • The first section consists of the number of headers describing the client about the type of data used, and the other section consists of the data that will be displayed during the execution of the script.

Let’s have a look at the Python code given below:

print ("Content-type : text/html") 
# now enter the rest html document print ("<html>") 
print ("<head>") 
print ("<title> Welcome to CGI program </title>") 
print ("<head>") 
print ("<body>") 
print ("<h2> Hello World! This is my first CGI program. </h2>") print ("</body>") 
print ("</html>")

Now, let’s save the above file as cgi.py. Once we execute the file, we should see an output, as shown below:

Hello World! This is my first CGI program.

The above program is a simple Python script that writes the output to STDOUT file that is on-screen.

Understanding the HTTP Header

There are various HTTP headers defined that are frequently used in the CGI programs. Some of them are listed below:

S. No.HTTP HeaderDescription
1Content-typeThe Content-type is a MIME string used for defining the file format that is being returned.
2Content-length: NThe Content-length works as the information used for reporting the estimated time for downloading a file.
3Expires: DateThe Expires: Date is used for displaying the valid date information
4Last-modified: DateThe Last-modified: Date is used to show the resource’s last modification date
5Location: URLThe Location: URL is used to display the URL returned by the server.
6Set-Cookies: StringThe Set-Cookies: String is used for setting the cooking with help of a string

The CGI Environment Variables

There are some variables predefined in the CGI environment alongside the HTML syntax. Some of them are listed in the following table:

S. No.Environment VariablesDescription
1CONTENT_TYPEThe CONTENT_TYPE variable is used to describe the type and data of the content.
2CONTENT_LENGTHThe CONTENT_LENGTH variable is used to define the query or information length.
3HTTP_COOKIEThe HTTP_COOKIE variable is used to return the cookie set by the user in the current session.
4HTTP_USER_AGENTThe HTTP_USER_AGENT variable is used for displaying the browser’s type currently being used by the user.
5REMOTE_HOSTThe REMOTE_HOST variable is used for describing the Host-name of the user.
6PATH_INFOThe PATH_INFO variable is used for describing the CGI script path.
7REMOTE_ADDRThe REMOTE_ADDR variable is used for defining the IP address of the visitor.
8REQUEST_METHODThe REQUEST_METHOD variable is used for requests with the help of the GET or POST method.

How to Debug CGI Scripts?

Then, the test() function can be used from the script. We can write the following code using a single statement

cgi.test()

Pros and Cons of CGI Programming

Some Pros of CGI Programming:

There are numerous pros of using CGI programming. Some of them are as follows:

  • The CGI programs are multi-lingual. These programs can be used with any programming language.
  • The CGI programs are portable and can work on almost any web-server.
  • The CGI programs are quite scalable and can perform any task, whether it’s simple or complex.
  • The CGIs take lesser time in processing requests.
  • The CGIs can be used in development; they can reduce the cost of developments and maintenance, making it profitable.
  • The CGIs can be used for increasing the dynamic communication in web applications.

Some Cons of CGI Programming:

There are a few cons of using CGI programming. Some of them are as follows:

  • The CGI programs are pretty much complex, making it harder to debug.
  • While initiating the program, the interpreter has to appraise a CGI script in every initiation. As an output, it creates a lot of traffic because of multiple requests from the client-server’s side.
  • The CGI programs are fairly susceptible, as they are mostly free and easily available with no server security.
  • CGI utilizes a lot of time in processing.
  • The data doesn’t store in the cache memory during the loading of the page.
  • CGIs have huge extensive codebases, mostly in Perl.
]]>
4312
Python Regex Functions https://shishirkant.com/python-regex-functions/?utm_source=rss&utm_medium=rss&utm_campaign=python-regex-functions Thu, 28 Sep 2023 15:02:27 +0000 https://shishirkant.com/?p=4308 A regular expression is a set of characters with highly specialized syntax that we can use to find or match other characters or groups of characters. In short, regular expressions, or Regex, are widely used in the UNIX world.

Import the re Module

  1. # Importing re module  
  2. import re  

The re-module in Python gives full support for regular expressions of Pearl style. The re module raises the re.error exception whenever an error occurs while implementing or using a regular expression.

We’ll go over crucial functions utilized to deal with regular expressions.

But first, a minor point: many letters have a particular meaning when utilized in a regular expression called metacharacters.Backward Skip 10sPlay VideoForward Skip 10s

The majority of symbols and characters will easily match. (A case-insensitive feature can be enabled, allowing this RE to match Python or PYTHON.) For example, the regular expression ‘check’ will match exactly the string ‘check’.

There are some exceptions to this general rule; certain symbols are special metacharacters that don’t match. Rather, they indicate that they must compare something unusual or have an effect on other parts of the RE by recurring or modifying their meaning.

Metacharacters or Special Characters

As the name suggests, there are some characters with special meanings:

CharactersMeaning
.Dot – It matches any characters except the newline character.
^Caret – It is used to match the pattern from the start of the string. (Starts With)
$Dollar – It matches the end of the string before the new line character. (Ends with)
*Asterisk – It matches zero or more occurrences of a pattern.
+Plus – It is used when we want a pattern to match at least one.
?Question mark – It matches zero or one occurrence of a pattern.
{}Curly Braces – It matches the exactly specified number of occurrences of a pattern
[]Bracket – It defines the set of characters
|Pipe – It matches any of two defined patterns.

Special Sequences:

The ability to match different sets of symbols will be the first feature regular expressions can achieve that’s not previously achievable with string techniques. On the other hand, Regexes isn’t much of an improvement if that had been their only extra capacity. We can also define that some sections of the RE must be reiterated a specified number of times.

The first metacharacter we’ll examine for recurring occurrences is *. Instead of matching the actual character ‘*,’ * signals that the preceding letter can be matched 0 or even more times rather than exactly once.

Ba*t, for example, matches ‘bt’ (zero ‘a’ characters), ‘bat’ (one ‘a’ character), ‘baaat’ (three ‘a’ characters), etc.

Greedy repetitions, such as *, cause the matching algorithm to attempt to replicate the RE as many times as feasible. If later elements of the sequence fail to match, the matching algorithm will retry with lesser repetitions.

Special Sequences consist of ‘\’ followed by a character listed below. Each character has a different meaning.

CharacterMeaning
\dIt matches any digit and is equivalent to [0-9].
\DIt matches any non-digit character and is equivalent to [^0-9].
\sIt matches any white space character and is equivalent to [\t\n\r\f\v]
\SIt matches any character except the white space character and is equivalent to [^\t\n\r\f\v]
\wIt matches any alphanumeric character and is equivalent to [a-zA-Z0-9]
\WIt matches any characters except the alphanumeric character and is equivalent to [^a-zA-Z0-9]
\AIt matches the defined pattern at the start of the string.
\br”\bxt” – It matches the pattern at the beginning of a word in a string.
r”xt\b” – It matches the pattern at the end of a word in a string.
\BThis is the opposite of \b.
\ZIt returns a match object when the pattern is at the end of the string.

RegEx Functions:

  • compile – It is used to turn a regular pattern into an object of a regular expression that may be used in a number of ways for matching patterns in a string.
  • search – It is used to find the first occurrence of a regex pattern in a given string.
  • match – It starts matching the pattern at the beginning of the string.
  • fullmatch – It is used to match the whole string with a regex pattern.
  • split – It is used to split the pattern based on the regex pattern.
  • findall – It is used to find all non-overlapping patterns in a string. It returns a list of matched patterns.
  • finditer – It returns an iterator that yields match objects.
  • sub – It returns a string after substituting the first occurrence of the pattern by the replacement.
  • subn – It works the same as ‘sub’. It returns a tuple (new_string, num_of_substitution).
  • escape – It is used to escape special characters in a pattern.
  • purge – It is used to clear the regex expression cache.

1. re.compile(pattern, flags=0)

It is used to create a regular expression object that can be used to match patterns in a string.

Example:

# Importing re module  
import re  

# Defining regEx pattern  
pattern = "amazing"  

# Createing a regEx object
regex_object = re.compile(pattern)  

# String  
text = "This tutorial is amazing!"   

# Searching for the pattern in the string  
match_object = regex_object.search(text)  

# Output  
print("Match Object:", match_object)  


Output:
Match Object:

This is equivalent to:

re_obj = re.compile(pattern)
result = re_obj.search(string)
=result = re.search(pattern, string)

Note – When it comes to using regular expression objects several times, the re.complie() version of the program is much more efficient.

2. re.match(pattern, string, flags=0)

  • It starts matching the pattern from the beginning of the string.
  • Returns a match object if any match is found with information like start, end, span, etc.
  • Returns a NONE value in the case no match is found.

Parameters

  • pattern:-this is the expression that is to be matched. It must be a regular expression
  • string:-This is the string that will be compared to the pattern at the start of the string.
  • flags:-Bitwise OR (|) can be used to express multiple flags.

Example:

# Importing re module  
import re  

# Our pattern  
pattern = "hello"  

# Returns a match object if found else Null  
match = re.match(pattern, "hello world")  

print(match) # Printing the match object  
print("Span:", match.span()) # Return the tuple (start, end)  
print("Start:", match.start()) # Return the starting index  
print("End:", match.end()) # Returns the ending index  

Output: 
Span: (0, 5) 
Start: 0 
End: 5

Another example of the implementation of the re.match() method in Python.

  • The expressions “.w*” and “.w*?” will match words that have the letter “w,” and anything that does not has the letter “w” will be ignored.
  • The for loop is used in this Python re.match() illustration to inspect for matches for every element in the list of words.

CODE:

import re    
line = "Learn Python through tutorials on shishirkant"  
match_object = re.match( r'.w* (.w?) (.w*?)', line, re.M|re.I)

if match_object:    
    print ("match object group : ", match_object.group())   
    print ("match object 1 group : ", match_object.group(1))
    print ("match object 2 group : ", match_object.group(2))  
else:    
    print ( "There isn't any match!!" )   

Output:
There isn't any match!!

3. re.search(pattern, string, flags=0)

The re.search() function will look for the first occurrence of a regular expression sequence and deliver it. It will verify all rows of the supplied string, unlike Python’s re.match(). If the pattern is matched, the re.search() function produces a match object; otherwise, it returns “null.”

To execute the search() function, we must first import the Python re-module and afterward run the program. The “sequence” and “content” to check from our primary string are passed to the Python re.search() call.

Here is the description of the parameters –

pattern:- this is the expression that is to be matched. It must be a regular expression

string:- The string provided is the one that will be searched for the pattern wherever within it.

flags:- Bitwise OR (|) can be used to express multiple flags. These are modifications, and the table below lists them.

Code

import re  

line = "Learn Python through tutorials on shishirkant";  

search_object = re.search( r' .*t? (.*t?) (.*t?)', line) 
if search_object:  
    print("search object group : ", search_object.group())  
    print("search object group 1 : ", search_object.group(1)) 
    print("search object group 2 : ", search_object.group(2)) 
else:  
    print("Nothing found!!")  

Output:
search object group : Python through tutorials on shishirkant 
search object group 1 : on 
search object group 2 : shishirkant

4. re.sub(pattern, repl, string, count=0, flags=0)

  • It substitutes the matching pattern with the ‘repl’ in the string
  • Pattern – is simply a regex pattern to be matched
  • repl – repl stands for “replacement” which replaces the pattern in string.
  • Count – This parameter is used to control the number of substitutions

Example 1:

# Importing re module  
import re  

# Defining parameters  
pattern = "like" # to be replaced  
repl = "love" # Replacement  
text = "I like Shishirkant!" # String 

# Returns a new string with a substituted pattern 
new_text = re.sub(pattern, repl, text)  

# Output  
print("Original text:", text)  
print("Substituted text: ", new_text)  

Output:
Original text: I like Shishirkant! 
Substituted text: I love Shishirkant!

In the above example, the sub-function replaces the ‘like’ with ‘love’.

Example 2 – Substituting 3 occurrences of a pattern.

# Importing re package  
import re  

# Defining parameters  
pattern = "l" # to be replaced  
repl = "L" # Replacement  
text = "I like Shishirkant! I also like tutorials!" # String  

# Returns a new string with the substituted pattern  
new_text = re.sub(pattern, repl, text, 3)  

# Output  
print("Original text:", text)  
print("Substituted text:", new_text)  

Output:
Original text: I like Shishirkant! I also like tutorials! 
Substituted text: I Like Shishirkant! I aLso Like tutorials!

Here, first three occurrences of ‘l’ is substituted with the “L”.

5. re.subn(pattern, repl, string, count=0, flags=0)

  • Working of subn if same as sub-function
  • It returns a tuple (new_string, num_of_substitutions)

Example:

# Importing re module  
import re  

# Defining parameters  
pattern = "l" # to be replaced  
repl = "L" # Replacement  
text = "I like Shishirkant! I also like tutorials!" # String  

# Returns a new string with the substituted pattern  
new_text = re.subn(pattern, repl, text, 3)  

# Output  
print("Original text:", text)  
print("Substituted text:", new_text) 

Output:
Original text: I like Shishirkant! I also like tutorials! 
Substituted text: ('I Like Shishirkant! I aLso Like tutorials!', 3)

    In the above program, the subn function replaces the first three occurrences of ‘l’ with ‘L’ in the string.

    6. re.fullmatch(pattern, string, flags=0)

    • It matches the whole string with the pattern.
    • Returns a corresponding match object.
    • Returns None in case no match is found.
    • On the other hand, the search() function will only search the first occurrence that matches the pattern.

    Example:

    # Importing re module  
    import re   
    # Sample string  
    line = "Hello world";    
    
    # Using re.fullmatch()  
    print(re.fullmatch("Hello", line))  
    print(re.fullmatch("Hello world", line))  

    Output:

    None

    In the above program, only the ‘Hello world” has completely matched the pattern, not ‘Hello’.

    Q. When to use re.findall()?

    Ans. Suppose we have a line of text and want to get all of the occurrences from the content, so we use Python’s re.findall() function. It will search the entire content provided to it.

    7. re.finditer(pattern, string, flags=0)

    • Returns an iterator that yields all non-overlapping matches of pattern in a string.
    • String is scanned from left to right.
    • Returning matches in the order they were discovered

    # Importing re module  
    import re   
    
    # Sample string  
    line = "Hello world. I am Here!";  
    
    # Regex pattern  
    pattern = r'[aeiou]'  
    
    # Using re.finditer()  
    iter_ = re.finditer(pattern, line)  
    
    # Iterating the itre_  
    for i in iter_:  
        print(i)  

    Output:

    8. re.split(pattern, string, maxsplit=0, flags=0)

    • It splits the pattern by the occurrences of patterns.
    • If maxsplit is zero, then the maximum number of splits occurs.
    • If maxsplit is one, then it splits the string by the first occurrence of the pattern and returns the remaining string as a final result.

    Example:

    # Import re module  
    import re    
    
    # Pattern  
    pattern = ' '  
    
    # Sample string  
    line = "Learn Python through tutorials on shishirkant"    
    
    # Using split function to split the string after ' '  
    result = re.split( pattern, line)   
    
    # Printing the result  
    print("When maxsplit = 0, result:", result)  
    
    # When Maxsplit is one  
    result = re.split(pattern, line, maxsplit=1)  
    print("When maxsplit = 1, result =", result)
    
    Output:
    When maxsplit = 0, result: ['Learn', 'Python', 'through', 'tutorials', 'on', 'shishirkant'] 
    When maxsplit = 1, result = ['Learn', 'Python through tutorials on shishirkant']

    9. re.escape(pattern)

    • It escapes the special character in the pattern.
    • The esacpe function become more important when the string contains regular expression metacharacters in it.

    Example:

    # Import re module  
    import re    
    
    # Pattern  
    pattern = 'https://www.shishirkant.com/'  
    
    # Using escape function to escape metacharacters  
    result = re.escape( pattern)   
      
    # Printing the result  
    print("Result:", result)

      Output:Result: https://www\.shishirkant\.com/

      The escape function escapes the metacharacter ‘.’ from the pattern. This is useful when want to treat metacharacters as regular characters to match the actual characters themselves.

      10. re.purge()

      • The purge function does not take any argument that simply clears the regular expression cache.

      Example:

      # Importing re module  
      import re  
      
      # Define some regular expressions  
      pattern1 = r'\d+'  
      pattern2 = r'[a-z]+'  
      
      # Use the regular expressions  
      print(re.search(pattern1, '123abc'))  
      print(re.search(pattern2, '123abc'))  
      
      # Clear the regular expression cache  
      re.purge()  
      
      # Use the regular expressions again  
      print(re.search(pattern1, '456def'))  
      print(re.search(pattern2, '456def'))

          Output:

          • After using, pattern1 and pattern2 to search for matches in the string ‘123abc’.
          • We have cleared the cache using re.purge().
          • We have again used pattern1 and pattern2 to search for matches in the string ‘456def’.
          • Since the regular expression cache has been cleared. The regular expressions are recompiled, and searching for matches in the ‘456def’ has been performed with the new regular expression object.

          Matching Versus Searching – re.match() vs. re.search()

          Python has two primary regular expression functions: match and search. The match function looks for a match only where the string starts, whereas the search function looks for a match everywhere in the string.

          CODE:

          # Import re module  
          import re
          
          # Sample string  
          line = "Learn Python through tutorials on shishirkant" 
          
          # Using match function to match 'through'
          match_object = re.match( r'through', line, re.M|re.I)
          if match_object:    
              print("match object group : ", match_object)    
          else:    
              print( "There isn't any match!!")    
          
          # using search function to search  
          search_object = re.search( r'through', line, re.M|re.I)    
          if search_object:    
              print("Search object group : ", search_object)    
          else:    
              print("Nothing found!!")  
          
          Output:
          There isn't any match!! 
          Search object group :

          The match function checks whether the string is starting with ‘through’ or not, and the search function checks whether there is ‘through’ in the string or not.

          ]]>
          4308
          Python Regular Expressions – I https://shishirkant.com/python-regular-expressions-i/?utm_source=rss&utm_medium=rss&utm_campaign=python-regular-expressions-i Thu, 28 Sep 2023 06:06:58 +0000 https://shishirkant.com/?p=4305 Introduction to the Python regular expressions

          Regular expressions (called regex or regexp) specify search patterns. Typical examples of regular expressions are the patterns for matching email addresses, phone numbers, and credit card numbers.

          Regular expressions are essentially a specialized programming language embedded in Python. And you can interact with regular expressions via the built-in re module in Python.

          The following shows an example of a simple regular expression:

          '\d'

          Code language: Python (python)

          In this example, a regular expression is a string that contains a search pattern. The '\d' is a digit character set that matches any single digit from 0 to 9.

          Note that you’ll learn how to construct more complex and advanced patterns in the next tutorials. This tutorial focuses on the functions that deal with regular expressions.

          To use this regular expression, you follow these steps:

          First, import the re module:

          import re

          Second, compile the regular expression into a Pattern object:

          p = re.compile('\d')

          Third, use one of the methods of the Pattern object to match a string:

          s = "Python 3.10 was released on October 04, 2021" 
          result = p.findall(s) 
          
          print(result)

          Output:

          ['3', '1', '0', '0', '4', '2', '0', '2', '1']

          The findall() method returns a list of single digits in the string s.

          The following shows the complete program:

          import re 
          
          p = re.compile('\d') 
          s = "Python 3.10 was released on October 04, 2021" 
          
          results = p.findall(s) 
          print(results)

          Besides the findall() method, the Pattern object has other essential methods that allow you to match a string:

          MethodPurpose
          match()Find the pattern at the beginning of a string
          search()Return the first match of a pattern in a string
          findall()Return all matches of a pattern in a string
          finditer()Return all matches of a pattern as an iterator

          Python regular expression functions

          Besides the Pattern class, the re module has some functions that match a string for a pattern:

          • match()
          • search()
          • findall()
          • finditer()

          These functions have the same names as the methods of the Pattern object. Also, they take the same arguments as the corresponding methods of the Pattern object. However, you don’t have to manually compile the regular expression before using it.

          The following example shows the same program that uses the findall() function instead of the findall() method of a Pattern object:

          import re 
          
          s = "Python 3.10 was released on October 04, 2021." 
          results = re.findall('\d',s) 
          print(results)

          Using the functions in the re module is more concise than the methods of the Pattern object because you don’t have to compile regular expressions manually.

          Under the hood, these functions create a Pattern object and call the appropriate method on it. They also store the compiled regular expression in a cache for speed optimization.

          It means that if you call the same regular expression from the second time, these functions will not need to recompile the regular expression. Instead, they get the compiled regular expression from the cache.

          Should you use the re functions or methods of the Pattern object?

          If you use a regular expression within a loop, the Pattern object may save a few function calls. However, if you use it outside of loops, the difference is very little due to the internal cache.

          The following sections discuss the most commonly used functions in the re module including search()match(), and fullmatch().

          search() function

          The search() function searches for a pattern within a string. If there is a match, it returns the first Match object or None otherwise. For example:

          import re 
          
          s = "Python 3.10 was released on October 04, 2021." 
          pattern = '\d{2}' 
          match = re.search(pattern, s) 
          print(type(match)) 
          print(match)
          Output:<class 're.Match'> 
          <re.Match object; span=(9, 11), match='10'>

          In this example, the search() function returns the first two digits in the string s as the Match object.

          Match object

          The Match object provides the information about the matched string. It has the following important methods:

          MethodDescription
          group()Return the matched string
          start()Return the starting position of the match
          end()Return the ending position of the match
          span()Return a tuple (start, end) that specifies the positions of the match

          The following example examines the Match object:

          import re 
          
          s = "Python 3.10 was released on October 04, 2021." 
          result = re.search('\d', s) 
          
          print('Matched string:',result.group()) 
          print('Starting position:', result.start()) 
          print('Ending position:',result.end()) print('Positions:',result.span())

          Output:

          Matched string: 3 
          Starting position: 7 
          Ending position: 8 
          Positions: (7, 8)

          match() function

          The match() function returns a Match object if it finds a pattern at the beginning of a string. For example:

          import re 
          
          l = ['Python', 
               'CPython is an implementation of Python written in C', 
               'Jython is a Java implementation of Python', 
                'IronPython is Python on .NET framework'] 
          
          pattern = '\wython' 
          for s in l: 
              result = re.match(pattern,s) 
              print(result)

          Output:

          <re.Match object; span=(0, 6), match='Python'> 
          None 
          <re.Match object; span=(0, 6), match='Jython'> 
          None

          In this example, the \w is the word character set that matches any single character.

          The \wython matches any string that starts with any sing word character and is followed by the literal string ython, for example, Python.

          Since the match() function only finds the pattern at the beginning of a string, the following strings match the pattern:

          Python 
          Jython is a Java implementation of Python

          And the following string doesn’t match:

          'CPython is an implementation of Python written in C' 
          'IronPython is Python on .NET framework'

          fullmatch() function

          The fullmatch() function returns a Match object if the whole string matches a pattern or None otherwise. The following example uses the fullmatch() function to match a string with four digits:

          import re 
          
          s = "2021" 
          pattern = '\d{4}' 
          result = re.fullmatch(pattern, s) 
          print(result)

          Output:

          <re.Match object; span=(0, 4), match='2019'>(python)

          The pattern '\d{4}' matches a string with four digits. Therefore, the fullmatch() function returns the string 2021.

          If you place the number 2021 at the middle or the end of the string, the fullmatch() will return None. For example:

          import re 
          
          s = "Python 3.10 released in 2021" 
          pattern = '\d{4}' 
          result = re.fullmatch(pattern, s) 
          print(result)

          Output:

          None

          Regular expressions and raw strings

          It’s important to note that Python and regular expression are different programming languages. They have their own syntaxes.

          The re module is the interface between Python and regular expression programming languages. It behaves like an interpreter between them.

          To construct a pattern, regular expressions often use a backslash '\' for example \d and \w . But this collides with Python’s usage of the backslash for the same purpose in string literals.

          For example, suppose you need to match the following string:

          s = '\section'

          In Python, the backslash (\) is a special character. To construct a regular expression, you need to escape any backslashes by preceding each of them with a backslash (\):

          pattern = '\\section'Code language: JavaScript (javascript)

          In regular expressions, the pattern must be '\\section'. However, to express this pattern in a string literal in Python, you need to use two more backslashes to escape both backslashes again:

          pattern = '\\\\section'Code language: JavaScript (javascript)

          Simply put, to match a literal backslash ('\'), you have to write '\\\\' because the regular expression must be '\\' and each backslash must be expressed as '\\' inside a string literal in Python.

          This results in lots of repeated backslashes. Hence, it makes the regular expressions difficult to read and understand.

          A solution is to use the raw strings in Python for regular expressions because raw strings treat the backslash (\) as a literal character, not a special character.

          To turn a regular string into a raw string, you prefix it with the letter r or R. For example:

          import re 
          
          s = '\section' 
          pattern = r'\\section' 
          result = re.findall(pattern, s) 
          
          print(result) 
          Output:['\\section']

          Note that in Python ‘\section’ and ‘\\section’ are the same:

          p1 = '\\section' 
          p2 = '\section' 
          
          print(p1==p2) # true

          In practice, you’ll find the regular expressions constructed in Python using the raw strings.

          Summary

          • A regular expression is a string that contains the special characters for matching a string with a pattern.
          • Use the Pattern object or functions in re module to search for a pattern in a string.
          • Use raw strings to construct regular expression to avoid escaping the backslashes.
          ]]>
          4305