Regular Expression (Regex) in Python

regex in python

Regex(Regular Expression) is a sequence of characters that define a search pattern. Regex can be used to check if a string contains the specified search pattern or find all the occurance of a Search pattern. The Idea of the Regular Expression first was invented by the American mathematician Stephen Cole Kleene who described regular language.

Stephen Cole Kleene

in this tutorial I try to explain regex in a simple way be examples.

Suppose we want to find all hash tags and callouts from tweet below :

First Example : We want to Extract all the words from the Tweet above, The Text of previous tweet is :

Ever wanted to sail the #SeaOfThieves?
With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD

As you See Words are seprated by space. So, We Split words by space charactor.

but Don’t forget first import needed package to use Regular Expression. we use re package in all of our codes in this page.

import re

After importing re Package then use code below to split sentence by space.

To avoid any confusion while dealing with regular expressions, we would use Raw Strings as r’expression’.

text = '''Ever wanted to sail the #SeaOfThieves?
With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD'''

allwords = re.split(r' ', text)
print(allwords)

this code separates words by comma. Output:

['Ever', 'wanted', 'to', 'sail', 'the', '#SeaOfThieves?\nWith', 'custom', '@Xbox', 'backgrounds', 'for', 'video', 'conferences,', 'now', 'you', 'can:', 'https://msft.it/6005TaGYD']

in the output above you see one word is “#SeaOfThieves?\nWith”. Do you what is \n in that word?

That is newline character, Means word after \n will be in the new line. in text above the word “with” is in the new line.

If you want Each Line :

text = '''Ever wanted to sail the #SeaOfThieves?
With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD'''

allwords = re.split(r'\n', text)
print(allwords)

this code separates Lines by comma. Output:

['Ever wanted to sail the #SeaOfThieves?', 'With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD']

a perfect program to split words correctly must consider space and \n and other white spaces, too. Beneath is list of patterns that show white spaces :

\nNew line
\tTab
\rCarriage return
\fForm feed
\vVertical tab

Now, if we want to Split Words by any whitespace above. We can use [ ]

[] – Square brackets

Square brackets specifies a set of characters you wish to match. Examples :

The re.findall() function is used to find all the matches for the pattern in the string.

str = 'Welcome to Code tips Academy Web Site'
matches = re.findall(r'[abc]', str)
print(matches)

#Output: ['c', 'c', 'a', 'b']

matches = re.findall(r'[abc A]', str)
print(matches)

#Output: ['c', ' ', ' ', ' ', ' ', 'A', 'c', 'a', ' ', 'b', ' ']

Then, We use [ \t\n\r\f\v], to Split words in tweet above by any white Space:

text = '''Ever wanted to sail the #SeaOfThieves?
With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD'''

allwords = re.split(r'[ \t\n\r\f\v]', text)
print(allwords)

output :

['Ever', 'wanted', 'to', 'sail', 'the', '#SeaOfThieves?', 'With', 'custom', '@Xbox', 'backgrounds', 'for', 'video', 'conferences,', 'now', 'you', 'can:', 'https://msft.it/6005TaGYD']

As a matter of fact we can use \s instead of [ \t\n\r\f\v]

text = '''Ever wanted to sail the #SeaOfThieves?
With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD'''

allwords = re.split(r'\s', text)
print(allwords)

As It’s shown below result is the same as splitting by [ \t\n\r\f\v]

['Ever', 'wanted', 'to', 'sail', 'the', '#SeaOfThieves?', 'With', 'custom', '@Xbox', 'backgrounds', 'for', 'video', 'conferences,', 'now', 'you', 'can:', 'https://msft.it/6005TaGYD']

Popular Patterns in Regex

SymbolDescription
.dot matches any character except newline
\wmatches any word character i.e letters, alphanumeric, digits and underscore (_)
\Wmatches non word characters
\dmatches a single digit
\Dmatches a single character that is not a digit
\smatches any white-spaces character like \n\t, spaces
\Smatches single non white space character
[abc]matches single character in the set i.e either match ab or c
[^abc]match a single character other than ab and c
[a-z]match a single character in the range a to z.
[a-zA-Z]match a single character in the range a-z or A-Z
[0-9]match a single character in the range 09
^match start at beginning of the string
$match start at end of the string
+matches one or more of the preceding character (greedy match).
*matches zero or more of the preceding character (greedy match).
a|bMatches either a or b.
re{n,m}Matches at least n and at most m occurrences of preceding
expression.
re{n}Matches exactly n number of occurrences of preceding
expression.
re{ n,}Matches n or more occurrences of preceding expression.
re?Matches 0 or 1 occurrence of preceding expression.
re+Matches 1 or more occurrence of preceding expression.
re*Matches 0 or more occurrences of preceding expression.
(re)Groups regular expressions and remembers matched text (Only the regular Expression before and after parentheses will be matched but only the regular Expression inside the parentheses will be shown to us as input. ).
(?imx)Temporarily toggles on i, m, or x options within a regular
expression. If in parentheses, only that area is affected.
(?: re)Groups regular expressions without remembering matched text.

Now, We want to find all the hash tags from Tweet above

text = '''Ever wanted to sail the #SeaOfThieves?
With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD'''

hashTagResult = re.findall(r'#[a-zA-Z0-9_]+', text)
print(hashTagResult)
#[a-zA-Z0-9_]+ means find all the text that 
  • Starts with #
  • There is any alphabet in lower case from a to z, or any alphabet in uppercase from a to z, or a digit from 0 to 9 or _ after #
  • + means anything inside [ ] must be at least once.

output

['#SeaOfThieves']

If we want to extract all the callouts

text = '''Ever wanted to sail the #SeaOfThieves?
With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD'''

atSignResult = re.findall(r'@[a-zA-Z0-9_]+', text)
print(atSignResult)
@[a-zA-Z0-9_]+ means find all the text that 
  • Starts with @
  • There is any alphabet in lower case from a to z, or any alphabet in uppercase from a to z, or a digit from 0 to 9 or _ after @
  • + means anything inside [ ] must be at least once.

output

['@Xbox']

Example : Get All word with at least5 characters.

text = '''Ever wanted to sail the #SeaOfThieves?
With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD'''

allwords = re.findall(r'\w{5,}', text)
print(allwords)

output

['wanted', 'SeaOfThieves', 'custom', 'backgrounds', 'video', 'conferences', 'https', '6005TaGYD']

In next Example we extract all the Numbers from Text.

sampleText = 'there are 54 apples here. Tempertaure is -23. I have 2124 in my account.'
sampleTextResult = re.findall(r'[-+]?[0-9]+', sampleText)
print(sampleTextResult)

[-+]?[0-9]+ means start of number can be – or + sign ( ? means having – or + is optional and not more that one character is + or -). then I must have at least 1 digit.

Output

['54', '-23', '2124']

To Extract any Urls from Tweet above

urlRegex = '''http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'''
urlRegexResult = re.findall(urlRegex, text)
print(urlRegexResult)

(?:%[0-9a-fA-F][0-9a-fA-F]) this matches hexadecimal character codes in URLs e.g. %2f for the ‘/’ character.

[s]? means ‘s’ character is optional,  but that’s because of the ? not of the brackets.

Output

['https://msft.it/6005TaGYD']

Now, If we want to Extract all the hashtags and callouts of our sample Tweet.

text = '''Ever wanted to sail the #SeaOfThieves?
With custom @Xbox backgrounds for video conferences, now you can: https://msft.it/6005TaGYD'''

result = re.findall(r'(?:@[a-zA-Z0-9_]+|#[a-zA-Z0-9_]+)', text)
print(result)

Output

['#SeaOfThieves', '@Xbox']

You can also check if a input string has correct format or not by using Regular Expression.

text = 'codetipsacademy@gmail.com'

result = re.match(r'^[\w\.\+\-]+\@[\w]+\.[a-z]{2,3}$',text)
print(result)

Result :

<re.Match object; span=(0, 25), match='codetipsacademy@gmail.com'>

As you See I used ^ and $ in the pattern, this means that text must be start by pattern after ^ and before $, nothing else must be in the text string. if text Contains Some other string in addition to email matching will fail.

text = 'My email is codetipsacademy@gmail.com'

result = re.match(r'^[\w\.\+\-]+\@[\w]+\.[a-z]{2,3}$',text)
print(result)

Result

None

So, Use ^ and $ usually for text Matching not for Searching.

Send Email By C#

Sending Email in C#

Sending Automatic Email from Application to Users is a necessary task, in this post I want to show you how to send Email by using c#.

My Solution is Automatically Connect to a smtp mail server and use that to send email. To connect to a SMTP Server we need connection information from that server. In this post, I want to connect to gmail SMTP Server. below you see Gmail SMTP Server Connection Information :

  1. smtp Server : smtp.gmail.com
  2. smtp port : 587

now, I use my gmail Credential to send Email.

See Code below :

try
            {
                MailMessage message = new MailMessage();
                SmtpClient smtp = new SmtpClient();
                message.From = new MailAddress("codetipsacademy@gmail.com");
                message.To.Add(new MailAddress("codetips@codetipsacademy.com"));
                message.Subject = "Test Subject";
                message.BodyEncoding = Encoding.UTF8;                   // Support UTF-8 Encoding 
                message.IsBodyHtml = true;                              // Message Body is Html  
                message.Body = "<b>Test Email</b><br /><p style='color:red;'>This is for Test</p>";
                smtp.Port = 587;                                        // Gmail SMTP SSL Support Port    
                smtp.Host = "smtp.gmail.com";                           // Gmail SMTP Server 
                smtp.EnableSsl = true;
                smtp.UseDefaultCredentials = false;
                smtp.Credentials = new NetworkCredential("[Gmail Address]", "[Gmail Password]");                //Gmail Credential to Login
                smtp.DeliveryMethod = SmtpDeliveryMethod.Network;
                smtp.Send(message);
                Console.WriteLine("Success");
                Console.ReadLine();
            }
            catch (Exception ex) {
                Console.WriteLine("Error");
            }

In code above fill [Gmail Address] with your Gmail Address and [Gmail Password] with your Gmail Password.

Email Sent Automatically by c#
Recived Email

Due to the fact that Gmail block connection from apps by default. you should change this behaviour. So, in google account management setting, go to Security section, then, Allow “Less Secure App access”. see image below for help

Bulk insert in SQL Server from txt or csv file

Bulk insert in SQL Server from txt or csv file

Some Times that ‘s needed to bulk insert data from a text or csv file to sql server. So, as a simple example we have Products table below in our data base.

and we create a txt file and a csv file that fill the data in order of fields above field be field and row by row.

the txt file is below

File 1.txt

as you see above each row contains values for a record and values in each field of a record is seperated by , and the last field is id that the value in this field will be generated by sql server automatically. then, I entered the value 1 for id of all the records, due to the fact that id must have value to sql code could run correctly and read txt file, but after reading the text file when inserting the data the value of id will be set automatically by the sql server.

below is the sql code to insert the records in the txt file. FIELDTERMINATOR is the a character that seprates values for fields. LastRow is the last last row that must be inserted in the sql.

BULK INSERT dbo.products
FROM 'd:\1.txt'
WITH ( FIELDTERMINATOR = ',' , LastRow = 3  );

also, I create csv file to bulk Insert the data :

File 1.csv

Like txt file, each row contains values for a record and values in each field of a record is seperated by , and the last field is id that the value in this field will be generated by sql server automatically. then, I entered the value 1 for id of all the records, due to the fact that id must have value to sql code could run correctly and read txt file, but after reading the text file when inserting the data the value of id will be set automatically by the sql server.

below is the sql code to insert the records in csv file. firstrow is the first row that must be inserted in the table, as you see picture above that shows file “1.csv” first row is the column header, so rows that will be inserted in the sql starts from row 2 of the “1.csv”. FIELDTERMINATOR is the a character that seprates values for fields. LastRow is the last last row that must be inserted in the sql.

BULK INSERT dbo.products
FROM 'd:\1.csv'
WITH ( firstrow=2 , FIELDTERMINATOR = ',' , LastRow = 3  );

Real Senario with large amount of Rows

in real scenarios we do bulk insert for large number of rows, a problem that may be happen is if an error takes place in the process, SQL Server will rollback the whole bulk insert process. So, a solution is the to use batchsize parameter and devide the process in some parts, if an error happens only the part that rises error will rollback.

Therefore, We create Sales table below in our data base

Sales table

Now, we want to Insert 1.500.000 recordes from csv file that we downloaded from this website. See Image below to download the csv file.

As a matter of fact, csv file is very large (187 MB), and There is not any ID field there.

Our Table Has ID field, but The csv file does not have that. Consequently, If we run bulk insert query, It will fail. To Solve this issue, we Should Create a temporary table with columns like csv file (without ID Column), then we do bulk insert csv file on the temporary table. then we should insert the data in our Permanent sales table from the temporary table.

Accordingly, the code to create Sales temporary table is below.

CREATE TABLE #Sales(
   [Region] [varchar](50) ,
   [Country] [varchar](50) ,
   [Item Type] [varchar](50) NULL,
   [Sales Channel] [varchar](50) NULL,
   [Order Priority] [varchar](50) NULL,
   [Order  Date]  datetime,
   [Order ID] bigint NULL,
   [Ship Date] datetime,
   [Units Sold]  float,
   [Unit Price] float,
   [Unit Cost] float,
   [Total Revenue] float,
   [Total Cost]  float,
   [Total Profit] float
   )

The result of running code above is #sales creating temporary table

Sales temporary Table

Afterwards, I should run bulk insert on the #sales Temporary Table. But a Run that on batches of 300,000 records. Due to the fact that If an error Arise only one batch rollback not all the records. See code below :

BULK INSERT #Sales
FROM 'd:\1500000 Sales Records.csv'
WITH ( firstrow=2 , FIELDTERMINATOR = ',' , ROWTERMINATOR='\n' , batchsize=300000   );

The result of running query above is underneath

Now, by query beneath the records will be inserted in the sales Table.

insert into Sales
select * from #Sales

As The result, If you have look on the Sales Table (Picture Below), you see that ID Column got values for It ‘s records automatically by the system.

Sales Table with 1.500.000 records

Check Constraints on bulk Insert

By default, bulk insert process ignores any constraint check and foreign key constraints, but It has some exceptions. According to the Microsoft documentation “UNIQUE and PRIMARY KEY constraints are always enforced. So, any Other Constraints will not check.

For instance, We create goods Table with constraint ([price]<=(5000)), See Picture beneath

On the condition, that you run bulk insert, constraints above will be ignored.

See file 1.txt file that price for cheese is 12000, which is against constraint,

file 1.txt

see the result of bulk insert that constraint will be ignored.

bulk insert ran successfully. Have a look at a Goods Table and see the Price constraint is ignored.

By Query beneath you can see that CK_Goods is ignored.

SELECT is_not_trusted ,* FROM sys.check_constraints where name='CK_Goods'

To force bulk insert to check all the constraints we should use check_constraints attribute.

BULK INSERT dbo.goods
FROM 'd:\1.txt'
WITH ( FIELDTERMINATOR = ',' , LastRow = 3 , check_constraints );

The result of above code

As you see; Now, the constraint check is working.

BackUP a Sql Data Base by C# Code

Backup-Sql-Data-Base-by-C#

There some ways to back Up a Data base, in this project a get a full back up from a database by using C#. However; we can customize the code to only backup data or exclude some tables to backup.
This Back up will Contain all Schema and Data together and generate a sql Code for us that
by Executing that code all the Data and Schema will be created for us.
So, an Important tip is that to restore a data base be using the generated code first Create an empty Data base then Execute the Sql Code.

To write this code first download nuget Microsoft.SqlServer.SqlManagementObjects by typing code below in Package Manager Console of Visual Studio.

Install-Package Microsoft.SqlServer.SqlManagementObjects 

then, add namespace Microsoft.SqlServer.Management.Smo (Code below)

using Microsoft.SqlServer.Management.Smo;

The C# Code that will Back UP a data base for us is below.

string outputFileName = @"d:/4.sql";
            StringBuilder sb = new StringBuilder();
            Server srv = new Server(new Microsoft.SqlServer.Management.Common.ServerConnection("[Server IP]", "[Sql User]", "[Sql Password]"));
            Database dbs = srv.Databases["EShop"];                // EShop is The Database that I want to backUP
            ScriptingOptions options = new ScriptingOptions();
            options.ScriptData = true;
            options.ScriptDrops = false;
            options.FileName = outputFileName;
            options.EnforceScriptingOptions = true;
            options.ScriptSchema = true;
            options.IncludeHeaders = true;
            options.AppendToFile = true;
            options.Indexes = true;
            options.WithDependencies = true;
            options.DriAll = true;

            foreach (Table tbl in dbs.Tables)
            {
                tbl.EnumScript(options);
            }

in the code below Input the Sql Connection of Data Base that you want to create back UP Sql Code of that.

ServerConnection("[Server IP]", "[Sql User]", "[Sql Password]")

and in the code below choose the output Sql File

        string outputFileName = @"d:/4.sql";