More on strings¶

a = "That's"
b = "fine"

Operator	Description	Input	Output
+	Concatenation	a + b	That'sfine
*	Repetition	2 * b	finefine
[]	Slice	a[1]	h
[:]	Range Slice	a[1:4]	hat
in	Membership	'a' in a	True
not in	Membership	'a' not in b	True
r/R	Raw String suppresses escape chars	r'\n'	\n
%	Format: %s %d %o (octal) %x (hex) %f %e %E %g %G

x = 12.345
y = 12.3e10
print("f: %f, e: %e, g: %g" % (x, x, x) )
print("f: %f, e: %e, g: %g" % (y, y, y) )

f: 12.345000, e: 1.234500e+01, g: 12.345
f: 123000000000.000000, e: 1.230000e+11, g: 1.23e+11

Upper and lower case¶

hi = "how ARE you"

Method	Description	Input	Output
.capitalize()	first letter into upper case	hi.capitalize()	How are you
.lower()	lower case	hi.lower()	how are you
.upper()	upper case	hi.upper()	HOW ARE YOU
.title()	every word capitalized	hi.title()	How Are You

hi = "how ARE you"

hi.title()

'How Are You'

hi.upper()

'HOW ARE YOU'

Method	Boolean value
.isalnum()	alphanumeric characters (no symbols)?
.isalpha()	alphabetic characters (no symbols)?
.islower()	lower case?
.isnumeric()	numeric characters?
.isspace()	whitespace characters?
.istitle()	is in title case?
.isupper()	upper case?

alnum = "23 apples"
num = "-1234"
title = "Big Apple"
print(alnum.isalnum(), num.isnumeric(), title.istitle())

False False True

alnum = "23apples"
num = "1234"
white = "   \t   \n\n   "
print(alnum.isalnum(), num.isnumeric(), white.isspace())

True True True

join(), strip(), replace(), split() methods¶

hi = "how ARE you"
seq = ['h', 'o', 'w']  #OR seq = 'h', 'o', 'w'
wh = "\t this \n "

Method	Description	Input	Output
.join()	concatenates with separator string	" < ".join(seq)	h < o < w
.lstrip()	removes leading whitespaces	wh.lstrip()	"this \n "
.rstrip()	removes trailing whitespaces	wh.rstrip()	"\t this"
.strip()	performs lstrip() and rstrip()	wh.strip()	"this"
.replace(old, new [, m])	replaces old with new at most m times	hi.replace("o", "O")	hOw ARE yOu
.split(s[,m])	splits at s max m times, returns list	hi.split()	[ "how", "ARE", "you" ]

s = " < "
seq = "a", "b", "c"   # a sequence of strings (tuple, list)
print (s.join( seq ))

a < b < c

hi = "how ARE you today"
hi.replace("o", "O", 2)

'hOw ARE yOu today'

hi.split() # if no argument is given, separate at white spaces

['how', 'ARE', 'you', 'today']

hi.split("o", 2)   # separate at the first two occurances

['h', 'w ARE y', 'u today']

s = '.,.dots or commas.,.,...'
print(s.strip('.,'))
print(s.rstrip('.,'))
print(s.lstrip('.,'))

dots or commas
.,.dots or commas
dots or commas.,.,...

Formatting¶

Alignment*¶

s = "where"

print('0123456789'*3)
print(s.center(30))
print(s.rjust(30))
print(s.ljust(30))

012345678901234567890123456789
            where             
                         where
where

The parameter of the method tells the final width.

You can print a table nicely:

tabular = [["First row", -2, -310], ["Second row", 3, 1], ["Third row",-321, 11]]
tabular_string = ""
for row in tabular:
    tabular_string += row[0].ljust(13)
    for i in range(1, len(row)):
        tabular_string += str(row[i]).rjust(7)
    tabular_string += "\n" 
print(tabular_string)

First row         -2   -310
Second row         3      1
Third row       -321     11

`format` method¶

The object is the formatting string and the parameters are the things to subtitute.

The numbers in the brackets mark the parameters.

'{0}-{1}-{2} {0}, {1}, {2}, {0}{0}{0}'.format('X', 'Y', 'Z')

'X-Y-Z X, Y, Z, XXX'

The format marker "{ }" can have optional formatting instructions: {number:optional}

optional	Meaning
d	decimal
b	binary
o	octal
x, X	hex, capital HEX
f, F	float
e, E	exponential form: something times 10 to some power
<	left justified
>	right justified
^	centered
c^	centered but with a character `'c'` as padding

print("01234 01234 01234 0123456789")
print('{0:5} {1:5d} {2:>5} {3:*^10}'.format('0123', 1234, '|', 'center'))

01234 01234 01234 0123456789
0123   1234     | **center**

"int {0:d}, hex {0:x} {0:X}, oct {0:o}, bin {0:b}".format(42)

'int 42, hex 2a 2A, oct 52, bin 101010'

"{0}, {0:e}, {0:f}, {0:8.4f}, {0:15.1f}".format(-12.345)

'-12.345, -1.234500e+01, -12.345000, -12.3450,           -12.3'

You can also name the parameters, it is more convinient then indices.

'The center is: ({x}, {y})'.format(x=3, y=5)

'The center is: (3, 5)'

x1 = 3; y1 = 4
print('The center is: ({x}, {y})'.format(x=x1, y=y1))

The center is: (3, 4)

tabular = [["First row", -2, -310], ["Second row", 3, 1], ["Third row",-321, 11]]
table_string = ""
for row in tabular:
    table_string += "{0:_<13}".format(row[0])
    for i in range(1, len(row)):
        table_string += "{0:7d}".format(row[i])
    table_string += "\n"
print(table_string)

First row____     -2   -310
Second row___      3      1
Third row____   -321     11

Regular expressions¶

A regular expression (regex, regexp) is a sequence of characters that define a search pattern. It is used in different programming languages and text editors. The aim is to recognize some string with given properties (like email-address, date, roman numeral, IP-address,...)

Metacharacters¶

The next characters has special meaning: . ^ $ * + ? { } [ ] ( ) \ |

Character	Description	Example	Fits to
[]	set of characters	"[abcd]"	a, b,...
[a-z]	an intervall	"[0-9a-fA-F]"	B, 5,...
[^chars]	not the listed chars	"[^qx]"	a, b, c,...
\	to escape special characters	"\s"	space, tab,...
.	any character (except newline)	"Wh..."	Where, Whose,...
^	beginning	"^Once"	Once.....
$	ends	"finished.\$"	.....finished.
?	zero or one occurrences	"colou?r"	color, colour
*	zero or more occurrences (greedy)	"woo*w"	woooooow
+	one or more occurrences (greedy)	"wo+w"	wow, woow
*?	zero or more (lazy)	"w.*?w"
+?	one or more (lazy)	"w.+?w"
{n}	exactly n occurrences	"al{2}e{2}"	allee
{n,}	at least n occurrences	"oh{3,}"	ohhhhhh
{,n}	at most n occurrences	"woo{,3}w"	woooow, wooow, woow, wow
{n,m}	at least n at most m occurrences	"wo{1,3}w"	wow,...
\|	either or	"H(a\|ae\|ä)ndel"	Handel, Haendel, Händel
()	capture and group

Special sequences¶

Character	Description	Examples
\b	beginning or end of a word	r"\bis" r"st\b"
\B	NOT the beginning or the end of a word	r"\Bis" r"st\B"
\d	digits (0-9)	r"\d\d-\d\d"
\D	NOT a digits	r"\d\d-\D"
\s	white space character	r"for\sever"
\S	NOT a white space	r"\S"
\w	any word character (a to Z, 0-9, and _)	r"\s\w\w\w\s"
\W	NOT a word character	r"\Wword\W"

Online with explanations: https://regex101.com/#python or https://extendsclass.com/regex-tester.html#python or https://www.regextester.com/ or with cheatsheet: https://pythex.org/.

Examples can be found in https://www.programiz.com/python-programming/regex

Examples¶

Two-digit numbers divisible by 4: [02468][048]|[13579][26]
String between ' or " characters: (['"])[^\1]*\1
Any floatingpoint number: ^[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?$
Roman numerals with capital letters: M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})

Tasks:¶

HTML color code of 6 hexadecimal numbers like A34DC8
Date in the form yyyy-mm-dd

Regular expressions (RegEx) in python¶

You have to import the functions of the modul re, because they are not default. Put this line in the beginning of your code.

import re

The functions in the module re:

Function	Description
findall(p, s)	returns a list containing all matches of p in s
search(p, s)	returns a "match object" if there is a match of p in s
split(p, s)	split at each match of p in s and returns a list
sub(p, n, s[, m])	replaces all or m matches of p with a new string n in s
finditer(p, s)	returns an iterable object on the match objects of p in s

Match object defines where is the pattern in the string and what is it exactly. These info can be read out by the .span() and .group() methods.

s = "confirmation"
p = ".i"
print(re.findall(p, s))

['fi', 'ti']

print(re.search(p, s))

<_sre.SRE_Match object; span=(3, 5), match='fi'>

x = re.search(p, s)
print(x.span())
print(x.group())

(3, 5)
fi

print(re.split(p, s))

['con', 'rma', 'on']

Since backslash and other special characters can be used in a RegEx pattern, you have to be careful with them.

The best if you use a so called raw string as pattern. In this format one backslash means actually one backspash. You don't have to escape the backslash.

If you put an r in front of the string, then it is in a raw format.

x = re.finditer(p, s)
for y in x:
    print(y.group())

fi
ti

Exercise: Triple the quotation mark in a string. Anyone!

st = """This "word" is an 'other' one."""
p = """(['"])"""
print(re.sub(p, r"\1\1\1", st))

This """word""" is an '''other''' one.

s = "This 'string' has two 'quoted' words"
p1 = "'.*'"    # greedy    
p2 = "'.*?'"   # lazy
print(re.findall(p1, s), "  --> greedy")
print(re.findall(p2, s), "  --> lazy")

["'string' has two 'quoted'"]   --> greedy
["'string'", "'quoted'"]   --> lazy