More on strings¶

Basics¶

a = "That's"
b = "fine"

Operator	Description	Input	Output
+	Concatenation	a + b	That'sfine
*	Repetition	2 * b	finefine
[]	Slice	a[1]	h
[:]	Range Slice	a[1:4]	hat
in	Membership	'a' in a	True
not in	Membership	'a' not in b	True
r/R	Raw String suppresses escape chars	r'\n'	\n

String methods¶

Upper and lower case methods¶

hi = "how ARE you"

Method	Description	Input	Output
.capitalize()	first letter into upper case	hi.capitalize()	How are you
.lower()	lower case	hi.lower()	how are you
.upper()	upper case	hi.upper()	HOW ARE YOU
.title()	every word capitalized	hi.title()	How Are You

hi = "how ARE you"

hi.title()

'How Are You'

hi.upper()

'HOW ARE YOU'

Boolean string methods¶

Method	Boolean value
.isalnum()	alphanumeric characters (no symbols)?
.isalpha()	alphabetic characters (no symbols)?
.islower()	lower case?
.isnumeric()	numeric characters?
.isspace()	whitespace characters?
.istitle()	is in title case?
.isupper()	upper case?

alnum = "23 apples"
num = "-1234"
title = "Big Apple"
print(alnum.isalnum(), num.isnumeric(), title.istitle())

False False True

alnum = "23apples"
num = "1234"
white = "   \t   \n\n   "
print(alnum.isalnum(), num.isnumeric(), white.isspace())

True True True

String modification methods – join(), strip(), replace(), split()¶

hi = "how ARE you"
seq = ['h', 'o', 'w']  #OR seq = 'h', 'o', 'w'
wh = "\t this \n "     # white spaces: space, tab, new line,...

Method	Description	Input	Output
.join()	concatenates with separator string	" < ".join(seq)	h < o < w
.lstrip()	removes leading whitespaces	wh.lstrip()	"this \n "
.rstrip()	removes trailing whitespaces	wh.rstrip()	"\t this"
.strip()	performs lstrip() and rstrip()	wh.strip()	"this"
.replace(old, new[, m])	replaces old with new at most m times	hi.replace("o", "O")	hOw ARE yOu
.split(s[, m])	splits at s max m times, returns list	hi.split()	["how", "ARE", "you"]

s = " < "
seq = "a", "b", "c"   # a sequence of strings (tuple, list)
print(s.join(seq))

a < b < c

hi = "how ARE you today"
hi.replace("o", "O", 2)

'hOw ARE yOu today'

hi.split() # if no argument is given, separate at white spaces

['how', 'ARE', 'you', 'today']

hi.split("o", 2)   # separate at the first two occurances

['h', 'w ARE y', 'u today']

s = '.,.dots or commas.,.,...'
print(s.strip('.,'))
print(s.rstrip('.,'))
print(s.lstrip('.,'))

dots or commas
.,.dots or commas
dots or commas.,.,...

Formatting¶

Old style formatting: the % operator¶

For positional formatting the % operator can be used easely:

name = "Lucy"
"Hi %s!" % name

The possibilities are summarized in the next table:

%char	short for	example	output
%s	string	"Hi %s" % "Joe"	"Hi Joe"
%d	digits: 0123456789	"%d is prime" % 7	"7 is prime"
%o	octal: 01234567	"8 = octal %o" % 8	"8 = octal 10"
%x	hex: 0123456789abcdef	"10 = hex %x" % 10	"10 = hex a"
%X	hex: 0123456789ABCDEF	"10 = hex %X" % 10	"10 = hex A"
%f, %F	floating point	"1.2 = %f" % 1.2	'1.2 = 1.200000'
%e, %E	exponential	"12 = %e" % 12	'12 = 1.200000e+01'
%g, %G	general	"1.2e2 = %g" % 1.2e2	'1.2e2 = 120'

x = -12.345
y = 12.3e10
print("f: %f, e: %e, g: %g" % (x, x, x) )
print("f: %f, e: %e, g: %g" % (y, y, y) )

f: -12.345000, e: -1.234500e+01, g: -12.345
f: 123000000000.000000, e: 1.230000e+11, g: 1.23e+11

"Hi %s! You have got %d points!" % ("Joe", 5)

'Hi Joe! You have got 5 points!'

name = "Lucy"
points = 10
"Hi %s! You have got %d points!" % (name, points)

'Hi Lucy! You have got 10 points!'

The `format` method¶

The object is the formatting string and the parameters are the things to subtitute.

The numbers in the brackets mark the parameters.

'{0}-{1}-{2} {0}, {1}, {2}, {0}{0}'.format('X', 'Y', 'Z')

'X-Y-Z X, Y, Z, XX'

The format marker "{ }" can have optional formatting instructions: {number:optional}

optional	Meaning
d	decimal
b	binary
o	octal
x, X	hex, capital HEX
f, F	float
e, E	exponential form: something times 10 to some power
<	left justified
>	right justified
^	centered
c^	centered but with a character `'c'` as padding

print("01234 01234 01234 0123456789")
print('{0:5} {1:5d} {2:>5} {3:*^10}'.format('0123', 1234, '|', 'center'))

01234 01234 01234 0123456789
0123   1234     | **center**

"int {0:d}, hex {0:x} {0:X}, oct {0:o}, bin {0:b}".format(42)

'int 42, hex 2a 2A, oct 52, bin 101010'

"{0}, {0:e}, {0:f}, {0:8.4f}, {0:8.1f}".format(-12.345)

'-12.345, -1.234500e+01, -12.345000, -12.3450,    -12.3'

You can also name the parameters, it is more convinient then indices.

'The center is: ({x}, {y})'.format(x=3, y=5)

'The center is: (3, 5)'

x1 = 3; y1 = 4
print('The center is: ({x}, {y})'.format(x=x1, y=y1))

The center is: (3, 4)

tabular = [["1st row", -2, -31], ["Second", 3, 1], ["Third", -32, 11]]
table_string = ""
for row in tabular:
    table_string += "{0:_<11}".format(row[0])
    for i in range(1, len(row)):
        table_string += "{0:5d}".format(row[i])
    table_string += "\n"
print(table_string)

1st row____   -2  -31
Second_____    3    1
Third______  -32   11

Formatted strings – f-strings (new from Python 3.6)¶

An f-string is a string that is prefixed with 'f' or 'F'. These strings may contain replacement fields, which are expressions delimited by curly braces {}. The previous string literals always have constant values, the f-strings are expressions evaluated at run time!

name = "Lucy"
points = 100
f"Hi {name}! You have {points} points!"

'Hi Lucy! You have 100 points!'

a, b = 3, 5
f"2({a} + {b}) = {2*(a+b)}"

'2(3 + 5) = 16'

width = 8
precision = 4
value = -123.4567
f"result: {value:{width}.{precision}}"  # nested fields

'result:   -123.5'

num = 300
f'{num}, {num:x}, {num:o}, {num:b}, {num:10}, {num:10X}'

'300, 12c, 454, 100101100,        300,        12C'

num, numb, numo, numx = 12, 0b10010, 0o371, 0xabc
f"{num}, {numb}, {numo}, {numx}"

'12, 18, 249, 2748'

Be careful with the quotation marks!

d = {"one": 1, "two": 2}
f"{d['one']} is one" # f"{d["one"]} is one" would give ERROR

'1 is one'

Alignment methods*¶

s = "where"

print('0123456789'*3)
print(s.center(30))
print(s.rjust(30))
print(s.ljust(30))

The parameter of the method tells the final width.

You can print a table nicely:

tabular_string = ""
for row in tabular:
    tabular_string += row[0].ljust(11)
    for i in range(1, len(row)):
        tabular_string += str(row[i]).rjust(5)
    tabular_string += "\n" 
print(tabular_string)

Regular expressions¶

A regular expression (regex, regexp) is a sequence of characters that define a search pattern. It is used in different programming languages and text editors. The aim is to recognize some string with given properties (like email-address, date, roman numeral, IP-address,...)

Metacharacters¶

The next characters has special meaning: . ^ $ * + ? { } [ ] ( ) \ |

Character	Description	Example	Fits to
[]	set of characters	"[abcd]"	a, b,...
[a-z]	an intervall	"[0-9a-fA-F]"	B, 5,...
[^chars]	not the listed chars	"[^qx]"	a, b, c,...
\	to escape special characters	"\s"	space, tab,...
.	any character (except newline)	"Wh..."	Where, Whose,...
^	beginning	"^Once"	Once.....
$	ends	"finished.\$"	.....finished.
?	zero or one occurrences	"colou?r"	color, colour
*	zero or more occurrences (greedy)	"woo*w"	woooooow
+	one or more occurrences (greedy)	"wo+w"	wow, woow
*?	zero or more (lazy)	"w.*?w"
+?	one or more (lazy)	"w.+?w"
{n}	exactly n occurrences	"al{2}e{2}"	allee
{n,}	at least n occurrences	"oh{3,}"	ohhhhhh
{,n}	at most n occurrences	"woo{,3}w"	woooow, wooow, woow, wow
{n,m}	at least n at most m occurrences	"wo{1,3}w"	wow,...
\|	either or	"H(a\|ae\|ä)ndel"	Handel, Haendel, Händel
()	capture and group

Special sequences¶

Character	Description	Examples
\b	beginning or end of a word	r"\bis" r"st\b"
\B	NOT the beginning or the end of a word	r"\Bis" r"st\B"
\d	digits (0-9)	r"\d\d-\d\d"
\D	NOT a digits	r"\d\d-\D"
\s	white space character	r"for\sever"
\S	NOT a white space	r"\S"
\w	any word character (a to Z, 0-9, and _)	r"\s\w\w\w\s"
\W	NOT a word character	r"\Wword\W"

Online with explanations: https://regex101.com/#python or https://extendsclass.com/regex-tester.html#python or https://www.regextester.com/ or with cheatsheet: https://pythex.org/.

Examples can be found in https://www.programiz.com/python-programming/regex

Examples¶

Two-digit numbers divisible by 4: [02468][048]|[13579][26]
String between ' or " characters: (['"])[^\1]*\1
Any floatingpoint number: ^[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?$
Roman numerals with capital letters: M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})

Tasks:¶

HTML color code of 6 hexadecimal numbers like A34DC8
Date in the form yyyy-mm-dd

Regular expressions (RegEx) in python¶

You have to import the functions of the modul re, because they are not default. Put this line in the beginning of your code.

import re

The functions in the module re:

Function	Description
findall(p, s)	returns a list containing all matches of p in s
search(p, s)	returns a "match object" if there is a match of p in s
split(p, s)	split at each match of p in s and returns a list
sub(p, n, s[, m])	replaces all or m matches of p with a new string n in s
finditer(p, s)	returns an iterable object on the match objects of p in s

Match object defines where is the pattern in the string and what is it exactly. These info can be read out by the .span() and .group() methods.

s = "confirmation"
p = ".i"
print(re.findall(p, s))

['fi', 'ti']

print(re.search(p, s))

<_sre.SRE_Match object; span=(3, 5), match='fi'>

x = re.search(p, s)
print(x.span())
print(x.group())

(3, 5)
fi

print(re.split(p, s))

['con', 'rma', 'on']

Since backslash and other special characters can be used in a RegEx pattern, you have to be careful with them.

The best if you use a so called raw string as pattern. In this format one backslash means actually one backspash. You don't have to escape the backslash.

If you put an r in front of the string, then it is in a raw format.

x = re.finditer(p, s)
for y in x:
    print(y.group())

fi
ti

Exercise: Triple the quotation mark in a string. Anyone!

st = """This "word" is an 'other' one."""
p = """(['"])"""
print(st)
print(re.sub(p, r"\1\1\1", st))

This "word" is an 'other' one.
This """word""" is an '''other''' one.

s = "This 'string' has two 'quoted' words"
p1 = "'.*'"    # greedy    
p2 = "'.*?'"   # lazy
print(re.findall(p1, s), "  --> greedy")
print(re.findall(p2, s), "  --> lazy")

["'string' has two 'quoted'"]   --> greedy
["'string'", "'quoted'"]   --> lazy