More on strings

a = "That's"
b = "fine"
Operator Description Input Output
+ Concatenation a + b That'sfine
* Repetition 2 * b finefine
[] Slice a[1] h
[:] Range Slice a[1:4] hat
in Membership 'a' in a True
not in Membership 'a' not in b True
r/R Raw String suppresses escape chars r'\n' \n
% Format: %s %d %o (octal) %x (hex) %f %e %E %g %G
In [1]:
x = 12.345
y = 12.3e10
print("f: %f, e: %e, g: %g" % (x, x, x) )
print("f: %f, e: %e, g: %g" % (y, y, y) )
f: 12.345000, e: 1.234500e+01, g: 12.345
f: 123000000000.000000, e: 1.230000e+11, g: 1.23e+11

Upper and lower case

hi = "how ARE you"
Method Description Input Output
.capitalize() first letter into upper case hi.capitalize() How are you
.lower() lower case hi.lower() how are you
.upper() upper case hi.upper() HOW ARE YOU
.title() every word capitalized hi.title() How Are You
In [2]:
hi = "how ARE you"
In [3]:
'How Are You'
In [4]:
Method Boolean value
.isalnum() alphanumeric characters (no symbols)?
.isalpha() alphabetic characters (no symbols)?
.islower() lower case?
.isnumeric() numeric characters?
.isspace() whitespace characters?
.istitle() is in title case?
.isupper() upper case?
In [5]:
alnum = "23 apples"
num = "-1234"
title = "Big Apple"
print(alnum.isalnum(), num.isnumeric(), title.istitle()) 
False False True
In [6]:
alnum = "23apples"
num = "1234"
white = "   \t   \n\n   "
print(alnum.isalnum(), num.isnumeric(), white.isspace()) 
True True True

join(), strip(), replace(), split() methods

hi = "how ARE you"
seq = ['h', 'o', 'w']  #OR seq = 'h', 'o', 'w'
wh = "\t this \n "
Method Description Input Output
.join() concatenates with separator string " < ".join(seq) h < o < w
.lstrip() removes leading whitespaces wh.lstrip() "this \n "
.rstrip() removes trailing whitespaces wh.rstrip() "\t this"
.strip() performs lstrip() and rstrip() wh.strip() "this"
.replace(old, new [, m]) replaces old with new at most m times hi.replace("o", "O") hOw ARE yOu
.split(s[,m]) splits at s max m times, returns list hi.split() [ "how", "ARE", "you" ]
In [7]:
s = " < "
seq = "a", "b", "c"   # a sequence of strings (tuple, list)
print (s.join( seq ))
a < b < c
In [8]:
hi = "how ARE you today"
hi.replace("o", "O", 2)
'hOw ARE yOu today'
In [9]:
hi.split() # if no argument is given, separate at white spaces
['how', 'ARE', 'you', 'today']
In [10]:
hi.split("o", 2)   # separate at the first two occurances
['h', 'w ARE y', 'u today']
In [11]:
s = '.,.dots or commas.,.,...'
dots or commas
.,.dots or commas
dots or commas.,.,...



In [12]:
s = "where"
In [13]:

The parameter of the method tells the final width.

You can print a table nicely:

In [14]:
tabular = [["First row", -2, -310], ["Second row", 3, 1], ["Third row",-321, 11]]
tabular_string = ""
for row in tabular:
    tabular_string += row[0].ljust(13)
    for i in range(1, len(row)):
        tabular_string += str(row[i]).rjust(7)
    tabular_string += "\n" 
First row         -2   -310
Second row         3      1
Third row       -321     11

format method

The object is the formatting string and the parameters are the things to subtitute.

The numbers in the brackets mark the parameters.

In [15]:
'{0}-{1}-{2} {0}, {1}, {2}, {0}{0}{0}'.format('X', 'Y', 'Z')
'X-Y-Z X, Y, Z, XXX'

The format marker "{ }" can have optional formatting instructions: {number:optional}

optional Meaning
d decimal
b binary
o octal
x, X hex, capital HEX
f, F float
e, E exponential form: something times 10 to some power
< left justified
> right justified
^ centered
c^ centered but with a character 'c' as padding
In [16]:
print("01234 01234 01234 0123456789")
print('{0:5} {1:5d} {2:>5} {3:*^10}'.format('0123', 1234, '|', 'center'))
01234 01234 01234 0123456789
0123   1234     | **center**
In [17]:
"int {0:d}, hex {0:x} {0:X}, oct {0:o}, bin {0:b}".format(42)
'int 42, hex 2a 2A, oct 52, bin 101010'
In [18]:
"{0}, {0:e}, {0:f}, {0:8.4f}, {0:15.1f}".format(-12.345)
'-12.345, -1.234500e+01, -12.345000, -12.3450,           -12.3'

You can also name the parameters, it is more convinient then indices.

In [19]:
'The center is: ({x}, {y})'.format(x=3, y=5)
'The center is: (3, 5)'
In [20]:
x1 = 3; y1 = 4
print('The center is: ({x}, {y})'.format(x=x1, y=y1))
The center is: (3, 4)
In [21]:
tabular = [["First row", -2, -310], ["Second row", 3, 1], ["Third row",-321, 11]]
table_string = ""
for row in tabular:
    table_string += "{0:_<13}".format(row[0])
    for i in range(1, len(row)):
        table_string += "{0:7d}".format(row[i])
    table_string += "\n"
First row____     -2   -310
Second row___      3      1
Third row____   -321     11

Regular expressions

A regular expression (regex, regexp) is a sequence of characters that define a search pattern. It is used in different programming languages and text editors. The aim is to recognize some string with given properties (like email-address, date, roman numeral, IP-address,...)


The next characters has special meaning: . ^ $ * + ? { } [ ] ( ) \ |

Character Description Example Fits to
[] set of characters "[abcd]" a, b,...
[a-z] an intervall "[0-9a-fA-F]" B, 5,...
[^chars] not the listed chars "[^qx]" a, b, c,...
\ to escape special characters "\s" space, tab,...
. any character (except newline) "Wh..." Where, Whose,...
^ beginning "^Once" Once.....
$ ends "finished.\$" .....finished.
? zero or one occurrences "colou?r" color, colour
* zero or more occurrences (greedy) "woo*w" woooooow
+ one or more occurrences (greedy) "wo+w" wow, woow
*? zero or more (lazy) "w.*?w"
+? one or more (lazy) "w.+?w"
{n} exactly n occurrences "al{2}e{2}" allee
{n,} at least n occurrences "oh{3,}" ohhhhhh
{,n} at most n occurrences "woo{,3}w" woooow, wooow, woow, wow
{n,m} at least n at most m occurrences "wo{1,3}w" wow,...
| either or "H(a|ae|ä)ndel" Handel, Haendel, Händel
() capture and group

Special sequences

Character Description Examples
\b beginning or end of a word r"\bis" r"st\b"
\B NOT the beginning or the end of a word r"\Bis" r"st\B"
\d digits (0-9) r"\d\d-\d\d"
\D NOT a digits r"\d\d-\D"
\s white space character r"for\sever"
\S NOT a white space r"\S"
\w any word character (a to Z, 0-9, and _) r"\s\w\w\w\s"
\W NOT a word character r"\Wword\W"


  1. Two-digit numbers divisible by 4: [02468][048]|[13579][26]
  2. String between ' or " characters: (['"])[^\1]*\1
  3. Any floatingpoint number: ^[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?$
  4. Roman numerals with capital letters: M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})


  1. HTML color code of 6 hexadecimal numbers like A34DC8
  2. Date in the form yyyy-mm-dd

Regular expressions (RegEx) in python

You have to import the functions of the modul re, because they are not default. Put this line in the beginning of your code.

In [22]:
import re

The functions in the module re:

Function Description
findall(p, s) returns a list containing all matches of p in s
search(p, s) returns a "match object" if there is a match of p in s
split(p, s) split at each match of p in s and returns a list
sub(p, n, s[, m]) replaces all or m matches of p with a new string n in s
finditer(p, s) returns an iterable object on the match objects of p in s

Match object defines where is the pattern in the string and what is it exactly. These info can be read out by the .span() and .group() methods.

In [23]:
s = "confirmation"
p = ".i"
print(re.findall(p, s))
['fi', 'ti']
In [24]:
print(, s))
<_sre.SRE_Match object; span=(3, 5), match='fi'>
In [25]:
x =, s)
(3, 5)
In [26]:
print(re.split(p, s))
['con', 'rma', 'on']

Since backslash and other special characters can be used in a RegEx pattern, you have to be careful with them.

The best if you use a so called raw string as pattern. In this format one backslash means actually one backspash. You don't have to escape the backslash.

If you put an r in front of the string, then it is in a raw format.

In [27]:
x = re.finditer(p, s)
for y in x:

Exercise: Triple the quotation mark in a string. Anyone!

In [28]:
st = """This "word" is an 'other' one."""
p = """(['"])"""
print(re.sub(p, r"\1\1\1", st))
This """word""" is an '''other''' one.
In [29]:
s = "This 'string' has two 'quoted' words"
p1 = "'.*'"    # greedy    
p2 = "'.*?'"   # lazy
print(re.findall(p1, s), "  --> greedy")
print(re.findall(p2, s), "  --> lazy")
["'string' has two 'quoted'"]   --> greedy
["'string'", "'quoted'"]   --> lazy
In [ ]: