PYW.png 7b: handling text - string and files

Table of Contents

1. Mid-term evaluation

  • Math focus is distracting
  • Exams stressful
  • help session
  • in-class notes also as files (not just as html)
  • Quick pace. Hard to keep up (others: not so challenging)
  • Have the homework on codegrade.
  • Difficulty class, homework, exam is not in line
  • Codegrade start could have been clearer
  • Notes before the lecture

2. Last time: sets and dictionaries

Unordered collection.

set1 = {1, 3, 3}
print(set1)

Very suitable for mapping frequencies:

s = "how many times does each character occur in this string"

dict1 = {}

for i in s:
    if i in dict1:
        dict1[i] += 1
    else:
        dict1[i] = 1

dict1["jojojo"] = 5

print(dict1)        
s = "how many s's does this string have?"

dict1 = {}

count = 0
for i in s:
    if i in dict1:
        dict1[i] += 1
    else:
        dict1[i] = 1
print(dict1)

3. Methods for strings

Earlier we saw methods for list:

lst = [1, 2, 3, 5]
lst.append(4)
print(lst)

lst.sort()
print(lst)

There are also methods for strings.

3.1. lower()

Maybe some of you have already used this one: this turns the string to lowercase.

Is this a fruitful or void method?

s = "This string was written in Amsterdam"

var1 = s.lower()

print(s.lower())
print(var1)

the opposite is upper()

s = "This string was written in Amsterdam"

print(s.upper())

3.2. strip

.strip(): by default, it will remove spaces, tabs, line breaks and other ‘space-like’ characters from begin and end of the string. You can also indicate wich characters you would like to strip.

var1 = "how are you?"

print("hi all %s" % var1)

s = "   this sentence contains some space at the beginning and the end   "

#s = s.strip()

print("|%s|" % s)

s = "*-* this string starts with *-* and ends with *-*?"

# You can just list all the characters that you do not want to have at the beginning or end of the string
print(s.strip("*?!- "))

3.3. replace

.replace(<old, <new>). Replace all occurrences of <old> with <new>

.replace(<old>, <new>, <max>). Replace <max> times <old> with <new>

s = "the United States is losing its democracy"

news = s.replace("losing", "fighting for")

print(news)

You can also use this to remove characters. (In contrast to strip, this works across the entire string).

s = "I just can't stand the letter t"

s2 = s.replace("letter","")
print(s2)

You can indicate how many occurences you wan to replace:

s = "I just can't stand the letter t"

s2 = s.replace("t","",2)
print(s2)

3.4. split

Some of you know that you can turn a string into a list of it's characters:

lst = list("this will be a list")
print(lst)

With split.(<delimiter>) you can split a <string> at every occurrence of any <delimiter> into a list of substrings!

It leaves out the delimiter which is by default a space.

s = "this will be a list"
print(s.split())
print(s.split(" "))
print(s.split("be"))
s = """Sometimes you want to cut up a string in a different way.
For example if it contains several sentences. Then you
could for example split it using the period closing a sentence.
"""

lst = s.split(".")

lst2 = []
for i in lst:
    lst2.append(i.replace("\n",""))
print(lst2)

the \n sign is the way Python stores a "newline" (an "enter").

3.5. isnumeric

It is possible to write a number like a string (just like you can write it as a float or an integer).

With isnumeric you can check if a string is a number:

s = "100"

print(s.isnumeric())

4. Ways of including variables in print statements

var1 = "Nil"

var2 = [1, 2, 4]

print("Showing %s and %s how to include variables in print statements" % (var1, var2))

print(f"Showing {var1} and {var2} how to include bla bla ...")

5. Text files

A text file is a sequence of characters and it consists of text-only! There is no formatting included, like fonts, colours, and no special elements.

The only formatting that is allowed is spacing and line breaks.

Some examples are files with extension: .txt, .frt, .csv, etc.

A CSV (acronym for comma-separated values) file is a popular file format used to exchange data from databases. They are so important that several libraries have been written for their processing and analysis. We will discuss one of them (Pandas) in the Introduction to Data Analysis class. For these two upcoming lectures, we will focus on reading plain text files, from which TXT files are the most representative.

  • Windows Users: Use Notepad to see the structure of a TXT file
  • Mac Users: Use TextEdit to see the structure of a TXT file
  • Linux users: use vim

5.1. File handling in a block

When a limited number of statements requires a file to be open, it is elegant to do the file handling in a separate block of code. This construction implies automatic closing of the file after the block is executed.

This is the syntax:

with open(<file path>, <mode>) as finput:
   <statement>
   ...
   <statement>

If only a name is specified, Python will assume that the file is in the current/same directory where the Python file is saved and executed. You can also add the full path if the text files is saved in another location than the Python file.

You can open a file in different modes, the most common are:

  • "r", reading (no change to the file will be done)
  • "w", writing (content will be written into the file, it is usually used when an output file is created)
  • "a", appending (start writting at the end of this file.)

By default, if we open a file and do NOT specify the mode, it will do it in the 'reading' mode.

After you have opened the file as <file>, you cannot directly work with that object.

Two often-used methods to work with the file are .read() and .readlines().

  • .read() outputs the entire file (as a string)
  • .readlines() outputs a list in which every item represents a line (as a string)
with open("example.txt") as file:
    output1 = file.readlines()

var1 = "lalalalalal"
# output is now a list of strings
for line in output1:
    print("%s %s" % (line,var1))
    print("----")

What happens here?

  1. I open the text file as "file".
  2. "file" is an object that we cannot use directly. We still have to unpack it.
  3. I use the method ".readlines() to unpack the file as a list of lines (as strings).
  4. I then save that list in a new variable.
  5. That new variable is available outside of the with-block (while the "file"-object is closed (and therefore unavailable) outside the code block.)

I can also use the method .read()

Then the file is unpacked as one string.

with open("example.txt","r") as file:
    output = file.read()

for i in output:
    print(i)

Created: 2025-04-03 do 15:54