pracs-sp21: Exercises: Week 8

Setup

Open a new file and save it as week08_exercises.py or something along those lines.
Type your commands in the script and send them to the prompt in the Python interactive window by pressing Shift+Enter.

Problems with the keyboard shortcut?

If this doesn’t work, check your keyboard shortcut by right-clicking in the script and looking for “Run Selection/Line In Python Interactive Window”.

Also, you can open the Command Palette (Ctrl+Shift+P) and look for that shortcut there, and change it if you want.

Because these first two exercises have many small steps, I put the solutions right below the question, so you don’t have to scroll back-and-forth all the time. However, make sure you actually try to do the exercises before you look at the solutions!

Exercise 1: Variable types and strings

Print the type of the value 4.88.

Solution

We can use the function type():

type(4.88)

<class 'float'>

The result would be the same if we first assigned it as a variable:

num = 4.88
type(num)

<class 'float'>

Assign the variable n_samples = 658, and then extract the third character from n_samples.

Hints

You can’t index a number like n_samples[index], so you’ll first have to convert n_samples to a string. Also, recall that Python starts counting from 0!

Solution

n_samples = 658
str(n_samples)[2]

'8'

Assign the string ‘CTTATGGAAT’ to a variable called adapter. Print the number of characters in adapter.

Solution

adapter = 'CTTATGGAAT'
len(adapter)

Replace all As by Ns in adapter and assign the resulting string to a new variable. Print the new variable.

Hints

Use the string method replace(), and recall that methods are called using the <object_name>.<method_name>() syntax.

Solution

bad_seq = adapter.replace('A', 'N')
bad_seq

'CTTNTGGNNT'

Find out what the third argument to the replace() method does by using the built-in help.

Hints

If you are typing your commands in a script rather than straight in the console, you will get some more information already when typing the opening parenthesis of the method (briefly pause if necessary).

To get more help, you can use a notation with a ?, or help(object.method).

Solution

help(adapter.replace)
# Or: "adapter.replace?"
# Or: "?adapter.replace"

Help on built-in function replace:

replace(old, new, count=-1, /) method of builtins.str instance
    Return a copy with all occurrences of substring old replaced by new.
    
      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.
    
    If the optional argument count is given, only the first count occurrences are
    replaced.

As it turns out, the third argument, count, determines how many instances of the substring will be replaced.

Using what you found out in the previous steps, replace just the first two As in adapter by Ns.

Solution

We specify 2 as the third argument, which is the number of instances of the substring that will be replaced:

adapter.replace('A', 'N', 2)

'CTTNTGGNAT'

Convert the following strings and numbers to a Boolean value to see what the resulting Boolean is (True or False): "False" (with quotes), 0, 1, -1, "", None, and see if you can make sense of these results.

Solution

bool("False")

True

bool(1)

True

bool(0)

False

bool(-1)

True

As it turns out, among numbers and strings, only 0 is interpreted as False, whereas anything else is interpreted as True.

bool("")

False

bool()

False

bool(None)

False

But an empty string, nothing at all between parenthesis, and None (Python’s keyword to define a null value or the lack of a value), are also interpreted as False.

Note that as soon as you quote "None", it is a string again and will be interpreted as True:

bool("None")

True

Have a look at the names of the methods that appear when you type adapter. (note the .). Can you find a method that will print the last occurrence of a T in adapter?

Hints

The method rfind will search from the right-hand side (hence r), and will therefore print the last occurrence of the specified substring.

Solution

adapter.rfind("T")

Split the sequence by GAGTCCCTNNNAGCAACGTTNNTTCGTCATTAN by Ns.

Hints

Use the split() method for strings.

Solution

seq = "GAGTCCCTNNNAGCAACGTTNNTTCGTCATTAN"
split_seq = seq.split('N')
split_seq

['GAGTCCCT', '', '', 'AGCAACGTT', '', 'TTCGTCATTA', '']

Exercise 2: Lists

Assign a list plant_diseases that contains the items fruit_rot, leaf_blight, leaf_spots, stem_blight, canker, wilt, root_knot and root_rot.

Solution

diseases = ['fruit_rot', 'leaf_blight', 'leaf_spots', 'stem_blight',
            'canker', 'wilt', 'root_knot', 'root_rot']

Extract stem_blight from diseases by its index (position).

Solution

stem_blight is the fourth item and because Python starts counting at 0, this is index number 3.

diseases[3]

'stem_blight'

Extract the first 5 items from diseases.

Hints

Recall that when using ranges, Python does not include the item corresponding to the last index.

Solution

While index 5 is the sixth item, it is not included, so we specify 0:5 or :5 to extract elements up to and including the fifth one:

diseases[0:5]

['fruit_rot', 'leaf_blight', 'leaf_spots', 'stem_blight', 'canker']

Or:

diseases[:5]

['fruit_rot', 'leaf_blight', 'leaf_spots', 'stem_blight', 'canker']

Extract the last item from diseases.

Hints

Recall that you can use negative numbers to start counting from the end. Also, while 0 is the first index, “-0” (or something along those lines) is not the last index.

Solution

diseases[-1]

'root_rot'

Extract the last 3 items from diseases.

Solution

Note that you’ll have to omit a number after the colon in this case, because [-3:-1] would not include the last number, and [-3:0] does not work either.

diseases[-3:]

['wilt', 'root_knot', 'root_rot']

Exercise 3: Dictionaries

Create and print a dictionary called yield_current with the following items:

{"plotA_1": 12, "plotA_2": 18, "plotA_3": 2,
 "plotB_1": 33, "plotB_2": 28, "plotB_3": 57}

Solution

yield_current = {"plotA_1": 12, "plotA_2": 18, "plotA_3": 2,
                 "plotB_1": 33, "plotB_2": 28, "plotB_3": 57}
                 
yield_current

{'plotA_1': 12, 'plotA_2': 18, 'plotA_3': 2, 'plotB_1': 33, 'plotB_2': 28, 'plotB_3': 57}

Print just the value for key plotA_3.

Solution

We can get the value for a specific key using the <dict>[<key>] notation:

yield_current["plotA_3"]

Update the value for key plotB_2 to be 31 and check whether this worked.

Solution

We can simply assign a new value using =:

yield_current["plotB_2"] = 31
yield_current["plotB_2"]

Count the number of items (i.e. entries, key-value pairs) in your dictionary.

Hints

Use the len() function.

Solution

len(yield_current)

Bonus: Create a dictionary obs_20210305 with keys plotA_3 and plotC_1, and values 18 and 3, respectively.

Then, update the yield_current dictionary with the obs_20210305 dictionary, and check whether this worked.

Solution

obs_20210305 = {"plotA_3": 18, "plotC_1": 3}

We use the update() method as follows:

yield_current.update(obs_20210305)

yield_current

{'plotA_1': 12, 'plotA_2': 18, 'plotA_3': 18, 'plotB_1': 33, 'plotB_2': 31, 'plotB_3': 57, 'plotC_1': 3}

Now, our dictionary has an updated value for key “plotA_3”, and an entirely new item with key “plotC_1”.

Bonus: Get and count the number of unique values in your dictionary.

Hints

Extract the values with the values() method. Next, turn these values into a set to get the unique values. Finally, count the unique values with the len() function.

Solution

len(set(yield_current.values()))

Exercise 4: Sets

Assign a set named dna with 4 items: each of the 4 bases (single-letter abbreviations) in DNA.

Hints

Recall the use of curly braces to assign a set.
The order of the bases doesn’t matter, because sets are unordered.

Solution

dna = {'A', 'G', 'C', 'T'}

Assign a set named rna with 4 items: each of the 4 bases (single-letter abbreviations) in RNA.

Solution

rna = {'A', 'G', 'C', 'U'}

Find the 3 bases that are shared between DNA and in RNA (try both with an operator and a method, if you want).

Solution

dna & rna

{'A', 'C', 'G'}

Or:

dna.intersection(rna)

{'A', 'C', 'G'}

Find all 5 bases that are collectively found among DNA and RNA.

Solution

dna | rna

{'G', 'C', 'T', 'A', 'U'}

Or:

dna.union(rna)

{'G', 'C', 'T', 'A', 'U'}

Find the base that only occurs in DNA.

Solution

dna - rna

{'T'}

Or:

dna.difference(rna)

{'T'}

Assign a set named purines with the two purine bases and a set named pyrimidines with the three pyrimidine bases.

Solution

purines = {'A', 'G'}
pyrimidines = {'C', 'T', 'U'}

Find the pyrimidine that occurs both in RNA and DNA.

Solution

You can combine more than two sets either by chaining methods or adding another operator.

Solution

pyrimidines & dna & rna

{'C'}

Or:

pyrimidines.intersection(dna).intersection(rna)

{'C'}

Bonus: Find the pyrimidine that occurs in RNA but not DNA.

Solution

(rna - dna) & pyrimidines

{'U'}

Or:

rna - dna & pyrimidines

{'U'}

Or:

rna.difference(dna).intersection(pyrimidines)

{'U'}

Exercises: Week 8

Setup

Exercise 1: Variable types and strings

Exercise 2: Lists

Exercise 3: Dictionaries

Exercise 4: Sets

Reuse