class:inverse middle center # *Week 10 - Writing good code* ---- # II: Running Python scripts, errors, <br> and beyond the basics <br> <br> <br> <br> <br> ### Jelmer Poelstra ### 2021/03/18 --- class: center middle inverse # Overview ---- ### - Running Python scripts (4.4) ### - Beyond the basics (4.9) ### - Errors and error handling (4.5) --- class: center middle inverse # Running Python scripts (4.4) ----- <br> <br> <br> <br> --- ## Running Python scripts So far, we have worked with Python: - By typing in the Python interactive window directly. And similarly, by interactively sending code from a script there. - By loading a Python script as a module using `import my_module`. But, like we did with Bash scripts, we can also save a script and then **run it non-interactively by calling it from the shell.** -- <br> ---- <br> **We will practice this with our `simulate_drift.py` script from Tuesday.** --- ## Running Python scripts (cont.) To be able to run the script smoothly and flexibly, we will: - Add a *shebang* line (inside the script). - Make the script executable (from the shell). - Parse command-line arguments passed to the script (inside the script & from the shell). - Add an `if __name__ == "__main__"` statement (inside the script). <br> Most of this is familiar from what we did running Bash scripts! --- ## Running Python scripts: *shebang* line We could write a *shebang* line just like we did for Bash scripts, directly providing the path to the interpreter, which in this case would be `/usr/bin/python3` (vs. `/bin/bash`).<sup>[1]</sup> .footnote[<sup>[1]</sup>This is also what CSB does in the example.] However, we will instead **call the `env` utility to determine the path of Python for us.** -- Let's see what difference that makes – open a new terminal and type: ```sh $ /usr/bin/env python3 #> ... #> Python 3.6.8 >>> # Ctrl-D or exit() to exit Python $ which python3 #> /usr/bin/python3 ``` That loaded OSC's default Python 3 installation at `/usr/bin/python3`. --- ## Running Python scripts: *shebang* line (cont.) If we now load our Conda environment with a new version of Python, and try again: ```sh $ module load python/3.6-conda5.2 $ source activate ipy-env $ /usr/bin/env python3 #> ... #> Python 3.9.2 >>> # Ctrl-D or exit() to exit Python $ which python3 # Check the current path to Python #> ~/.conda/envs/ipy-env/bin/python3 ``` Nice! `env` automatically tracked down the correct version of Python for us. <br> So, using `/usr/bin/env python3` in the *shebang* line means we don't need to change the path depending on which Python installation we want to use. --- ## Running Python scripts: *shebang* line (cont.) So, our *shebang* line, the very first line of the script, should be: ```sh #!/usr/bin/env python3 ``` --- ## Running Python scripts: Make the script executable Make the script executable in the shell – same as we've done with Bash scripts: ```sh $ chmod u+x simulate_drift.py ``` --- ## Running Python scripts: <br> Parse command-line arguments Again like we did with Bash scripts, we generally want our Python scripts to take *arguments*. Arguments that were passed to the script (as in `./my.py arg1 arg2`) are available inside the script in the list `sys.argv`. -- Just like `$0` contained the name of the script in Bash, and `$1` contained the first argument, and so on: - `sys.argv[0]` contains the name of the script. - `sys.argv[1]` contains the first argument. - `sys.argv[2]` contains the second argument. - And so on... .content-box-info[ For more sophisticated argument parsing, such as accepting flags and named options, look into the [`argparse` module](https://docs.python.org/3/howto/argparse.html). ] --- ## Running Python scripts: Parse command-line arguments Our `simulate_drift.py` script should take two arguments: the population size `N` and `p` which is essentially the starting allele frequency. Because the items in the `sys.argv` list are strings by default, we need to convert them to the right type: ```python import sys # PUT THIS AT THE TOP OF THE SCRIPT N = int(sys.argv[1]) # PUT THIS BELOW THE FUNCTION p = float(sys.argv[2]) ``` <br> -- Additionally, our script currently just contains a *function*. We therefore also need to *call* the function in order to run it, and we use `N` and `p` as parsed from the command-line arguments to do so: ```python # Run the simulation: simulate_drift(N, p) ``` --- ## Running Python scripts: `__name__` A Python script can be run in two quite different ways: - It can be **imported into any Python session using `import`**. When we do this, we typically just want to load the functions that are available in the script/module to use them in our own script. - It can be **run from the shell.** In this case, we want to directly run computations, e.g. *call* the functions in the script. --- ## Running Python scripts: `__name__` We could create separate scripts for both use cases, but Python makes it easy for **a single script to perform either of these functions depending on the context.** Using the special variable `__name__`, we can tell the use case: | If the script is being: | Then the value of `__name__` is: |-------------------------| ----- | Called from the shell | `__main__` | Imported into a Python session | *The name of the script* --- ## Running Python scripts: `__name__` Therefore, we can include an `if` statement to parse arguments and *run* the functions in our script only when our script is being called from the shell: ```python if __name__ == "__main__": N = int(sys.argv[1]) p = float(sys.argv[2]) simulate_drift(N, p) ``` <br> On the other hand, when we import the script (module), these commands won't be executed: its functions will just be made available to us. --- ## Finally, running our script Now, we can run our Python script from the shell: ```sh $ ./simulate_drift.py 1000 0.1 #> An allele reached fixation at generation 162 #> The genotype counts are #> {'AA': 0, 'aa': 1000, 'Aa': 0, 'aA': 0} ``` --- background-color: #f2f5eb ## Side note I: <br> Standard input and standard output in scripts Python makes it straightforward to let scripts interface with standard input and standard output. To **accept input from standard input**, we can use the `sys.stdin()` function: ```python #!/user/bin/env python import sys for line in sys.stdin(): # Process each line... ``` -- To **send output to standard out**, we can simply use the `print()` function as we did `simulate_drift.py`, or `sys.stdout()` (and `sys.stderr()`). If our output is processed data, printing to standard out rather than saving to file can be also useful as it allows you to flexibly redirect output into any file, possibly gzip it, and perhaps even pipe it into another program! ```sh grep "my_gene" my.GFF | ./process_gene.py | gzip > gene_output.gz ``` --- background-color: #f2f5eb ## Side note II: <br> Using Python to orchestrate analysis pipelines We can also run external command-line programs (and shell tools) from within Python scripts as we can **run any run shell command** in a Python script.<sup>[1]</sup> .footnote[<sup>[1]</sup>This can be done e.g. with the `subprocess.check_output()` function.] Combined with Python's advanced argument parsing capabilities such as with the `argparse` module, this means that you can **use Python to glue together more complex analysis pipelines** than would be advisable with Bash scripts. <br> -- However, we will learn about an "even more advanced" of managing workflows using Snakemake (which *is* a Python program!). --- class: center middle inverse # Beyond the basics (4.9) ----- <br> <br> <br> <br> --- ## Arithmetic of data structures - Recall that we can use operators like `+` with lists: ```python a = [1, 2, 3] b = [4, 5] a + b ``` --- ## Arithmetic of data structures - Recall that we can use operators like `+` with lists: ```python a = [1, 2, 3] b = [4, 5] a + b #> [1, 2, 3, 4, 5] ``` -- - ...As well as with tuples: ```python a = (1, 2) b = (4, 6) a + b #> (1, 2, 4, 6) ``` <br> **Specifically, the `+` operator will *concatenate* lists and tuples.** --- ## Arithmetic of data structures (cont.) - But we can't do the same for sets: ```python a = {1, 2} b = {4, 6} a + b #> TypeError: unsupported operand type(s) for +: 'set' and 'set' ``` - How could we combine `a` and `b` into a single set with all 4 numbers? -- ```python a | b {1, 2, 4, 6} ``` --- ## Arithmetic of data structures (cont.) - We also can't use `+` for dictionaries: ```python z1 = {1: "AAA", 2: "BBB"} z2 = {3: "CCC", 4: "DDD"} z1 + z2 #> TypeError: unsupported operand type(s) for +: 'dict' and 'dict' ``` - How can we combine two dictionaries again? -- ```python z1.update(z2) z1 #> {1: 'AAA', 2: 'BBB', 3: 'CCC', 4: 'DDD'} ``` --- ## Arithmetic of data structures (cont.) We can also "multiply" strings, lists, and tuples, though recall that this repeats the contents *n* times and does not operate element-wise: ```python 'a' * 3 #> 'aaa' (1, 2, 4) * 2 #> (1, 2, 4, 1, 2, 4) [1, 3, 5] * 4 #> [1, 3, 5, 1, 3, 5, 1, 3, 5, 1, 3, 5] ``` --- ## Mutable and immutable data types *Strings*, *sets*, and *tuples* are **immutable**. As a consequence: - They don't have methods that can modify them *in place*, so any "modification" in fact creates an entirely new object: ```python seq = 'AACT' seq.replace("T", "U") #> 'AACU' seq #> 'AACT' ``` -- - Assigning a new variable name will always create an entirely new, unlinked variable: ```python x = 5 y = x y = y + 10 y #> 15 x #> 5 ``` --- ## Mutable and immutable data types We have seen that *lists*, *dictionaries*, and *sets* are **mutable**. As a consequence: - They have methods that can modify them *in place*. ```python seq = list('AACG') reversed_seq_maybe = seq.reverse() reversed_seq_maybe #> print(reversed_seq_maybe) #> None seq #> ['G', 'C', 'A', 'A'] ``` --- ## Mutable and immutable data types We have seen that *lists*, *dictionaries*, and *sets* are **mutable**. As a consequence: - They have methods that can modify them *in place*. - When we assign them to a new variable name, all that happens is that we **create a new pointer to the *same variable*,**: ```python seq2 = seq seq2.append('T') seq2 #> ['A', 'A', 'C', 'G', 'T'] seq #> ['A', 'A', 'C', 'G', 'T'] ``` To actually create a new object with the same contents, we need to explicitly make a copy. --- ## Copying mutable objects - The `copy()` method will create a copy: ```python import copy seq2 = seq.copy() seq #> ['A', 'A', 'C', 'G', 'T'] ``` -- - Now, these lists can be modified independently: ```python seq2.clear() seq2 #> [] seq #> ['A', 'A', 'C', 'G', 'T'] ``` -- .content-box-info[ For lists specifically, we can also use `[:]` notation: ```python seq2 = seq[:] ``` ] --- ## Copying mutable objects **However**, for "nested" objects, such as lists containing lists or dictionaries containing dictionaries, the "shallow" copy made by `copy` does not suffice: ```python a = [[1, 2], [3, 4]] b = a.copy() a[0][1] = 10 a #> [[1, 10], [3, 4]] b #> [[1, 10], [3, 4]] # b also changed! ``` --- ## Copying mutable objects Instead, we need the `deepcopy()` method from the `copy` module: ```python import copy a = [[1, 2], [3, 4]] b = copy.deepcopy(a) a[0][1] = 10 a #> [[1, 10], [3, 4]] b #> [[1, 2], [3, 4]] # b has not changed ``` --- class: center middle inverse # Errors and error handling (4.5) ----- <br> <br> <br> <br> --- ## Errors and exceptions Python's error messages contain a lot of useful information, so it really pays off to pay attention to them. The two main error types that Python distinguishes between are: - **Syntax errors**: You have not followed the syntax (= grammar) of Python. - **Exceptions**: The code is gramatically correct but cannot be executed. --- ## Most common error messages | Error type | Explanation |------------------|-------------- | SyntaxError | **Code contains a syntax (i.e., grammar) error**. <br> Common problems are a missing colon, missing quotes, `=` instead of `==`. | IndentationError | **Incorrect amount of indentation**, <br> e.g. after `if`, loop, or function definition. | TypeError | **Variable type does not allow the operation**: <br> e.g. adding a number and a string. | NameError | **Variable does not exist** (typo?). --- ## Most common error messages | Error type | Explanation |------------------|-------------- | SyntaxError | **Code contains a syntax (i.e., grammar) error**. <br> Common problems are a missing colon, missing quotes, `=` instead of `==`. | IndentationError | **Incorrect amount of indentation**, <br> e.g. after `if`, loop, or function definition. | TypeError | **Variable type does not allow the operation**: <br> e.g. adding a number and a string. | NameError | **Variable does not exist** (typo?). | IndexError | **Item in list does not exist.** | KeyError | **Item in dictionary does not exist.** | IOError | **File** does not exist, or you tried to write to a file that is open only for reading (IO = Input/Output). | AttributeError | **Attribute or method does not exist**, <br> e.g. `.keys()` on a list. --- ## Handling exceptions Like most programming languages (but unlike Bash), Python will **stop execution** when it encounters an error. So, when an error is encountered in the middle of running your script or function, any code after the error will not be run. For instance, we may get an error when trying to divide by 0: ```python def div(x, y): print("I will divide!") print(x / y) print("I'm done") div(10, 5) #> I will divide! #> 2 #> I'm done div(10, 0) #> I will divide! #> [...] #> ZeroDivisionError: float division by zero ``` --- ## Handling exceptions (cont.) Sometimes, you want to allow for an error and carry on, in which case you are said to **"catch" the error**. For example: ```python def superdiv(x, y): print("I will divide!") try: print(x / y) except: print("Error: Cannot divide by 0") print("I'm done") superdiv(10, 0) #> I will divide! #> Error: Cannot divide by 0 #> I'm done ``` --- ## Handling exceptions (cont.) A less silly version of that, where you actually need to accommodate zero-divisions, could look something like this: ```python def superdiv(x, y): try: return(x / y) except: return(None) # Return None when trying to divide by 0 ``` -- <br> .content-box-info[ Note that this is doing the opposite of what we were doing in Bash scripts, where we made sure the program stops when it encounters an error, which it did not do by default. The distinction is that in *specific, anticipated cases*, we actually may want to carry on despite something that is technically an error. ] --- class: center middle inverse # Questions? ----- <br> <br> <br> <br> --- class: center middle inverse # Bonus material ----- <br> <br> <br> <br> --- class: center middle inverse # Writing style (4.3) ----- <br> <br> <br> <br> --- background-color: #f2f5eb ## Some style guidelines - Use four spaces per indentation level; not tabs. - Split long lines using parentheses: consider having your text editor insert a vertical line at 80 characters and try not to pass that line. - Separate functions using blank lines. - Use a new line for each operation: .pull-left[ Yes: ```python import scipy import re ``` ] .pull-right[ No: ```python import scipy, re ``` ] <br> <br> .content-box-info[ For more style guidelines, see [PEP8 - The style guide for Python code](https://pep8.org/) ] --- background-color: #f2f5eb ## Some style guidelines (cont.) - Put spaces around operators (unless using the : to “slice” a list). - Use parentheses to clarify the precedence of operators. .pull-left[ Yes: ```python a = b ** c z = [a, b, c, d] n = n * (n - 1) k = (a * b) + (c * d) my_list = a_list[1:n] ``` ] .pull-right[ No: ```python a=b**c z=[ a,b,c,d ] n =n*(n-1) k=a*b+c*d my_list = a_list[1 : n] ``` ] --- background-color: #f2f5eb ## Some style guidelines (cont.) - **Use lots of comments and "docstrings"!** Python allows you to write short comments (starting with # ) and docstrings (multiline comments flanked by triple double quotes, """ ): - Use docstrings to document *how to use the code*. What does this function do? What is the input/output? - Use short comments for explaining difficult passages in the code: that is, to document how the code works. <br> -- - **Naming conventions:** - Variables and functions in lowercase, "snake case": `my_var`, `my_fun`. - **Constants** in uppercase: `SOME_CONSTANT`. - Use descriptive names: `count_pairs`, `body_mass`, `cell_volume`. - Verbs for functions and nouns for objects.