diff --git a/_config.yml b/_config.yml
index 2ec03d3a8..8d159fdfd 100644
--- a/_config.yml
+++ b/_config.yml
@@ -92,10 +92,12 @@ extras_order:
   - figures
   - guide
   - common-issues
-  - discuss
+  - refactor-1-software-design
+  - refactor-2-code-refactoring
+  - refactor-3-code-abstractions
+  - refactor-4-architecture-revisited
   - protect-main-branch
   - vscode
-  - functional-programming
   - persistence
   - databases
   - geopandas
diff --git a/_extras/refactor-1-software-design.md b/_extras/refactor-1-software-design.md
new file mode 100644
index 000000000..4941bb4a3
--- /dev/null
+++ b/_extras/refactor-1-software-design.md
@@ -0,0 +1,241 @@
+---
+title: "Refactor 1: Software Design"
+teaching: 25
+exercises: 20
+questions:
+- "Why should we invest time in software design?"
+- "What should we consider when designing software?"
+objectives:
+- "Understand the goals and principles of designing 'good' software."
+- "Understand code decoupling and code abstraction design techniques."
+- "Understand what code refactoring is."
+keypoints:
+- "'Good' code is designed to be maintainable: readable by people who did not author the code, 
+testable through a set of automated tests, adaptable to new requirements."
+- "The sooner you adopt a practice of designing your software in the lifecycle of your project, 
+the easier the development and maintenance process will."
+---
+
+## Introduction
+
+Ideally, we should have at least a rough design of our software sketched out 
+before we write a single line of code. 
+This design should be based around the requirements and the structure of the problem we are trying 
+to solve: what are the concepts we need to represent in our code 
+and what are the relationships between them. 
+And importantly, who will be using our software and how will they interact with it.
+
+As a piece of software grows,
+it will reach a point where there is too much code for us to keep in mind at once.
+At this point, it becomes particularly important to think of the overall design and
+structure of our software, how should all the pieces of functionality fit together,
+and how should we work towards fulfilling this overall design throughout development.
+Even if you did not think about the design of your software from the very beginning - 
+it is not too late to start now.
+
+It is not easy to come up with a complete definition for the term **software design**,
+but some of the common aspects are:
+
+- **Algorithm design** -
+  what method are we going to use to solve the core research/business problem?
+- **Software architecture** -
+  what components will the software have and how will they cooperate?
+- **System architecture** -
+  what other things will this software have to interact with and how will it do this?
+- **UI/UX** (User Interface / User Experience) -
+  how will users interact with the software?
+
+There is literature on each of the above software design aspects - we will not go into details of
+them all here. 
+Instead, we will learn some techniques to structure our code better to satisfy some of the 
+requirements of 'good' software and revisit 
+our software's [MVC architecture](/11-software-project/index.html#software-architecture) 
+in the context of software design.
+
+## Good Software Design Goals
+Aspirationally, what makes good code can be summarised in the following quote from the
+[Intent HG blog](https://intenthq.com/blog/it-audience/what-is-good-code-a-scientific-definition/):
+
+> *“Good code is written so that is readable, understandable,
+> covered by automated tests, not over complicated
+> and does well what is intended to do.”*
+
+Software has become a crucial aspect of reproducible research, as well as an asset that
+can be reused or repurposed. 
+Thus, it is even more important to take time to design the software to be easily *modifiable* and 
+*extensible*, to save ourselves and our team a lot of time later on when we have 
+to fix a problem or the software's requirements change.
+
+Satisfying the above properties will lead to an overall software design 
+goal of having *maintainable* code, which is:
+
+* *readable* (and understandable) by developers who did not write the code, e.g. by:
+  * following a consistent coding style and naming conventions
+  * using meaningful and descriptive names for variables, functions, and classes
+  * documenting code to describe it does and how it may be used
+  * using simple control flow to make it easier to follow the code execution
+  * keeping functions and methods small and focused on a single task (also important for testing)
+* *testable* through a set of (preferably automated) tests, e.g. by:
+  * writing unit, functional, regression tests to verify the code produces 
+  the expected outputs from controlled inputs and exhibits the expected behavior over time 
+  as the code changes
+* *adaptable* (easily modifiable and extensible) to satisfy new requirements, e.g. by:
+  * writing low-coupled/decoupled code where each part of the code has a separate concern and 
+  the lowest possible dependency on other parts of the code making it 
+  easier to test, update or replace - e.g. by separating the "business logic" and "presentation" 
+  layers of the code on the architecture level (recall the [MVC architecture](/11-software-project/index.html#software-architecture)), 
+  or separating "pure" (without side-effects) and "impure" (with side-effects) parts of the code on the 
+  level of functions.
+
+Now that we know what goals we should aspire to, let us take a critical look at the code in our 
+software project and try to identify ways in which it can be improved. 
+
+Our software project contains a branch `full-data-analysis` with code for a new feature of our 
+inflammation analysis software. Recall that you can see all your branches as follows: 
+~~~
+$ git branch --all
+~~~
+{: .language-bash}
+
+Let's checkout a new local branch from the `full-data-analysis` branch, making sure we
+have saved and committed all current changes before doing so.
+
+~~~
+git checkout -b full-data-analysis
+~~~
+{: .language-bash}
+
+This new feature enables user to pass a new command-line parameter `--full-data-analysis` causing
+the software to find the directory containing the first input data file (provided via command line 
+parameter `infiles`) and invoke the data analysis over all the data files in that directory. 
+This bit of functionality is handled by `catchment-analysis.py` in the project root. E.g.
+```bash
+python catchment-analysis.py data/rain_data_small.csv --full-data-analysis
+```
+
+The new data analysis code is located in `compute_data.py` file within the `catchment` directory 
+in a function called `analyse_data()`. 
+This function loads all the data files for a given a directory path, then
+calculates and compares standard deviation across all the data by day and finally plots a graph.
+
+> ## Exercise: Identifying How Code Can be Improved?
+> Critically examine the code in `analyse_data()` function in `compute_data.py` file. 
+> 
+> In what ways does this code not live up to the ideal properties of 'good' code?
+> Think about ways in which you find it hard to understand.
+> Think about the kinds of changes you might want to make to it, and what would
+> make making those changes challenging.
+>> ## Solution
+>> You may have found others, but here are some of the things that make the code
+>> hard to read, test and maintain.
+>>
+>> * **Hard to read:** everything is implemented in a single function. 
+>> In order to understand it, you need to understand how file loading works at the same time as 
+>> the analysis itself.
+>> * **Hard to modify:** if you wanted to use the data for some other purpose and not just 
+>> plotting the graph you would have to change the `data_analysis()` function.
+>> * **Hard to modify or test:** it always analyses a fixed set of CSV data files 
+>> within whichever directory it accesses, not always the file that is given as an argument.
+>> * **Hard to modify:** it does not have any tests so we cannot be 100% confident the code does 
+>> what it claims to do; any changes to the code may break something and it would be harder and 
+>> more time-consuming to figure out what.
+>> 
+>> Make sure to keep the list you have created in the exercise above. 
+>> For the remainder of this section, we will work on improving this code. 
+>> At the end, we will revisit your list to check that you have learnt ways to address each of the 
+>> problems you had found.
+>> 
+>> There may be other things to improve with the code on this branch, e.g. how command line 
+>> parameters are being handled in `catchment-analysis.py`, but we are focussing on 
+>> `analyse_data()` function for the time being.
+> {: .solution}
+{: .challenge}
+
+## Poor Design Choices & Technical Debt
+
+When faced with a problem that you need to solve by writing code - it may be tempted to 
+skip the design phase and dive straight into coding. 
+What happens if you do not follow the good software design and development best practices?
+It can lead to accumulated 'technical debt',
+which (according to [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt)),
+is the "cost of additional rework caused by choosing an easy (limited) solution now
+instead of using a better approach that would take longer".
+The pressure to achieve project goals can sometimes lead to quick and easy solutions,
+which make the software become
+more messy, more complex, and more difficult to understand and maintain.
+The extra effort required to make changes in the future is the interest paid on the (technical) debt.
+It is natural for software to accrue some technical debt,
+but it is important to pay off that debt during a maintenance phase -
+simplifying, clarifying the code, making it easier to understand -
+to keep these interest payments on making changes manageable.
+
+There is only so much time available in a project.
+How much effort should we spend on designing our code properly
+and using good development practices?
+The following [XKCD comic](https://xkcd.com/844/) summarises this tension:
+
+![Writing good code comic](../fig/xkcd-good-code-comic.png){: .image-with-shadow width="400px" }
+
+At an intermediate level there are a wealth of practices that *could* be used,
+and applying suitable design and coding practices is what separates
+an *intermediate developer* from someone who has just started coding.
+The key for an intermediate developer is to balance these concerns
+for each software project appropriately,
+and employ design and development practices *enough* so that progress can be made.
+It is very easy to under-design software,
+but remember it is also possible to over-design software too.
+
+## Techniques for Improving Code
+
+How code is structured is important for helping people who are developing and maintaining it 
+to understand and update it.
+By breaking down our software into components with a single responsibility, 
+we avoid having to rewrite it all when requirements change. 
+Such components can be as small as a single function, or be a software package in their own right.
+These smaller components can be understood individually without having to understand 
+the entire codebase at once.
+
+### Code Refactoring
+
+*Code refactoring* is the process of improving the design of an existing code - 
+changing the internal structure of code without changing its 
+external behavior, with the goal of making the code more readable, maintainable, efficient or easier
+to test.
+This can include things such as renaming variables, reorganising 
+functions to avoid code duplication and increase reuse, and simplifying conditional statements.
+
+### Code Decoupling
+
+*Code decoupling* is a code design technique that involves breaking a (complex) 
+software system into smaller, more manageable parts, and reducing the interdependence 
+between these different parts of the system.
+This means that a change in one part of the code usually does not require a change in the other, 
+thereby making its development more efficient and less error prone.
+
+### Code Abstraction
+
+*Abstraction* is the process of hiding the implementation details of a piece of
+code (typically behind an interface) - i.e. the details of *how* something works are hidden away,
+leaving code developers to deal only with *what* it does.
+This allows developers to work with the code at a higher level
+of abstraction, without needing to understand fully (or keep in mind) all the underlying
+details at any given time and thereby reducing the cognitive load when programming.
+
+Abstraction can be achieved through techniques such as *encapsulation*, *inheritance*, and 
+*polymorphism*, which we will explore in the next episodes. There are other [abstraction techniques](https://en.wikipedia.org/wiki/Abstraction_(computer_science))
+available too.
+
+## Improving Our Software Design
+
+Refactoring our code to make it more decoupled and to introduce abstractions to
+hide all but the relevant information about parts of the code is important for creating more 
+maintainable code. 
+It will help to keep our codebase clean, modular and easier to understand. 
+
+Writing good code is hard and takes practise.
+You may also be faced with an existing piece of code that breaks some (or all) of the
+good code principles, and your job will be to improve/refactor it so that it can evolve further.
+We will now look into some examples of the techniques that can help us redesign our code 
+and incrementally improve its quality.
+
+{% include links.md %}
diff --git a/_extras/refactor-2-code-refactoring.md b/_extras/refactor-2-code-refactoring.md
new file mode 100644
index 000000000..7ad37673f
--- /dev/null
+++ b/_extras/refactor-2-code-refactoring.md
@@ -0,0 +1,409 @@
+---
+title: "Refactor 2: Code Refactoring"
+teaching: 30
+exercises: 20
+questions:
+- "How do you refactor code without breaking it?"
+- "What is decoupled code?"
+- "What are benefits of using pure functions in our code?"
+objectives:
+- "Understand the benefits of code decoupling."
+- "Understand the use of regressions tests to avoid breaking existing code when refactoring."
+- "Understand the use of pure functions in software design to make the code easier to test."
+- "Refactor a piece of code to separate out 'pure' from 'impure' code."
+keypoints:
+- "Implementing regression tests before refactoring gives you confidence that your changes have not 
+broken the code."
+- "Decoupling code into pure functions that process data without side effects makes code easier 
+to read, test and maintain."
+---
+
+## Introduction
+
+*Code refactoring* is the process of improving the design of an existing code - for example 
+to make it more decoupled. 
+Recall that *code decoupling* means breaking the system into smaller components and reducing the 
+interdependence between these components, so that they can be tested and maintained independently.
+Two components of code can be considered **decoupled** if a change in one does not
+necessitate a change in the other.
+While two connected units cannot always be totally decoupled, **loose coupling**
+is something we should aim for. Benefits of decoupled code include:
+
+* easier to read as you do not need to understand the
+  details of the other component.
+* easier to test, as one of the components can be replaced
+  by a test or a mock version of it.
+* code tends to be easier to maintain, as changes can be isolated
+  from other parts of the code.
+
+When faced with an existing piece of code that needs modifying a good refactoring
+process to follow is:
+
+1. Make sure you have tests that verify the current behaviour
+2. Refactor the code
+3. Verify that that the behaviour of the code is identical to that before refactoring.
+
+In this episode we will refactor the function `analyse_data()` in `compute_data.py` 
+from our project in the following two ways:
+* add more tests so we can be more confident that future changes will have the 
+intended effect and will not break the existing code. 
+* split the monolithic `analyse_data()` function into a number of smaller and mode decoupled functions 
+making the code easier to understand and test.
+
+## Writing Tests Before Refactoring
+
+When refactoring, first we need to make sure there are tests that verity 
+the code behaviour as it is now (or write them if they are missing), 
+then refactor the code and, finally, check that the original tests still pass. 
+This is to make sure we do not break the existing behaviour through refactoring.
+
+There is a bit of a "chicken and egg" problem here - if the refactoring is supposed to make it easier 
+to write tests in the future, how can we write tests before doing the refactoring? 
+The tricks to get around this trap are:
+
+ * Test at a higher level, with coarser accuracy
+ * Write tests that you intend to remove
+
+The best tests are ones that test single bits of functionality rigorously.
+However, with our current `analyse_data()` code that is not possible because it is a 
+large function doing a little bit of everything. 
+Instead we will make minimal changes to the code to make it a bit more testable. 
+
+Firstly, 
+we will modify the function to return the data instead of visualising it because graphs are harder 
+to test automatically (i.e. they need to be viewed and inspected manually in order to determine 
+their correctness). 
+Next, we will make the assert statements verify what the outcome is 
+currently, rather than checking whether that is correct or not. 
+Such tests are meant to 
+verify that the behaviour does not *change* rather than checking the current behaviour is correct
+(there should be another set of tests checking the correctness). 
+This kind of testing is called **regression testing** as we are testing for
+regressions in existing behaviour.
+
+Refactoring code is not meant to change its behaviour, but sometimes to make it possible to verify
+you not changing the important behaviour you have to make small tweaks to the code to write
+the tests at all.
+
+> ## Exercise: Write Regression Tests
+> Modify the `analyse_data()` function not to plot a graph and return the data instead.
+> Then, add a new test file called `test_compute_data.py` in the `tests` folder and 
+> add a regression test to verify the current output of `analyse_data()`. We will use this test
+> in the remainder of this section to verify the output `analyse_data()` is unchanged each time
+> we refactor or change code in the future. 
+> 
+> Start from the skeleton test code below: 
+> 
+> ```python
+> def test_analyse_data():
+>     from catchment.compute_data import analyse_data
+>     path = Path.cwd() / "data"
+>     result = analyse_data(path)
+>
+>     # TODO: add an assert for the value of result
+> ```
+> Use `assert_array_almost_equal` from the `numpy.testing` library to
+> compare arrays of floating point numbers.
+>
+> Remember to run the test using `python -m pytest` from the project base directory:
+> ```bash
+> python -m pytest tests/test_compute_data.py
+> ```
+>
+>> ## Hint
+>> When determining the correct return data result to use in tests, it may be helpful to assert the 
+>> result equals some random made-up data, observe the test fail initially and then 
+>> copy and paste the correct result into the test.
+>>
+>> Remember also that NaN values can be defined using the numpy library (`numpy.nan`).
+> {: .solution}
+>
+>> ## Solution
+>> One approach we can take is to:
+>>  * comment out the visualize method on `analyse_data()` 
+>> (as this will cause our test to hang waiting for the result data)
+>>  * return the data instead, so we can write asserts on the data
+>>  * See what the calculated value is, and assert that it is the same as the expected value
+>> 
+>> Putting this together, your test may look like:
+>>
+>> ```python
+>> import numpy as np
+>> import numpy.testing as npt
+>> from pathlib import Path
+>>
+>> def test_analyse_data():
+>>     from catchment.compute_data import analyse_data
+>>     path = Path.cwd() / "data"
+>>     result = analyse_data(path)
+>>    expected_output = [ [0.        , 0.18801829],
+>>                        [0.10978448, 0.43107373],
+>>                        [0.06066156, 0.0699624 ],
+>>                        [0.        , 0.02041241],
+>>                        [0.        , 0.        ],
+>>                        [0.        , 0.02871518],
+>>                        [0.        , 0.17227833],
+>>                        [0.        , 0.04866643],
+>>                        [0.        , 0.02041241],
+>>                        [0.88952727, 0.        ],
+>>                        [0.        , 0.02041241],
+>>                        [0.        , 0.        ],
+>>                        [0.02041241, 0.        ],
+>>                        [0.        , 0.        ],
+>>                        [0.        , 0.        ],
+>>                        [0.        , 0.        ],
+>>                        [0.        , 0.        ],
+>>                        [0.0349812 , 0.02041241],
+>>                        [0.02871518, 0.02041241],
+>>                        [0.02041241, 0.        ],
+>>                        [0.02041241, 0.        ],
+>>                        [0.        , 0.02041241],
+>>                        [0.        , 0.        ],
+>>                        [0.        ,     np.nan],
+>>                        [0.02041241, 0.        ],
+>>                        [0.        , 0.02041241],
+>>                        [0.        , 0.02041241],
+>>                        [0.02041241, 0.        ],
+>>                        [0.13449059, 0.        ],
+>>                        [0.18285024, 0.19707288],
+>>                        [0.19176008, 0.13915472]]
+>>     npt.assert_array_almost_equal(result, expected_output)
+>> ```
+>>
+>> Note that while the above test will detect if we accidentally break the analysis code and 
+>> change the output of the analysis, is not a good or complete test for the following reasons:
+>> * It is not at all obvious why the `expected_output` is correct
+>> * It does not test edge cases
+>> * If the data files in the directory change - the test will fail
+>> 
+>> We would need additional tests to check the above.
+> {: .solution}
+{: .challenge}
+
+## Separating Pure and Impure Code
+
+Now that we have our regression test for `analyse_data()` in place, we are ready to refactor the 
+function further. 
+We would like to separate out as much of its code as possible as *pure functions*. 
+Pure functions are very useful and much easier to test as they take input only from its input 
+parameters and output only via their return values.
+
+### Pure Functions
+
+A pure function in programming works like a mathematical function -
+it takes in some input and produces an output and that output is
+always the same for the same input.
+That is, the output of a pure function does not depend on any information
+which is not present in the input (such as global variables).
+Furthermore, pure functions do not cause any *side effects* - they do not modify the input data
+or data that exist outside the function (such as printing text, writing to a file or
+changing a global variable). They perform actions that affect nothing but the value they return.
+
+### Benefits of Pure Functions
+
+Pure functions are easier to understand because they eliminate side effects.
+The reader only needs to concern themselves with the input
+parameters of the function and the function code itself, rather than
+the overall context the function is operating in. 
+Similarly, a function that calls a pure function is also easier
+to understand - we only need to understand what the function returns, which will probably
+be clear from the context in which the function is called. 
+Finally, pure functions are easier to reuse as the caller
+only needs to understand what parameters to provide, rather
+than anything else that might need to be configured prior to the call. 
+For these reasons, you should try and have as much of the complex, analytical and mathematical 
+code are pure functions.
+
+
+Some parts of a program are inevitably impure.
+Programs need to read input from users, generate a graph, or write results to a file or a database.
+Well designed programs separate complex logic from the necessary impure "glue" code that 
+interacts with users and other systems.
+This way, you have easy-to-read and easy-to-test pure code that contains the complex logic 
+and simplified impure code that reads data from a file or gathers user input. Impure code may 
+be harder to test but, when simplified like this, may only require a handful of tests anyway.
+
+> ## Exercise: Refactoring To Use a Pure Function
+> Refactor the `analyse_data()` function to delegate the data analysis to a new 
+> pure function `compute_standard_deviation_by_day()` and separate it 
+> from the impure code that handles the input and output.
+> The pure function should take in the data, and return the analysis result, as follows:
+> ```python
+> def compute_standard_deviation_by_day(data):
+>   # TODO
+>   return daily_standard_deviation
+> ```
+>> ## Solution
+>> The analysis code will be refactored into a separate function that may look something like:
+>> ```python
+>>def compute_standard_deviation_by_day(data):
+>>    daily_std_list = []
+>>    for dataset in data:
+>>        daily_std = dataset.groupby(dataset.index.date).std()
+>>        daily_std_list.append(daily_std)
+>>
+>>    daily_standard_deviation = pd.concat(daily_std_list)
+>>    return daily_standard_deviation
+>> ```
+>> The `analyse_data()` function now calls the `compute_standard_deviation_by_day()` function, 
+>> while keeping all the logic for reading the data, processing it and showing it in a graph:
+>>```python
+>>def analyse_data(data_dir):
+>>    """Calculate the standard deviation by day between datasets.
+>>
+>>    Gets all the measurement data from the CSV files in the data directory,
+>>    works out the mean for each day, and then graphs the standard deviation
+>>    of these means.
+>>    """
+>>    data_file_paths = glob.glob(os.path.join(data_dir, 'rain_data_2015*.csv'))
+>>    if len(data_file_paths) == 0:
+>>        raise ValueError('No CSV files found in the data directory')
+>>    data = map(models.read_variable_from_csv, data_file_paths)
+>>    daily_standard_deviation = compute_standard_deviation_by_day(data)
+>>
+>>    graph_data = {
+>>        'standard deviation by day': daily_standard_deviation,
+>>    }
+>>    # views.visualize(graph_data)
+>>    return daily_standard_deviation
+>>```
+>> Make sure to re-run the regression test to check this refactoring has not
+>> changed the output of `analyse_data()`.
+> {: .solution}
+{: .challenge}
+
+> ## Mapping
+> `map(f, C)` is a function that takes another function `f()`
+> and a collection `C` of data items as inputs.
+> Calling `map(f, C)` applies the function `f(x)` to every data item `x` in a collection `C`
+> and returns the resulting values as a new collection of the same size.
+>
+> This is a simple mapping that takes a list of names and
+> returns a list of the lengths of those names using the built-in function `len()`:
+> ```python
+> name_lengths = map(len, ["Mary", "Isla", "Sam"])
+> print(list(name_lengths))
+> ```
+> ```output
+> [4, 4, 3]
+> ```
+> For more information on mapping functions, and how they can be combined with reduce
+> functions, see the [Functional Programming](/34-functional-programming/index.html) episode.
+{: .callout}
+
+> ## Exercise: Mapping
+> Identify a line of code in the `analyse_data` function which uses the `map` function.
+>> ## Solution
+>> The `map` function is used with the `read_variables_from_csv` function in the `catchment/models.py` module.
+>> It creates a collection of dataframes containing the data within files defined in the list `data_file_paths`:
+>> ```python
+>> data = map(models.read_variable_from_csv, data_file_paths)
+>> ```
+> {: .solution}
+>
+> Now create a pure function, `daily_std`, to calculate the standard deviation by day for any dataframe.
+> This can take a similar form to the `daily_mean` and `daily_max` functions in the `catchment/models.py` file.
+>
+> Then replace the `for` loop below, that is in your `compute_standard_deviation_by_day` function,
+> with a `map()` function that uses the `daily_std` function to calculate the daily standard
+> deviation.
+> ```python
+> daily_std_list = []
+> for dataset in data:
+>     daily_std = dataset.groupby(dataset.index.date).std()
+>     daily_std_list.append(daily_std)
+> ```
+>> ## Solution
+>> The final functions could look like:
+>> ```python
+>> def daily_std(data):
+>>     return data.groupby(data.index.date).std()
+>>
+>>
+>> def compute_standard_deviation_by_day(data):
+>>     daily_std_list = map(daily_std, data)
+>>
+>>     daily_standard_deviation = pd.concat(daily_std_list)
+>>     return daily_standard_deviation
+>> ```
+>>
+> {: .solution}
+{: .challenge}
+
+### Testing Pure Functions
+
+Now we have our analysis implemented as a pure function, we can write tests that cover
+all the things we would like to check without depending on CSVs files.
+This is another advantage of pure functions - they are very well suited to automated testing, 
+i.e. their tests are:
+* **easier to write** - we construct input and assert the output
+without having to think about making sure the global state is correct before or after
+* **easier to read** - the reader will not have to open a CSV file to understand why 
+the test is correct
+* **easier to maintain** - if at some point the data format changes 
+from CSV to JSON, the bulk of the tests need not be updated
+
+> ## Exercise: Testing a Pure Function
+> Add tests for `compute_standard_deviation_by_day()` that check for situations 
+> when there is only one file with multiple sites, 
+> multiple files with one site, and any other cases you can think of that should be tested.
+>> ## Solution
+>> You might have thought of more tests, but we can easily extend the test by parametrizing
+>> with more inputs and expected outputs:
+>> ```python
+>>@pytest.mark.parametrize(
+>>    "data, expected_output",
+>>    [
+>>        (
+>>            [pd.DataFrame(data=[ [1.0, 0.0], [3.0, 4.0], [5.0, 8.0] ],
+>>                        index=[ pd.to_datetime('2000-01-01 01:00'),
+>>                                pd.to_datetime('2000-01-01 02:00'),
+>>                                pd.to_datetime('2000-01-01 03:00') ],
+>>                        columns=[ 'A', 'B' ])],
+>>            [ [2.0,  4.0] ]
+>>        ),
+>>        (
+>>            [pd.DataFrame(data=[ 1.0, 3.0, 5.0 ],
+>>                        index=[ pd.to_datetime('2000-01-01 01:00'),
+>>                                pd.to_datetime('2000-01-01 02:00'),
+>>                                pd.to_datetime('2000-01-01 03:00') ],
+>>                        columns=['A']),
+>>            pd.DataFrame(data=[ 0.0, 4.0, 8.0 ],
+>>                        index=[ pd.to_datetime('2000-01-01 01:00'),
+>>                                pd.to_datetime('2000-01-01 02:00'),
+>>                                pd.to_datetime('2000-01-01 03:00') ],
+>>                        columns=['B']) ],                      
+>>            [ [2.0,  4.0] ]
+>>        )
+>>    ], ids=["two datasets in same dataframe", "two datasets in two different dataframes"])
+>>def test_compute_standard_deviation_by_day(data, expected_output):
+>>    from catchment.compute_data import compute_standard_deviation_by_day
+>>
+>>    result = compute_standard_deviation_by_day(data)
+>>    npt.assert_array_almost_equal(result, expected_output)
+```
+> {: .solution}
+{: .challenge}
+
+> ## Functional Programming
+> **Functional programming** is a programming paradigm where programs are constructed by 
+> applying and composing/chaining pure functions.
+> Some programming languages, such as Haskell or Lisp, support writing pure functional code only.
+> Other languages, such as Python, Java, C++, allow mixing **functional** and **procedural** 
+> programming paradigms. 
+> Read more in the [extra episode on functional programming](/34-functional-programming/index.html)
+> and when it can be very useful to switch to this paradigm 
+> (e.g. to employ MapReduce approach for data processing).
+{: .callout}
+
+
+There are no definite rules in software design but making your complex logic out of 
+composed pure functions is a great place to start when trying to make your code readable, 
+testable and maintainable. This is particularly useful for:
+
+* Data processing and analysis 
+(for example, using [Python Pandas library](https://pandas.pydata.org/) for data manipulation where most of functions appear pure)
+* Doing simulations
+* Translating data from one format to another
+
+{% include links.md %}
diff --git a/_extras/refactor-3-code-abstractions.md b/_extras/refactor-3-code-abstractions.md
new file mode 100644
index 000000000..4a3996256
--- /dev/null
+++ b/_extras/refactor-3-code-abstractions.md
@@ -0,0 +1,482 @@
+---
+title: "Refactor 3: Code Abstractions"
+teaching: 30
+exercises: 45
+questions:
+- "When is it useful to use classes to structure code?"
+- "How can we make sure the components of our software are reusable?"
+objectives:
+- "Introduce appropriate abstractions to simplify code."
+- "Understand the principles of encapsulation, polymorphism and interfaces."
+- "Use mocks to replace a class in test code."
+keypoints:
+- "Classes and interfaces can help decouple code so it is easier to understand, test and maintain."
+- "Encapsulation is bundling related data into a structured component, 
+along with the methods that operate on the data. It is also provides a mechanism for restricting 
+the access to that data, hiding the internal representation of the component."
+- "Polymorphism describes the provision of a single interface to entities of different types, 
+or the use of a single symbol to represent different types."
+---
+
+## Introduction
+
+*Code abstraction* is the process of hiding the implementation details of a piece of
+code behind an interface - i.e. the details of *how* something works are hidden away,
+leaving us to deal only with *what* it does.
+This allows developers to work with the code at a higher level
+of abstraction, without needing to understand fully (or keep in mind) all the underlying 
+details and thereby reducing the cognitive load when programming.
+
+Abstractions can aid decoupling of code.
+If one part of the code only uses another part through an appropriate abstraction
+then it becomes easier for these parts to change independently.
+
+Let's start redesigning our code by introducing some of the abstraction techniques 
+to incrementally improve its design.
+
+You may have noticed that loading data from CSV files in a directory is "baked" into 
+(i.e. is part of) the `analyse_data()` function. 
+This is not strictly a functionality of the data analysis function, so firstly 
+let's decouple the data loading into a separate function.
+
+> ## Exercise: Decouple Data Loading from Data Analysis
+> Separate out the data loading functionality from `analyse_data()` into a new function 
+> `load_catchment_data()` that returns all the files to load.
+>> ## Solution
+>> The new function `load_catchment_data()` that reads all the data into the format needed
+>> for the analysis should look something like:
+>> ```python
+>> def load_inflammation_data(dir_path):
+>>   data_file_paths = glob.glob(os.path.join(dir_path, 'rain_data_2015*.csv'))
+>>   if len(data_file_paths) == 0:
+>>       raise ValueError('No CSV files found in the data directory')
+>>   data = map(models.load_csv, data_file_paths)
+>>   return list(data)
+>> ```
+>> This function can now be used in the analysis as follows:
+>> ```python
+>> def analyse_data(data_dir):
+>>   data = load_catchment_data(data_dir)
+>>   daily_standard_deviation = compute_standard_deviation_by_data(data)
+>>   ...
+>> ```
+>> The code is now easier to follow since we do not need to understand the the data loading from
+>> files to read the statistical analysis, and vice versa - we do not have to understand the 
+>> statistical analysis when looking at data loading.
+>> Ensure you re-run the regression tests to check this refactoring has not
+>> changed the output of `analyse_data()`.
+> {: .solution}
+{: .challenge}
+
+However, even with this change, the data loading is still coupled with the data analysis.
+For example, if we have to support loading data from different sources 
+(e.g. JSON files and CSV files), we would have to pass some kind of a flag indicating 
+what we want into `analyse_data()`. Instead, we would like to decouple the 
+consideration of what data to load from the `analyse_data()` function entirely.
+One way we can do this is by using *encapsulation* and *classes*.
+
+## Encapsulation & Classes
+
+*Encapsulation* is the packing of "data" and "functions operating on that data" into a 
+single component/object. 
+It is also provides a mechanism for restricting the access to that data. 
+Encapsulation means that the internal representation of a component is generally hidden 
+from view outside of the component's definition.
+
+Encapsulation allows developers to present a consistent interface to an object/component
+that is independent of its internal implementation. 
+For example, encapsulation can be used to hide the values or 
+state of a structured data object inside a **class**, preventing direct access to them 
+that could violate the object's state maintained by the class' methods. 
+Note that object-oriented programming (OOP) languages support encapsulation, 
+but encapsulation is not unique to OOP.
+
+So, a class is a way of grouping together data with some methods that manipulate that data.
+In Python, you can *declare* a class as follows:
+
+```python
+class Circle:
+  pass
+```
+
+Classes are typically named using "CapitalisedWords" naming convention - e.g. FileReader, 
+OutputStream, Rectangle.
+
+You can *construct* an *instance* of a class elsewhere in the code by doing the following:
+
+```python
+my_circle = Circle()
+```
+
+When you construct a class in this ways, the class' *constructor* method is called.
+It is also possible to pass values to the constructor in order to configure the class instance:
+
+```python
+class Circle:
+  def __init__(self, radius):
+    self.radius = radius
+
+my_circle = Circle(10)
+```
+
+The constructor has the special name `__init__`.
+Note it has a special first parameter called `self` by convention - it is 
+used to access the current *instance* of the object being created.
+
+A class can be thought of as a cookie cutter template, and instances as the cookies themselves.
+That is, one class can have many instances.
+
+Classes can also have other methods defined on them.
+Like constructors, they have the special parameter `self` that must come first.
+
+```python
+import math
+
+class Circle:
+  ...
+  def get_area(self):
+    return math.pi * self.radius * self.radius
+...
+print(my_circle.get_area())
+```
+
+On the last line of the code above, the instance of the class, `my_circle`, will be automatically
+passed as the first parameter (`self`) when calling the `get_area()` method.
+The `get_area()` method can then access the variable `radius` encapsulated within the object, which 
+is otherwise invisible to the world outside of the object. 
+The method `get_area()` itself can also be accessed via the object/instance only.
+
+As we can see, internal representation of any instance of class `Circle` is hidden 
+outside of this class (encapsulation). 
+In addition, implementation of the method `get_area()` is hidden too (abstraction).
+
+> ## Encapsulation & Abstraction
+> Encapsulation provides **information hiding**. Abstraction provides **implementation hiding**.
+{: .callout}
+
+> ## Exercise: Use Classes to Abstract out Data Loading
+> Declare a new class `CSVDataSource` that contains the `load_catchment_data` function 
+> we wrote in the previous exercise as a method of this class.
+> The directory path where to load the files from should be passed in the class' constructor method.
+> Finally, construct an instance of the class `CSVDataSource` outside the statistical 
+> analysis and pass it to `analyse_data()` function.
+>> ## Hint
+>> At the end of this exercise, the code in the `analyse_data()` function should look like:
+>> ```python
+>> def analyse_data(data_source):
+>>   data = data_source.load_catchment_data()
+>>   daily_standard_deviation = compute_standard_deviation_by_data(data)
+>>   ...
+>> ```
+>> The controller code should look like:
+>> ```python
+>> data_source = compute_data.CSVDataSource(os.path.dirname(InFiles[0]))
+>> compute_data.analyse_data(data_source)
+>> ```
+> {: .solution}
+>> ## Solution
+>> For example, we can declare class `CSVDataSource` like this:
+>>
+>> ```python
+>> class CSVDataSource:
+>>   """
+>>   Loads all the catchment CSV files within a specified directory.
+>>   """
+>>   def __init__(self, dir_path):
+>>     self.dir_path = dir_path
+>>
+>>   def load_catchment_data(self):
+>>     data_file_paths = glob.glob(os.path.join(self.dir_path, 'rain_data_2015*.csv'))
+>>     if len(data_file_paths) == 0:
+>>       raise ValueError('No CSV files found in the data directory')
+>>     data = map(models.read_variable_from_csv, data_file_paths)
+>>     return list(data)
+>> ```
+>> In the controller, we create an instance of CSVDataSource and pass it 
+>> into the the statistical analysis function.
+>>
+>> ```python
+>> data_source = CSVDataSource(os.path.dirname(InFiles[0]))
+>> analyse_data(data_source)
+>> ```
+>> The `analyse_data()` function is modified to receive any data source object (that implements 
+>> the `load_catchment_data()` method) as a parameter.
+>> ```python
+>> def analyse_data(data_source):
+>>   data = data_source.load_catchment_data()
+>>   daily_standard_deviation = compute_standard_deviation_by_data(data)
+>>   ...
+>> ```
+>> We have now fully decoupled the reading of the data from the statistical analysis and 
+>> the analysis is not fixed to reading from a directory of CSV files. Indeed, we can pass various 
+>> data sources to this function now, as long as they implement the `load_catchment_data()` 
+>> method. 
+>> 
+>> While the overall behaviour of the code and its results are unchanged, 
+>> the way we invoke data analysis has changed. 
+>> We must update our regression test to match this, to ensure we have not broken anything:
+>> ```python
+>> ...
+>> def test_compute_data():
+>>     from catchment.compute_data import analyse_data, CSVDataSource
+>>     path = Path.cwd() / "../data"
+>>     data_source = CSVDataSource(path)
+>>     result = analyse_data(data_source)
+>>     expected_output = [ [0.        , 0.18801829],
+>>     ...
+>> ```
+> {: .solution}
+{: .challenge}
+
+
+## Interfaces
+
+An interface is another important concept in software design related to abstraction and 
+encapsulation. For a software component, it declares the operations that can be invoked on 
+that component, along with input arguments and what it returns. By knowing these details, 
+we can communicate with this component without the need to know how it implements this interface. 
+
+API (Application Programming Interface) is one example of an interface that allows separate 
+systems (external to one another) to communicate with each other. 
+For example, a request to Google Maps service API may get 
+you the latitude and longitude for a given address. 
+Twitter API may return all tweets that contain 
+a given keyword that have been posted within a certain date range.
+
+Internal interfaces within software dictate how
+different parts of the system interact with each other.
+Even when these are not explicitly documented or thought out, they still exist.
+
+For example, our `Circle` class implicitly has an interface - you can call `get_area()` method
+on it and it will return a number representing its surface area.
+
+> ## Exercise: Identify an Interface Between `CSVDataSource` and `analyse_data`
+> What is the interface between CSVDataSource class and `analyse_data()` function.
+> Think about what functions `analyse_data()` needs to be able to call to perform its duty,
+> what parameters they need and what they return.
+>> ## Solution
+>> The interface is the `load_catchment_data()` method, which takes no parameters and 
+>> returns a list where each entry is a 2D array of catchment measurement data (read from some 
+>> data source).
+>> 
+>> Any object passed into `analyse_data()` should conform to this interface.
+> {: .solution}
+{: .challenge}
+
+
+## Polymorphism
+
+In general, polymorphism is the idea of having multiple implementations/forms/shapes 
+of the same abstract concept. 
+It is the provision of a single interface to entities of different types, 
+or the use of a single symbol to represent multiple different types.
+
+There are [different versions of polymorphism](https://www.bmc.com/blogs/polymorphism-programming/). 
+For example, method or operator overloading is one 
+type of polymorphism enabling methods and operators to take parameters of different types. 
+
+We will have a look at the interface-based polymorphism. 
+In OOP, it is possible to have different object classes that conform to the same interface. 
+For example, let's have a look at the following class representing a `Rectangle`:
+
+```python
+class Rectangle:
+  def __init__(self, width, height):
+    self.width = width
+    self.height = height
+  def get_area(self):
+    return self.width * self.height
+```
+
+Like `Circle`, this class provides the `get_area()` method.
+The method takes the same number of parameters (none), and returns a number.
+However, the implementation is different. This is one type of *polymorphism*.
+
+The word "polymorphism" means "many forms", and in programming it refers to 
+methods/functions/operators with the same name that can be executed on many objects or classes.
+
+Using our `Circle` and `Rectangle` classes, we can create a list of different shapes and iterate 
+through the list to find their total surface area as follows:
+
+```python
+my_circle = Circle(radius=10)
+my_rectangle = Rectangle(width=5, height=3)
+my_shapes = [my_circle, my_rectangle]
+total_area = sum(shape.get_area() for shape in my_shapes)
+```
+
+Note that we have not created a common superclass or linked the classes `Circle` and `Rectangle` 
+together in any way. It is possible due to polymorphism. 
+You could also say that, when we are calculating the total surface area, 
+the method for calculating the area of each shape is abstracted away to the relevant class.
+
+How can polymorphism be useful in our software project? 
+For example, we can replace our `CSVDataSource` with another class that reads a totally 
+different file format (e.g. JSON instead of CSV), or reads from an external service or database
+All of these changes can be now be made without changing the analysis function as we have decoupled 
+the process of data loading from the data analysis earlier.
+Conversely, if we wanted to write a new analysis function, we could support any of these 
+data sources with no extra work.
+
+> ## Exercise: Add an Additional DataSource
+> Create another class that supports loading catchment data from JSON files, with the 
+> appropriate `load_catchment_data()` method.
+> There is a function in `models.py` that loads from JSON in the following format:
+> ```json
+>[
+>    {
+>        "Site": "FP35",
+>        "Site Name": "Lower Wraxall Farm",
+>        "Date": "01/12/2008 23:00",
+>        "Rainfall (mm)": 0.0
+>    },
+>    {
+>        "Site": "FP35",
+>        "Site Name": "Lower Wraxall Farm",
+>        "Date": "01/12/2008 23:15",
+>        "Rainfall (mm)": 0.0
+>    }
+> ]
+> ```
+> Finally, at run time construct an appropriate instance based on the file extension.
+>> ## Solution
+>> The new class could look something like:
+>> ```python
+>> class JSONDataSource:
+>>     """
+>>     Loads patient data with catchment values from JSON files within a specified folder.
+>>     """
+>>     def __init__(self, dir_path):
+>>         self.dir_path = dir_path
+>>
+>>     def load_catchment_data(self):
+>>         data_file_paths = glob.glob(os.path.join(self.dir_path, 'rain_data_2015*.json'))
+>>         if len(data_file_paths) == 0:
+>>             raise ValueError('No JSON files found in the data directory')
+>>         data = map(models.load_json, data_file_paths)
+>>         return list(data)
+>> ```
+>> Additionally, in the controller will need to select the appropriate DataSource to
+>> provide to the analysis:
+>>```python
+>> _, extension = os.path.splitext(InFiles[0])
+>> if extension == '.json':
+>>   data_source = JSONDataSource(os.path.dirname(InFiles[0]))
+>> elif extension == '.csv':
+>>   data_source = CSVDataSource(os.path.dirname(InFiles[0]))
+>> else:
+>>   raise ValueError(f'Unsupported file format: {extension}')
+>> analyse_data(data_source)
+>>```
+>> As you can seen, all the above changes have been made made without modifying
+>> the analysis code itself.
+> {: .solution}
+{: .challenge}
+
+## Testing Using Mock Objects
+
+We can use this abstraction to also make testing more straight forward.
+Instead of having our tests use real file system data, we can instead provide
+a mock or dummy implementation instead of one of the real classes.
+Providing that what we use as a substitute conforms to the same interface, 
+the code we are testing should work just the same.
+Such mock/dummy implementation could just returns some fixed example data.
+
+An convenient way to do this in Python is using Python's [mock object library](https://docs.python.org/3/library/unittest.mock.html).
+This is a whole topic in itself -
+but a basic mock can be constructed using a couple of lines of code:
+
+```python
+from unittest.mock import Mock
+
+mock_version = Mock()
+mock_version.method_to_mock.return_value = 42
+```
+
+Here we construct a mock in the same way you would construct a class.
+Then we specify a method that we want to behave a specific way.
+
+Now whenever you call `mock_version.method_to_mock()` the return value will be `42`.
+
+
+> ## Exercise: Test Using a Mock Implementation
+> Complete this test for `analyse_data()`, using a mock object in place of the
+> `data_source`:
+> ```python
+> from unittest.mock import Mock
+>
+> def test_compute_data_mock_source():
+>   from catchment.compute_data import analyse_data
+>   data_source = Mock()
+>
+>   # TODO: configure data_source mock
+>
+>   result = analyse_data(data_source)
+>
+>   # TODO: add assert on the contents of result
+> ```
+> Create a mock that returns some fixed data and to use as the `data_source` in order to test
+> the `analyse_data` method.
+> Use this mock in a test.
+>
+> Do not forget to import `Mock` from the `unittest.mock` package.
+>> ## Solution
+>> ```python
+>> from unittest.mock import Mock
+>>
+>> def test_compute_data_mock_source():
+>>    from catchment.compute_data import analyse_data
+>>    data_source = Mock()
+>>
+>>    data_source.load_catchment_data.return_value = [pd.DataFrame(
+>>                     data=[[1.0, 1.0],
+>>                           [2.0, 1.0],
+>>                           [4.0, 2.0]],
+>>                     index=[pd.to_datetime('2000-01-01 01:00'),
+>>                            pd.to_datetime('2000-01-01 02:00'),
+>>                            pd.to_datetime('2000-01-01 03:00')],
+>>                     columns=['A', 'B']
+>>    )]
+>>
+>>    result = analyse_data(data_source)
+>>    npt.assert_array_almost_equal(result, [[1.527525, 0.57735 ]])
+>> ```
+> {: .solution}
+{: .challenge}
+
+## Programming Paradigms
+
+Until now, we have mainly been writing procedural code. 
+In the previous episode, we mentioned [pure functions](/33-code-refactoring/index.html#pure-functions) 
+and Functional Programming.
+In this episode, we have touched a bit upon classes, encapsulation and polymorphism, 
+which are characteristics of (but not limited to) the Object Oriented Programming (OOP).
+All these different programming paradigms provide varied approaches to structuring your code - 
+each with certain strengths and weaknesses when used to solve particular types of problems. 
+In many cases, particularly with modern languages, a single language can allow many different 
+structural approaches and mixing programming paradigms within your code.
+Once your software begins to get more complex - it is common to use aspects of [different paradigm](/programming-paradigms/index.html) 
+to handle different subtasks. 
+Because of this, it is useful to know about the [major paradigms](/programming-paradigms/index.html), 
+so you can recognise where it might be useful to switch. 
+This is outside of scope of this course - we have some extra episodes on the topics of 
+[Procedural Programming](/programming-paradigms/index.html#procedural-programming), 
+[Functional Programming](/functional-programming/index.html) and 
+[Object Oriented Programming](/object-oriented-programming/index.html) if you want to know more.
+
+> ## So Which One is Python?
+> Python is a multi-paradigm and multi-purpose programming language.
+> You can use it as a procedural language and you can use it in a more object oriented way.
+> It does tend to land more on the object oriented side as all its core data types
+> (strings, integers, floats, booleans, lists,
+> sets, arrays, tuples, dictionaries, files)
+> as well as functions, modules and classes are objects.
+>
+> Since functions in Python are also objects that can be passed around like any other object,
+> Python is also well suited to functional programming.
+> One of the most popular Python libraries for data manipulation,
+> [Pandas](https://pandas.pydata.org/) (built on top of NumPy),
+> supports a functional programming style
+> as most of its functions on data are not changing the data (no side effects)
+> but producing a new data to reflect the result of the function.
+{: .callout}
diff --git a/_extras/refactor-4-architecture-revisited.md b/_extras/refactor-4-architecture-revisited.md
new file mode 100644
index 000000000..660ddda11
--- /dev/null
+++ b/_extras/refactor-4-architecture-revisited.md
@@ -0,0 +1,570 @@
+---
+title: "Refactor 4: Architecture Revisited: Extending Software"
+teaching: 15
+exercises: 0
+questions:
+- "How can we extend our software within the constraints of the MVC architecture?"
+objectives:
+- "Extend our software to add a view of a single patient in the study and the software's command line interface to request a specific view."
+keypoints:
+- "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change.
+  Such components can be as small as a single function, or be a software package in their own right."
+---
+
+As we have seen, we have different programming paradigms that are suitable for different problems
+and affect the structure of our code.
+In programming languages that support multiple paradigms, such as Python,
+we have the luxury of using elements of different paradigms paradigms and we,
+as software designers and programmers,
+can decide how to use those elements in different architectural components of our software.
+Let's now circle back to the architecture of our software for one final look.
+
+## MVC Revisited
+
+We've been developing our software using the **Model-View-Controller** (MVC) architecture so far,
+but, as we have seen, MVC is just one of the common architectural patterns
+and is not the only choice we could have made.
+
+### Separation of Responsibilities
+
+Separation of responsibilities is important when designing software architectures
+in order to reduce the code's complexity and increase its maintainability.
+Note, however, there are limits to everything -
+and MVC architecture is no exception.
+Controller often transcends into Model and View
+and a clear separation is sometimes difficult to maintain.
+For example, the Command Line Interface provides both the View
+(what user sees and how they interact with the command line)
+and the Controller (invoking of a command) aspects of a CLI application.
+In Web applications, Controller often manipulates the data (received from the Model)
+before displaying it to the user or passing it from the user to the Model.
+
+There are many variants of an MVC-like pattern (such as
+[Model-View-Presenter](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93presenter) (MVP),
+[Model-View-Viewmodel](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel) (MVVM), etc.),
+but in most cases, the distinction between these patterns isn't particularly important.
+What really matters is that we are making decisions about the architecture of our software
+that suit the way in which we expect to use it.
+We should reuse these established ideas where we can, but we don't need to stick to them exactly.
+
+The key thing to take away is the distinction between the Model and the View code, while
+the View and the Controller can be more or less coupled together (e.g. the code that specifies 
+there is a button on the screen, might be the same code that specifies what that button does).
+The View may be hard to test, or use special libraries to draw the UI, but should not contain any 
+complex logic, and is really just a presentation layer on top of the Model.
+The Model, conversely, should not care how the data is displayed. 
+For example, the View may present dates as "Monday 24th July 2023", 
+but the Model stores it using a `Date` object rather than its string representation.
+
+## Our Project's Architecture (Revisited)
+
+Recall that in our software project, the **Controller** module is in `catchment-analysis.py`, 
+and the View and Model modules are contained in 
+`catchment/views.py` and `catchment/models.py`, respectively.
+Data underlying the Model is contained within the directory `data`.
+
+Looking at the code in the branch `full-data-analysis` (where we should be currently located),
+we can notice that the new code was added in a separate script `catchment/compute_data.py` and 
+contains a mix of Model, View and Controller code.
+
+> ## Exercise: Identify Model, View and Controller Parts of the Code
+> Looking at the code inside `compute_data.py`, what parts could be considered 
+> Model, View and Controller code?
+>
+>> ## Solution
+>> * Computing the standard deviation belongs to Model.
+>> * Reading the data from CSV files also belongs to Model.
+>> * Displaying of the output as a graph is View.
+>> * The logic that processes the supplied files is Controller.
+> {: .solution}
+{: .challenge}
+
+Within the Model further separations make sense.
+For example, as we did in the before, separating out the impure code that interacts with 
+the file system from the pure calculations helps with readability and testability.
+Nevertheless, the MVC architectural pattern is a great starting point when thinking about 
+how you should structure your code.
+
+> ## Exercise: Split out the Model, View and Controller Code
+> Refactor `analyse_data()` function so that the Model, View and Controller code 
+> we identified in the previous exercise is moved to appropriate modules.
+>> ## Solution
+>> The idea here is for the `analyse_data()` function not to have any "view" considerations.
+>> That is, it should just compute and return the data and 
+>> should be located in `catchment/models.py`.
+>>
+>> ```python
+>> def analyse_data(data_source):
+>>     """Calculate the standard deviation by day between datasets
+>>     Gets all the measurement data from the CSV files in the data directory,
+>>     works out the mean for each day, and then graphs the standard deviation
+>>     of these means.
+>>     """
+>>     data = data_source.load_catchment_data()
+>>     daily_standard_deviation = compute_standard_deviation_by_data(data)
+>>
+>>     return daily_standard_deviation
+>> ```
+>> There can be a separate bit of code in the Controller `catchment-analysis.py` 
+>> that chooses how data should be presented, e.g. as a graph:
+>>
+>> ```python
+>> if args.full_data_analysis:
+>>   _, extension = os.path.splitext(InFiles[0])
+>>   if extension == '.json':
+>>     data_source = JSONDataSource(os.path.dirname(InFiles[0]))
+>>   elif extension == '.csv':
+>>     data_source = CSVDataSource(os.path.dirname(InFiles[0]))
+>>   else:
+>>     raise ValueError(f'Unsupported file format: {extension}')
+>>   data_result = analyse_data(data_source)
+>>   graph_data = {
+>>     'daily standard deviation': data_result,
+>>   }
+>>   views.visualize(graph_data)
+>>   return
+>> ```
+>> Note that this is, more or less, the change we did to write our regression test.
+>> This demonstrates that splitting up Model code from View code can
+>> immediately make your code much more testable.
+>> Ensure you re-run our regression test to check this refactoring has not
+>> changed the output of `analyse_data()`.
+> {: .solution}
+{: .challenge}
+
+At this point, you have refactored and tested all the code on branch `full-data-analysis`
+and it is working as expected. The branch is ready to be incorporated into `develop`
+and then, later on, `main`, which may also have been changed by other developers working on
+the code at the same time so make sure to update accordingly or resolve any conflicts.
+
+~~~
+$ git switch develop
+$ git merge full-data-analysis
+~~~
+{: .language-bash}
+
+Let's now have a closer look at our Controller, and how can handling command line arguments in Python
+(which is something you may find yourself doing often if you need to run the code from a 
+command line tool).
+
+
+### Controller file structure
+
+You will have noticed already that structure of the `catchment-analysis.py` file
+follows this pattern:
+
+~~~
+# import modules
+
+def main():
+    # perform some actions
+
+if __name__ == "__main__":
+    # perform some actions before main()
+    main()
+~~~
+{: .language-python}
+
+In this pattern the actions performed by the script are contained within the `main` function
+(which does not need to be called `main`,
+but using this convention helps others in understanding your code).
+The `main` function is then called within the `if` statement `__name__ == "__main__"`,
+after some other actions have been performed
+(usually the parsing of command-line arguments, which will be explained below).
+`__name__` is a special dunder variable which is set,
+along with a number of other special dunder variables,
+by the python interpreter before the execution of any code in the source file.
+What value is given by the interpreter to `__name__` is determined by
+the manner in which it is loaded.
+
+If we run the source file directly using the Python interpreter, e.g.:
+
+~~~
+python catchment-analysis.py
+~~~
+{: .language-bash}
+then the interpreter will assign the hard-coded string `"__main__"` to the `__name__` variable:
+
+~~~
+__name__ = "__main__"
+...
+# rest of your code
+~~~
+{: .language-python}
+
+However, if your source file is imported by another Python script, e.g:
+
+~~~
+import catchment-analysis
+~~~
+{: .language-python}
+
+then the interpreter will assign the name `"catchment-analysis"`
+from the import statement to the `__name__` variable:
+
+~~~
+__name__ = "catchment-analysis"
+...
+# rest of your code
+~~~
+{: .language-python}
+
+Because of this behaviour of the interpreter,
+we can put any code that should only be executed when running the script
+directly within the `if __name__ == "__main__":` structure,
+allowing the rest of the code within the script to be
+safely imported by another script if we so wish.
+
+While it may not seem very useful to have your controller script importable by another script,
+there are a number of situations in which you would want to do this:
+
+- for testing of your code, you can have your testing framework import the main script,
+  and run special test functions which then call the `main` function directly;
+- where you want to not only be able to run your script from the command-line,
+  but also provide a programmer-friendly application programming interface (API) for advanced users.
+
+### Passing Command-line Options to Controller
+
+The standard python library for reading command line arguments passed to a script is
+[`argparse`](https://docs.python.org/3/library/argparse.html).
+This module reads arguments passed by the system,
+and enables the automatic generation of help and usage messages.
+These include, as we saw at the start of this course,
+the generation of helpful error messages when users give the program invalid arguments. 
+
+The basic usage of `argparse` can be seen in the `catchment-analysis.py` script.
+First we import the library:
+
+~~~
+import argparse
+~~~
+{: .language-python}
+
+We then initialise the argument parser class, passing an (optional) description of the program:
+
+~~~
+parser = argparse.ArgumentParser(
+    description='A basic environmental data management system')
+~~~
+{: .language-python}
+
+Once the parser has been initialised we can add
+the arguments that we want argparse to look out for.
+In our basic case, we want only the names of the file(s) to process: 
+
+~~~
+parser.add_argument(
+    'infiles',
+    nargs='+',
+    help='Input CSV(s) containing measurement data')
+~~~
+{: .language-python}
+
+Here we have defined what the argument will be called (`'infiles'`) when it is read in;
+the number of arguments to be expected
+(`nargs='+'`, where `'+'` indicates that there should be 1 or more arguments passed);
+and a help string for the user
+(`help='Input CSV(s) containing measurement data'`).
+
+You can add as many arguments as you wish,
+and these can be either mandatory (as the one above) or optional.
+Most of the complexity in using `argparse` is in adding the correct argument options,
+and we will explain how to do this in more detail below.
+
+Finally we parse the arguments passed to the script using:
+
+~~~
+args = parser.parse_args()
+~~~
+{: .language-python}
+
+This returns an object (that we've called `arg`) containing all the arguments requested. 
+These can be accessed using the names that we have defined for each argument,
+e.g. `args.infiles` would return the filenames that have been input.
+
+The help for the script can be accessed using the `-h` or `--help` optional argument
+(which `argparse` includes by default):
+
+~~~
+python catchment-analysis.py --help
+~~~
+{: .language-bash}
+~~~
+usage: catchment-analysis.py [-h] infiles [infiles ...]
+
+A basic environmental data management system
+
+positional arguments:
+  infiles     Input CSV(s) containing measurement data
+
+optional arguments:
+  -h, --help  show this help message and exit
+~~~
+{: .output}
+
+The help page starts with the command line usage,
+illustrating what inputs can be given (any within `[]` brackets are optional).
+It then lists the **positional** and **optional** arguments,
+giving as detailed a description of each as you have added to the `add_argument()` command. 
+Positional arguments are arguments that need to be included
+in the proper position or order when calling the script.
+
+Note that optional arguments are indicated by `-` or `--`, followed by the argument name.
+Positional arguments are simply inferred by their position.
+It is possible to have multiple positional arguments,
+but usually this is only practical where all (or all but one) positional arguments
+contains a clearly defined number of elements.
+If more than one option can have an indeterminate number of entries,
+then it is better to create them as 'optional' arguments.
+These can be made a required input though,
+by setting `required = True` within the `add_argument()` command. 
+
+> ## Positional and Optional Argument Order
+>
+> The usage section of the help page above shows 
+> the optional arguments going before the positional arguments. 
+> This is the customary way to present options, but is not mandatory.
+> Instead there are two rules which must be followed for these arguments:
+>
+> 1. Positional and optional arguments must each be given all together, and not inter-mixed. 
+>    For example, the order can be either `optional - positional` or `positional - optional`,
+>    but not `optional - positional - optional`.
+> 2. Positional arguments must be given in the order that they are shown
+>    in the usage section of the help page. 
+{: .callout}
+
+Now that you have some familiarity with `argparse`,
+we will demonstrate below how you can use this to add extra functionality to your controller.
+
+### Choosing the Measurement Dataseries
+
+Up until now we have only read the rainfall data from our `data/rain_data_2015-12.csv` file.
+But what if we want to read the river measurement data too?
+We can, simply, change the file that we are reading,
+by passing a different file name.
+But when we do this with the river data we get the following error:
+~~~
+python catchment-analysis.py data/river_data_2015-12.csv
+~~~
+{: .language-bash}
+~~~
+Traceback (most recent call last):
+  File "/Users/mbessdl2/work/manchester/Course_Material/Intermediate_Programming_Skills/python-intermediate-rivercatchment-template/catchment-analysis.py", line 39, in <module>
+    main(args)
+  File "/Users/mbessdl2/work/manchester/Course_Material/Intermediate_Programming_Skills/python-intermediate-rivercatchment-template/catchment-analysis.py", line 22, in main
+    measurement_data = models.read_variable_from_csv(filename)
+  File "/Users/mbessdl2/work/manchester/Course_Material/Intermediate_Programming_Skills/python-intermediate-rivercatchment-template/catchment/models.py", line 22, in read_variable_from_csv
+    dataset = pd.read_csv(filename, usecols=['Date', 'Site', 'Rainfall (mm)'])
+...
+ValueError: Usecols do not match columns, columns expected but not found: ['Rainfall (mm)']
+~~~
+{: .output}
+
+This error message tells us that the pandas `read_csv` function
+has failed to find one of the columns that are listed to be read.
+We would not expect a column called `'Rainfall (mm)'` in the river data file,
+so we need to make the `read_variable_from_csv` more flexible,
+so that it can read any defined measurement dataset.
+
+The first step is to add an argument to our command line interface,
+so that users can specify the measurement dataset.
+This can be done by adding the following argument to your `catchment-analysis.py` script:
+~~~
+    parser.add_argument(
+        '-m', '--measurements',
+        help = 'Name of measurement data series to load',
+        required = True)
+~~~
+{: .language-python}
+Here we have defined the name of the argument (`--measurements`),
+as well as a short name (`-m`) for lazy users to use.
+Note that the short name is preceded by a single dash (`-`),
+while the full name is preceded by two dashes (`--`).
+We provide a `help` string for the user,
+and finally we set `required = True`,
+so that the end user must define which data series they want to read.
+
+Once this is added, then your help message should look like this:
+~~~
+python catchment-analysis.py --help
+~~~
+{: .language-bash} 
+~~~
+usage: catchment-analysis.py [-h] -m MEASUREMENTS infiles [infiles ...]
+
+A basic environmental data management system
+
+positional arguments:
+  infiles               Input CSV(s) containing measurement data
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -m MEASUREMENTS, --measurements MEASUREMENTS
+                        Name of measurement data series to use
+~~~
+{: .output}
+
+> ## Optional vs Required Arguments, and Argument Groups
+> You will note that the `--measurements` argument is still listed as an optional argument.
+> This is because the two basic option groups in `argparse` are
+> positional and optional. 
+> In the usage section the `--measurements` option is listed without `[]` brackets,
+> indicating that it is an expected argument,
+> but still this is not very clear for end users.
+> 
+> To make the help clearer we can add an extra argument group,
+> and assign `--measurements` to this:
+> ~~~
+> ...
+>     req_group = parser.add_argument_group('required arguments')
+> ...
+>     req_group.add_argument(
+>         '-m', '--measurements',
+>         help = 'Name of measurement data series to load',
+>         required = True)
+> ...
+> ~~~
+> {: .language-python}
+> This will return the following help message:
+> ~~~
+> python catchment-analysis.py --help
+> ~~~
+> {: .language-bash} 
+> ~~~
+> usage: catchment-analysis.py [-h] -m MEASUREMENTS infiles [infiles ...]
+> 
+> A basic environmental data management system
+> 
+> positional arguments:
+>   infiles               Input CSV(s) containing measurement data
+> 
+> optional arguments:
+>   -h, --help            show this help message and exit
+> 
+> required arguments:
+>   -m MEASUREMENTS, --measurements MEASUREMENTS
+>                         Name of measurement data series to use
+> ~~~
+> {: .output}
+> This solution is not perfect, because the positional arguments are also required,
+> but it will at least help end users distinguish between optional and required flagged arguments.
+{: .callout}
+
+> ## Default Argument Number and Type
+> `argparse` will, by default, assume that each argument added will take a single value, 
+> and will be a string (`type = str`). If you want to change this for any argument you 
+> should explicitly set `type` and `nargs`.
+>
+> Note also, that the returned object will be a single item unless `nargs` has been set,
+> in which case a list of items is returned (even if `nargs = 1` is used).
+{: .callout}
+
+
+#### Controller and Model Adaption
+
+The new measurement string needs to be passed to the `read_variable_from_csv` function,
+and applied appropriately within that function.
+First we add a `measurements` argument to the `read_variable_from_csv` function in `catchment/models.py`
+(remembering to update the function docstring at the same time):
+~~~
+# catchment/models.py
+...
+def read_variable_from_csv(filename, measurement):
+    """Reads a named variable from a CSV file, and returns a
+    pandas dataframe containing that variable. The CSV file must contain
+    a column of dates, a column of site ID's, and (one or more) columns
+    of data - only one of which will be read.
+
+    :param filename: Filename of CSV to load
+    :param measurement: Name of data column to be read
+    :return: 2D array of given variable. Index will be dates,
+             Columns will be the individual sites
+    """
+...
+~~~
+{: .language-python}
+Following this we need to change two lines of code,
+the first being the CSV reading code,
+and the second being the code which reorganises the dataset before it is returned:
+~~~
+# catchment/models.py
+...
+def read_variable_from_csv(filename, measurement):
+...
+    dataset = pd.read_csv(filename, usecols=['Date', 'Site', measurement])
+...
+    for site in dataset['Site'].unique():
+        newdataset[site] = dataset[dataset['Site'] == site].set_index('Date')[measurement]
+...
+~~~
+{: .language-python}
+
+
+Finally, within the `main` function of the controller we should add `args.measurements` as an argument:
+~~~
+# catchment-analysis.py
+...
+def main(args):
+...
+    for filename in in_files:
+        measurement_data = models.read_variable_from_csv(filename, args.measurements)
+...
+~~~
+{: .language-python}
+
+You can now test your new code, to ensure it works as expected:
+~~~
+python catchment-analysis.py -m 'Rainfall (mm)' data/rain_data_2015-12.csv
+~~~
+{: .language-bash}
+![Rainfall daily metrics](../fig/rainfall_daily_metrics.png){: .image-with-shadow width="800px" }
+
+~~~
+python catchment-analysis.py -m 'pH continuous' data/river_data_2015-12.csv
+~~~
+{: .language-bash}
+![River pH daily metrics](../fig/pH_daily_metrics.png){: .image-with-shadow width="800px" }
+
+Note that we have to use quotation marks to
+pass any strings which contain spaces or special characters,
+so that they are properly read by the parser.
+
+
+
+> ## Additional Material
+>
+> Now that we've covered the basics of different programming paradigms
+> and how we can integrate them into our multi-layer architecture,
+> there are two optional extra episodes which you may find interesting.
+>
+> Both episodes cover the persistence layer of software architectures
+> and methods of persistently storing data, but take different approaches.
+> The episode on [persistence with JSON](../persistence) covers
+> some more advanced concepts in Object Oriented Programming, while
+> the episode on [databases](../databases) starts to build towards a true multilayer architecture,
+> which would allow our software to handle much larger quantities of data.
+{: .callout}
+
+
+## Towards Collaborative Software Development
+
+Having looked at some theoretical aspects of software design,
+we are now circling back to implementing our software design
+and developing our software to satisfy the requirements collaboratively in a team.
+At an intermediate level of software development,
+there is a wealth of practices that could be used,
+and applying suitable design and coding practices is what separates
+an intermediate developer from someone who has just started coding.
+The key for an intermediate developer is to balance these concerns
+for each software project appropriately,
+and employ design and development practices enough so that progress can be made.
+
+One practice that should always be considered,
+and has been shown to be very effective in team-based software development,
+is that of *code review*.
+Code reviews help to ensure the 'good' coding standards are achieved
+and maintained within a team by having multiple people
+have a look and comment on key code changes to see how they fit within the codebase.
+Such reviews check the correctness of the new code, test coverage, functionality changes,
+and confirm that they follow the coding guides and best practices.
+In the following episodes we will have a look at some code review techniques available to us.