From f3d36fb0a25f967d40378bae2ec61de264b679f5 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 14:56:46 +0100 Subject: [PATCH 01/82] Remove old course content and add new pages The top sections are filled out to give an idea of content --- _episodes/32-software-design.md | 253 +----- _episodes/33-programming-paradigms.md | 175 ---- _episodes/33-refactoring-functions | 15 + _episodes/34-functional-programming.md | 825 ------------------ _episodes/34-refactoring-architecture | 15 + _episodes/35-object-oriented-programming.md | 904 -------------------- _episodes/35-refactoring-decoupled-units | 15 + _episodes/36-architecture-revisited.md | 444 ---------- 8 files changed, 49 insertions(+), 2597 deletions(-) delete mode 100644 _episodes/33-programming-paradigms.md create mode 100644 _episodes/33-refactoring-functions delete mode 100644 _episodes/34-functional-programming.md create mode 100644 _episodes/34-refactoring-architecture delete mode 100644 _episodes/35-object-oriented-programming.md create mode 100644 _episodes/35-refactoring-decoupled-units delete mode 100644 _episodes/36-architecture-revisited.md diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 18dbe2ae7..6020472a8 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -4,261 +4,16 @@ teaching: 15 exercises: 30 questions: - "What should we consider when designing software?" -- "How can we make sure the components of our software are reusable?" +- "What goals should we have when structuring our code" objectives: -- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." -- "Understand the components of multi-layer software architectures." +- "Understand what an abstraction is, and when you should use one" +- "Understand what refactoring is" keypoints: -- "Planning software projects in advance can save a lot of effort and reduce 'technical debt' later - even a partial plan is better than no plan at all." +- "How code is structured is important for helping future people understand and update it" - "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. Such components can be as small as a single function, or be a software package in their own right." - "When writing software used for research, requirements will almost *always* change." - "*'Good code is written so that is readable, understandable, covered by automated tests, not over complicated and does well what is intended to do.'*" --- -## Introduction - -In this episode, we'll be looking at how we can design our software -to ensure it meets the requirements, -but also retains the other qualities of good software. -As a piece of software grows, -it will reach a point where there's too much code for us to keep in mind at once. -At this point, it becomes particularly important that the software be designed sensibly. -What should be the overall structure of our software, -how should all the pieces of functionality fit together, -and how should we work towards fulfilling this overall design throughout development? - -It's not easy to come up with a complete definition for the term **software design**, -but some of the common aspects are: - -- **Algorithm design** - - what method are we going to use to solve the core business problem? -- **Software architecture** - - what components will the software have and how will they cooperate? -- **System architecture** - - what other things will this software have to interact with and how will it do this? -- **UI/UX** (User Interface / User Experience) - - how will users interact with the software? - -As usual, the sooner you adopt a practice in the lifecycle of your project, the easier it will be. -So we should think about the design of our software from the very beginning, -ideally even before we start writing code - -but if you didn't, it's never too late to start. - - -The answers to these questions will provide us with some **design constraints** -which any software we write must satisfy. -For example, a design constraint when writing a mobile app would be -that it needs to work with a touch screen interface - -we might have some software that works really well from the command line, -but on a typical mobile phone there isn't a command line interface that people can access. - - -## Software Architecture - -At the beginning of this episode we defined **software architecture** -as an answer to the question -"what components will the software have and how will they cooperate?". -Software engineering borrowed this term, and a few other terms, -from architects (of buildings) as many of the processes and techniques have some similarities. -One of the other important terms we borrowed is 'pattern', -such as in **design patterns** and **architecture patterns**. -This term is often attributed to the book -['A Pattern Language' by Christopher Alexander *et al.*](https://en.wikipedia.org/wiki/A_Pattern_Language) -published in 1977 -and refers to a template solution to a problem commonly encountered when building a system. - -Design patterns are relatively small-scale templates -which we can use to solve problems which affect a small part of our software. -For example, the **[adapter pattern](https://en.wikipedia.org/wiki/Adapter_pattern)** -(which allows a class that does not have the "right interface" to be reused) -may be useful if part of our software needs to consume data -from a number of different external data sources. -Using this pattern, -we can create a component whose responsibility is -transforming the calls for data to the expected format, -so the rest of our program doesn't have to worry about it. - -Architecture patterns are similar, -but larger scale templates which operate at the level of whole programs, -or collections or programs. -Model-View-Controller (which we chose for our project) is one of the best known architecture patterns. -Many patterns rely on concepts from Object Oriented Programming, -so we'll come back to the MVC pattern shortly -after we learn a bit more about Object Oriented Programming. - -There are many online sources of information about design and architecture patterns, -often giving concrete examples of cases where they may be useful. -One particularly good source is [Refactoring Guru](https://refactoring.guru/design-patterns). - - -### Multilayer Architecture - -One common architectural pattern for larger software projects is **Multilayer Architecture**. -Software designed using this architecture pattern is split into layers, -each of which is responsible for a different part of the process of manipulating data. - -Often, the software is split into three layers: - -- **Presentation Layer** - - This layer is responsible for managing the interaction between - our software and the people using it - - May include the **View** components if also using the MVC pattern -- **Application Layer / Business Logic Layer** - - This layer performs most of the data processing required by the presentation layer - - Likely to include the **Controller** components if also using an MVC pattern - - May also include the **Model** components -- **Persistence Layer / Data Access Layer** - - This layer handles data storage and provides data to the rest of the system - - May include the **Model** components of an MVC pattern - if they're not in the application layer - -Although we've drawn similarities here between the layers of a system and the components of MVC, -they're actually solutions to different scales of problem. -In a small application, a multilayer architecture is unlikely to be necessary, -whereas in a very large application, -the MVC pattern may be used just within the presentation layer, -to handle getting data to and from the people using the software. - -## Addressing New Requirements - -So, let's assume we now want to extend our application - -designed around an MVC architecture - with some new functionalities -(more statistical processing and a new view to see a patient's data). -Let's recall the solution requirements we discussed in the previous episode: - -- *Functional Requirements*: - - SR1.1.1 (from UR1.1): - add standard deviation to data model and include in graph visualisation view - - SR1.2.1 (from UR1.2): - add a new view to generate a textual representation of statistics, - which is invoked by an optional command line argument -- *Non-functional Requirements*: - - SR2.1.1 (from UR2.1): - generate graphical statistics report on clinical workstation configuration in under 30 seconds - -### How Should We Test These Requirements? - -Sometimes when we make changes to our code that we plan to test later, -we find the way we've implemented that change doesn't lend itself well to how it should be tested. -So what should we do? - -Consider requirement SR1.2.1 - -we have (at least) two things we should test in some way, -for which we could write unit tests. -For the textual representation of statistics, -in a unit test we could invoke our new view function directly -with known inflammation data and test the text output as a string against what is expected. -The second one, invoking this new view with an optional command line argument, -is more problematic since the code isn't structured in a way where -we can easily invoke the argument parsing portion to test it. -To make this more amenable to unit testing we could -move the command line parsing portion to a separate function, -and use that in our unit tests. -So in general, it's a good idea to make sure -your software's features are modularised and accessible via logical functions. - -We could also consider writing unit tests for SR2.1.1, -ensuring that the system meets our performance requirement, so should we? -We do need to verify it's being met with the modified implementation, -however it's generally considered bad practice to use unit tests for this purpose. -This is because unit tests test *if* a given aspect is behaving correctly, -whereas performance tests test *how efficiently* it does it. -Performance testing produces measurements of performance which require a different kind of analysis -(using techniques such as [*code profiling*](https://towardsdatascience.com/how-to-assess-your-code-performance-in-python-346a17880c9f)), -and require careful and specific configurations of operating environments to ensure fair testing. -In addition, unit testing frameworks are not typically designed for conducting such measurements, -and only test units of a system, -which doesn't give you an idea of performance of the system -as it is typically used by stakeholders. - -The key is to think about which kind of testing should be used -to check if the code satisfies a requirement, -but also what you can do to make that code amenable to that type of testing. - -> ## Exercise: Implementing Requirements -> Pick one of the requirements SR1.1.1 or SR1.2.1 above to implement -> and create an appropriate feature branch - -> e.g. `add-std-dev` or `add-view` from your most up-to-date `develop` branch. -> -> One aspect you should consider first is -> whether the new requirement can be implemented within the existing design. -> If not, how does the design need to be changed to accommodate the inclusion of this new feature? -> Also try to ensure that the changes you make are amenable to unit testing: -> is the code suitably modularised -> such that the aspect under test can be easily invoked -> with test input data and its output tested? -> -> If you have time, feel free to implement the other requirement, or invent your own! -> -> Also make sure you push changes to your new feature branch remotely -> to your software repository on GitHub. -> -> **Note: do not add the tests for the new feature just yet - -> even though you would normally add the tests along with the new code, -> we will do this in a later episode. -> Equally, do not merge your changes to the `develop` branch just yet.** -> -> **Note 2: we have intentionally left this exercise without a solution -> to give you more freedom in implementing it how you see fit. -> If you are struggling with adding a new view and command line parameter, -> you may find the standard deviation requirement easier. -> A later episode in this section will look at -> how to handle command line parameters in a scalable way.** -{: .challenge} - -## Best Practices for 'Good' Software Design - -Aspirationally, what makes good code can be summarised in the following quote from the -[Intent HG blog](https://intenthq.com/blog/it-audience/what-is-good-code-a-scientific-definition/): - -> *“Good code is written so that is readable, understandable, -> covered by automated tests, not over complicated -> and does well what is intended to do.”* - -By taking time to design our software to be easily modifiable and extensible, -we can save ourselves a lot of time later when requirements change. -The sooner we do this the better - -ideally we should have at least a rough design sketched out for our software -before we write a single line of code. -This design should be based around the structure of the problem we're trying to solve: -what are the concepts we need to represent -and what are the relationships between them. -And importantly, who will be using our software and how will they interact with it? - -Here's another way of looking at it. - -Not following good software design and development practices -can lead to accumulated 'technical debt', -which (according to [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt)), -is the "cost of additional rework caused by choosing an easy (limited) solution now -instead of using a better approach that would take longer". -So, the pressure to achieve project goals can sometimes lead to quick and easy solutions, -which make the software become -more messy, more complex, and more difficult to understand and maintain. -The extra effort required to make changes in the future is the interest paid on the (technical) debt. -It's natural for software to accrue some technical debt, -but it's important to pay off that debt during a maintenance phase - -simplifying, clarifying the code, making it easier to understand - -to keep these interest payments on making changes manageable. -If this isn't done, the software may accrue too much technical debt, -and it can become too messy and prohibitive to maintain and develop, -and then it cannot evolve. - -Importantly, there is only so much time available. -How much effort should we spend on designing our code properly -and using good development practices? -The following [XKCD comic](https://xkcd.com/844/) summarises this tension: - -![Writing good code comic](../fig/xkcd-good-code-comic.png){: .image-with-shadow width="400px" } - -At an intermediate level there are a wealth of practices that *could* be used, -and applying suitable design and coding practices is what separates -an *intermediate developer* from someone who has just started coding. -The key for an intermediate developer is to balance these concerns -for each software project appropriately, -and employ design and development practices *enough* so that progress can be made. -It's very easy to under-design software, -but remember it's also possible to over-design software too. - {% include links.md %} diff --git a/_episodes/33-programming-paradigms.md b/_episodes/33-programming-paradigms.md deleted file mode 100644 index 520708b54..000000000 --- a/_episodes/33-programming-paradigms.md +++ /dev/null @@ -1,175 +0,0 @@ ---- -title: "Programming Paradigms" -start: false -teaching: 10 -exercises: 0 -questions: -- "How does the structure of a problem affect the structure of our code?" -- "How can we use common software paradigms to improve the quality of our software?" -objectives: -- "Describe some of the major software paradigms we can use to classify programming languages." -keypoints: -- "A software paradigm describes a way of structuring or reasoning about code." -- "Different programming languages are suited to different paradigms." -- "Different paradigms are suited to solving different classes of problems." -- "A single piece of software will often contain instances of multiple paradigms." ---- - -## Introduction - -As you become more experienced in software development it becomes increasingly important -to understand the wider landscape in which you operate, -particularly in terms of the software decisions the people around you made and why? -Today, there are a multitude of different programming languages, -with each supporting at least one way to approach a problem and structure your code. -In many cases, particularly with modern languages, -a single language can allow many different structural approaches within your code. - -One way to categorise these structural approaches is into **paradigms**. -Each paradigm represents a slightly different way of thinking about and structuring our code -and each has certain strengths and weaknesses when used to solve particular types of problems. -Once your software begins to get more complex -it's common to use aspects of different paradigms to handle different subtasks. -Because of this, it's useful to know about the major paradigms, -so you can recognise where it might be useful to switch. - -There are two major families that we can group the common programming paradigms into: -**Imperative** and **Declarative**. -An imperative program uses statements that change the program's state - -it consists of commands for the computer to perform -and focuses on describing **how** a program operates step by step. -A declarative program expresses the logic of a computation -to describe **what** should be accomplished -rather than describing its control flow as a sequence steps. - -We will look into three major paradigms -from the imperative and declarative families that may be useful to you - -**Procedural Programming**, **Functional Programming** and **Object-Oriented Programming**. -Note, however, that most of the languages can be used with multiple paradigms, -and it is common to see multiple paradigms within a single program - -so this classification of programming languages based on the paradigm they use isn't as strict. - -## Procedural Programming - -Procedural Programming comes from a family of paradigms known as the Imperative Family. -With paradigms in this family, we can think of our code as the instructions for processing data. - -Procedural Programming is probably the style you're most familiar with -and the one we used up to this point, -where we group code into -*procedures performing a single task, with exactly one entry and one exit point*. -In most modern languages we call these **functions**, instead of procedures - -so if you're grouping your code into functions, this might be the paradigm you're using. -By grouping code like this, we make it easier to reason about the overall structure, -since we should be able to tell roughly what a function does just by looking at its name. -These functions are also much easier to reuse than code outside of functions, -since we can call them from any part of our program. - -So far we have been using this technique in our code - -it contains a list of instructions that execute one after the other starting from the top. -This is an appropriate choice for smaller scripts and software -that we're writing just for a single use. -Aside from smaller scripts, Procedural Programming is also commonly seen -in code focused on high performance, with relatively simple data structures, -such as in High Performance Computing (HPC). -These programs tend to be written in C (which doesn't support Object Oriented Programming) -or Fortran (which didn't until recently). -HPC code is also often written in C++, -but C++ code would more commonly follow an Object Oriented style, -though it may have procedural sections. - -Note that you may sometimes hear people refer to this paradigm as "functional programming" -to contrast it with Object Oriented Programming, -because it uses functions rather than objects, -but this is incorrect. -Functional Programming is a separate paradigm that -places much stronger constraints on the behaviour of a function -and structures the code differently as we'll see soon. - -## Functional Programming - -Functional Programming comes from a different family of paradigms - -known as the Declarative Family. -The Declarative Family is a distinct set of paradigms -which have a different outlook on what a program is - -here code describes *what* data processing should happen. -What we really care about here is the outcome - how this is achieved is less important. - -Functional Programming is built around -a more strict definition of the term **function** borrowed from mathematics. -A function in this context can be thought of as -a mapping that transforms its input data into output data. -Anything a function does other than produce an output is known as a **side effect** -and should be avoided wherever possible. - -Being strict about this definition allows us to -break down the distinction between **code** and **data**, -for example by writing a function which accepts and transforms other functions - -in Functional Programming *code is data*. - -The most common application of Functional Programming in research is in data processing, -especially when handling **Big Data**. -One popular definition of Big Data is -data which is too large to fit in the memory of a single computer, -with a single dataset sometimes being multiple terabytes or larger. -With datasets like this, we can't move the data around easily, -so we often want to send our code to where the data is instead. -By writing our code in a functional style, -we also gain the ability to run many operations in parallel -as it's guaranteed that each operation won't interact with any of the others - -this is essential if we want to process this much data in a reasonable amount of time. - -## Object Oriented Programming - -Object Oriented Programming focuses on the specific characteristics of each object -and what each object can do. -An object has two fundamental parts - properties (characteristics) and behaviours. -In Object Oriented Programming, -we first think about the data and the things that we're modelling - and represent these by objects. - -For example, if we're writing a simulation for our chemistry research, -we're probably going to need to represent atoms and molecules. -Each of these has a set of properties which we need to know about -in order for our code to perform the tasks we want - -in this case, for example, we often need to know the mass and electric charge of each atom. -So with Object Oriented Programming, -we'll have some **object** structure which represents an atom and all of its properties, -another structure to represent a molecule, -and a relationship between the two (a molecule contains atoms). -This structure also provides a way for us to associate code with an object, -representing any **behaviours** it may have. -In our chemistry example, this could be our code for calculating the force between a pair of atoms. - -Most people would classify Object Oriented Programming as an -[extension of the Imperative family of languages](https://www.digitalocean.com/community/tutorials/functional-imperative-object-oriented-programming-comparison) -(with the extra feature being the objects), but -[others disagree](https://stackoverflow.com/questions/38527078/what-is-the-difference-between-imperative-and-object-oriented-programming). - -> ## So Which one is Python? -> Python is a multi-paradigm and multi-purpose programming language. -> You can use it as a procedural language and you can use it in a more object oriented way. -> It does tend to land more on the object oriented side as all its core data types -> (strings, integers, floats, booleans, lists, -> sets, arrays, tuples, dictionaries, files) -> as well as functions, modules and classes are objects. -> -> Since functions in Python are also objects that can be passed around like any other object, -> Python is also well suited to functional programming. -> One of the most popular Python libraries for data manipulation, -> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), -> supports a functional programming style -> as most of its functions on data are not changing the data (no side effects) -> but producing a new data to reflect the result of the function. -{: .callout} - -## Other Paradigms - -The three paradigms introduced here are some of the most common, -but there are many others which may be useful for addressing specific classes of problem - -for much more information see the Wikipedia's page on -[programming paradigms](https://en.wikipedia.org/wiki/Programming_paradigm). -Having mainly used Procedural Programming so far, -we will now have a closer look at Functional and Object Oriented Programming paradigms -and how they can affect our architectural design choices. - -{% include links.md %} diff --git a/_episodes/33-refactoring-functions b/_episodes/33-refactoring-functions new file mode 100644 index 000000000..aa240023c --- /dev/null +++ b/_episodes/33-refactoring-functions @@ -0,0 +1,15 @@ +--- +title: "Refactoring functions to do just one thing" +teaching: 0 +exercises: 0 +questions: +- "How do you refactor code without breaking it?" +- "How do you write code that is easy to test?" +objectives: +- "Understand how to refactor functions to be easier to test" +- "Be able to write regressions tests to avoid breaking existing code" +- "Understand what a pure function is." +keypoints: +- "By refactoring code into pure functions that act on data makes code easier to test." +- "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" +--- diff --git a/_episodes/34-functional-programming.md b/_episodes/34-functional-programming.md deleted file mode 100644 index 45431b994..000000000 --- a/_episodes/34-functional-programming.md +++ /dev/null @@ -1,825 +0,0 @@ ---- -title: "Functional Programming" -teaching: 30 -exercises: 30 -questions: -- What is functional programming? -- Which situations/problems is functional programming well suited for? -objectives: -- Describe the core concepts that define the functional programming paradigm -- Describe the main characteristics of code that is written in functional programming style -- Learn how to generate and process data collections efficiently using MapReduce and Python's comprehensions -keypoints: -- Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations). -- In functional programming, functions tend to be *pure* - they do not exhibit *side-effects* (by not affecting anything other than the value they return or anything outside a function). Functions can also be named, passed as arguments, and returned from other functions, just as any other data type. -- MapReduce is an instance of a data generation and processing approach, in particular suited for functional programming and handling Big Data within parallel and distributed environments. -- Python provides comprehensions for lists, dictionaries, sets and generators - a concise (if not strictly functional) way to generate new data from existing data collections while performing sophisticated mapping, filtering and conditional logic on original dataset's members. ---- - -## Introduction - -Functional programming is a programming paradigm where -programs are constructed by applying and composing/chaining **functions**. -Functional programming is based on the -[mathematical definition of a function](https://en.wikipedia.org/wiki/Function_(mathematics)) -`f()`, -which applies a transformation to some input data giving us some other data as a result -(i.e. a mapping from input `x` to output `f(x)`). -Thus, a program written in a functional style becomes a series of transformations on data -which are performed to produce a desired output. -Each function (transformation) taken by itself is simple and straightforward to understand; -complexity is handled by composing functions in various ways. - -Often when we use the term function we are referring to -a construct containing a block of code which performs a particular task and can be reused. -We have already seen this in procedural programming - -so how are functions in functional programming different? -The key difference is that functional programming is focussed on -**what** transformations are done to the data, -rather than **how** these transformations are performed -(i.e. a detailed sequence of steps which update the state of the code to reach a desired state). -Let's compare and contrast examples of these two programming paradigms. - -## Functional vs Procedural Programming - -The following two code examples implement the calculation of a factorial -in procedural and functional styles, respectively. -Recall that the factorial of a number `n` (denoted by `n!`) is calculated as -the product of integer numbers from 1 to `n`. - -The first example provides a procedural style factorial function. - -~~~ -def factorial(n): - """Calculate the factorial of a given number. - - :param int n: The factorial to calculate - :return: The resultant factorial - """ - if n < 0: - raise ValueError('Only use non-negative integers.') - - factorial = 1 - for i in range(1, n + 1): # iterate from 1 to n - # save intermediate value to use in the next iteration - factorial = factorial * i - - return factorial -~~~ -{: .language-python} - -Functions in procedural programming are *procedures* that describe -a detailed list of instructions to tell the computer what to do step by step -and how to change the state of the program and advance towards the result. -They often use *iteration* to repeat a series of steps. -Functional programming, on the other hand, typically uses *recursion* - -an ability of a function to call/repeat itself until a particular condition is reached. -Let's see how it is used in the functional programming example below -to achieve a similar effect to that of iteration in procedural programming. - -~~~ -# Functional style factorial function -def factorial(n): - """Calculate the factorial of a given number. - - :param int n: The factorial to calculate - :return: The resultant factorial - """ - if n < 0: - raise ValueError('Only use non-negative integers.') - - if n == 0 or n == 1: - return 1 # exit from recursion, prevents infinite loops - else: - return n * factorial(n-1) # recursive call to the same function -~~~ -{: .language-python} - -Note: You may have noticed that both functions in the above code examples have the same signature -(i.e. they take an integer number as input and return its factorial as output). -You could easily swap these equivalent implementations -without changing the way that the function is invoked. -Remember, a single piece of software may well contain instances of multiple programming paradigms - -including procedural, functional and object-oriented - -it is up to you to decide which one to use and when to switch -based on the problem at hand and your personal coding style. - -Functional computations only rely on the values that are provided as inputs to a function -and not on the state of the program that precedes the function call. -They do not modify data that exists outside the current function, including the input data - -this property is referred to as the *immutability of data*. -This means that such functions do not create any *side effects*, -i.e. do not perform any action that affects anything other than the value they return. -For example: printing text, -writing to a file, -modifying the value of an input argument, -or changing the value of a global variable. -Functions without side affects -that return the same data each time the same input arguments are provided -are called *pure functions*. - -> ## Exercise: Pure Functions -> -> Which of these functions are pure? -> If you're not sure, explain your reasoning to someone else, do they agree? -> -> ~~~ -> def add_one(x): -> return x + 1 -> -> def say_hello(name): -> print('Hello', name) -> -> def append_item_1(a_list, item): -> a_list += [item] -> return a_list -> -> def append_item_2(a_list, item): -> result = a_list + [item] -> return result -> ~~~ -> {: .language-python} -> -> > ## Solution -> > -> > 1. `add_one` is pure - it has no effects other than to return a value and this value will always be the same when given the same inputs -> > 2. `say_hello` is not pure - printing text counts as a side effect, even though it is the clear purpose of the function -> > 3. `append_item_1` is not pure - the argument `a_list` gets modified as a side effect - try this yourself to prove it -> > 4. `append_item_2` is pure - the result is a new variable, so this time `a_list` does not get modified - again, try this yourself -> {: .solution} -{: .challenge} - -## Benefits of Functional Code - -There are a few benefits we get when working with pure functions: - -- Testability -- Composability -- Parallelisability - -**Testability** indicates how easy it is to test the function - usually meaning unit tests. -It is much easier to test a function if we can be certain that -a particular input will always produce the same output. -If a function we are testing might have different results each time it runs -(e.g. a function that generates random numbers drawn from a normal distribution), -we need to come up with a new way to test it. -Similarly, it can be more difficult to test a function with side effects -as it is not always obvious what the side effects will be, or how to measure them. - -**Composability** refers to the ability to make a new function from a chain of other functions -by piping the output of one as the input to the next. -If a function does not have side effects or non-deterministic behaviour, -then all of its behaviour is reflected in the value it returns. -As a consequence of this, any chain of combined pure functions is itself pure, -so we keep all these benefits when we are combining functions into a larger program. -As an example of this, we could make a function called `add_two`, -using the `add_one` function we already have. - -~~~ -def add_two(x): - return add_one(add_one(x)) -~~~ -{: .language-python} - -**Parallelisability** is the ability for operations to be performed at the same time (independently). -If we know that a function is fully pure and we have got a lot of data, -we can often improve performance by -splitting data and distributing the computation across multiple processors. -The output of a pure function depends only on its input, -so we will get the right result regardless of when or where the code runs. - -> ## Everything in Moderation -> Despite the benefits that pure functions can bring, -> we should not be trying to use them everywhere. -> Any software we write needs to interact with the rest of the world somehow, -> which requires side effects. -> With pure functions you cannot read any input, write any output, -> or interact with the rest of the world in any way, -> so we cannot usually write useful software using just pure functions. -> Python programs or libraries written in functional style will usually not be -> as extreme as to completely avoid reading input, writing output, -> updating the state of internal local variables, etc.; -> instead, they will provide a functional-appearing interface -> but may use non-functional features internally. -> An example of this is the [Python Pandas library](https://pandas.pydata.org/) -> for data manipulation built on top of NumPy - -> most of its functions appear pure -> as they return new data objects instead of changing existing ones. -{: .callout} - -There are other advantageous properties that can be derived from the functional approach to coding. -In languages which support functional programming, -a function is a *first-class object* like any other object - -not only can you compose/chain functions together, -but functions can be used as inputs to, -passed around or returned as results from other functions -(remember, in functional programming *code is data*). -This is why functional programming is suitable for processing data efficiently - -in particular in the world of Big Data, where code is much smaller than the data, -sending the code to where data is located is cheaper and faster than the other way round. -Let's see how we can do data processing using functional programming. - -## MapReduce Data Processing Approach - -When working with data you will often find that you need to -apply a transformation to each datapoint of a dataset -and then perform some aggregation across the whole dataset. -One instance of this data processing approach is known as MapReduce -and is applied when processing (but not limited to) Big Data, -e.g. using tools such as [Spark](https://en.wikipedia.org/wiki/Apache_Spark) -or [Hadoop](https://hadoop.apache.org/). -The name MapReduce comes from applying an operation to (mapping) each value in a dataset, -then performing a reduction operation which -collects/aggregates all the individual results together to produce a single result. -MapReduce relies heavily on composability and parallelisability of functional programming - -both map and reduce can be done in parallel and on smaller subsets of data, -before aggregating all intermediate results into the final result. - -### Mapping -`map(f, C)` is a function takes another function `f()` and a collection `C` of data items as inputs. -Calling `map(f, L)` applies the function `f(x)` to every data item `x` in a collection `C` -and returns the resulting values as a new collection of the same size. - -This is a simple mapping that takes a list of names and -returns a list of the lengths of those names using the built-in function `len()`: - -~~~ -name_lengths = map(len, ["Mary", "Isla", "Sam"]) -print(list(name_lengths)) -~~~ -{: .language-python} -~~~ -[4, 4, 3] -~~~ -{: .output} - -This is a mapping that squares every number in the passed collection using anonymous, -inlined *lambda* expression (a simple one-line mathematical expression representing a function): - -~~~ -squares = map(lambda x: x * x, [0, 1, 2, 3, 4]) -print(list(squares)) -~~~ -{: .language-python} -~~~ -[0, 1, 4, 9, 16] -~~~ -{: .output} - -> ## Lambda -> Lambda expressions are used to create anonymous functions that can be used to -> write more compact programs by inlining function code. -> A lambda expression takes any number of input parameters and -> creates an anonymous function that returns the value of the expression. -> So, we can use the short, one-line `lambda x, y, z, ...: expression` code -> instead of defining and calling a named function `f()` as follows: -> ~~~ -> def f(x, y, z, ...): -> return expression -> ~~~ -> {: .language-python} -> The major distinction between lambda functions and ‘normal’ functions is that -> lambdas do not have names. -> We could give a name to a lambda expression if we really wanted to - -> but at that point we should be using a ‘normal’ Python function instead. -> -> ~~~ -> # Don't do this -> add_one = lambda x: x + 1 -> -> # Do this instead -> def add_one(x): -> return x + 1 -> ~~~ -> {: .language-python} -{: .callout} - -In addition to using built-in or inlining anonymous lambda functions, -we can also pass a named function that we have defined ourselves to the `map()` function. - -~~~ -def add_one(num): - return num + 1 - -result = map(add_one, [0, 1, 2]) -print(list(result)) -~~~ -{: .language-python} -~~~ -[1, 2, 3] -~~~ -{: .output} - -> ## Exercise: Check Inflammation Patient Data Against A Threshold Using Map -> Write a new function called `daily_above_threshold()` in our inflammation `models.py` that -> determines whether or not each daily inflammation value for a given patient -> exceeds a given threshold. -> -> Given a patient row number in our data, the patient dataset itself, and a given threshold, -> write the function to use `map()` to generate and return a list of booleans, -> with each value representing whether or not the daily inflammation value for that patient -> exceeded the given threshold. -> -> Ordinarily we would use Numpy's own `map` feature, -> but for this exercise, let's try a solution without it. -> -> > ## Solution -> > ~~~ -> > def daily_above_threshold(patient_num, data, threshold): -> > """Determine whether or not each daily inflammation value exceeds a given threshold for a given patient. -> > -> > :param patient_num: The patient row number -> > :param data: A 2D data array with inflammation data -> > :param threshold: An inflammation threshold to check each daily value against -> > :returns: A boolean list representing whether or not each patient's daily inflammation exceeded the threshold -> > """ -> > -> > return list(map(lambda x: x > threshold, data[patient_num])) -> > ~~~ -> > {: .language-python} -> > -> > Note: `map()` function returns a map iterator object -> > which needs to be converted to a collection object -> > (such as a list, dictionary, set, tuple) -> > using the corresponding "factory" function (in our case `list()`). -> {: .solution} -{: .challenge} - -#### Comprehensions for Mapping/Data Generation - -Another way you can generate new collections of data from existing collections in Python is -using *comprehensions*, -which are an elegant and concise way of creating data from -[iterable objects](https://www.w3schools.com/python/python_iterators.asp) using *for loops*. -While not a pure functional concept, -comprehensions provide data generation functionality -and can be used to achieve the same effect as the built-in "pure functional" function `map()`. -They are commonly used and actually recommended as a replacement of `map()` in modern Python. -Let's have a look at some examples. - -~~~ -integers = range(5) -double_ints = [2 * i for i in integers] - -print(double_ints) -~~~ -{: .language-python} -~~~ -[0, 2, 4, 6, 8] -~~~ -{: .output} - -The above example uses a *list comprehension* to double each number in a sequence. -Notice the similarity between the syntax for a list comprehension and a for loop - -in effect, this is a for loop compressed into a single line. -In this simple case, the code above is equivalent to using a map operation on a sequence, -as shown below: - -~~~ -integers = range(5) -double_ints = map(lambda i: 2 * i, integers) -print(list(double_ints)) -~~~ -{: .language-python} -~~~ -[0, 2, 4, 6, 8] -~~~ -{: .output} - -We can also use list comprehensions to filter data, by adding the filter condition to the end: - -~~~ -double_even_ints = [2 * i for i in integers if i % 2 == 0] -print(double_even_ints) -~~~ -{: .language-python} -~~~ -[0, 4, 8] -~~~ -{: .output} - -> ## Set and Dictionary Comprehensions and Generators -> We also have *set comprehensions* and *dictionary comprehensions*, -> which look similar to list comprehensions -> but use the set literal and dictionary literal syntax, respectively. -> ~~~ -> double_even_int_set = {2 * i for i in integers if i % 2 == 0} -> print(double_even_int_set) -> -> double_even_int_dict = {i: 2 * i for i in integers if i % 2 == 0} -> print(double_even_int_dict) -> ~~~ -> {: .language-python} -> ~~~ -> {0, 4, 8} -> {0: 0, 2: 4, 4: 8} -> ~~~ -> {: .output} -> -> Finally, there’s one last ‘comprehension’ in Python - a *generator expression* - -> a type of an iterable object which we can take values from and loop over, -> but does not actually compute any of the values until we need them. -> Iterable is the generic term for anything we can loop or iterate over - -> lists, sets and dictionaries are all iterables. -> ->The `range` function is an example of a generator - -> if we created a `range(1000000000)`, but didn’t iterate over it, -> we’d find that it takes almost no time to do. -> Creating a list containing a similar number of values would take much longer, -> and could be at risk of running out of memory. -> -> We can build our own generators using a generator expression. -> These look much like the comprehensions above, -> but act like a generator when we use them. -> Note the syntax difference for generator expressions - -> parenthesis are used in place of square or curly brackets. -> -> ~~~ -> doubles_generator = (2 * i for i in integers) -> for x in doubles_generator: -> print(x) -> ~~~ -> {: .language-python} -> ~~~ -> 0 -> 2 -> 4 -> 6 -> 8 -> ~~~ -> {: .output} -{: .callout} - - -Let's now have a look at reducing the elements of a data collection into a single result. - -### Reducing - -`reduce(f, C, initialiser)` function accepts a function `f()`, -a collection `C` of data items -and an optional `initialiser`, -and returns a single cumulative value which -aggregates (reduces) all the values from the collection into a single result. -The reduction function first applies the function `f()` to the first two values in the collection -(or to the `initialiser`, if present, and the first item from `C`). -Then for each remaining value in the collection, -it takes the result of the previous computation -and the next value from the collection as the new arguments to `f()` -until we have processed all of the data and reduced it to a single value. -For example, if collection `C` has 5 elements, the call `reduce(f, C)` calculates: - -~~~ -f(f(f(f(C[0], C[1]), C[2]), C[3]), C[4]) -~~~ - -One example of reducing would be to calculate the product of a sequence of numbers. - -~~~ -from functools import reduce - -l = [1, 2, 3, 4] - -def product(a, b): - return a * b - -print(reduce(product, l)) - -# The same reduction using a lambda function -print(reduce((lambda a, b: a * b), l)) -~~~ -{: .language-python} -~~~ -24 -24 -~~~ -{: .output} - -Note that `reduce()` is not a built-in function like `map()` - -you need to import it from library `functools`. - -> ## Exercise: Calculate the Sum of a Sequence of Numbers Using Reduce -> Using reduce calculate the sum of a sequence of numbers. -> Although in practice we would use the built-in `sum()` function for this - try doing it without it. -> -> > ## Solution -> > ~~~ -> > from functools import reduce -> > -> > l = [1, 2, 3, 4] -> > -> > def add(a, b): -> > return a + b -> > -> > print(reduce(add, l)) -> > -> > # The same reduction using a lambda function -> > print(reduce((lambda a, b: a + b), l)) -> > ~~~ -> > {: .language-python} -> > ~~~ -> > 10 -> > 10 -> > ~~~ -> > {: .output} -> {: .solution} -{: .challenge} - -### Putting It All Together -Let's now put together what we have learned about map and reduce so far -by writing a function that calculates the sum of the squares of the values in a list -using the MapReduce approach. - -~~~ -from functools import reduce - -def sum_of_squares(l): - squares = [x * x for x in l] # use list comprehension for mapping - return reduce(lambda a, b: a + b, squares) -~~~ -{: .language-python} - -We should see the following behaviour when we use it: - -~~~ -print(sum_of_squares([0])) -print(sum_of_squares([1])) -print(sum_of_squares([1, 2, 3])) -print(sum_of_squares([-1])) -print(sum_of_squares([-1, -2, -3])) -~~~ -{: .language-python} -~~~ -0 -1 -14 -1 -14 -~~~ -{: .output} - -Now let’s assume we’re reading in these numbers from an input file, -so they arrive as a list of strings. -We'll modify the function so that it passes the following tests: - -~~~ -print(sum_of_squares(['1', '2', '3'])) -print(sum_of_squares(['-1', '-2', '-3'])) -~~~ -{: .language-python} -~~~ -14 -14 -~~~ -{: .output} - -The code may look like: - -~~~ -from functools import reduce - -def sum_of_squares(l): - integers = [int(x) for x in l] - squares = [x * x for x in integers] - return reduce(lambda a, b: a + b, squares) -~~~ -{: .language-python} - -Finally, like comments in Python, we’d like it to be possible for users to -comment out numbers in the input file they give to our program. -We'll finally extend our function so that the following tests pass: - -~~~ -print(sum_of_squares(['1', '2', '3'])) -print(sum_of_squares(['-1', '-2', '-3'])) -print(sum_of_squares(['1', '2', '#100', '3'])) -~~~ -{: .language-python} -~~~ -14 -14 -14 -~~~ -{: .output} - -To do so, we may filter out certain elements and have: - -~~~ -from functools import reduce - -def sum_of_squares(l): - integers = [int(x) for x in l if x[0] != '#'] - squares = [x * x for x in integers] - return reduce(lambda a, b: a + b, squares) -~~~ -{: .language-python} - ->## Exercise: Extend Inflammation Threshold Function Using Reduce -> Extend the `daily_above_threshold()` function you wrote previously -> to return a count of the number of days a patient's inflammation is over the threshold. -> Use `reduce()` over the boolean array that was previously returned to generate the count, -> then return that value from the function. -> -> You may choose to define a separate function to pass to `reduce()`, -> or use an inline lambda expression to do it (which is a bit trickier!). -> -> Hints: -> - Remember that you can define an `initialiser` value with `reduce()` -> to help you start the counter -> - If defining a lambda expression, -> note that it can conditionally return different values using the syntax -> ` if else ` in the expression. -> -> > ## Solution -> > Using a separate function: -> > ~~~ -> > def daily_above_threshold(patient_num, data, threshold): -> > """Count how many days a given patient's inflammation exceeds a given threshold. -> > -> > :param patient_num: The patient row number -> > :param data: A 2D data array with inflammation data -> > :param threshold: An inflammation threshold to check each daily value against -> > :returns: An integer representing the number of days a patient's inflammation is over a given threshold -> > """ -> > def count_above_threshold(a, b): -> > if b: -> > return a + 1 -> > else: -> > return a -> > -> > # Use map to determine if each daily inflammation value exceeds a given threshold for a patient -> > above_threshold = map(lambda x: x > threshold, data[patient_num]) -> > # Use reduce to count on how many days inflammation was above the threshold for a patient -> > return reduce(count_above_threshold, above_threshold, 0) -> > ~~~ -> > {: .language-python} -> > -> > Note that the `count_above_threshold` function used by `reduce()` -> > was defined within the `daily_above_threshold()` function -> > to limit its scope and clarify its purpose -> > (i.e. it may only be useful as part of `daily_above_threshold()` -> > hence being defined as an inner function). -> > -> > The equivalent code using a lambda expression may look like: -> > -> > ~~~ -> > from functools import reduce -> > -> > ... -> > -> > def daily_above_threshold(patient_num, data, threshold): -> > """Count how many days a given patient's inflammation exceeds a given threshold. -> > -> > :param patient_num: The patient row number -> > :param data: A 2D data array with inflammation data -> > :param threshold: An inflammation threshold to check each daily value against -> > :returns: An integer representing the number of days a patient's inflammation is over a given threshold -> > """ -> > -> > above_threshold = map(lambda x: x > threshold, data[patient_num]) -> > return reduce(lambda a, b: a + 1 if b else a, above_threshold, 0) -> > ~~~ -> > {: .language-python} -> Where could this be useful? -> For example, you may want to define the success criteria for a trial if, say, -> 80% of patients do not exhibit inflammation in any of the trial days, or some similar metrics. ->{: .solution} -{: .challenge} - -## Decorators - -Finally, we will look at one last aspect of Python where functional programming is coming handy. -As we have seen in the -[episode on parametrising our unit tests](../22-scaling-up-unit-testing/index.html#parameterising-our-unit-tests), -a decorator can take a function, modify/decorate it, then return the resulting function. -This is possible because Python treats functions as first-class objects -that can be passed around as normal data. -Here, we discuss decorators in more detail and learn how to write our own. -Let's look at the following code for ways on how to "decorate" functions. - -~~~ -def with_logging(func): - - """A decorator which adds logging to a function.""" - def inner(*args, **kwargs): - print("Before function call") - result = func(*args, **kwargs) - print("After function call") - return result - - return inner - - -def add_one(n): - print("Adding one") - return n + 1 - -# Redefine function add_one by wrapping it within with_logging function -add_one = with_logging(add_one) - -# Another way to redefine a function - using a decorator -@with_logging -def add_two(n): - print("Adding two") - return n + 2 - -print(add_one(1)) -print(add_two(1)) -~~~ -{: .language-python} -~~~ -Before function call -Adding one -After function call -2 -Before function call -Adding two -After function call -3 -~~~ -{: .output} - -In this example, we see a decorator (`with_logging`) -and two different syntaxes for applying the decorator to a function. -The decorator is implemented here as a function which encloses another function. -Because the inner function (`inner()`) calls the function being decorated (`func()`) -and returns its result, -it still behaves like this original function. -Part of this is the use of `*args` and `**kwargs` - -these allow our decorated function to accept any arguments or keyword arguments -and pass them directly to the function being decorated. -Our decorator in this case does not need to modify any of the arguments, -so we do not need to know what they are. -Any additional behaviour we want to add as part of our decorated function, -we can put before or after the call to the original function. -Here we print some text both before and after the decorated function, -to show the order in which events happen. - -We also see in this example the two different ways in which a decorator can be applied. -The first of these is to use a normal function call (`with_logging(add_one)`), -where we then assign the resulting function back to a variable - -often using the original name of the function, so replacing it with the decorated version. -The second syntax is the one we have seen previously (`@with_logging`). -This syntax is equivalent to the previous one - -the result is that we have a decorated version of the function, -here with the name `add_two`. -Both of these syntaxes can be useful in different situations: -the `@` syntax is more concise if we never need to use the un-decorated version, -while the function-call syntax gives us more flexibility - -we can continue to use the un-decorated function -if we make sure to give the decorated one a different name, -and can even make multiple decorated versions using different decorators. - -> ## Exercise: Measuring Performance Using Decorators -> One small task you might find a useful case for a decorator is -> measuring the time taken to execute a particular function. -> This is an important part of performance profiling. -> -> Write a decorator which you can use to measure the execution time of the decorated function -> using the [time.process_time_ns()](https://docs.python.org/3/library/time.html#time.process_time_ns) function. -> There are several different timing functions each with slightly different use-cases, -> but we won’t worry about that here. -> -> For the function to measure, you may wish to use this as an example: -> ~~~ -> def measure_me(n): -> total = 0 -> for i in range(n): -> total += i * i -> -> return total -> ~~~ -> {: .language-python} -> > ## Solution -> > -> > ~~~ -> > import time -> > -> > def profile(func): -> > def inner(*args, **kwargs): -> > start = time.process_time_ns() -> > result = func(*args, **kwargs) -> > stop = time.process_time_ns() -> > -> > print("Took {0} seconds".format((stop - start) / 1e9)) -> > return result -> > -> > return inner -> > -> > @profile -> > def measure_me(n): -> > total = 0 -> > for i in range(n): -> > total += i * i -> > -> > return total -> > -> > print(measure_me(1000000)) -> > ~~~ -> > {: .language-python} -> > ~~~ -> > Took 0.124199753 seconds -> > 333332833333500000 -> > ~~~ -> > {: .output} -> {: .solution} -{: .challenge} diff --git a/_episodes/34-refactoring-architecture b/_episodes/34-refactoring-architecture new file mode 100644 index 000000000..aa240023c --- /dev/null +++ b/_episodes/34-refactoring-architecture @@ -0,0 +1,15 @@ +--- +title: "Refactoring functions to do just one thing" +teaching: 0 +exercises: 0 +questions: +- "How do you refactor code without breaking it?" +- "How do you write code that is easy to test?" +objectives: +- "Understand how to refactor functions to be easier to test" +- "Be able to write regressions tests to avoid breaking existing code" +- "Understand what a pure function is." +keypoints: +- "By refactoring code into pure functions that act on data makes code easier to test." +- "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" +--- diff --git a/_episodes/35-object-oriented-programming.md b/_episodes/35-object-oriented-programming.md deleted file mode 100644 index 01413497a..000000000 --- a/_episodes/35-object-oriented-programming.md +++ /dev/null @@ -1,904 +0,0 @@ ---- -title: "Object Oriented Programming" -teaching: 30 -exercises: 20 -questions: -- "How can we use code to describe the structure of data?" -- "How should the relationships between structures be described?" -objectives: -- "Describe the core concepts that define the object oriented paradigm" -- "Use classes to encapsulate data within a more complex program" -- "Structure concepts within a program in terms of sets of behaviour" -- "Identify different types of relationship between concepts within a program" -- "Structure data within a program using these relationships" -keypoints: -- "Object oriented programming is a programming paradigm based on the concept of classes, which encapsulate data and code." -- "Classes allow us to organise data into distinct concepts." -- "By breaking down our data into classes, we can reason about the behaviour of parts of our data." -- "Relationships between concepts can be described using inheritance (*is a*) and composition (*has a*)." ---- - -## Introduction - -Object oriented programming is a programming paradigm based on the concept of objects, -which are data structures that contain (encapsulate) data and code. -Data is encapsulated in the form of fields (attributes) of objects, -while code is encapsulated in the form of procedures (methods) -that manipulate objects' attributes and define "behaviour" of objects. -So, in object oriented programming, -we first think about the data and the things that we’re modelling - -and represent these by objects - -rather than define the logic of the program, -and code becomes a series of interactions between objects. - -## Structuring Data - -One of the main difficulties we encounter when building more complex software is -how to structure our data. -So far, we've been processing data from a single source and with a simple tabular structure, -but it would be useful to be able to combine data from a range of different sources -and with more data than just an array of numbers. - -~~~ -data = np.array([[1., 2., 3.], - [4., 5., 6.]]) -~~~ -{: .language-python} - -Using this data structure has the advantage of -being able to use NumPy operations to process the data -and Matplotlib to plot it, -but often we need to have more structure than this. -For example, we may need to attach more information about the patients -and store this alongside our measurements of inflammation. - -We can do this using the Python data structures we're already familiar with, -dictionaries and lists. -For instance, we could attach a name to each of our patients: - -~~~ -patients = [ - { - 'name': 'Alice', - 'data': [1., 2., 3.], - }, - { - 'name': 'Bob', - 'data': [4., 5., 6.], - }, -] -~~~ -{: .language-python} - -> ## Exercise: Structuring Data -> -> Write a function, called `attach_names`, -> which can be used to attach names to our patient dataset. -> When used as below, it should produce the expected output. -> -> If you're not sure where to begin, -> think about ways you might be able to effectively loop over two collections at once. -> Also, don't worry too much about the data type of the `data` value, -> it can be a Python list, or a NumPy array - either is fine. -> -> ~~~ -> data = np.array([[1., 2., 3.], -> [4., 5., 6.]]) -> -> output = attach_names(data, ['Alice', 'Bob']) -> print(output) -> ~~~ -> {: .language-python} -> -> ~~~ -> [ -> { -> 'name': 'Alice', -> 'data': [1., 2., 3.], -> }, -> { -> 'name': 'Bob', -> 'data': [4., 5., 6.], -> }, -> ] -> ~~~ -> {: .output} -> -> > ## Solution -> > -> > One possible solution, perhaps the most obvious, -> > is to use the `range` function to index into both lists at the same location: -> > -> > ~~~ -> > def attach_names(data, names): -> > """Create datastructure containing patient records.""" -> > output = [] -> > -> > for i in range(len(data)): -> > output.append({'name': names[i], -> > 'data': data[i]}) -> > -> > return output -> > ~~~ -> > {: .language-python} -> > -> > However, this solution has a potential problem that can occur sometimes, -> > depending on the input. -> > What might go wrong with this solution? -> > How could we fix it? -> > -> > > ## A Better Solution -> > > -> > > What would happen if the `data` and `names` inputs were different lengths? -> > > -> > > If `names` is longer, we'll loop through, until we run out of rows in the `data` input, -> > > at which point we'll stop processing the last few names. -> > > If `data` is longer, we'll loop through, but at some point we'll run out of names - -> > > but this time we try to access part of the list that doesn't exist, -> > > so we'll get an exception. -> > > -> > > A better solution would be to use the `zip` function, -> > > which allows us to iterate over multiple iterables without needing an index variable. -> > > The `zip` function also limits the iteration to whichever of the iterables is smaller, -> > > so we won't raise an exception here, -> > > but this might not quite be the behaviour we want, -> > > so we'll also explicitly `assert` that the inputs should be the same length. -> > > Checking that our inputs are valid in this way is an example of a precondition, -> > > which we introduced conceptually in an earlier episode. -> > > -> > > If you've not previously come across the `zip` function, -> > > read [this section](https://docs.python.org/3/library/functions.html#zip) -> > > of the Python documentation. -> > > -> > > ~~~ -> > > def attach_names(data, names): -> > > """Create datastructure containing patient records.""" -> > > assert len(data) == len(names) -> > > output = [] -> > > -> > > for data_row, name in zip(data, names): -> > > output.append({'name': name, -> > > 'data': data_row}) -> > > -> > > return output -> > > ~~~ -> > > {: .language-python} -> > {: .solution} -> {: .solution} -{: .challenge} - -## Classes in Python - -Using nested dictionaries and lists should work for some of the simpler cases -where we need to handle structured data, -but they get quite difficult to manage once the structure becomes a bit more complex. -For this reason, in the object oriented paradigm, -we use **classes** to help with managing this data -and the operations we would want to perform on it. -A class is a **template** (blueprint) for a structured piece of data, -so when we create some data using a class, -we can be certain that it has the same structure each time. - -With our list of dictionaries we had in the example above, -we have no real guarantee that each dictionary has the same structure, -e.g. the same keys (`name` and `data`) unless we check it manually. -With a class, if an object is an **instance** of that class -(i.e. it was made using that template), -we know it will have the structure defined by that class. -Different programming languages make slightly different guarantees -about how strictly the structure will match, -but in object oriented programming this is one of the core ideas - -all objects derived from the same class must follow the same behaviour. - -You may not have realised, but you should already be familiar with -some of the classes that come bundled as part of Python, for example: - -~~~ -my_list = [1, 2, 3] -my_dict = {1: '1', 2: '2', 3: '3'} -my_set = {1, 2, 3} - -print(type(my_list)) -print(type(my_dict)) -print(type(my_set)) -~~~ -{: .language-python} - -~~~ - - - -~~~ -{: .output} - -Lists, dictionaries and sets are a slightly special type of class, -but they behave in much the same way as a class we might define ourselves: - -- They each hold some data (**attributes** or **state**). -- They also provide some methods describing the behaviours of the data - - what can the data do and what can we do to the data? - -The behaviours we may have seen previously include: - -- Lists can be appended to -- Lists can be indexed -- Lists can be sliced -- Key-value pairs can be added to dictionaries -- The value at a key can be looked up in a dictionary -- The union of two sets can be found (the set of values present in any of the sets) -- The intersection of two sets can be found (the set of values present in all of the sets) - -## Encapsulating Data - -Let's start with a minimal example of a class representing our patients. - -~~~ -# file: inflammation/models.py - -class Patient: - def __init__(self, name): - self.name = name - self.observations = [] - -alice = Patient('Alice') -print(alice.name) -~~~ -{: .language-python} - -~~~ -Alice -~~~ -{: .output} - -Here we've defined a class with one method: `__init__`. -This method is the **initialiser** method, -which is responsible for setting up the initial values and structure of the data -inside a new instance of the class - -this is very similar to **constructors** in other languages, -so the term is often used in Python too. -The `__init__` method is called every time we create a new instance of the class, -as in `Patient('Alice')`. -The argument `self` refers to the instance on which we are calling the method -and gets filled in automatically by Python - -we do not need to provide a value for this when we call the method. - -Data encapsulated within our Patient class includes -the patient's name and a list of inflammation observations. -In the initialiser method, -we set a patient's name to the value provided, -and create a list of inflammation observations for the patient (initially empty). -Such data is also referred to as the attributes of a class -and holds the current state of an instance of the class. -Attributes are typically hidden (encapsulated) internal object details -ensuring that access to data is protected from unintended changes. -They are manipulated internally by the class, -which, in addition, can expose certain functionality as public behavior of the class -to allow other objects to interact with this class' instances. - -## Encapsulating Behaviour - -In addition to representing a piece of structured data -(e.g. a patient who has a name and a list of inflammation observations), -a class can also provide a set of functions, or **methods**, -which describe the **behaviours** of the data encapsulated in the instances of that class. -To define the behaviour of a class we add functions which operate on the data the class contains. -These functions are the member functions or methods. - -Methods on classes are the same as normal functions, -except that they live inside a class and have an extra first parameter `self`. -Using the name `self` is not strictly necessary, but is a very strong convention - -it is extremely rare to see any other name chosen. -When we call a method on an object, -the value of `self` is automatically set to this object - hence the name. -As we saw with the `__init__` method previously, -we do not need to explicitly provide a value for the `self` argument, -this is done for us by Python. - -Let's add another method on our Patient class that adds a new observation to a Patient instance. - -~~~ -# file: inflammation/models.py - -class Patient: - """A patient in an inflammation study.""" - def __init__(self, name): - self.name = name - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - if self.observations: - day = self.observations[-1]['day'] + 1 - else: - day = 0 - - new_observation = { - 'day': day, - 'value': value, - } - - self.observations.append(new_observation) - return new_observation - -alice = Patient('Alice') -print(alice) - -observation = alice.add_observation(3) -print(observation) -print(alice.observations) -~~~ -{: .language-python} - -~~~ -<__main__.Patient object at 0x7fd7e61b73d0> -{'day': 0, 'value': 3} -[{'day': 0, 'value': 3}] -~~~ -{: .output} - -Note also how we used `day=None` in the parameter list of the `add_observation` method, -then initialise it if the value is indeed `None`. -This is one of the common ways to handle an optional argument in Python, -so we'll see this pattern quite a lot in real projects. - -> ## Class and Static Methods -> -> Sometimes, the function we're writing doesn't need access to -> any data belonging to a particular object. -> For these situations, we can instead use a **class method** or a **static method**. -> Class methods have access to the class that they're a part of, -> and can access data on that class - -> but do not belong to a specific instance of that class, -> whereas static methods have access to neither the class nor its instances. -> -> By convention, class methods use `cls` as their first argument instead of `self` - -> this is how we access the class and its data, -> just like `self` allows us to access the instance and its data. -> Static methods have neither `self` nor `cls` -> so the arguments look like a typical free function. -> These are the only common exceptions to using `self` for a method's first argument. -> -> Both of these method types are created using **decorators** - -> for more information see -> the [classmethod](https://docs.python.org/3/library/functions.html#classmethod) -> and [staticmethod](https://docs.python.org/3/library/functions.html#staticmethod) -> decorator sections of the Python documentation. -{: .callout} - -### Dunder Methods - -Why is the `__init__` method not called `init`? -There are a few special method names that we can use -which Python will use to provide a few common behaviours, -each of which begins and ends with a **d**ouble-**under**score, -hence the name **dunder method**. - -When writing your own Python classes, -you'll almost always want to write an `__init__` method, -but there are a few other common ones you might need sometimes. -You may have noticed in the code above that the method `print(alice)` -returned `<__main__.Patient object at 0x7fd7e61b73d0>`, -which is the string representation of the `alice` object. -We may want the print statement to display the object's name instead. -We can achieve this by overriding the `__str__` method of our class. - -~~~ -# file: inflammation/models.py - -class Patient: - """A patient in an inflammation study.""" - def __init__(self, name): - self.name = name - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1]['day'] + 1 - - except IndexError: - day = 0 - - - new_observation = { - 'day': day, - 'value': value, - } - - self.observations.append(new_observation) - return new_observation - - def __str__(self): - return self.name - - -alice = Patient('Alice') -print(alice) -~~~ -{: .language-python} - -~~~ -Alice -~~~ -{: .output} - -These dunder methods are not usually called directly, -but rather provide the implementation of some functionality we can use - -we didn't call `alice.__str__()`, -but it was called for us when we did `print(alice)`. -Some we see quite commonly are: - -- `__str__` - converts an object into its string representation, used when you call `str(object)` or `print(object)` -- `__getitem__` - Accesses an object by key, this is how `list[x]` and `dict[x]` are implemented -- `__len__` - gets the length of an object when we use `len(object)` - usually the number of items it contains - -There are many more described in the Python documentation, -but it’s also worth experimenting with built in Python objects to -see which methods provide which behaviour. -For a more complete list of these special methods, -see the [Special Method Names](https://docs.python.org/3/reference/datamodel.html#special-method-names) -section of the Python documentation. - -> ## Exercise: A Basic Class -> -> Implement a class to represent a book. -> Your class should: -> -> - Have a title -> - Have an author -> - When printed using `print(book)`, show text in the format "title by author" -> -> ~~~ -> book = Book('A Book', 'Me') -> -> print(book) -> ~~~ -> {: .language-python} -> -> ~~~ -> A Book by Me -> ~~~ -> {: .output} -> -> > ## Solution -> > -> > ~~~ -> > class Book: -> > def __init__(self, title, author): -> > self.title = title -> > self.author = author -> > -> > def __str__(self): -> > return self.title + ' by ' + self.author -> > ~~~ -> > {: .language-python} -> {: .solution} -{: .challenge} - -### Properties - -The final special type of method we will introduce is a **property**. -Properties are methods which behave like data - -when we want to access them, we do not need to use brackets to call the method manually. - -~~~ -# file: inflammation/models.py - -class Patient: - ... - - @property - def last_observation(self): - return self.observations[-1] - -alice = Patient('Alice') - -alice.add_observation(3) -alice.add_observation(4) - -obs = alice.last_observation -print(obs) -~~~ -{: .language-python} - -~~~ -{'day': 1, 'value': 4} -~~~ -{: .output} - -You may recognise the `@` syntax from episodes on -parameterising unit tests and functional programming - -`property` is another example of a **decorator**. -In this case the `property` decorator is taking the `last_observation` function -and modifying its behaviour, -so it can be accessed as if it were a normal attribute. -It is also possible to make your own decorators, but we won't cover it here. - -## Relationships Between Classes - -We now have a language construct for grouping data and behaviour -related to a single conceptual object. -The next step we need to take is to describe the relationships between the concepts in our code. - -There are two fundamental types of relationship between objects -which we need to be able to describe: - -1. Ownership - x **has a** y - this is **composition** -2. Identity - x **is a** y - this is **inheritance** - -### Composition - -You should hopefully have come across the term **composition** already - -in the novice Software Carpentry, we use composition of functions to reduce code duplication. -That time, we used a function which converted temperatures in Celsius to Kelvin -as a **component** of another function which converted temperatures in Fahrenheit to Kelvin. - -In the same way, in object oriented programming, we can make things components of other things. - -We often use composition where we can say 'x *has a* y' - -for example in our inflammation project, -we might want to say that a doctor *has* patients -or that a patient *has* observations. - -In the case of our example, we're already saying that patients have observations, -so we're already using composition here. -We're currently implementing an observation as a dictionary with a known set of keys though, -so maybe we should make an `Observation` class as well. - -~~~ -# file: inflammation/models.py - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return str(self.value) - -class Patient: - """A patient in an inflammation study.""" - def __init__(self, name): - self.name = name - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation - - def __str__(self): - return self.name - - -alice = Patient('Alice') -obs = alice.add_observation(3) - -print(obs) -~~~ -{: .language-python} - -~~~ -3 -~~~ -{: .output} - -Now we're using a composition of two custom classes to -describe the relationship between two types of entity in the system that we're modelling. - -### Inheritance - -The other type of relationship used in object oriented programming is **inheritance**. -Inheritance is about data and behaviour shared by classes, -because they have some shared identity - 'x *is a* y'. -If class `X` inherits from (*is a*) class `Y`, -we say that `Y` is the **superclass** or **parent class** of `X`, -or `X` is a **subclass** of `Y`. - -If we want to extend the previous example to also manage people who aren't patients -we can add another class `Person`. -But `Person` will share some data and behaviour with `Patient` - -in this case both have a name and show that name when you print them. -Since we expect all patients to be people (hopefully!), -it makes sense to implement the behaviour in `Person` and then reuse it in `Patient`. - -To write our class in Python, -we used the `class` keyword, the name of the class, -and then a block of the functions that belong to it. -If the class **inherits** from another class, -we include the parent class name in brackets. - -~~~ -# file: inflammation/models.py - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return str(self.value) - -class Person: - def __init__(self, name): - self.name = name - - def __str__(self): - return self.name - -class Patient(Person): - """A patient in an inflammation study.""" - def __init__(self, name): - super().__init__(name) - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation - -alice = Patient('Alice') -print(alice) - -obs = alice.add_observation(3) -print(obs) - -bob = Person('Bob') -print(bob) - -obs = bob.add_observation(4) -print(obs) -~~~ -{: .language-python} - -~~~ -Alice -3 -Bob -AttributeError: 'Person' object has no attribute 'add_observation' -~~~ -{: .output} - -As expected, an error is thrown because we cannot add an observation to `bob`, -who is a Person but not a Patient. - -We see in the example above that to say that a class inherits from another, -we put the **parent class** (or **superclass**) in brackets after the name of the **subclass**. - -There's something else we need to add as well - -Python doesn't automatically call the `__init__` method on the parent class -if we provide a new `__init__` for our subclass, -so we'll need to call it ourselves. -This makes sure that everything that needs to be initialised on the parent class has been, -before we need to use it. -If we don't define a new `__init__` method for our subclass, -Python will look for one on the parent class and use it automatically. -This is true of all methods - -if we call a method which doesn't exist directly on our class, -Python will search for it among the parent classes. -The order in which it does this search is known as the **method resolution order** - -a little more on this in the Multiple Inheritance callout below. - -The line `super().__init__(name)` gets the parent class, -then calls the `__init__` method, -providing the `name` variable that `Person.__init__` requires. -This is quite a common pattern, particularly for `__init__` methods, -where we need to make sure an object is initialised as a valid `X`, -before we can initialise it as a valid `Y` - -e.g. a valid `Person` must have a name, -before we can properly initialise a `Patient` model with their inflammation data. - - -> ## Composition vs Inheritance -> -> When deciding how to implement a model of a particular system, -> you often have a choice of either composition or inheritance, -> where there is no obviously correct choice. -> For example, it's not obvious whether a photocopier *is a* printer and *is a* scanner, -> or *has a* printer and *has a* scanner. -> -> ~~~ -> class Machine: -> pass -> -> class Printer(Machine): -> pass -> -> class Scanner(Machine): -> pass -> -> class Copier(Printer, Scanner): -> # Copier `is a` Printer and `is a` Scanner -> pass -> ~~~ -> {: .language-python} -> -> ~~~ -> class Machine: -> pass -> -> class Printer(Machine): -> pass -> -> class Scanner(Machine): -> pass -> -> class Copier(Machine): -> def __init__(self): -> # Copier `has a` Printer and `has a` Scanner -> self.printer = Printer() -> self.scanner = Scanner() -> ~~~ -> {: .language-python} -> -> Both of these would be perfectly valid models and would work for most purposes. -> However, unless there's something about how you need to use the model -> which would benefit from using a model based on inheritance, -> it's usually recommended to opt for **composition over inheritance**. -> This is a common design principle in the object oriented paradigm and is worth remembering, -> as it's very common for people to overuse inheritance once they've been introduced to it. -> -> For much more detail on this see the -> [Python Design Patterns guide](https://python-patterns.guide/gang-of-four/composition-over-inheritance/). -{: .callout} - -> ## Multiple Inheritance -> -> **Multiple Inheritance** is when a class inherits from more than one direct parent class. -> It exists in Python, but is often not present in other Object Oriented languages. -> Although this might seem useful, like in our inheritance-based model of the photocopier above, -> it's best to avoid it unless you're sure it's the right thing to do, -> due to the complexity of the inheritance heirarchy. -> Often using multiple inheritance is a sign you should instead be using composition - -> again like the photocopier model above. -{: .callout} - - -> ## Exercise: A Model Patient -> -> Let's use what we have learnt in this episode and combine it with what we have learnt on -> [software requirements](../31-software-requirements/index.html) -> to formulate and implement a -> [few new solution requirements](../31-software-requirements/index.html#exercise-new-solution-requirements) -> to extend the model layer of our clinical trial system. -> -> Let's start with extending the system such that there must be -> a `Doctor` class to hold the data representing a single doctor, which: -> -> - must have a `name` attribute -> - must have a list of patients that this doctor is responsible for. -> -> In addition to these, try to think of an extra feature you could add to the models -> which would be useful for managing a dataset like this - -> imagine we're running a clinical trial, what else might we want to know? -> Try using Test Driven Development for any features you add: -> write the tests first, then add the feature. -> The tests have been started for you in `tests/test_patient.py`, -> but you will probably want to add some more. -> -> Once you've finished the initial implementation, do you have much duplicated code? -> Is there anywhere you could make better use of composition or inheritance -> to improve your implementation? -> -> For any extra features you've added, -> explain them and how you implemented them to your neighbour. -> Would they have implemented that feature in the same way? -> -> > ## Solution -> > One example solution is shown below. -> > You may start by writing some tests (that will initially fail), -> > and then develop the code to satisfy the new requirements and pass the tests. -> > ~~~ -> > # file: tests/test_patient.py -> > """Tests for the Patient model.""" -> > -> > def test_create_patient(): -> > """Check a patient is created correctly given a name.""" -> > from inflammation.models import Patient -> > name = 'Alice' -> > p = Patient(name=name) -> > assert p.name == name -> > -> > def test_create_doctor(): -> > """Check a doctor is created correctly given a name.""" -> > from inflammation.models import Doctor -> > name = 'Sheila Wheels' -> > doc = Doctor(name=name) -> > assert doc.name == name -> > -> > def test_doctor_is_person(): -> > """Check if a doctor is a person.""" -> > from inflammation.models import Doctor, Person -> > doc = Doctor("Sheila Wheels") -> > assert isinstance(doc, Person) -> > -> > def test_patient_is_person(): -> > """Check if a patient is a person. """ -> > from inflammation.models import Patient, Person -> > alice = Patient("Alice") -> > assert isinstance(alice, Person) -> > -> > def test_patients_added_correctly(): -> > """Check patients are being added correctly by a doctor. """ -> > from inflammation.models import Doctor, Patient -> > doc = Doctor("Sheila Wheels") -> > alice = Patient("Alice") -> > doc.add_patient(alice) -> > assert doc.patients is not None -> > assert len(doc.patients) == 1 -> > -> > def test_no_duplicate_patients(): -> > """Check adding the same patient to the same doctor twice does not result in duplicates. """ -> > from inflammation.models import Doctor, Patient -> > doc = Doctor("Sheila Wheels") -> > alice = Patient("Alice") -> > doc.add_patient(alice) -> > doc.add_patient(alice) -> > assert len(doc.patients) == 1 -> > ... -> > ~~~ -> > {: .language-python} -> > -> > ~~~ -> > # file: inflammation/models.py -> > ... -> > class Person: -> > """A person.""" -> > def __init__(self, name): -> > self.name = name -> > -> > def __str__(self): -> > return self.name -> > -> > class Patient(Person): -> > """A patient in an inflammation study.""" -> > def __init__(self, name): -> > super().__init__(name) -> > self.observations = [] -> > -> > def add_observation(self, value, day=None): -> > if day is None: -> > try: -> > day = self.observations[-1].day + 1 -> > except IndexError: -> > day = 0 -> > new_observation = Observation(day, value) -> > self.observations.append(new_observation) -> return new_observation -> > -> > class Doctor(Person): -> > """A doctor in an inflammation study.""" -> > def __init__(self, name): -> > super().__init__(name) -> > self.patients = [] -> > -> > def add_patient(self, new_patient): -> > # A crude check by name if this patient is already looked after -> > # by this doctor before adding them -> > for patient in self.patients: -> > if patient.name == new_patient.name: -> > return -> > self.patients.append(new_patient) -> > ... -> > ~~~ -> {: .language-python} -> {: .solution} -{: .challenge} - -{% include links.md %} diff --git a/_episodes/35-refactoring-decoupled-units b/_episodes/35-refactoring-decoupled-units new file mode 100644 index 000000000..cba637c80 --- /dev/null +++ b/_episodes/35-refactoring-decoupled-units @@ -0,0 +1,15 @@ +--- +title: "Using classes to de-couple code." +teaching: 0 +exercises: 0 +questions: +- "What is de-coupled code?" +- "When is it useful to use classes to structure code?" +objectives: +- "Understand the object-oriented principle of polymorphism and interfaces." +- "Be able to introduce appropriate abstractions to simplify code." +- "Understand what decoupled code is, and why you would want it." +keypoints: +- "By using interfaces, code can become more decoupled." +- "Decoupled code is easier to test, and easier to maintain." +--- diff --git a/_episodes/36-architecture-revisited.md b/_episodes/36-architecture-revisited.md deleted file mode 100644 index 0b460211a..000000000 --- a/_episodes/36-architecture-revisited.md +++ /dev/null @@ -1,444 +0,0 @@ ---- -title: "Architecture Revisited: Extending Software" -teaching: 15 -exercises: 0 -questions: -- "How can we extend our software within the constraints of the MVC architecture?" -objectives: -- "Extend our software to add a view of a single patient in the study and the software's command line interface to request a specific view." -keypoints: -- "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. - Such components can be as small as a single function, or be a software package in their own right." ---- - -As we have seen, we have different programming paradigms that are suitable for different problems -and affect the structure of our code. -In programming languages that support multiple paradigms, such as Python, -we have the luxury of using elements of different paradigms paradigms and we, -as software designers and programmers, -can decide how to use those elements in different architectural components of our software. -Let's now circle back to the architecture of our software for one final look. - -## MVC Revisited - -We've been developing our software using the **Model-View-Controller** (MVC) architecture so far, -but, as we have seen, MVC is just one of the common architectural patterns -and is not the only choice we could have made. - -There are many variants of an MVC-like pattern (such as -[Model-View-Presenter](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93presenter) (MVP), -[Model-View-Viewmodel](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel) (MVVM), etc.), -but in most cases, the distinction between these patterns isn't particularly important. -What really matters is that we are making decisions about the architecture of our software -that suit the way in which we expect to use it. -We should reuse these established ideas where we can, but we don't need to stick to them exactly. - -In this episode we'll be taking our Object Oriented code from the previous episode -and integrating it into our existing MVC pattern. -But first we will explain some features of -the Controller (`inflammation-analysis.py`) component of our architecture. - -### Controller Structure - -You will have noticed already that structure of the `inflammation-analysis.py` file -follows this pattern: - -~~~ -# import modules - -def main(): - # perform some actions - -if __name__ == "__main__": - # perform some actions before main() - main() -~~~ -{: .language-python} - -In this pattern the actions performed by the script are contained within the `main` function -(which does not need to be called `main`, -but using this convention helps others in understanding your code). -The `main` function is then called within the `if` statement `__name__ == "__main__"`, -after some other actions have been performed -(usually the parsing of command-line arguments, which will be explained below). -`__name__` is a special dunder variable which is set, -along with a number of other special dunder variables, -by the python interpreter before the execution of any code in the source file. -What value is given by the interpreter to `__name__` is determined by -the manner in which it is loaded. - -If we run the source file directly using the Python interpreter, e.g.: - -~~~ -$ python3 inflammation-analysis.py -~~~ -{: .language-bash} - -then the interpreter will assign the hard-coded string `"__main__"` to the `__name__` variable: - -~~~ -__name__ = "__main__" -... -# rest of your code -~~~ -{: .language-python} - -However, if your source file is imported by another Python script, e.g: - -~~~ -import inflammation-analysis -~~~ -{: .language-python} - -then the interpreter will assign the name `"inflammation-analysis"` -from the import statement to the `__name__` variable: - -~~~ -__name__ = "inflammation-analysis" -... -# rest of your code -~~~ -{: .language-python} - -Because of this behaviour of the interpreter, -we can put any code that should only be executed when running the script -directly within the `if __name__ == "__main__":` structure, -allowing the rest of the code within the script to be -safely imported by another script if we so wish. - -While it may not seem very useful to have your controller script importable by another script, -there are a number of situations in which you would want to do this: - -- for testing of your code, you can have your testing framework import the main script, - and run special test functions which then call the `main` function directly; -- where you want to not only be able to run your script from the command-line, - but also provide a programmer-friendly application programming interface (API) for advanced users. - -### Passing Command-line Options to Controller - -The standard Python library for reading command line arguments passed to a script is -[`argparse`](https://docs.python.org/3/library/argparse.html). -This module reads arguments passed by the system, -and enables the automatic generation of help and usage messages. -These include, as we saw at the start of this course, -the generation of helpful error messages when users give the program invalid arguments. - -The basic usage of `argparse` can be seen in the `inflammation-analysis.py` script. -First we import the library: - -~~~ -import argparse -~~~ -{: .language-python} - -We then initialise the argument parser class, passing an (optional) description of the program: - -~~~ -parser = argparse.ArgumentParser( - description='A basic patient inflammation data management system') -~~~ -{: .language-python} - -Once the parser has been initialised we can add -the arguments that we want argparse to look out for. -In our basic case, we want only the names of the file(s) to process: - -~~~ -parser.add_argument( - 'infiles', - nargs='+', - help='Input CSV(s) containing inflammation series for each patient') -~~~ -{: .language-python} - -Here we have defined what the argument will be called (`'infiles'`) when it is read in; -the number of arguments to be expected -(`nargs='+'`, where `'+'` indicates that there should be 1 or more arguments passed); -and a help string for the user -(`help='Input CSV(s) containing inflammation series for each patient'`). - -You can add as many arguments as you wish, -and these can be either mandatory (as the one above) or optional. -Most of the complexity in using `argparse` is in adding the correct argument options, -and we will explain how to do this in more detail below. - -Finally we parse the arguments passed to the script using: - -~~~ -args = parser.parse_args() -~~~ -{: .language-python} - -This returns an object (that we've called `arg`) containing all the arguments requested. -These can be accessed using the names that we have defined for each argument, -e.g. `args.infiles` would return the filenames that have been input. - -The help for the script can be accessed using the `-h` or `--help` optional argument -(which `argparse` includes by default): - -~~~ -$ python3 inflammation-analysis.py --help -~~~ -{: .language-bash} - -~~~ -usage: inflammation-analysis.py [-h] infiles [infiles ...] - -A basic patient inflammation data management system - -positional arguments: - infiles Input CSV(s) containing inflammation series for each patient - -optional arguments: - -h, --help show this help message and exit -~~~ -{: .output} - -The help page starts with the command line usage, -illustrating what inputs can be given (any within `[]` brackets are optional). -It then lists the **positional** and **optional** arguments, -giving as detailed a description of each as you have added to the `add_argument()` command. -Positional arguments are arguments that need to be included -in the proper position or order when calling the script. - -Note that optional arguments are indicated by `-` or `--`, followed by the argument name. -Positional arguments are simply inferred by their position. -It is possible to have multiple positional arguments, -but usually this is only practical where all (or all but one) positional arguments -contains a clearly defined number of elements. -If more than one option can have an indeterminate number of entries, -then it is better to create them as 'optional' arguments. -These can be made a required input though, -by setting `required = True` within the `add_argument()` command. - -> ## Positional and Optional Argument Order -> -> The usage section of the help page above shows -> the optional arguments going before the positional arguments. -> This is the customary way to present options, but is not mandatory. -> Instead there are two rules which must be followed for these arguments: -> -> 1. Positional and optional arguments must each be given all together, and not inter-mixed. -> For example, the order can be either `optional - positional` or `positional - optional`, -> but not `optional - positional - optional`. -> 2. Positional arguments must be given in the order that they are shown -> in the usage section of the help page. -{: .callout} - -Now that you have some familiarity with `argparse`, -we will demonstrate below how you can use this to add extra functionality to your controller. - -### Adding a New View - -Let's start with adding a view that allows us to see the data for a single patient. -First, we need to add the code for the view itself -and make sure our `Patient` class has the necessary data - -including the ability to pass a list of measurements to the `__init__` method. -Note that your Patient class may look very different now, -so adapt this example to fit what you have. - -~~~ -# file: inflammation/views.py - -... - -def display_patient_record(patient): - """Display data for a single patient.""" - print(patient.name) - for obs in patient.observations: - print(obs.day, obs.value) -~~~ -{: .language-python} - -~~~ -# file: inflammation/models.py - -... - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return self.value - -class Person: - def __init__(self, name): - self.name = name - - def __str__(self): - return self.name - -class Patient(Person): - """A patient in an inflammation study.""" - def __init__(self, name, observations=None): - super().__init__(name) - - self.observations = [] - if observations is not None: - self.observations = observations - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation -~~~ -{: .language-python} - -Now we need to make sure people can call this view - -that means connecting it to the controller -and ensuring that there's a way to request this view when running the program. -The changes we need to make here are that the `main` function -needs to be able to direct us to the view we've requested - -and we need to add to the command line interface - the controller - -the necessary data to drive the new view. - -~~~ -# file: inflammation-analysis.py - -#!/usr/bin/env python3 -"""Software for managing patient data in our imaginary hospital.""" - -import argparse - -from inflammation import models, views - - -def main(args): - """The MVC Controller of the patient data system. - - The Controller is responsible for: - - selecting the necessary models and views for the current task - - passing data between models and views - """ - infiles = args.infiles - if not isinstance(infiles, list): - infiles = [args.infiles] - - for filename in infiles: - inflammation_data = models.load_csv(filename) - - if args.view == 'visualize': - view_data = { - 'average': models.daily_mean(inflammation_data), - 'max': models.daily_max(inflammation_data), - 'min': models.daily_min(inflammation_data), - } - - views.visualize(view_data) - - elif args.view == 'record': - patient_data = inflammation_data[args.patient] - observations = [models.Observation(day, value) for day, value in enumerate(patient_data)] - patient = models.Patient('UNKNOWN', observations) - - views.display_patient_record(patient) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser( - description='A basic patient data management system') - - parser.add_argument( - 'infiles', - nargs='+', - help='Input CSV(s) containing inflammation series for each patient') - - parser.add_argument( - '--view', - default='visualize', - choices=['visualize', 'record'], - help='Which view should be used?') - - parser.add_argument( - '--patient', - type=int, - default=0, - help='Which patient should be displayed?') - - args = parser.parse_args() - - main(args) -~~~ -{: .language-python} - -We've added two options to our command line interface here: -one to request a specific view and one for the patient ID that we want to lookup. -For the full range of features that we have access to with `argparse` see the -[Python module documentation](https://docs.python.org/3/library/argparse.html?highlight=argparse#module-argparse). -Allowing the user to request a specific view like this is -a similar model to that used by the popular Python library Click - -if you find yourself needing to build more complex interfaces than this, -Click would be a good choice. -You can find more information in [Click's documentation](https://click.palletsprojects.com/). - -For now, we also don't know the names of any of our patients, -so we've made it `'UNKNOWN'` until we get more data. - -We can now call our program with these extra arguments to see the record for a single patient: - -~~~ -$ python3 inflammation-analysis.py --view record --patient 1 data/inflammation-01.csv -~~~ -{: .language-bash} - -~~~ -UNKNOWN -0 0.0 -1 0.0 -2 1.0 -3 3.0 -4 1.0 -5 2.0 -6 4.0 -7 7.0 -... -~~~ -{: .output} - -> ## Additional Material -> -> Now that we've covered the basics of different programming paradigms -> and how we can integrate them into our multi-layer architecture, -> there are two optional extra episodes which you may find interesting. -> -> Both episodes cover the persistence layer of software architectures -> and methods of persistently storing data, but take different approaches. -> The episode on [persistence with JSON](/persistence) covers -> some more advanced concepts in Object Oriented Programming, while -> the episode on [databases](/databases) starts to build towards a true multilayer architecture, -> which would allow our software to handle much larger quantities of data. -{: .callout} - - -## Towards Collaborative Software Development - -Having looked at some theoretical aspects of software design, -we are now circling back to implementing our software design -and developing our software to satisfy the requirements collaboratively in a team. -At an intermediate level of software development, -there is a wealth of practices that could be used, -and applying suitable design and coding practices is what separates -an intermediate developer from someone who has just started coding. -The key for an intermediate developer is to balance these concerns -for each software project appropriately, -and employ design and development practices enough so that progress can be made. - -One practice that should always be considered, -and has been shown to be very effective in team-based software development, -is that of *code review*. -Code reviews help to ensure the 'good' coding standards are achieved -and maintained within a team by having multiple people -have a look and comment on key code changes to see how they fit within the codebase. -Such reviews check the correctness of the new code, test coverage, functionality changes, -and confirm that they follow the coding guides and best practices. -Let's have a look at some code review techniques available to us. From cf49f194752d61fd5c30ae4f3c2655dd8d7efd98 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 16:22:33 +0100 Subject: [PATCH 02/82] Add a concluding section talking about YAGNI --- _episodes/36-yagni | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 _episodes/36-yagni diff --git a/_episodes/36-yagni b/_episodes/36-yagni new file mode 100644 index 000000000..82c724800 --- /dev/null +++ b/_episodes/36-yagni @@ -0,0 +1,13 @@ +--- +title: "When to abstract, and when not to." +teaching: 0 +exercises: 0 +questions: +- "How to tell what is and isn't an appropriate abstraction" +objectives: +- "Understand how to determine correct abstractions. " +- "How to design large changes to the codebase." +keypoints: +- "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." +- "The best code is simple to understand and test, not the most clever or uses advanced language features." +--- From 22ef936a2c9aeb4188cf573306478f2c8b6a8afc Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 16:59:52 +0100 Subject: [PATCH 03/82] Add headers for the various new pages --- _episodes/32-software-design.md | 22 ++++++++++++++-- _episodes/33-refactoring-functions | 22 ++++++++++++++++ _episodes/34-refactoring-architecture | 32 ++++++++++++++++++------ _episodes/35-refactoring-decoupled-units | 23 +++++++++++++++++ _episodes/36-yagni | 22 ++++++++++++++++ 5 files changed, 111 insertions(+), 10 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 6020472a8..352712900 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -1,7 +1,7 @@ --- title: "Software Architecture and Design" -teaching: 15 -exercises: 30 +teaching: 0 +exercises: 0 questions: - "What should we consider when designing software?" - "What goals should we have when structuring our code" @@ -12,8 +12,26 @@ keypoints: - "How code is structured is important for helping future people understand and update it" - "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. Such components can be as small as a single function, or be a software package in their own right." +- "These smaller components can be understood individually without having to understand the entire codebase at once." - "When writing software used for research, requirements will almost *always* change." - "*'Good code is written so that is readable, understandable, covered by automated tests, not over complicated and does well what is intended to do.'*" --- +## Introduction + +* Thoughts on software design + +## Abstractions + +* Introduce the idea of an abstraction + +## Refactoring + +* Define refactoring +* Discuss the advantages of refactoring before making changes + +## The code for this episode + +* Introduce the code that will be used for this episode + {% include links.md %} diff --git a/_episodes/33-refactoring-functions b/_episodes/33-refactoring-functions index aa240023c..6684396f7 100644 --- a/_episodes/33-refactoring-functions +++ b/_episodes/33-refactoring-functions @@ -13,3 +13,25 @@ keypoints: - "By refactoring code into pure functions that act on data makes code easier to test." - "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" --- + +## Introduction + +* What is going to happen in this episode - learn good code design by refactoring some poorly + structured code. + +## Writing tests before refactoring + +* Explain the benefits of writing tests before refactoring +* Explain techniques for writing tests for hard to test, existing code + +## Pure functions + +* Explain what a pure function is +* Explain the benefits of pure functions for testing + +## Functional Programming + +* Introduce that pure functions are a concept from functional programming +* Mention tools and techniques Python has for functional programming + +{% include links.md %} diff --git a/_episodes/34-refactoring-architecture b/_episodes/34-refactoring-architecture index aa240023c..835a5937f 100644 --- a/_episodes/34-refactoring-architecture +++ b/_episodes/34-refactoring-architecture @@ -1,15 +1,31 @@ --- -title: "Refactoring functions to do just one thing" +title: "Architecting code to separate responsibilities" teaching: 0 exercises: 0 questions: -- "How do you refactor code without breaking it?" -- "How do you write code that is easy to test?" +- "What is the point of the MVC architecture" +- "How should code be structured" objectives: -- "Understand how to refactor functions to be easier to test" -- "Be able to write regressions tests to avoid breaking existing code" -- "Understand what a pure function is." +- "Understand the MVC pattern and how to apply it." +- "Understand the benefits of using patterns" keypoints: -- "By refactoring code into pure functions that act on data makes code easier to test." -- "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" +- "By splitting up the "view" code from "model" code, you allow easier re-use of code." +- "Using coding patterns can be useful inspirations for how to structure your code." --- + +## Introduction + +* Refamiliarise with MVC + +## Separating out considerations + +* Talk about model and view as distinct parts of the code +* Model should be made up of pure functions as discussed + +## Programming patterns + +* Talk about how MVC is one pattern +* Mention a couple of others than might be useful +* Talk about how patterns can be useful for designing architecture + +{% include links.md %} diff --git a/_episodes/35-refactoring-decoupled-units b/_episodes/35-refactoring-decoupled-units index cba637c80..d9826988b 100644 --- a/_episodes/35-refactoring-decoupled-units +++ b/_episodes/35-refactoring-decoupled-units @@ -13,3 +13,26 @@ keypoints: - "By using interfaces, code can become more decoupled." - "Decoupled code is easier to test, and easier to maintain." --- + +## Introduction + +* What is coupled and decoupled code +* Why decoupled code is better + +## Polymorphism + +* Introduce what a class is +* Introduce what an interface is +* Introduce what polymorphism is +* Explain how we can use polymorphism to introduce abstractions + +## How polymorphism is useful + +* Introduce the idea of using a different implementation + without changing the code +* Explain how to test code that uses an interface + +## Object Oriented Programming + +* Polymorphism is a tool from object oriented programming +* Outline some other tools from OOP that might be useful diff --git a/_episodes/36-yagni b/_episodes/36-yagni index 82c724800..21169d629 100644 --- a/_episodes/36-yagni +++ b/_episodes/36-yagni @@ -11,3 +11,25 @@ keypoints: - "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." - "The best code is simple to understand and test, not the most clever or uses advanced language features." --- + +## Introduction + +* Talk about the bigger picture of design having seen some techniques + +## Architecting larger changes + +* Talk about box diagrams + +## An abstraction too far + +* Drawbacks of abstraction +* Example showing too complex abstractions + +## You Ain't Gonna Need It + +* Introduce and explain YAGNI principle + +## Conclusion + +* Take care to think about software with the appropriate priorities and things will get better. +* Tips for getting better at architecture From 046677dea50adb5948dd98a5f06874cc800b286b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 17:51:35 +0100 Subject: [PATCH 04/82] Add in exercises for the new sections --- _episodes/32-software-design.md | 18 ++++++++ _episodes/33-refactoring-functions | 22 ++++++++++ _episodes/34-refactoring-architecture | 22 ++++++++++ _episodes/35-refactoring-decoupled-units | 55 ++++++++++++++++++++++++ _episodes/36-yagni | 17 ++++++++ 5 files changed, 134 insertions(+) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 352712900..58dbec5ff 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -25,6 +25,12 @@ Such components can be as small as a single function, or be a software package i * Introduce the idea of an abstraction +> ## Group Exercise: Think about examples of good and bad code +> Try to come up with examples of code that has been hard to understand - why? +> +> Try to come up with examples of code that was easy to understand and modify - why? +{: .challenge} + ## Refactoring * Define refactoring @@ -34,4 +40,16 @@ Such components can be as small as a single function, or be a software package i * Introduce the code that will be used for this episode +> ## Group Exercise: What is bad about this code? +> What about this code makes it hard to understand? +> What makes this code hard to change? +>> ## Solution +>> * Everything is in a single function +>> * If I want to use the data without using the graph I'd have to change it +>> * It is always analysing a fixed set of data +>> * It seems hard to write tests for it +>> * It doesn't have any tests +> {: .solution} +{: .challenge} + {% include links.md %} diff --git a/_episodes/33-refactoring-functions b/_episodes/33-refactoring-functions index 6684396f7..acbadcb79 100644 --- a/_episodes/33-refactoring-functions +++ b/_episodes/33-refactoring-functions @@ -24,11 +24,33 @@ keypoints: * Explain the benefits of writing tests before refactoring * Explain techniques for writing tests for hard to test, existing code +> ## Exercise: Write regression tests before refactoring +> Write a regression test to verify we don't break the code when refactoring +>> ## Solution +>> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/5000b6122e576d91c2acbc437184e00893483fdd +> {: .solution} +{: .challenge} + ## Pure functions * Explain what a pure function is + +> ## Exercise: Refactor the function into a pure function +> Refactor the function to call a pure function that just operates on and returns data. +>> ## Solution +>> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 +> {: .solution} +{: .challenge} + * Explain the benefits of pure functions for testing +> ## Exercise: Write some tests for the pure function +> Now we have refactored our a pure function, we can more easily write comprehensive tests +>> ## Solution +>> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 +> {: .solution} +{: .challenge} + ## Functional Programming * Introduce that pure functions are a concept from functional programming diff --git a/_episodes/34-refactoring-architecture b/_episodes/34-refactoring-architecture index 835a5937f..ea29f95a6 100644 --- a/_episodes/34-refactoring-architecture +++ b/_episodes/34-refactoring-architecture @@ -20,7 +20,29 @@ keypoints: ## Separating out considerations * Talk about model and view as distinct parts of the code + +> ## Exercise: Identify model and view parts of the code +> Looking at the code as it is, what parts should be considered "model" code +> and what parts should be considered "view" code? +>> ## Solution +>> The computation of the standard deviation is model code +>> The display of the output as a graph is the view code. +> {: .solution} +{: .challenge} + * Model should be made up of pure functions as discussed +TODO: Reading files is model code, but not pure + +> ## Exercise: Split out the model code from the view code +> Refactor the code to have the model code separated from +> the view code. +>> ## Solution +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/97fd04b747a6491c2590f34384eed44e83a8e73c +> {: .solution} +{: .challenge} + +TODO: did originally intend to add a new view - but I think it isn't necessary, isn't a great +example and the time could be better used. ## Programming patterns diff --git a/_episodes/35-refactoring-decoupled-units b/_episodes/35-refactoring-decoupled-units index d9826988b..efc0f92fc 100644 --- a/_episodes/35-refactoring-decoupled-units +++ b/_episodes/35-refactoring-decoupled-units @@ -19,19 +19,74 @@ keypoints: * What is coupled and decoupled code * Why decoupled code is better +> ## Exercise: Decouple the file loading from the computation +> Currently the function is hard coded to load all the files in a directory +> Decouple this into a separate function that returns all the files to load +>> ## Solution +>> TODO: This is breaking this down into more steps that I originally though, but I think +>> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd +>> At the end of this exercise, perhaps just have a version of load_data written and called directly +> {: .solution} +{: .challenge} + ## Polymorphism * Introduce what a class is +* Explain member methods +* Explain constructors + +> ## Exercise: Use a class to configure loading +> Put your function as a member method of a class, separating out the configuration +> of where to load the files from in the constructor, from where it actually loads the data +>> ## Solution +>> TODO: This is breaking this down into more steps that I originally though, but I think +>> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd +>> At the end of this exercise, they would have implemented `CSVDataSource`. +> {: .solution} +{: .challenge} + * Introduce what an interface is * Introduce what polymorphism is * Explain how we can use polymorphism to introduce abstractions +> ## Exercise: Define an interface for your class +> Create an interface class that defines the methods that a data source should provide +>> ## Solution +>> TODO: This is breaking this down into more steps that I originally though, but I think +>> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd +>> At the end of this exercise, they would have the complete solution. +> {: .solution} +{: .challenge} + ## How polymorphism is useful * Introduce the idea of using a different implementation without changing the code + +> ## Exercise: Introduce an alternative implentation of DataSource +> Create another class that repeatedly asks the user for paths to CSVs to analyse. +> It should inherit from the interface and implement the load_data method. +> Finally, at run time provide an instance of the new implementation if the user hasn't +> put any files on the path. +>> ## Solution +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/045754a11221a269771de8648fc56a383136fdaf +>> TODO: this is kind of hard too +> {: .solution} +{: .challenge} + * Explain how to test code that uses an interface +> ## Exercise: Test using a mock or dummy implemenation +> It is now possible to test your original method by providing a dummy +> implementation of the `DataProvider`. Use this to test the method +>> ## Solution +>> TODO: I haven't done this - do we want it? +> {: .solution} +{: .challenge} + ## Object Oriented Programming * Polymorphism is a tool from object oriented programming diff --git a/_episodes/36-yagni b/_episodes/36-yagni index 21169d629..bd3332b5f 100644 --- a/_episodes/36-yagni +++ b/_episodes/36-yagni @@ -20,6 +20,16 @@ keypoints: * Talk about box diagrams +> ## Exercise: Design a high-level architecture +> Consider implementing a new feature +> TODO: suggest a more complex feature +> Using boxes and lines sketch out an architecture for the code. +> Discuss with your team +>> ## Solution +>> An example design for the hypothetical problem. +> {: .solution} +{: .challenge} + ## An abstraction too far * Drawbacks of abstraction @@ -29,6 +39,13 @@ keypoints: * Introduce and explain YAGNI principle +> ## Exercise: Applying to real world examples +> Thinking about the examples of good and bad code you identified at the start of the episode. +> Identify what kind of principles were and weren't being followed +> Identify some refactorings that could be performed that would improve the code +> Discuss the ideas as a group. +{: .challenge} + ## Conclusion * Take care to think about software with the appropriate priorities and things will get better. From 664afbc2d9f4a8c7c7d737a76cd251eaa3555b51 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 17:52:58 +0100 Subject: [PATCH 05/82] Fix file name extensions --- .../{33-refactoring-functions => 33-refactoring-functions.md} | 0 ...34-refactoring-architecture => 34-refactoring-architecture.md} | 0 ...actoring-decoupled-units => 35-refactoring-decoupled-units.md} | 0 _episodes/{36-yagni => 36-yagni.md} | 0 4 files changed, 0 insertions(+), 0 deletions(-) rename _episodes/{33-refactoring-functions => 33-refactoring-functions.md} (100%) rename _episodes/{34-refactoring-architecture => 34-refactoring-architecture.md} (100%) rename _episodes/{35-refactoring-decoupled-units => 35-refactoring-decoupled-units.md} (100%) rename _episodes/{36-yagni => 36-yagni.md} (100%) diff --git a/_episodes/33-refactoring-functions b/_episodes/33-refactoring-functions.md similarity index 100% rename from _episodes/33-refactoring-functions rename to _episodes/33-refactoring-functions.md diff --git a/_episodes/34-refactoring-architecture b/_episodes/34-refactoring-architecture.md similarity index 100% rename from _episodes/34-refactoring-architecture rename to _episodes/34-refactoring-architecture.md diff --git a/_episodes/35-refactoring-decoupled-units b/_episodes/35-refactoring-decoupled-units.md similarity index 100% rename from _episodes/35-refactoring-decoupled-units rename to _episodes/35-refactoring-decoupled-units.md diff --git a/_episodes/36-yagni b/_episodes/36-yagni.md similarity index 100% rename from _episodes/36-yagni rename to _episodes/36-yagni.md From d39c621a4413756a3ee4586bce8ab63d5111879a Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 10 Oct 2023 13:41:01 +0100 Subject: [PATCH 06/82] Add first draft of the software design episode This section outlines the key ideas for the rest of the episode. --- _episodes/32-software-design.md | 108 ++++++++++++++++++++++++++++---- 1 file changed, 96 insertions(+), 12 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 58dbec5ff..c87491c90 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -4,10 +4,12 @@ teaching: 0 exercises: 0 questions: - "What should we consider when designing software?" -- "What goals should we have when structuring our code" +- "What goals should we have when structuring our code?" +- "What is refactoring?" objectives: -- "Understand what an abstraction is, and when you should use one" -- "Understand what refactoring is" +- "Know what goals we have when architecting and designing software." +- "Understand what an abstraction is, and when you should use one." +- "Understand what refactoring is." keypoints: - "How code is structured is important for helping future people understand and update it" - "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. @@ -19,11 +21,66 @@ Such components can be as small as a single function, or be a software package i ## Introduction -* Thoughts on software design +Typically when we start writing code, we write small scripts that +we intend to use. +We probably don't imagine we will need to change the code in the future. +We almost certainly don't expect other people will need to understand +and modify the code in the future. +However, as projects grow in complexity and the number of people involved grows, +it becomes important to think about how to structure code. +Software Architecture and Design is all about thinking about ways to make the +code be **maintainable** as projects grow. + +Maintainable code is: + + * Readable to people who didn't write the code. + * Testable through automated tests (like those from [episode 2](../21-automatically-testing-software/index.html)). + * Adaptable to new requirements. + +Writing code that meets these requirements is hard and takes practise. +Further, in most contexts you will already have a piece of code that breaks +some (or maybe all!) of these principles. + +In this episode we will explore techniques and processes that can help you +continuously improve the quality of code so, over time, it tends towards more +maintainable code. + +We will look at: + + * What abstractions are, and how to pick appropriate ones. + * How to take code that is in a bad shape and improve it. + * Best practises to write code in ways that facilitate achieving these goals. ## Abstractions -* Introduce the idea of an abstraction +An **abstraction**, at its most basic level, is a technique to hide the details +of one part of a system from another part of the system. +We deal with abstractions all the time - when you press the break pedal on the +car, you do not know how this manages both slowing down the engine and applying +pressure on the breaks. +The advantage of using this abstraction is, when something changes, for example +the introduction of anti-lock breaking or an electric engine, the driver does +not need to do anything differently - +the detail of how the car breaks is *abstracted* away from them. + +Abstractions are a fundamental part of software. +For example, when you write Python code, you are dealing with an +abstraction of the computer. +You don't need to understand how RAM functions. +Instead, you just need to understand how variables work in Python. + +In large projects it is vital to come up with good abstractions. +A good abstraction makes code easier to read, as the reader doesn't need to understand +all the details of the project to understand one part. +A good abstraction makes code easier to test, as it can be tested in isolation +from everything else. +Finally, a good abstraction makes code easier to adapt, as the details of +how a subsystem *used* to work are hidden from the user, so when they change, +the user doesn't need to know. + +In this episode we are going to look at some code and introduce various +different kinds of abstraction. +However, fundamentally any abstraction should be serving these goals. > ## Group Exercise: Think about examples of good and bad code > Try to come up with examples of code that has been hard to understand - why? @@ -33,21 +90,48 @@ Such components can be as small as a single function, or be a software package i ## Refactoring -* Define refactoring -* Discuss the advantages of refactoring before making changes +Often we are not working on brand new projects, but instead maintaining an existing +piece of software. +Often, this piece of software will be hard to maintain, perhaps because it has hard to understand, or doesn't have any tests. +In this situation, we want to adapt the code to make it more maintainable. +This will allow greater confidence of the code, as well as making future development easier. + +**Refactoring** is a process where some code is modified, such that its external behaviour remains +unchanged, but the code itself is easier to read, test and extend. + +When faced with a old piece of code that is hard to work with, that you need to modify, a good process to follow is: + +1. Refactor the code in such a way that the new change will slot in cleanly. +2. Make the desired change, which now fits in easily. + +Notice, after step 1, the *behaviour* of the code should be totally identical. +This allows you to test rigorously that the refactoring hasn't changed/broken anything +*before* making the intended change. + +In this episode, we will be making some changes to an existing bit of code that +is in need of refactoring. ## The code for this episode -* Introduce the code that will be used for this episode +The code itself is a feature to the inflammation tool we've been working on. + +In it, if the user adds `--full-data-analysis` then the program will scan the directory +of one of the provided files, compare standard deviations across the data by day and +plot a graph. + +We are going to be refactoring and extending this over the remainder of this episode. > ## Group Exercise: What is bad about this code? -> What about this code makes it hard to understand? -> What makes this code hard to change? +> In what ways does this code not live up to the ideal properties of maintainable code? +> Think about ways in which you find it hard to understand. +> Think about the kinds of changes you might want to make to it, and what would +> make making those changes challenging. >> ## Solution ->> * Everything is in a single function +>> * Everything is in a single function - reading it you have to understand how the file loading +works at the same time as the analysis itself. >> * If I want to use the data without using the graph I'd have to change it >> * It is always analysing a fixed set of data ->> * It seems hard to write tests for it +>> * It seems hard to write tests for it as it always analyses a fixed set of files >> * It doesn't have any tests > {: .solution} {: .challenge} From e0e6da1818b7cbf0ba285fb2cee223c53e04f63d Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 10 Oct 2023 14:53:47 +0100 Subject: [PATCH 07/82] Add first draft of the pure functions section --- _episodes/33-refactoring-functions.md | 188 ++++++++++++++++++++++++-- 1 file changed, 178 insertions(+), 10 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index acbadcb79..1ec91a617 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -5,6 +5,8 @@ exercises: 0 questions: - "How do you refactor code without breaking it?" - "How do you write code that is easy to test?" +- "What is functional programming?" +- "Which situations/problems is functional programming well suited for?" objectives: - "Understand how to refactor functions to be easier to test" - "Be able to write regressions tests to avoid breaking existing code" @@ -12,48 +14,214 @@ objectives: keypoints: - "By refactoring code into pure functions that act on data makes code easier to test." - "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" +- "Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations)." --- ## Introduction -* What is going to happen in this episode - learn good code design by refactoring some poorly - structured code. +In this episode we will take some code and refactor it in a way which is going to make it +easier to test. +By having more tests, we can more confident of future changes having their intended effect. +The change we will make will also end up making the code easier to understand. ## Writing tests before refactoring -* Explain the benefits of writing tests before refactoring +The process we are going to be following is: + +1. Write some tests that test the behaviour as it is now +2. Refactor the code to be more testable +3. Ensure that the original tests still pass + +By writing the tests *before* we refactor, we can be confident we haven't broken +existing behaviour through the refactoring. + +There is a bit of a chicken-and-the-egg problem here however. +If the refactoring is to make it easier to write tests, how can we write tests +before doing the refactoring? + +The tricks to get around this trap are: + + * Test at a higher level, with coarser accuracy + * Write tests that you intend to remove + +The best tests are ones that test single bits of code rigorously. +However, with this code it isn't possible to do that. +Instead we will make minimal changes to the code to make it a bit testable, +for example returning the data instead of visualising it. +We will also simply observe what the outcome is, rather than trying to +test the outcome is correct. +If the behaviour is currently broken, then we don't want to inadvertently fix it. + +As with everything in this episode, there isn't a hard and fast rule. +Refactoring doesn't change behaviour, but sometimes to make it possible to verify +you're not changing the important behaviour you have to make some small tweaks to write +the tests at all. + * Explain techniques for writing tests for hard to test, existing code > ## Exercise: Write regression tests before refactoring > Write a regression test to verify we don't break the code when refactoring >> ## Solution +>> One approach we can take is to: +>> * comment out the visualize (as this will cause our test to hang) +>> * return the data instead, so we can write asserts on the data +>> * See what the calculated value is, and assert that it is the same +>> Putting this together, you can write a test that looks something like: +>> +>> ```python +>> import numpy.testing as npt +>> +>> def test_compute_data(): +>> from inflammation.compute_data import analyse_data +>> path = 'data/' +>> result = analyse_data(path) +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 +>> ,0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094 +>> ,1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312 +>> ,1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578 +>> ,0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417 +>> ,0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707 +>> ,0.50323031,0.47574665,0.45197398,0.22070227] +>> npt.assert_array_almost_equal(result, expected_output) +>> ``` +>> +>> This isn't a good test: +>> * It isn't at all obvious why these numbers are correct. +>> * It doesn't test edge cases. +>> * If the files change, the test will start failing. +>> +>> However, it allows us to guarantee we don't accidentally change the analysis output. >> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/5000b6122e576d91c2acbc437184e00893483fdd > {: .solution} {: .challenge} ## Pure functions -* Explain what a pure function is +A **pure function** is a function that works like a mathematical function. +That is, it takes in some inputs as parameters, and it produces an output. +That output should always be the same for the same input. +That is, it does not depend on any information not present in the inputs (such as global variables, databases, the time of day etc.) +Further, it should not cause any **side effects" such as writing to a file or changing a global variable. + +You should try and have as much of the complex, analytical and mathematical code in pure functions. + +Maybe something about cognitive load here? And maybe drop other two advantages til later. + +Pure functions have a number of advantages: + +* They are easy to test: you feed in inputs and get fixed outputs +* They are easy to understand: when you are reading them you have all + the information they depend on, you don't need to know what is likely to be in + a database, or what the state of a global variable is likely to be. +* They are easy to re-use: because they always behave the same, you can always use them + +Some parts of a program are inevitably impure. +Programs need to read input from the user, or write to a database. +Well designed programs separate complex logic from the necessary "glue" code that interacts with users and systems. +This way, you have easy-to-test, easy-to-read code that contains the complex logic. +And you have really simple code that just reads data from a file, or gathers user input etc, +that is maybe harder to test, but is so simple that it only needs a handful of tests anyway. > ## Exercise: Refactor the function into a pure function -> Refactor the function to call a pure function that just operates on and returns data. +> Refactor the `analyse_data` function into a pure function with the logic, and an impure function that handles the input and output. +> The pure function should take in the data, and return the analysis results. +> The "glue" function should maintain the behaviour of the original `analyse_data` +> but delegate all the calculations to the new pure function. >> ## Solution +>> You can move all of the code that does the analysis into a separate function that +>> might look something like this: +>> ```python +>> def compute_standard_deviation_by_data(all_loaded_data): +>> means_by_day = map(models.daily_mean, all_loaded_data) +>> means_by_day_matrix = np.stack(list(means_by_day)) +>> +>> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) +>> return daily_standard_deviation +>> ``` +>> Then the glue function can use this function, whilst keeping all the logic +>> for reading the file and processing the data for showing in a graph: +>>```python +>>def analyse_data(data_dir): +>> """Calculate the standard deviation by day between datasets +>> Gets all the inflammation csvs within a directory, works out the mean +>> inflammation value for each day across all datasets, then graphs the +>> standard deviation of these means.""" +>> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {data_dir}") +>> data = map(models.load_csv, data_file_paths) +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> +>> graph_data = { +>> 'standard deviation by day': daily_standard_deviation, +>> } +>> # views.visualize(graph_data) +>> return daily_standard_deviation +>>``` >> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 > {: .solution} {: .challenge} -* Explain the benefits of pure functions for testing +Now we have a pure function for the analysis, we can write tests that cover +all the things we would like tests to cover without depending on the data +existing in CSVs. + +This will make tests easier to write, but it will also make them easier to read. +The reader will not have to open up a CSV file to understand why the test is correct. + +It will also make the tests easier to maintain. +If at some point the data format is changed from CSV to JSON, the bulk of the tests +won't need to be updated. > ## Exercise: Write some tests for the pure function -> Now we have refactored our a pure function, we can more easily write comprehensive tests +> Now we have refactored our a pure function, we can more easily write comprehensive tests. +> Add tests that check for when there is only one file with multiple rows, multiple files with one row +> and any other cases you can think of that should be tested. >> ## Solution ->> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 +>> You might hev throught of more tests, but we can easily extend the test by parameterizing +>> with more inputs and expected outputs: +>> ```python +>>@pytest.mark.parametrize('data,expected_output', [ +>> ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), +>> ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), +>> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) +>>], +>>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) +>>def test_compute_standard_deviation_by_data(data, expected_output): +>> from inflammation.compute_data import compute_standard_deviation_by_data +>> +>> result = compute_standard_deviation_by_data(data) +>> npt.assert_array_almost_equal(result, expected_output) +``` > {: .solution} {: .challenge} ## Functional Programming -* Introduce that pure functions are a concept from functional programming -* Mention tools and techniques Python has for functional programming +**Pure Functions** are a concept that is part of the idea of **Functional Programming**. +Functional programming is a style of programming that encourages using pure functions, +chained together. +Some programming languages, such as Haskell or Lisp just support writing functional code, +but it is more common for languages to allow using functional and **imperative** (the style +of code you have probably been writing thus far where you instruct the computer directly what to do). +Python, Java, C++ and many other languages allow for mixing these two styles. + +In Python, you can use the built-in functions `map`, `filter` and `reduce` to chain +pure functions together into pipelines. + +In the original code, we used `map` to "map" the file paths into the loaded data. +Extending this idea, you could then "map" the results of that through another process. + +You can read more about using these language features [here](https://www.learnpython.org/en/Map%2C_Filter%2C_Reduce). +Other programming languages will have similar features, and searching "functional style" + your programming language of choice +will help you find the features available. + +There are no hard and fast rules in software design but making your complex logic out of composed pure functions is a great place to start +when trying to make code readable, testable and maintainable. +This tends to be possible when: + +* Doing any kind of data analysis +* Simulations +* Translating data from one format to another {% include links.md %} From 3103a3cc77dd45aa1eb289ddd62969af57aed90d Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 10 Oct 2023 16:41:19 +0100 Subject: [PATCH 08/82] Adding first draft of the MVC section --- _episodes/34-refactoring-architecture.md | 105 ++++++++++++++++++++--- 1 file changed, 92 insertions(+), 13 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index ea29f95a6..afe3398c1 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -6,48 +6,127 @@ questions: - "What is the point of the MVC architecture" - "How should code be structured" objectives: +- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." - "Understand the MVC pattern and how to apply it." - "Understand the benefits of using patterns" keypoints: -- "By splitting up the "view" code from "model" code, you allow easier re-use of code." +- "By splitting up the \"view\" code from \"model\" code, you allow easier re-use of code." - "Using coding patterns can be useful inspirations for how to structure your code." --- + ## Introduction -* Refamiliarise with MVC +Model-View-Controller (MVC) is a way of separating out different portions of a typical +application. Specifically we have: + +* The **model** which contains the internal data representations for the program, and the valid + operations that can be performed on it. +* The **view** is responsible for how this data is presented to the user (e.g. through a GUI or + by writing out to a file) +* The **controller** defines how the model can be interacted with. + +Separating out these different sections into different parts of the code will make +the code much more maintainable. +For example, if the view code is kept away from the model code, then testing the model code +can be done without having to worry about how it will be presented. + +It helps with readability, as it makes it easier to have each function doing +just one thing. + +It also helps with maintainability - if the UI requirements change, these changes +are easily isolated from the more complex logic. ## Separating out considerations -* Talk about model and view as distinct parts of the code +The key thing to take away from MVC is the distinction between model code and view code. + +> The view and the controller tend to be more tightly coupled and it isn't always sensible +> to draw a thick line dividing these two. Depending on how the user interacts with the software +> this distinction may not be possible (the code that specifies there is a button on the screen, +> might be the same code that specifies what that button does). In fact, the original proposer +> of MVC groups the views and the controller into a single element, called the tool. Other modern +> architectures like Model-ViewModel-View do away with the controller and instead separate out the +> layout code from a programmable view of the UI. +{: .callout} + +The view code might be hard to test, or use libraries to draw the UI, but should +not contain any complex logic, and is really just a presentation layer on top of the model. + +The model, conversely, should operate quite agonistically of how a specific tool might interact with it. +For example, perhaps there currently is no way > ## Exercise: Identify model and view parts of the code > Looking at the code as it is, what parts should be considered "model" code > and what parts should be considered "view" code? >> ## Solution ->> The computation of the standard deviation is model code ->> The display of the output as a graph is the view code. +>> * The computation of the standard deviation is model code +>> * Reading the data is also model code. +>> * The display of the output as a graph is the view code. +>> * The controller is the logic that processes what flags the user has provided. > {: .solution} {: .challenge} -* Model should be made up of pure functions as discussed -TODO: Reading files is model code, but not pure +Within the model there is further separation that makes sense. +For example, as discussed, separating out the code that interacts with file systems from +the calculations is sensible. +Nevertheless, the MVC approach is a great starting point when thinking about how you should structure your code. > ## Exercise: Split out the model code from the view code > Refactor the code to have the model code separated from > the view code. >> ## Solution +>> The idea here is to have `analyse_data` to not have any "view" considerations. +>> That is, it should just compute and return the data. +>> +>> ```python +>> def analyse_data(data_dir): +>> """Calculate the standard deviation by day between datasets +>> Gets all the inflammation csvs within a directory, works out the mean +>> inflammation value for each day across all datasets, then graphs the +>> standard deviation of these means.""" +>> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {data_dir}") +>> data = map(models.load_csv, data_file_paths) +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> +>> return daily_standard_deviation +>> ``` +>> There can be a separate bit of code that chooses how that should be presented, e.g. as a graph: +>> +>> ```python +>> if args.full_data_analysis: +>> data_result = analyse_data(os.path.dirname(InFiles[0])) +>> graph_data = { +>> 'standard deviation by day': data_result, +>> } +>> views.visualize(graph_data) +>> return +>> ``` >> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/97fd04b747a6491c2590f34384eed44e83a8e73c > {: .solution} {: .challenge} -TODO: did originally intend to add a new view - but I think it isn't necessary, isn't a great -example and the time could be better used. - ## Programming patterns -* Talk about how MVC is one pattern -* Mention a couple of others than might be useful -* Talk about how patterns can be useful for designing architecture +MVC is a **programming pattern**, which is a template for structuring code. +Patterns are useful starting point for how to design your software. +They also work as a common vocabulary for discussing software designs with +other developers. + +The Refactoring Guru website has a [list of programming patterns](https://refactoring.guru/design-patterns/catalog). +They aren't all good design decisions, and can certainly be over-applied, but learning about them can be helpful +for thinking at a big picture level about software design. + +For example, the [visitor pattern](https://refactoring.guru/design-patterns/visitor) is +a good way of separating the problem of how to move through the data +from a specific action you want to perform on the data. + +By having a terminology for these approaches can facilitate discussions +where everyone is familiar with them. +However, they cannot replace a full design as most problems will require +a bespoke design that maps cleanly on to the specific problem you are +trying to solve. {% include links.md %} From 9c40c44fe4f0aae5965e4db3559663def62508ed Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 17 Oct 2023 16:19:13 +0100 Subject: [PATCH 09/82] First draft of the class section of the episode --- _episodes/35-refactoring-decoupled-units.md | 377 ++++++++++++++++++-- 1 file changed, 341 insertions(+), 36 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index efc0f92fc..3aae044d4 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -5,89 +5,394 @@ exercises: 0 questions: - "What is de-coupled code?" - "When is it useful to use classes to structure code?" +- "How can we make sure the components of our software are reusable?" objectives: - "Understand the object-oriented principle of polymorphism and interfaces." - "Be able to introduce appropriate abstractions to simplify code." - "Understand what decoupled code is, and why you would want it." +- "Be able to use mocks to replace a class in test code." keypoints: +- "Classes can help separate code so it is easier to understand." - "By using interfaces, code can become more decoupled." - "Decoupled code is easier to test, and easier to maintain." --- ## Introduction -* What is coupled and decoupled code -* Why decoupled code is better +When we're thinking about units of code, one important thing to consider is +whether the code is **decoupled** (as opposed to **coupled**). +Two units of code can be considered decoupled if changes in one don't +necessitate changes in the other. +While two connected units can't be totally decoupled, loose coupling +allows for more maintainable code: + +* Loosely coupled code is easier to read as you don't need to understand the + detail of the other unit. +* Loosely coupled code is easier to test, as one of the units can be replaced + by a test or mock version of it. +* Loose coupled code tends to be easier to maintain, as changes can be isolated + from other parts of the code. > ## Exercise: Decouple the file loading from the computation > Currently the function is hard coded to load all the files in a directory > Decouple this into a separate function that returns all the files to load >> ## Solution ->> TODO: This is breaking this down into more steps that I originally though, but I think ->> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd ->> At the end of this exercise, perhaps just have a version of load_data written and called directly +>> You should have written a new function that reads all the data into the format needed +>> for the analysis: +>> ```python +>> def load_inflammation_data(dir_path): +>> data_file_paths = glob.glob(os.path.join(dir_path, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {dir_path}") +>> data = map(models.load_csv, data_file_paths) +>> return list(data) +>> ``` +>> This can then be used in the analysis. +>> ```python +>> def analyse_data(data_dir): +>> ... +>> data = load_inflammation_data(data_dir) +>> ... +>> ``` +>> This is now easier to understand, as we don't need to understand the the file loading +>> to read the statistical analysis, and we don't have to understand the statistical analysis +>> when reading the data loading. > {: .solution} {: .challenge} -## Polymorphism +## Using classes to encapsulate data and behaviours + +Abstractedly, we can talk about units of code, where we are thinking of the unit doing one "thing". +In practise, in Python there are three ways we can create defined units of code. +The first is functions, which we have used. +The next level up is **classes**. +Finally, there are also modules and packages, which we won't cover. + +A class is a way of grouping together data with some specific methods. +In Python, you can declare a class as follows: + +```python +class MyClass: + pass +``` + +They are typically named using `UpperCase`. + +You can then **construct** a class elsewhere in your code by doing the following: + +```python +my_class = MyClass() +``` + +When you construct a class in this ways, its **construtor** is called. It is possible +to pass in values to the constructor that configure the class: + +```python +class Circle: + def __init__(self, radius): + self.radius = radius + +my_circle = Circle(10) +``` + +The constructor has the special name `__init__` (one of the so called "dunder methods"). +Notice it also has a special first parameter called `self` (called this by convention). +This parameter can be used to access the current **instance** of the object being created. + +A class can be thought of as a cookie cutter template, +and the instances are the cookies themselves. +That is, one class can have many instances. + +Classes can also have methods defined on them. +Like constructors, they have an special `self` parameter that must come first. -* Introduce what a class is -* Explain member methods -* Explain constructors +```python +class Circle: + ... + def get_area(self): + return Math.PI * self.radius * self.radius +... +print(my_circle.get_area()) +``` + +Here the instance of the class, `my_circle` will be automatically +passed in as the first parameter when calling `get_area`. +Then the method can access the **member variable** `radius`. + +Classes have a number of uses. + +* Encapsulating data - such as grouping three numbers together into a Vector class +* Maintaining invariants - TODO an example here would be good +* Encapsulating behaviour - such as a class that csha > ## Exercise: Use a class to configure loading > Put your function as a member method of a class, separating out the configuration -> of where to load the files from in the constructor, from where it actually loads the data +> of where to load the files from in the constructor, from where it actually loads the data. +> Once this is done, you can construct this class outside the the statistical analysis +> and pass it in. >> ## Solution ->> TODO: This is breaking this down into more steps that I originally though, but I think ->> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd ->> At the end of this exercise, they would have implemented `CSVDataSource`. +>> ```python +>> class CSVDataSource: +>> """ +>> Loads all the inflammation csvs within a specified folder. +>> """ +>> def __init__(self, dir_path): +>> self.dir_path = dir_path +>> super().__init__() +>> +>> def load_inflammation_data(self): +>> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {self.dir_path}") +>> data = map(models.load_csv, data_file_paths) +>> return list(data) +>> ``` +>> We can now pass an instance of this class into the the statistical analysis function, +>> constructing the object in the controller code. +>> This means that should we want to re-use the analysis it wouldn't be fixed to reading +>> from a directory of CSVs. +>> We have "decoupled" the reading of the data from the statistical analysis. +>> ```python +>> def analyse_data(data_source): +>> ... +>> data = data_source.load_inflammation_data() +>> ``` +>> +>> In the controller, you might have something like: +>> +>> ```python +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> data_result = analyse_data(data_source) +>> ``` +>> Note in all these refactorings the behaviour is unchanged, +>> so we can still run our original tests to ensure we've not +>> broken anything. > {: .solution} {: .challenge} -* Introduce what an interface is -* Introduce what polymorphism is -* Explain how we can use polymorphism to introduce abstractions +## Interfaces + +Another important concept in software design is the idea of **interfaces** between different units in the code. +One kind of interface you might have come across are APIs (Application Programming Interfaces). +These allow separate systems to communicate with each other - such as a making an API request +to Google Maps to find the latitude and longitude of an address. + +However, there are internal interfaces within our software that dictate how +different units of the system interact with each other. +Even if these aren't thought out or documented, they still exist! + +For example, there is an interface for how the statistical analysis in `analyse_data` +uses the class `CSVDataSource` - the method `load_inflammation_data`, how it should be called +and what it will return. + +Interfaces are important to get right - a messy interface will force tighter coupling between +two units in the system. +Unfortunately, it would be an entire course to cover everything to consider in interface design. + +In addition to the abstract notion of an interface, many programming languages +support creating interfaces as a special kind of class. +Python doesn't support this explicitly, but we can still use this feature with +regular classes. +An interface class will define some methods, but not provide an implementation: + +```python +class Shape: + def get_area(): + raise NotImplementedError +``` > ## Exercise: Define an interface for your class -> Create an interface class that defines the methods that a data source should provide +> As discussed, there is an interface between the CSVDataSource and the analysis. +> Write an interface(that is, a class that defines some empty methods) called `InflammationDataSource` +> that makes this interface explicit. +> Document the format the data will be returned in. >> ## Solution ->> TODO: This is breaking this down into more steps that I originally though, but I think ->> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd ->> At the end of this exercise, they would have the complete solution. +>> ```python +>> class InflammationDataSource: +>> """ +>> An interface for providing a series of inflammation data. +>> """ +>> +>> def load_inflammation_data(self): +>> """ +>> Loads the data and returns it as a list, where each entry corresponds to one file, +>> and each entry is a 2D array with patients inflammation by day. +>> :returns: A list where each entry is a 2D array of patient inflammation results by day +>> """ +>> raise NotImplementedError +>> ``` > {: .solution} {: .challenge} -## How polymorphism is useful +An interface on its own is not useful - it cannot be instantiated. +The next step is to create a class that **implements** the interface. +That is, create a class that inherits from the interface and then provide +implementations of all the methods on the interface. +To return to our `Shape` interface, we can write classes that implement this +interface, with different implementations: + +```python +class Circle(Shape): + ... + def get_area(self): + return math.pi * self.radius * self.radius -* Introduce the idea of using a different implementation - without changing the code +class Rectangle(Shape): + ... + def get_area(self): + return self.width * self.height +``` -> ## Exercise: Introduce an alternative implentation of DataSource +As you can see, by putting `ShapeInterface`` in brackets after the class +we are saying a `Circle` **is a** `Shape`. + +> ## Exercise: Implement the interface +> Modify the existing class to implement the interface. +> Ensure the method matches up exactly to the interface. +>> ## Solution +>> We can create a class that implements `load_inflammation_data`. +>> We can lift the code into this new class. +>> +>> ```python +>> class CSVDataSource(InflammationDataSource): +>> ``` +> {: .solution} +{: .challenge} + +## Polymorphism + +Where this gets useful is by using a concept called **polymorphism** +which is a fancy way of saying we can use an instance of a class and treat +it as a `Shape`, without worrying about whether it is a `Circle` or a `Rectangle`. + + +```python +my_circle = Circle(radius=10) +my_rectangle = Rectangle(width=5, height=3) +my_shapes = [my_circle, my_rectangle] +total_area = sum(shape.get_area() for shape in my_shapes) +``` + +This is an example of **abstraction** - when we are calculating the total +area, the method for calculating the area of each shape is abstracted away +to the relevant class. + +### How polymorphism is useful + +As we saw with the `Circle` and `Square` examples, we can use interfaces and polymorphism +to provide different implementations of the same interface. + +For example, we could replace our `CSVReader` with a class that reads a totally different format, +or reads from an external service. +All of these can be added in without changing the analysis. +Further - if we want to write a new analysis, we can support any of these data sources +for free with no further work. +That is, we have decoupled the job of loading the data from the job of analysing the data. + +> ## Exercise: Introduce an alternative implementation of DataSource > Create another class that repeatedly asks the user for paths to CSVs to analyse. -> It should inherit from the interface and implement the load_data method. +> It should inherit from the interface and implement the `load_inflammation_data` method. > Finally, at run time provide an instance of the new implementation if the user hasn't > put any files on the path. >> ## Solution ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/045754a11221a269771de8648fc56a383136fdaf ->> TODO: this is kind of hard too +>> ```python +>> class UserProvidSpecificFilesDataSource(InflammationDataSource): +>> def load_inflammation_data(self): +>> paths = [] +>> while(True): +>> input_string = input('Enter path to CSV or press enter to process paths collected: ') +>> if(len(input_string) == 0): +>> print(f'Finished entering input - will process {len(paths)} CSVs') +>> break +>> if os.path.exists(input_string): +>> paths.append(input_string) +>> else: +>> print(f'Path {input_string} does not exist, please enter a valid path') +>> +>> data = map(models.load_csv, paths) +>> return list(data) +>> ``` > {: .solution} {: .challenge} -* Explain how to test code that uses an interface +We can use this abstraction to also make testing more straight forward. +Instead of having our tests use real file system data, we can instead provide +a mock or dummy implementation of the `InflammationDataSource` that just returns some example data. +Separately, we can test the file parsing class `CSVReader` without having to understand +the specifics of the statistical analysis. + +An convenient way to do this in Python is using Mocks. +These are a whole topic to themselves - but a basic mock can be constructed using a couple of lines of code: + +```python +mock_version = Mock() +mock_version.method_to_mock.return_value = 42 +``` + +Here we construct a mock in the same way you'd construct a class. +Then we specify a method that we want to behave a specific way. + +Now whenever you call `mock_version.method_to_mock()` the return value will be `42`. + > ## Exercise: Test using a mock or dummy implemenation -> It is now possible to test your original method by providing a dummy -> implementation of the `DataProvider`. Use this to test the method +> Create a mock for the `InflammationDataSource` that returns some fixed data to test +> the `analyse_data` method. +> Use this mock in a test. >> ## Solution ->> TODO: I haven't done this - do we want it? +>> ```python +>> def test_compute_data_mock_source(): +>> from inflammation.compute_data import analyse_data +>> data_source = Mock() +>> data_source.load_inflammation_data.return_value = [[[0, 2, 0]], +>> [[0, 1, 0]]] +>> +>> result = analyse_data(data_source) +>> npt.assert_array_almost_equal(result, [0, math.sqrt(0.25) ,0]) +>> ``` > {: .solution} {: .challenge} ## Object Oriented Programming -* Polymorphism is a tool from object oriented programming -* Outline some other tools from OOP that might be useful +Using classes, particularly when using polymorphism, are techniques that come from +**object oriented programming** (frequently abbreviated to OOP). +As with functional programming different programming languages will provide features to enable you +to write object oriented programming. +For example, in Python you can create classes, and use polymorphism to call the +correct method on an instance (e.g when we called `get_area` on a shape, the appropriate `get_area` was called.) + +Object oriented programming also includes **information hiding**. +In this, certain fields might be marked private to a class, +preventing them from being modified at will. + +This can be used to maintain invariants of a class (such as insisting that a circles radius is always non-negative). + +There is also inheritance, which allows classes to specialise the behaviour of other classes by **inheriting** from +another class and **overriding** certain methods. + +As with functional programming, there are times when object oriented programming is well suited, and times where it is not. + +Good uses: + + * Representing real world objects with invariants + * Providing alternative implementations such as we did with DataSource + * Representing something that has a state that will change over the programs lifetime (such as elements of a GUI) + +One downside of OOP is ending up with very large classes that contain complex methods. +As they are methods on the class, it can be hard to know up front what side effects it causes to the class. +This can make maintenance hard. + +Grouping data together into logical structures (such as three numbers into a vector) is a vital step in writing +readable and maintainable code. +However, when using classes in this way it is best for them to be immutable (can't be changed) +It is worth noting that you can use classes to group data together - a very useful feature that you should be using everywhere + - does not you can't be practising functional programming: + +You can still have classes, and these classes might have read-only methods on (such as the `get_area` we defined for shapes) +but then still have your complex logic operate on + +Don't use features for the sake of using features. +Code should be as simple as it can be, but not any simpler. +If you know your function only makes sense to operate on circles, then +don't accept shapes just to use polymorphism! From 860fbd8170b4271a31ca096bfa9254c486715e5c Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 17 Oct 2023 17:27:30 +0100 Subject: [PATCH 10/82] First draft of the episode conclusion --- _episodes/36-yagni.md | 121 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 108 insertions(+), 13 deletions(-) diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md index bd3332b5f..fec77e860 100644 --- a/_episodes/36-yagni.md +++ b/_episodes/36-yagni.md @@ -3,41 +3,125 @@ title: "When to abstract, and when not to." teaching: 0 exercises: 0 questions: -- "How to tell what is and isn't an appropriate abstraction" +- "How to tell what is and isn't an appropriate abstraction." +- "How to design larger solutions." objectives: - "Understand how to determine correct abstractions. " - "How to design large changes to the codebase." keypoints: - "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." - "The best code is simple to understand and test, not the most clever or uses advanced language features." +- "Sketching a diagram of the code can clarify how it is supposed to work, and troubleshoot problems early." --- ## Introduction -* Talk about the bigger picture of design having seen some techniques +In this episode we have explored a range of techniques for architecting code: + + * Using pure functions assembled into pipelines to perform analysis + * Using established patterns to discuss design + * Separating different considerations, such as how data is presented from how it is stored + * Using classes to create abstractions + +None of these techniques are always applicable, and they are not sufficient to design a good technical solution. ## Architecting larger changes -* Talk about box diagrams +When creating a new application, or creating a substantial change to an existing one, +it can be really helpful to sketch out the intended architecture on a whiteboard +(pen and paper works too, though of course it might get messy as you iterate on the design!). + +The basic idea is you draw boxes that will represent different units of code, as well as +other components of the system (such as users, databases etc). +Then connect these boxes with lines where information or control will be exchanged. +These lines represent the interfaces in your system. + +As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. +For example, if there is a circular dependency between two sections of the design. +It can also help with estimating how long the work will take, as it forces you to consider all the components that +need to be made. + +Diagrams aren't foolproof, and often the stuff we haven't considered won't make it on to the diagram +but they are a great starting point to break down the different responsibilities and think about +the kinds of information different parts of the system will need. + > ## Exercise: Design a high-level architecture -> Consider implementing a new feature -> TODO: suggest a more complex feature -> Using boxes and lines sketch out an architecture for the code. -> Discuss with your team +> Sketch out a design for a new feature requested by a user +> +> *"I want there to be a Google Drive folder that when I upload new inflammation data to +> the software automatically pulls it down and updates the analysis. +> The new result should be added to a database with a timestamp. +> An email should then be sent to a group email notifying them of the change."* +> +> TODO: this doesn't generate a very interesting diagram +> >> ## Solution ->> An example design for the hypothetical problem. +>> An example design for the hypothetical problem. (TODO: incomplete) +>> ```mermaid +graph TD + A[(GDrive Folder)] + B[(Database)] + C[GDrive Monitor] + C -- Checks periodically--> A + D[Download inflammation data] + C -- Trigger update --> D + E[Parse inflammation data] + D --> E + F[Perform analysis] + E --> F + G[Upload analysis] + F --> G + G --> B + H[Notify users] +>> ``` > {: .solution} {: .challenge} ## An abstraction too far -* Drawbacks of abstraction -* Example showing too complex abstractions +So far we have seen how abstractions are good for making code easier to read, maintain and test. +However, it is possible to introduce too many abstractions. + +> All problems in computer science can be solved by another level of indirection except the problem of too many levels of indirection + +When you introduce an abstraction, if the reader of the code needs to understand what is happening inside the abstraction, +it has actually made the code *harder* to read. +When code is just in the function, it can be clear to see what it is doing. +When the code is calling out to an instance of a class that, thanks to polymorphism, could be a range of possible implementations, +the only way to find out what is *actually* being called is to run the code and see. +This is much slower to understand, and actually obfuscates meaning. + +It is a judgement as to whether you have make the code too abstract. +If you have to jump around a lot when reading the code that is a clue that is too abstract. +Similarly, if there are two parts of the code that always need updating together, that is +again an indication of an incorrect or over-zealous abstraction. + ## You Ain't Gonna Need It -* Introduce and explain YAGNI principle +There are different approaches to designing software. +One principle that is popular is called You Ain't Gonna Need it - "YAGNI" for short. +The idea is that, since it is hard to predict the future needs of a piece of software, +it is always best to design the simplest solution that solves the problem at hand. +This is opposed to trying to imagine how you might want to adapt the software in future +and designing the code with that in mind. + +Then, since you know the problem you are trying to solve, you can avoid making your solution unnecessarily complex or abstracted. + +In our example, it might be tempting to abstract how the `CSVDataSource` walks the file tree into a class. +However, since we only have one strategy for exploring the file tree, this would just create indirection for the sake of it +- now a reader of CSVDataSource would have to read a different class to find out how the tree is walked. +Maybe in the future this is something that needs to be customised, but we haven't really made it any harder to do by *not* doing this prematurely +and once we have the concrete feature request, it will be easier to design it appropriately. + +> All of this is a judgement. +> For example, in this case, perhaps it *would* make sense to at least pull the file parsing out into a separate +> class, but not have the CSVDataSource be configurable. +> That way, it is clear to see how the file tree is being walked (there's no polymorphism going on) +> without mixing the *parsing* code in with the file finding code. +> There are no right answers, just guidelines. +{: .callout} > ## Exercise: Applying to real world examples > Thinking about the examples of good and bad code you identified at the start of the episode. @@ -48,5 +132,16 @@ keypoints: ## Conclusion -* Take care to think about software with the appropriate priorities and things will get better. -* Tips for getting better at architecture +Good architecture is not about applying any rules blindly, but instead practise and taking care around important things: + +* Avoid duplication of code or data. +* Keeping how much a person has to understand at once to a minimum. +* Think about how interfaces will work. +* Separate different considerations into different sections of the code. +* Don't try and design a future proof solution, focus on the problem at hand. + +Practise makes perfect. +One way to practise is to consider code that you already have and think how it might be redesigned. +Another way is to always try to leave code in a better state that you found it. +So when you're working on a less well structured part of the code, start by refactoring it so that your change fits in cleanly. +Doing this, over time, with your colleagues, will improve your skills as software architecture as well as improving the code. From fcd6aa1e9d45ad31432653ed17a0d39e548661b0 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 11:57:07 +0100 Subject: [PATCH 11/82] Update introduction to not mention paradigms --- _episodes/30-section3-intro.md | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index 2bc022d39..4bd5bb742 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -131,15 +131,10 @@ within the context of the typical software development process: - How requirements inform and drive the **design of software**, the importance, role, and examples of **software architecture**, and the ways we can describe a software design. -- **Implementation choices** in terms of **programming paradigms**, - looking at **procedural**, **functional**, and **object oriented** paradigms of development. - Modern software will often contain instances of multiple paradigms, - so it is worthwhile being familiar with them and knowing when - to switch in order to make better code. -- How you can (and should) assess and update a software's architecture when - requirements change and complexity increases - - is the architecture still fit for purpose, - or are modifications and extensions becoming increasingly difficult to make? +- How to improve existing code to be more readable, maintainable and testable. +- Consider different strategies for writing well designed code, including + using **pure functions**, **classes** and **abstractions**. +- How to create, asses and improve software design. {% include links.md %} From 0d06342cd24826ba3cb64181bec17116a6f4581c Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 11:59:40 +0100 Subject: [PATCH 12/82] Move exercise about good and bad code before abstractions This relates more to the descriptions of good code, so we might as well have this discussion before introducing new concepts --- _episodes/32-software-design.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index c87491c90..63b81d730 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -41,6 +41,12 @@ Writing code that meets these requirements is hard and takes practise. Further, in most contexts you will already have a piece of code that breaks some (or maybe all!) of these principles. +> ## Group Exercise: Think about examples of good and bad code +> Try to come up with examples of code that has been hard to understand - why? +> +> Try to come up with examples of code that was easy to understand and modify - why? +{: .challenge} + In this episode we will explore techniques and processes that can help you continuously improve the quality of code so, over time, it tends towards more maintainable code. @@ -82,12 +88,6 @@ In this episode we are going to look at some code and introduce various different kinds of abstraction. However, fundamentally any abstraction should be serving these goals. -> ## Group Exercise: Think about examples of good and bad code -> Try to come up with examples of code that has been hard to understand - why? -> -> Try to come up with examples of code that was easy to understand and modify - why? -{: .challenge} - ## Refactoring Often we are not working on brand new projects, but instead maintaining an existing From 97f8590ae83443aae447f2938ede772903885799 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 12:11:40 +0100 Subject: [PATCH 13/82] Highlight where the code we are refactoring is --- _episodes/32-software-design.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 63b81d730..80a4b23db 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -119,6 +119,8 @@ In it, if the user adds `--full-data-analysis` then the program will scan the di of one of the provided files, compare standard deviations across the data by day and plot a graph. +The main body of it exists in `inflammation/compute_data.py` in a function called `analyse_data`. + We are going to be refactoring and extending this over the remainder of this episode. > ## Group Exercise: What is bad about this code? From 78943f8e68e80b76bc6e413ca44854f587576b91 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 12:12:24 +0100 Subject: [PATCH 14/82] Expand the solution of the find problems with the code exercise The section ends with revisiting this list, so explicitly request people keep hold of it. Add some glue text to make the list flow better --- _episodes/32-software-design.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 80a4b23db..2c20e4b74 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -129,12 +129,18 @@ We are going to be refactoring and extending this over the remainder of this epi > Think about the kinds of changes you might want to make to it, and what would > make making those changes challenging. >> ## Solution +>> You may have found others, but here are some of the things that make the code +>> hard to read, test and maintain: +>> >> * Everything is in a single function - reading it you have to understand how the file loading works at the same time as the analysis itself. >> * If I want to use the data without using the graph I'd have to change it >> * It is always analysing a fixed set of data >> * It seems hard to write tests for it as it always analyses a fixed set of files >> * It doesn't have any tests +>> +>> Keep the list you created - at the end of this section we will revisit this +>> and check that we have learnt ways to address the problems we found. > {: .solution} {: .challenge} From 77ffa14db8d8ded6357fef801e8014dbf5278ca4 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 12:15:28 +0100 Subject: [PATCH 15/82] Make the section explaining the tests clearer --- _episodes/33-refactoring-functions.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 1ec91a617..19aa456c8 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -46,19 +46,19 @@ The tricks to get around this trap are: The best tests are ones that test single bits of code rigorously. However, with this code it isn't possible to do that. + Instead we will make minimal changes to the code to make it a bit testable, for example returning the data instead of visualising it. -We will also simply observe what the outcome is, rather than trying to -test the outcome is correct. -If the behaviour is currently broken, then we don't want to inadvertently fix it. + +We will make the asserts verify whatever the outcome is currently, +rather than worrying whether that is correct. +These tests are to verify the behaviour doesn't *change* rather than to check the current behaviour is correct. As with everything in this episode, there isn't a hard and fast rule. Refactoring doesn't change behaviour, but sometimes to make it possible to verify you're not changing the important behaviour you have to make some small tweaks to write the tests at all. -* Explain techniques for writing tests for hard to test, existing code - > ## Exercise: Write regression tests before refactoring > Write a regression test to verify we don't break the code when refactoring >> ## Solution From 7f9b163e98d5cf0c29ea48db8ea09bd7b5984822 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 13:29:58 +0100 Subject: [PATCH 16/82] Add guidance to the regression test exericse --- _episodes/33-refactoring-functions.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 19aa456c8..96e230b83 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -60,7 +60,18 @@ you're not changing the important behaviour you have to make some small tweaks t the tests at all. > ## Exercise: Write regression tests before refactoring -> Write a regression test to verify we don't break the code when refactoring +> Write a regression test to verify we don't break the code when refactoring. +> You will need to modify `analyse_data` to not create a graph and instead +> return the data. +> +> Don't forget you can use the `numpy.testing` function `assert_array_equal` to +> compare arrays of floating point numbers. +> +>> ## Hint +>> You might find it helpful to assert the result, observe the test failing +>> and copy and paste the correct result into the test. +> {: .solution} +> >> ## Solution >> One approach we can take is to: >> * comment out the visualize (as this will cause our test to hang) @@ -85,13 +96,12 @@ the tests at all. >> npt.assert_array_almost_equal(result, expected_output) >> ``` >> ->> This isn't a good test: +>> Note - this isn't a good test: >> * It isn't at all obvious why these numbers are correct. >> * It doesn't test edge cases. >> * If the files change, the test will start failing. >> >> However, it allows us to guarantee we don't accidentally change the analysis output. ->> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/5000b6122e576d91c2acbc437184e00893483fdd > {: .solution} {: .challenge} From 253efda9d99f31ed73f064ad6202cccb6b368440 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 13:40:30 +0100 Subject: [PATCH 17/82] Define regression testing before using it as exercise name --- _episodes/33-refactoring-functions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 96e230b83..c801561d1 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -53,6 +53,8 @@ for example returning the data instead of visualising it. We will make the asserts verify whatever the outcome is currently, rather than worrying whether that is correct. These tests are to verify the behaviour doesn't *change* rather than to check the current behaviour is correct. +This kind of testing is called **regression testing** as we are testing for +regressions in existing behaviour. As with everything in this episode, there isn't a hard and fast rule. Refactoring doesn't change behaviour, but sometimes to make it possible to verify From 81b414fb73a7ad9a15c099d3b2dabcaf874aeda0 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 13:57:47 +0100 Subject: [PATCH 18/82] Add paragraph introducing cognitive load --- _episodes/32-software-design.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 2c20e4b74..7128ac450 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -57,6 +57,23 @@ We will look at: * How to take code that is in a bad shape and improve it. * Best practises to write code in ways that facilitate achieving these goals. +### Cognitive Load + +When we are trying to understand a piece of code, in our heads we are storing +what the different variables mean and what the lines of code will do. +**Cognitive load** is a way of thinking about how much information we have to store in our +heads to understand a piece of code. + +The higher the cognitive load, the harder it is to understand the code. +If it is too high, we might have to create diagrams to help us hold it all in our head +or we might just decide we can't understand it. + +There are lots of ways to keep cognitive load down: + +* Good variable and function names +* Simple control flow +* Having each function do just one thing + ## Abstractions An **abstraction**, at its most basic level, is a technique to hide the details @@ -78,8 +95,12 @@ Instead, you just need to understand how variables work in Python. In large projects it is vital to come up with good abstractions. A good abstraction makes code easier to read, as the reader doesn't need to understand all the details of the project to understand one part. +An abstraction lowers the cognitive load of a bit of code, +as there is less to understand at once. + A good abstraction makes code easier to test, as it can be tested in isolation from everything else. + Finally, a good abstraction makes code easier to adapt, as the details of how a subsystem *used* to work are hidden from the user, so when they change, the user doesn't need to know. From a2c5f2e6ec1cc8019b9eb3b054008d22c1dbffd7 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:11:39 +0100 Subject: [PATCH 19/82] Fix type in introduction to pure functions --- _episodes/33-refactoring-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index c801561d1..3d063881e 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -113,7 +113,7 @@ A **pure function** is a function that works like a mathematical function. That is, it takes in some inputs as parameters, and it produces an output. That output should always be the same for the same input. That is, it does not depend on any information not present in the inputs (such as global variables, databases, the time of day etc.) -Further, it should not cause any **side effects" such as writing to a file or changing a global variable. +Further, it should not cause any **side effects**, such as writing to a file or changing a global variable. You should try and have as much of the complex, analytical and mathematical code in pure functions. From 6e698236b6ec731b991083ecb4dfd5c0797e0610 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:14:14 +0100 Subject: [PATCH 20/82] Add a bit about congitive load in advantages of pure functions --- _episodes/33-refactoring-functions.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 3d063881e..4c3a77027 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -117,14 +117,21 @@ Further, it should not cause any **side effects**, such as writing to a file or You should try and have as much of the complex, analytical and mathematical code in pure functions. -Maybe something about cognitive load here? And maybe drop other two advantages til later. +By eliminating dependency on external things such as global state, we +reduce the cognitive load to understand the function. +The reader only needs to concern themselves with the input +parameters of the function and the code itself, rather than +the overall context the function is operating in. + +Similarly, a function that *calls* a pure function is also easier +to understand. +Since the function won't have any side effects, the reader needs to +only understand what the function returns, which will probably +be clear from the context in which the function is called. Pure functions have a number of advantages: * They are easy to test: you feed in inputs and get fixed outputs -* They are easy to understand: when you are reading them you have all - the information they depend on, you don't need to know what is likely to be in - a database, or what the state of a global variable is likely to be. * They are easy to re-use: because they always behave the same, you can always use them Some parts of a program are inevitably impure. From 0006a8610efcaeda272de0dee96e51c337f4d627 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:16:30 +0100 Subject: [PATCH 21/82] Explain that pure functions are easier to test ine one place --- _episodes/33-refactoring-functions.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 4c3a77027..2239190c2 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -131,7 +131,6 @@ be clear from the context in which the function is called. Pure functions have a number of advantages: -* They are easy to test: you feed in inputs and get fixed outputs * They are easy to re-use: because they always behave the same, you can always use them Some parts of a program are inevitably impure. @@ -185,10 +184,16 @@ Now we have a pure function for the analysis, we can write tests that cover all the things we would like tests to cover without depending on the data existing in CSVs. -This will make tests easier to write, but it will also make them easier to read. -The reader will not have to open up a CSV file to understand why the test is correct. +This is another advantage of pure functions - they are very well suited to automated testing. -It will also make the tests easier to maintain. +They are **easier to write** - +we construct input and assert the output +without having to think about making sure the global state is correct before or after. + +Perhaps more important, they are **easier to read** - +the reader will not have to open up a CSV file to understand why the test is correct. + +It will also make the tests **easier to maintain**. If at some point the data format is changed from CSV to JSON, the bulk of the tests won't need to be updated. From b0f48e987daa1eeed7f7e73729e820cf0172f055 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:29:09 +0100 Subject: [PATCH 22/82] Incorporate the point about reuse pure functions into main text --- _episodes/33-refactoring-functions.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 2239190c2..086e70fcb 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -129,9 +129,11 @@ Since the function won't have any side effects, the reader needs to only understand what the function returns, which will probably be clear from the context in which the function is called. -Pure functions have a number of advantages: - -* They are easy to re-use: because they always behave the same, you can always use them +This property also makes them easier to re-use as the caller +only needs to understand what parameters to provide, rather +than anything else that might need to be configured +or side effects for calling it at a time that is different +to when the original author intended. Some parts of a program are inevitably impure. Programs need to read input from the user, or write to a database. From a653eeb72b48e169a2bff6714f82f84581e58918 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:29:26 +0100 Subject: [PATCH 23/82] Highlight that the glue code is the non-pure code --- _episodes/33-refactoring-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 086e70fcb..cb9cf265c 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -137,7 +137,7 @@ to when the original author intended. Some parts of a program are inevitably impure. Programs need to read input from the user, or write to a database. -Well designed programs separate complex logic from the necessary "glue" code that interacts with users and systems. +Well designed programs separate complex logic from the necessary impure "glue" code that interacts with users and systems. This way, you have easy-to-test, easy-to-read code that contains the complex logic. And you have really simple code that just reads data from a file, or gathers user input etc, that is maybe harder to test, but is so simple that it only needs a handful of tests anyway. From cd19c7f324c5278f9684298a2dca4f06f07c19ac Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:36:07 +0100 Subject: [PATCH 24/82] USe model view presenter as alternative architecture Is more common and essentially the same as MVVM. --- _episodes/34-refactoring-architecture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index afe3398c1..ea4fde03f 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -46,7 +46,7 @@ The key thing to take away from MVC is the distinction between model code and vi > this distinction may not be possible (the code that specifies there is a button on the screen, > might be the same code that specifies what that button does). In fact, the original proposer > of MVC groups the views and the controller into a single element, called the tool. Other modern -> architectures like Model-ViewModel-View do away with the controller and instead separate out the +> architectures like Model-View-Presenter do away with the controller and instead separate out the > layout code from a programmable view of the UI. {: .callout} From 5358305dbdd762c1204f9353ab62295e55fadc06 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:36:26 +0100 Subject: [PATCH 25/82] Add header to call out about the controller Makes the callout formatting work better --- _episodes/34-refactoring-architecture.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index ea4fde03f..3bda35284 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -41,6 +41,7 @@ are easily isolated from the more complex logic. The key thing to take away from MVC is the distinction between model code and view code. +> ## What about the controller > The view and the controller tend to be more tightly coupled and it isn't always sensible > to draw a thick line dividing these two. Depending on how the user interacts with the software > this distinction may not be possible (the code that specifies there is a button on the screen, From b0d520d2090673f43dba21222ff6f459f3eb5e0f Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:43:03 +0100 Subject: [PATCH 26/82] Provide an example for how the model should be agnostic about the view --- _episodes/34-refactoring-architecture.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 3bda35284..713276f82 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -54,8 +54,9 @@ The key thing to take away from MVC is the distinction between model code and vi The view code might be hard to test, or use libraries to draw the UI, but should not contain any complex logic, and is really just a presentation layer on top of the model. -The model, conversely, should operate quite agonistically of how a specific tool might interact with it. -For example, perhaps there currently is no way +The model, conversely, should not really care how the data is displayed. +For example, perhaps the UI always presents dates as "Monday 24th July 2023", but the model +would still store this using a `Date` rather than just that string. > ## Exercise: Identify model and view parts of the code > Looking at the code as it is, what parts should be considered "model" code From 4f727f0cbfd2cd2b56bdfa96d09c4139570fb159 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:43:17 +0100 Subject: [PATCH 27/82] Improve formatting of model/view classification exercise --- _episodes/34-refactoring-architecture.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 713276f82..0ba0c842d 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -62,10 +62,10 @@ would still store this using a `Date` rather than just that string. > Looking at the code as it is, what parts should be considered "model" code > and what parts should be considered "view" code? >> ## Solution ->> * The computation of the standard deviation is model code ->> * Reading the data is also model code. ->> * The display of the output as a graph is the view code. ->> * The controller is the logic that processes what flags the user has provided. +>> * The computation of the standard deviation is **model** code +>> * Reading the data from the CSV is also **model** code. +>> * The display of the output as a graph is the **view** code. +>> * The logic that processes the supplied flats is the **controller**. > {: .solution} {: .challenge} From 13b7df26d749efdbf43f3fe4b704024271f8329f Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:58:22 +0100 Subject: [PATCH 28/82] Emphasise the connection to the last episode --- _episodes/34-refactoring-architecture.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 0ba0c842d..3ed4c120b 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -70,8 +70,8 @@ would still store this using a `Date` rather than just that string. {: .challenge} Within the model there is further separation that makes sense. -For example, as discussed, separating out the code that interacts with file systems from -the calculations is sensible. +For example, as we did in the last episode, separating out the impure code that interacts with file systems from +the pure calculations is helps with readability and testability. Nevertheless, the MVC approach is a great starting point when thinking about how you should structure your code. > ## Exercise: Split out the model code from the view code From a541c889a5caf9025fc8384af081c5093082a6f7 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:58:45 +0100 Subject: [PATCH 29/82] Improve clarity of first exercise in the MVC section --- _episodes/34-refactoring-architecture.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 3ed4c120b..525f4b0ed 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -59,8 +59,12 @@ For example, perhaps the UI always presents dates as "Monday 24th July 2023", bu would still store this using a `Date` rather than just that string. > ## Exercise: Identify model and view parts of the code -> Looking at the code as it is, what parts should be considered "model" code -> and what parts should be considered "view" code? +> Looking at the code inside `compute_data.py`, +> +> * What parts should be considered **model** code +> * What parts should be considered **view** code? +> * What parts should be considered **controller** code? +> >> ## Solution >> * The computation of the standard deviation is **model** code >> * Reading the data from the CSV is also **model** code. From e8375d4cbda57fbd830a5e58cda1d1bb49d55a33 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:05:24 +0100 Subject: [PATCH 30/82] Improve readability of the second exercise from the MVC section --- _episodes/34-refactoring-architecture.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 525f4b0ed..cc1392d38 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -79,8 +79,9 @@ the pure calculations is helps with readability and testability. Nevertheless, the MVC approach is a great starting point when thinking about how you should structure your code. > ## Exercise: Split out the model code from the view code -> Refactor the code to have the model code separated from -> the view code. +> Refactor `analyse_data` such the *view* code we identified in the last +> exercise is removed from the function, so the function contains only +> *model* code, and the *view* code is moved elsewhere. >> ## Solution >> The idea here is to have `analyse_data` to not have any "view" considerations. >> That is, it should just compute and return the data. @@ -110,7 +111,10 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> views.visualize(graph_data) >> return >> ``` ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/97fd04b747a6491c2590f34384eed44e83a8e73c +>> You might notice this is more-or-less the change we did to write our +>> regression test. +>> This demonstrates that splitting up model code from view code can +>> immediately make your code much more testable. > {: .solution} {: .challenge} From 17f03957dd0787ee0cb3b4991bb8c104acf88c41 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:05:34 +0100 Subject: [PATCH 31/82] Tightening up concluding paragraph --- _episodes/34-refactoring-architecture.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index cc1392d38..2f1e3d473 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -120,8 +120,8 @@ Nevertheless, the MVC approach is a great starting point when thinking about how ## Programming patterns -MVC is a **programming pattern**, which is a template for structuring code. -Patterns are useful starting point for how to design your software. +MVC is a **programming pattern**. Programming patterns are templates for structuring code. +Patterns are a useful starting point for how to design your software. They also work as a common vocabulary for discussing software designs with other developers. From 9c8ec84fb337239af5ccde9a29301b1e387cec38 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:53:01 +0100 Subject: [PATCH 32/82] Fix semantic break in section about constructors --- _episodes/35-refactoring-decoupled-units.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 3aae044d4..2cf2710f1 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -84,8 +84,8 @@ You can then **construct** a class elsewhere in your code by doing the following my_class = MyClass() ``` -When you construct a class in this ways, its **construtor** is called. It is possible -to pass in values to the constructor that configure the class: +When you construct a class in this ways, the classes **construtor** is called. +It is possible to pass in values to the constructor that configure the class: ```python class Circle: From 19f07b341385d3842c7769c0e18067a6672800fe Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:53:18 +0100 Subject: [PATCH 33/82] Correct code sample to use write capitalisation for math.pi --- _episodes/35-refactoring-decoupled-units.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 2cf2710f1..6a9c96b7d 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -107,10 +107,12 @@ Classes can also have methods defined on them. Like constructors, they have an special `self` parameter that must come first. ```python +import math + class Circle: ... def get_area(self): - return Math.PI * self.radius * self.radius + return math.pi * self.radius * self.radius ... print(my_circle.get_area()) ``` From 7f6023087be77810c84edd4881fa2c69c5fba623 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:53:33 +0100 Subject: [PATCH 34/82] Add examples for invariants and encapsulation --- _episodes/35-refactoring-decoupled-units.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 6a9c96b7d..846279c92 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -124,8 +124,9 @@ Then the method can access the **member variable** `radius`. Classes have a number of uses. * Encapsulating data - such as grouping three numbers together into a Vector class -* Maintaining invariants - TODO an example here would be good -* Encapsulating behaviour - such as a class that csha +* Maintaining invariants - perhaps when storing a file path it only makes sense for that to resolve to a valid file - by storing the string in a class with a method for setting it (a **setter**), that method can validate the new value before updating the value. +* Encapsulating behaviour - such as a class representing a UI state, modifying some value will automatically + force the relevant portion of the UI to be updated. > ## Exercise: Use a class to configure loading > Put your function as a member method of a class, separating out the configuration From 66a904c1c1dfb03db68428257f3a60756409b247 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:54:01 +0100 Subject: [PATCH 35/82] Add a callout about why maintaining invariants is good This was too much text to include in the bullet point about using classes to maintain invariants, but might be useful context. --- _episodes/35-refactoring-decoupled-units.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 846279c92..cc956fa4c 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -128,6 +128,15 @@ Classes have a number of uses. * Encapsulating behaviour - such as a class representing a UI state, modifying some value will automatically force the relevant portion of the UI to be updated. +> ## Maintaining Invariants +> Maintaining invariants can be a really powerful tool in debugging. +> Without invariants, you can find bugs where some data is in an invalid +> state, but the problem only appears when you try to use the data. +> This makes it hard to track down the cause of the bug. +> By using classes to maintain invariants, you can force the issue +> to appear when the invalid data is set, that is, the source of the bug. +{: .callout} + > ## Exercise: Use a class to configure loading > Put your function as a member method of a class, separating out the configuration > of where to load the files from in the constructor, from where it actually loads the data. From 0f68b3ea3b444b73dc4430d0597d698d6dcc47e8 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:14:56 +0100 Subject: [PATCH 36/82] Improve the class loading exercise content --- _episodes/35-refactoring-decoupled-units.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index cc956fa4c..bdbe87742 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -138,11 +138,14 @@ Classes have a number of uses. {: .callout} > ## Exercise: Use a class to configure loading -> Put your function as a member method of a class, separating out the configuration -> of where to load the files from in the constructor, from where it actually loads the data. +> Put the `load_inflammation_data` function we wrote in the last exercise as a member method +> of a new class called `CSVDataSource`. +> Put the configuration of where to load the files in the classes constructor. > Once this is done, you can construct this class outside the the statistical analysis -> and pass it in. +> and pass the instance in to `analyse_data`. >> ## Solution +>> You should have created a class that looks something like this: +>> >> ```python >> class CSVDataSource: >> """ @@ -150,7 +153,6 @@ Classes have a number of uses. >> """ >> def __init__(self, dir_path): >> self.dir_path = dir_path ->> super().__init__() >> >> def load_inflammation_data(self): >> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.csv')) @@ -159,8 +161,7 @@ Classes have a number of uses. >> data = map(models.load_csv, data_file_paths) >> return list(data) >> ``` ->> We can now pass an instance of this class into the the statistical analysis function, ->> constructing the object in the controller code. +>> We can now pass an instance of this class into the the statistical analysis function. >> This means that should we want to re-use the analysis it wouldn't be fixed to reading >> from a directory of CSVs. >> We have "decoupled" the reading of the data from the statistical analysis. From 0c8817a91dff6b501f7bbbfb2cb06c931447d5a3 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:39:05 +0100 Subject: [PATCH 37/82] Add the controller modifications to the solution --- _episodes/35-refactoring-decoupled-units.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index bdbe87742..76c93ee66 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -308,6 +308,7 @@ That is, we have decoupled the job of loading the data from the job of analysing > Finally, at run time provide an instance of the new implementation if the user hasn't > put any files on the path. >> ## Solution +>> You should have created a class that looks something like: >> ```python >> class UserProvidSpecificFilesDataSource(InflammationDataSource): >> def load_inflammation_data(self): @@ -325,6 +326,17 @@ That is, we have decoupled the job of loading the data from the job of analysing >> data = map(models.load_csv, paths) >> return list(data) >> ``` +>> Additionally, in the controller will need to select the appropriate DataSource to +>> provide to the analysis: +>>```python +>> if len(InFiles) == 0: +>> data_source = UserProvidSpecificFilesDataSource() +>> else: +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> data_result = analyse_data(data_source) +>>``` +>> As you have seen, all these changes were made without modifying +>> the analysis code itself. > {: .solution} {: .challenge} From 79244621a6abd0a8be3f01c6506c6e125a0107c9 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:57:49 +0100 Subject: [PATCH 38/82] Fix spelling type in exercise title --- _episodes/35-refactoring-decoupled-units.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 76c93ee66..b059c9910 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -360,7 +360,7 @@ Then we specify a method that we want to behave a specific way. Now whenever you call `mock_version.method_to_mock()` the return value will be `42`. -> ## Exercise: Test using a mock or dummy implemenation +> ## Exercise: Test using a mock or dummy implementation > Create a mock for the `InflammationDataSource` that returns some fixed data to test > the `analyse_data` method. > Use this mock in a test. From 13bab516341dcd52a75e9056ead6c33169b9881f Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:58:11 +0100 Subject: [PATCH 39/82] Small fixes to flow of text in oop section --- _episodes/35-refactoring-decoupled-units.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index b059c9910..4e75ef20a 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -383,9 +383,9 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` Using classes, particularly when using polymorphism, are techniques that come from **object oriented programming** (frequently abbreviated to OOP). As with functional programming different programming languages will provide features to enable you -to write object oriented programming. +to write object oriented code. For example, in Python you can create classes, and use polymorphism to call the -correct method on an instance (e.g when we called `get_area` on a shape, the appropriate `get_area` was called.) +correct method on an instance (e.g when we called `get_area` on a shape, the appropriate `get_area` was called). Object oriented programming also includes **information hiding**. In this, certain fields might be marked private to a class, @@ -393,10 +393,12 @@ preventing them from being modified at will. This can be used to maintain invariants of a class (such as insisting that a circles radius is always non-negative). -There is also inheritance, which allows classes to specialise the behaviour of other classes by **inheriting** from +There is also inheritance, which allows classes to specialise +the behaviour of other classes by **inheriting** from another class and **overriding** certain methods. -As with functional programming, there are times when object oriented programming is well suited, and times where it is not. +As with functional programming, there are times when +object oriented programming is well suited, and times where it is not. Good uses: From 3f3ecd38088f5084c900d7c369ec3080bb0fd361 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:58:32 +0100 Subject: [PATCH 40/82] Make the using classes in functional programming a callout --- _episodes/35-refactoring-decoupled-units.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 4e75ef20a..f4dc532b6 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -410,14 +410,17 @@ One downside of OOP is ending up with very large classes that contain complex me As they are methods on the class, it can be hard to know up front what side effects it causes to the class. This can make maintenance hard. -Grouping data together into logical structures (such as three numbers into a vector) is a vital step in writing -readable and maintainable code. -However, when using classes in this way it is best for them to be immutable (can't be changed) -It is worth noting that you can use classes to group data together - a very useful feature that you should be using everywhere - - does not you can't be practising functional programming: - -You can still have classes, and these classes might have read-only methods on (such as the `get_area` we defined for shapes) -but then still have your complex logic operate on +> ## Classes and functional programming +> Using classes is compatible with functional programming. +> In fact, grouping data into logical structures (such as three numbers into a vector) +> is a vital step in writing readable and maintainable code with any approach. +> However, when writing in a functional style, classes should be immutable. +> That is, the methods they provide are read-only. +> If you require the class to be different, you'd create a new instance +> with the new values. +> (that is, the functions should not modify the state of the class). +{: .callout} + Don't use features for the sake of using features. Code should be as simple as it can be, but not any simpler. From bd41da7b9aadcdcb5fd93276abb3326d62090803 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 17:16:32 +0100 Subject: [PATCH 41/82] Fixed incorrect usage of episode --- _episodes/36-yagni.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md index fec77e860..733492667 100644 --- a/_episodes/36-yagni.md +++ b/_episodes/36-yagni.md @@ -16,7 +16,7 @@ keypoints: ## Introduction -In this episode we have explored a range of techniques for architecting code: +In this section we have explored a range of techniques for architecting code: * Using pure functions assembled into pipelines to perform analysis * Using established patterns to discuss design From 2bea011c53a203034aa02b102e9eed5be889cc29 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 17:17:01 +0100 Subject: [PATCH 42/82] Add diagram for solution to the architecture exercise --- _episodes/36-yagni.md | 23 ++------------------ fig/example-architecture-daigram.mermaid.txt | 18 +++++++++++++++ fig/example-architecture-diagram.svg | 1 + 3 files changed, 21 insertions(+), 21 deletions(-) create mode 100644 fig/example-architecture-daigram.mermaid.txt create mode 100644 fig/example-architecture-diagram.svg diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md index 733492667..caecc596f 100644 --- a/_episodes/36-yagni.md +++ b/_episodes/36-yagni.md @@ -53,28 +53,9 @@ the kinds of information different parts of the system will need. > the software automatically pulls it down and updates the analysis. > The new result should be added to a database with a timestamp. > An email should then be sent to a group email notifying them of the change."* -> -> TODO: this doesn't generate a very interesting diagram -> >> ## Solution ->> An example design for the hypothetical problem. (TODO: incomplete) ->> ```mermaid -graph TD - A[(GDrive Folder)] - B[(Database)] - C[GDrive Monitor] - C -- Checks periodically--> A - D[Download inflammation data] - C -- Trigger update --> D - E[Parse inflammation data] - D --> E - F[Perform analysis] - E --> F - G[Upload analysis] - F --> G - G --> B - H[Notify users] ->> ``` +>> +>> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg) > {: .solution} {: .challenge} diff --git a/fig/example-architecture-daigram.mermaid.txt b/fig/example-architecture-daigram.mermaid.txt new file mode 100644 index 000000000..c3ab99112 --- /dev/null +++ b/fig/example-architecture-daigram.mermaid.txt @@ -0,0 +1,18 @@ +graph TD + A[(GDrive Folder)] + B[(Database)] + C[GDrive Monitor] + C -- Checks periodically--> A + D[Download inflammation data] + C -- Trigger update --> D + E[Parse inflammation data] + D --> E + F[Perform analysis] + E --> F + G[Upload analysis] + F --> G + G --> B + H[Notify users] + I[Monitor database] + I -- Check periodically --> B + I --> H diff --git a/fig/example-architecture-diagram.svg b/fig/example-architecture-diagram.svg new file mode 100644 index 000000000..02a7ecceb --- /dev/null +++ b/fig/example-architecture-diagram.svg @@ -0,0 +1 @@ +
Checks periodically
Trigger update
Check periodically
GDrive Folder
Database
GDrive Monitor
Download inflammation data
Parse inflammation data
Perform analysis
Upload analysis
Notify users
Monitor database
\ No newline at end of file From 10aa3fd8825a50d2c606e7a997bd8e782152f541 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 17:25:09 +0100 Subject: [PATCH 43/82] Remove redundant see this commit text The solution now contains the code --- _episodes/33-refactoring-functions.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index cb9cf265c..8d4cd01d7 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -178,7 +178,6 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> # views.visualize(graph_data) >> return daily_standard_deviation >>``` ->> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 > {: .solution} {: .challenge} From 33620670f144d4b11dfb8b0fe678777424d8d721 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 17:27:15 +0100 Subject: [PATCH 44/82] Correct broken links in extras That said, the persistence one might depended on the code written in the original version of section 3. Need to decide what to do about that --- _extras/databases.md | 2 +- _extras/persistence.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_extras/databases.md b/_extras/databases.md index b4bc67a65..2be010dbe 100644 --- a/_extras/databases.md +++ b/_extras/databases.md @@ -16,7 +16,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../36-architecture-revisited/index.html#additional-material). +> [Section 3 on software design and development](../36-yagni/index.html). {: .callout} A **database** is an organised collection of data, diff --git a/_extras/persistence.md b/_extras/persistence.md index ab0379062..6fa8fe449 100644 --- a/_extras/persistence.md +++ b/_extras/persistence.md @@ -25,7 +25,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../36-architecture-revisited/index.html#additional-material). +> [Section 3 on software design and development](../36-yagni/index.html). {: .callout} Our patient data system so far can read in some data, process it, and display it to people. From f766d7024a809afc8058eedeb5004add894dbc6e Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:44:58 +0100 Subject: [PATCH 45/82] Add initial timings for episodes --- _episodes/32-software-design.md | 4 ++-- _episodes/33-refactoring-functions.md | 4 ++-- _episodes/34-refactoring-architecture.md | 4 ++-- _episodes/35-refactoring-decoupled-units.md | 4 ++-- _episodes/36-yagni.md | 4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 7128ac450..3b6338758 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -1,7 +1,7 @@ --- title: "Software Architecture and Design" -teaching: 0 -exercises: 0 +teaching: 25 +exercises: 20 questions: - "What should we consider when designing software?" - "What goals should we have when structuring our code?" diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 8d4cd01d7..6e9f317e7 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -1,7 +1,7 @@ --- title: "Refactoring functions to do just one thing" -teaching: 0 -exercises: 0 +teaching: 30 +exercises: 20 questions: - "How do you refactor code without breaking it?" - "How do you write code that is easy to test?" diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 2f1e3d473..a57c8541f 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -1,7 +1,7 @@ --- title: "Architecting code to separate responsibilities" -teaching: 0 -exercises: 0 +teaching: 4 +exercises: 25 questions: - "What is the point of the MVC architecture" - "How should code be structured" diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index f4dc532b6..d0bcca438 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -1,7 +1,7 @@ --- title: "Using classes to de-couple code." -teaching: 0 -exercises: 0 +teaching: 35 +exercises: 55 questions: - "What is de-coupled code?" - "When is it useful to use classes to structure code?" diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md index caecc596f..9dff05f9a 100644 --- a/_episodes/36-yagni.md +++ b/_episodes/36-yagni.md @@ -1,7 +1,7 @@ --- title: "When to abstract, and when not to." -teaching: 0 -exercises: 0 +teaching: 10 +exercises: 25 questions: - "How to tell what is and isn't an appropriate abstraction." - "How to design larger solutions." From 4343b1d952d8b45b83f7812c5ba6f79bd8e76f39 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:50:40 +0100 Subject: [PATCH 46/82] Ensure model test is agnostic as to where it is run from --- _episodes/33-refactoring-functions.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 6e9f317e7..7da2d9e30 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -83,10 +83,11 @@ the tests at all. >> >> ```python >> import numpy.testing as npt +>> from pathlib import Path >> >> def test_compute_data(): >> from inflammation.compute_data import analyse_data ->> path = 'data/' +>> path = Path.cwd() / "../data" >> result = analyse_data(path) >> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 >> ,0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094 From 515ed184fbc1a0aa26ba0825f1cb645c409970c1 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:51:27 +0100 Subject: [PATCH 47/82] Correct example function name --- _episodes/33-refactoring-functions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 7da2d9e30..65429822e 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -152,7 +152,7 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> You can move all of the code that does the analysis into a separate function that >> might look something like this: >> ```python ->> def compute_standard_deviation_by_data(all_loaded_data): +>> def compute_standard_deviation_by_day(all_loaded_data): >> means_by_day = map(models.daily_mean, all_loaded_data) >> means_by_day_matrix = np.stack(list(means_by_day)) >> @@ -171,7 +171,7 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> if len(data_file_paths) == 0: >> raise ValueError(f"No inflammation csv's found in path {data_dir}") >> data = map(models.load_csv, data_file_paths) ->> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> daily_standard_deviation = compute_standard_deviation_by_day(data) >> >> graph_data = { >> 'standard deviation by day': daily_standard_deviation, @@ -213,7 +213,7 @@ won't need to be updated. >> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) >>], >>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) ->>def test_compute_standard_deviation_by_data(data, expected_output): +>>def test_compute_standard_deviation_by_day(data, expected_output): >> from inflammation.compute_data import compute_standard_deviation_by_data >> >> result = compute_standard_deviation_by_data(data) From 51cf100618a01b50f293ef654e5ff67022e083fe Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:51:59 +0100 Subject: [PATCH 48/82] Fixing spelling mistakes in refactoring functions exercise --- _episodes/33-refactoring-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 65429822e..069190b3b 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -204,7 +204,7 @@ won't need to be updated. > Add tests that check for when there is only one file with multiple rows, multiple files with one row > and any other cases you can think of that should be tested. >> ## Solution ->> You might hev throught of more tests, but we can easily extend the test by parameterizing +>> You might have thought of more tests, but we can easily extend the test by parametrizing >> with more inputs and expected outputs: >> ```python >>@pytest.mark.parametrize('data,expected_output', [ From a79d3940ed94acf22550624ce60433635a680058 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:55:44 +0100 Subject: [PATCH 49/82] Make sure there is an import for the Mock class --- _episodes/35-refactoring-decoupled-units.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index d0bcca438..098405e98 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -350,6 +350,8 @@ An convenient way to do this in Python is using Mocks. These are a whole topic to themselves - but a basic mock can be constructed using a couple of lines of code: ```python +from unittest.mock import Mock + mock_version = Mock() mock_version.method_to_mock.return_value = 42 ``` From c4e11463e4924938d9aabe398cfb5370e2f06e25 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:16:30 +0100 Subject: [PATCH 50/82] Cover the changes needed to the regression test with the class refactor --- _episodes/35-refactoring-decoupled-units.md | 24 ++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 098405e98..358f8c0e5 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -177,9 +177,27 @@ Classes have a number of uses. >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) >> data_result = analyse_data(data_source) >> ``` ->> Note in all these refactorings the behaviour is unchanged, ->> so we can still run our original tests to ensure we've not ->> broken anything. +>> While the behaviour is unchanged, how we call `analyse_data` has changed. +>> We must update our regression test to match this, to ensure we haven't broken the code: +>> ```python +>> ... +>> def test_compute_data(): +>> from inflammation.compute_data import analyse_data +>> path = Path.cwd() / "../data" +>> data_source = CSVDataSource(path) +>> result = analyse_data(data_source) +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 +>> ... +>> ``` +>> If this was a more complex refactoring, we could introduce an indirection to keep +>> the interface the same: +>> ```python +>> def analyse_data(dir_path): +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> return analyse_data_from_source(data_source) +>> ``` +>> This can be a really useful intermediate step if `analyse_data` is called +>> from lots of different places. > {: .solution} {: .challenge} From a77b9d62dfd5c197ca0e22910b7cdefc4dad59ac Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:18:10 +0100 Subject: [PATCH 51/82] Link decoupling to abstractions --- _episodes/35-refactoring-decoupled-units.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 358f8c0e5..b26a1ef7f 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -33,6 +33,10 @@ allows for more maintainable code: * Loose coupled code tends to be easier to maintain, as changes can be isolated from other parts of the code. +Introducing **abstractions** is a way to decouple code. +If one part of the code only uses another part through an appropriate abstraction +then it becomes easier for these parts to change independently. + > ## Exercise: Decouple the file loading from the computation > Currently the function is hard coded to load all the files in a directory > Decouple this into a separate function that returns all the files to load From dfdb1a95248aac76a7a7db0fc319c600912c5471 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:21:06 +0100 Subject: [PATCH 52/82] Make consistent use of first/second person --- _episodes/32-software-design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 3b6338758..f1ba7bfd9 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -155,7 +155,7 @@ We are going to be refactoring and extending this over the remainder of this epi >> >> * Everything is in a single function - reading it you have to understand how the file loading works at the same time as the analysis itself. ->> * If I want to use the data without using the graph I'd have to change it +>> * If you want to use the data without using the graph you'd have to change it >> * It is always analysing a fixed set of data >> * It seems hard to write tests for it as it always analyses a fixed set of files >> * It doesn't have any tests From aebc099ab00fcbd0aee84eeed2a980099aa8a1d6 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:24:58 +0100 Subject: [PATCH 53/82] Ensure each problem links to a specific part of maintainable code --- _episodes/32-software-design.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index f1ba7bfd9..bdc571c83 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -153,12 +153,11 @@ We are going to be refactoring and extending this over the remainder of this epi >> You may have found others, but here are some of the things that make the code >> hard to read, test and maintain: >> ->> * Everything is in a single function - reading it you have to understand how the file loading +>> * **Hard to read:** Everything is in a single function - reading it you have to understand how the file loading works at the same time as the analysis itself. ->> * If you want to use the data without using the graph you'd have to change it ->> * It is always analysing a fixed set of data ->> * It seems hard to write tests for it as it always analyses a fixed set of files ->> * It doesn't have any tests +>> * **Hard to modify:** If you want to use the data without using the graph you'd have to change it +>> * **Hard to modify or test:** It is always analysing a fixed set of data stored on the disk +>> * **Hard to modify:** It doesn't have any tests meaning changes might break something >> >> Keep the list you created - at the end of this section we will revisit this >> and check that we have learnt ways to address the problems we found. From a6e502afd98497c3918bbb3826b1157cd1b58b8a Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:25:32 +0100 Subject: [PATCH 54/82] Improve grammar of exercise solution --- _episodes/32-software-design.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index bdc571c83..700813f5c 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -159,7 +159,8 @@ works at the same time as the analysis itself. >> * **Hard to modify or test:** It is always analysing a fixed set of data stored on the disk >> * **Hard to modify:** It doesn't have any tests meaning changes might break something >> ->> Keep the list you created - at the end of this section we will revisit this +>> Keep the list you have created. +>> At the end of this section we will revisit this >> and check that we have learnt ways to address the problems we found. > {: .solution} {: .challenge} From 35fd32d3bcc07d935b008e0ee0a3044cb01d4ee8 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:41:51 +0100 Subject: [PATCH 55/82] Move MVC stuff after the classes section --- ...oring-decoupled-units.md => 34-refactoring-decoupled-units.md} | 0 ...refactoring-architecture.md => 35-refactoring-architecture.md} | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename _episodes/{35-refactoring-decoupled-units.md => 34-refactoring-decoupled-units.md} (100%) rename _episodes/{34-refactoring-architecture.md => 35-refactoring-architecture.md} (100%) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md similarity index 100% rename from _episodes/35-refactoring-decoupled-units.md rename to _episodes/34-refactoring-decoupled-units.md diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md similarity index 100% rename from _episodes/34-refactoring-architecture.md rename to _episodes/35-refactoring-architecture.md From d1c3491ad68cda9d60375784cab7d6295ae5b144 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:53:57 +0100 Subject: [PATCH 56/82] Combine YAGNI section into the MVC section Is really all about high level architecture --- _episodes/35-refactoring-architecture.md | 117 +++++++++++++++++++-- _episodes/36-yagni.md | 128 ----------------------- 2 files changed, 111 insertions(+), 134 deletions(-) delete mode 100644 _episodes/36-yagni.md diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index a57c8541f..6c72b6fd9 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -1,17 +1,19 @@ --- title: "Architecting code to separate responsibilities" -teaching: 4 -exercises: 25 +teaching: 15 +exercises: 50 questions: - "What is the point of the MVC architecture" -- "How should code be structured" +- "How to design larger solutions." +- "How to tell what is and isn't an appropriate abstraction." objectives: - "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." -- "Understand the MVC pattern and how to apply it." -- "Understand the benefits of using patterns" +- "How to design large changes to the codebase." +- "Understand how to determine correct abstractions. " keypoints: - "By splitting up the \"view\" code from \"model\" code, you allow easier re-use of code." -- "Using coding patterns can be useful inspirations for how to structure your code." +- "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." +- "Sketching a diagram of the code can clarify how it is supposed to work, and troubleshoot problems early." --- @@ -139,4 +141,107 @@ However, they cannot replace a full design as most problems will require a bespoke design that maps cleanly on to the specific problem you are trying to solve. +## Architecting larger changes + +When creating a new application, or creating a substantial change to an existing one, +it can be really helpful to sketch out the intended architecture on a whiteboard +(pen and paper works too, though of course it might get messy as you iterate on the design!). + +The basic idea is you draw boxes that will represent different units of code, as well as +other components of the system (such as users, databases etc). +Then connect these boxes with lines where information or control will be exchanged. +These lines represent the interfaces in your system. + +As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. +For example, if there is a circular dependency between two sections of the design. +It can also help with estimating how long the work will take, as it forces you to consider all the components that +need to be made. + +Diagrams aren't foolproof, and often the stuff we haven't considered won't make it on to the diagram +but they are a great starting point to break down the different responsibilities and think about +the kinds of information different parts of the system will need. + + +> ## Exercise: Design a high-level architecture +> Sketch out a design for a new feature requested by a user +> +> *"I want there to be a Google Drive folder that when I upload new inflammation data to +> the software automatically pulls it down and updates the analysis. +> The new result should be added to a database with a timestamp. +> An email should then be sent to a group email notifying them of the change."* +>> ## Solution +>> +>> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg) +> {: .solution} +{: .challenge} + +## An abstraction too far + +So far we have seen how abstractions are good for making code easier to read, maintain and test. +However, it is possible to introduce too many abstractions. + +> All problems in computer science can be solved by another level of indirection except the problem of too many levels of indirection + +When you introduce an abstraction, if the reader of the code needs to understand what is happening inside the abstraction, +it has actually made the code *harder* to read. +When code is just in the function, it can be clear to see what it is doing. +When the code is calling out to an instance of a class that, thanks to polymorphism, could be a range of possible implementations, +the only way to find out what is *actually* being called is to run the code and see. +This is much slower to understand, and actually obfuscates meaning. + +It is a judgement as to whether you have make the code too abstract. +If you have to jump around a lot when reading the code that is a clue that is too abstract. +Similarly, if there are two parts of the code that always need updating together, that is +again an indication of an incorrect or over-zealous abstraction. + + +## You Ain't Gonna Need It + +There are different approaches to designing software. +One principle that is popular is called You Ain't Gonna Need it - "YAGNI" for short. +The idea is that, since it is hard to predict the future needs of a piece of software, +it is always best to design the simplest solution that solves the problem at hand. +This is opposed to trying to imagine how you might want to adapt the software in future +and designing the code with that in mind. + +Then, since you know the problem you are trying to solve, you can avoid making your solution unnecessarily complex or abstracted. + +In our example, it might be tempting to abstract how the `CSVDataSource` walks the file tree into a class. +However, since we only have one strategy for exploring the file tree, this would just create indirection for the sake of it +- now a reader of CSVDataSource would have to read a different class to find out how the tree is walked. +Maybe in the future this is something that needs to be customised, but we haven't really made it any harder to do by *not* doing this prematurely +and once we have the concrete feature request, it will be easier to design it appropriately. + +> All of this is a judgement. +> For example, in this case, perhaps it *would* make sense to at least pull the file parsing out into a separate +> class, but not have the CSVDataSource be configurable. +> That way, it is clear to see how the file tree is being walked (there's no polymorphism going on) +> without mixing the *parsing* code in with the file finding code. +> There are no right answers, just guidelines. +{: .callout} + +> ## Exercise: Applying to real world examples +> Thinking about the examples of good and bad code you identified at the start of the episode. +> Identify what kind of principles were and weren't being followed +> Identify some refactorings that could be performed that would improve the code +> Discuss the ideas as a group. +{: .challenge} + +## Conclusion + +Good architecture is not about applying any rules blindly, but instead practise and taking care around important things: + +* Avoid duplication of code or data. +* Keeping how much a person has to understand at once to a minimum. +* Think about how interfaces will work. +* Separate different considerations into different sections of the code. +* Don't try and design a future proof solution, focus on the problem at hand. + +Practise makes perfect. +One way to practise is to consider code that you already have and think how it might be redesigned. +Another way is to always try to leave code in a better state that you found it. +So when you're working on a less well structured part of the code, start by refactoring it so that your change fits in cleanly. +Doing this, over time, with your colleagues, will improve your skills as software architecture as well as improving the code. + + {% include links.md %} diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md deleted file mode 100644 index 9dff05f9a..000000000 --- a/_episodes/36-yagni.md +++ /dev/null @@ -1,128 +0,0 @@ ---- -title: "When to abstract, and when not to." -teaching: 10 -exercises: 25 -questions: -- "How to tell what is and isn't an appropriate abstraction." -- "How to design larger solutions." -objectives: -- "Understand how to determine correct abstractions. " -- "How to design large changes to the codebase." -keypoints: -- "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." -- "The best code is simple to understand and test, not the most clever or uses advanced language features." -- "Sketching a diagram of the code can clarify how it is supposed to work, and troubleshoot problems early." ---- - -## Introduction - -In this section we have explored a range of techniques for architecting code: - - * Using pure functions assembled into pipelines to perform analysis - * Using established patterns to discuss design - * Separating different considerations, such as how data is presented from how it is stored - * Using classes to create abstractions - -None of these techniques are always applicable, and they are not sufficient to design a good technical solution. - -## Architecting larger changes - -When creating a new application, or creating a substantial change to an existing one, -it can be really helpful to sketch out the intended architecture on a whiteboard -(pen and paper works too, though of course it might get messy as you iterate on the design!). - -The basic idea is you draw boxes that will represent different units of code, as well as -other components of the system (such as users, databases etc). -Then connect these boxes with lines where information or control will be exchanged. -These lines represent the interfaces in your system. - -As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. -For example, if there is a circular dependency between two sections of the design. -It can also help with estimating how long the work will take, as it forces you to consider all the components that -need to be made. - -Diagrams aren't foolproof, and often the stuff we haven't considered won't make it on to the diagram -but they are a great starting point to break down the different responsibilities and think about -the kinds of information different parts of the system will need. - - -> ## Exercise: Design a high-level architecture -> Sketch out a design for a new feature requested by a user -> -> *"I want there to be a Google Drive folder that when I upload new inflammation data to -> the software automatically pulls it down and updates the analysis. -> The new result should be added to a database with a timestamp. -> An email should then be sent to a group email notifying them of the change."* ->> ## Solution ->> ->> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg) -> {: .solution} -{: .challenge} - -## An abstraction too far - -So far we have seen how abstractions are good for making code easier to read, maintain and test. -However, it is possible to introduce too many abstractions. - -> All problems in computer science can be solved by another level of indirection except the problem of too many levels of indirection - -When you introduce an abstraction, if the reader of the code needs to understand what is happening inside the abstraction, -it has actually made the code *harder* to read. -When code is just in the function, it can be clear to see what it is doing. -When the code is calling out to an instance of a class that, thanks to polymorphism, could be a range of possible implementations, -the only way to find out what is *actually* being called is to run the code and see. -This is much slower to understand, and actually obfuscates meaning. - -It is a judgement as to whether you have make the code too abstract. -If you have to jump around a lot when reading the code that is a clue that is too abstract. -Similarly, if there are two parts of the code that always need updating together, that is -again an indication of an incorrect or over-zealous abstraction. - - -## You Ain't Gonna Need It - -There are different approaches to designing software. -One principle that is popular is called You Ain't Gonna Need it - "YAGNI" for short. -The idea is that, since it is hard to predict the future needs of a piece of software, -it is always best to design the simplest solution that solves the problem at hand. -This is opposed to trying to imagine how you might want to adapt the software in future -and designing the code with that in mind. - -Then, since you know the problem you are trying to solve, you can avoid making your solution unnecessarily complex or abstracted. - -In our example, it might be tempting to abstract how the `CSVDataSource` walks the file tree into a class. -However, since we only have one strategy for exploring the file tree, this would just create indirection for the sake of it -- now a reader of CSVDataSource would have to read a different class to find out how the tree is walked. -Maybe in the future this is something that needs to be customised, but we haven't really made it any harder to do by *not* doing this prematurely -and once we have the concrete feature request, it will be easier to design it appropriately. - -> All of this is a judgement. -> For example, in this case, perhaps it *would* make sense to at least pull the file parsing out into a separate -> class, but not have the CSVDataSource be configurable. -> That way, it is clear to see how the file tree is being walked (there's no polymorphism going on) -> without mixing the *parsing* code in with the file finding code. -> There are no right answers, just guidelines. -{: .callout} - -> ## Exercise: Applying to real world examples -> Thinking about the examples of good and bad code you identified at the start of the episode. -> Identify what kind of principles were and weren't being followed -> Identify some refactorings that could be performed that would improve the code -> Discuss the ideas as a group. -{: .challenge} - -## Conclusion - -Good architecture is not about applying any rules blindly, but instead practise and taking care around important things: - -* Avoid duplication of code or data. -* Keeping how much a person has to understand at once to a minimum. -* Think about how interfaces will work. -* Separate different considerations into different sections of the code. -* Don't try and design a future proof solution, focus on the problem at hand. - -Practise makes perfect. -One way to practise is to consider code that you already have and think how it might be redesigned. -Another way is to always try to leave code in a better state that you found it. -So when you're working on a less well structured part of the code, start by refactoring it so that your change fits in cleanly. -Doing this, over time, with your colleagues, will improve your skills as software architecture as well as improving the code. From 05fee5be7fae195326221612d8b6db96d42cf7ec Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 15:11:33 +0100 Subject: [PATCH 57/82] Use consistent langauge - responsibilties - when talking about parts of code --- _episodes/35-refactoring-architecture.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 6c72b6fd9..6d6087ddd 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -19,16 +19,16 @@ keypoints: ## Introduction -Model-View-Controller (MVC) is a way of separating out different portions of a typical +Model-View-Controller (MVC) is a way of separating out different responsibilities of a typical application. Specifically we have: -* The **model** which contains the internal data representations for the program, and the valid - operations that can be performed on it. +* The **model** which is responsible for the internal data representations for the program, + and the valid operations that can be performed on it. * The **view** is responsible for how this data is presented to the user (e.g. through a GUI or by writing out to a file) -* The **controller** defines how the model can be interacted with. +* The **controller** is responsible for how the model can be interacted with. -Separating out these different sections into different parts of the code will make +Separating out these different responsibilities into different parts of the code will make the code much more maintainable. For example, if the view code is kept away from the model code, then testing the model code can be done without having to worry about how it will be presented. @@ -39,7 +39,7 @@ just one thing. It also helps with maintainability - if the UI requirements change, these changes are easily isolated from the more complex logic. -## Separating out considerations +## Separating out responsibilities The key thing to take away from MVC is the distinction between model code and view code. From dfcf219b5aad5b2882626276e369fcf472eea7bb Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 15:21:22 +0100 Subject: [PATCH 58/82] Improve the flow of the start of the classes section Introduce a problem that classes will solve. Use consistent circle example all through. Make header more accurate. Remove benifits of using classes - we are introducing a big benifit, don't want to muddy the waters with other benifits. --- _episodes/34-refactoring-decoupled-units.md | 37 +++++++-------------- 1 file changed, 12 insertions(+), 25 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index b26a1ef7f..63d059178 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -1,6 +1,6 @@ --- title: "Using classes to de-couple code." -teaching: 35 +teaching: 30 exercises: 55 questions: - "What is de-coupled code?" @@ -64,19 +64,22 @@ then it becomes easier for these parts to change independently. > {: .solution} {: .challenge} -## Using classes to encapsulate data and behaviours +Even with this change, the file loading is coupled with the data analysis. +For example, if we wave to support reading JSON files or CSV files +we would have to pass into `analyse_data` some kind of flag indicating what we want. -Abstractedly, we can talk about units of code, where we are thinking of the unit doing one "thing". -In practise, in Python there are three ways we can create defined units of code. -The first is functions, which we have used. -The next level up is **classes**. -Finally, there are also modules and packages, which we won't cover. +Instead, we would like to decouple the consideration of what data to load +from the `analyse_data`` function entirely. + +One way we can do this is to use a language feature called a **class**. + +## Using Python Classes A class is a way of grouping together data with some specific methods. In Python, you can declare a class as follows: ```python -class MyClass: +class Circle: pass ``` @@ -85,7 +88,7 @@ They are typically named using `UpperCase`. You can then **construct** a class elsewhere in your code by doing the following: ```python -my_class = MyClass() +my_circle = Circle() ``` When you construct a class in this ways, the classes **construtor** is called. @@ -125,22 +128,6 @@ Here the instance of the class, `my_circle` will be automatically passed in as the first parameter when calling `get_area`. Then the method can access the **member variable** `radius`. -Classes have a number of uses. - -* Encapsulating data - such as grouping three numbers together into a Vector class -* Maintaining invariants - perhaps when storing a file path it only makes sense for that to resolve to a valid file - by storing the string in a class with a method for setting it (a **setter**), that method can validate the new value before updating the value. -* Encapsulating behaviour - such as a class representing a UI state, modifying some value will automatically - force the relevant portion of the UI to be updated. - -> ## Maintaining Invariants -> Maintaining invariants can be a really powerful tool in debugging. -> Without invariants, you can find bugs where some data is in an invalid -> state, but the problem only appears when you try to use the data. -> This makes it hard to track down the cause of the bug. -> By using classes to maintain invariants, you can force the issue -> to appear when the invalid data is set, that is, the source of the bug. -{: .callout} - > ## Exercise: Use a class to configure loading > Put the `load_inflammation_data` function we wrote in the last exercise as a member method > of a new class called `CSVDataSource`. From 1de7c868479bd0cb68979e6b1306967367b2c74e Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:16:35 +0100 Subject: [PATCH 59/82] Use correct name for CSVDataSource --- _episodes/34-refactoring-decoupled-units.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 63d059178..5514e3820 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -304,7 +304,7 @@ to the relevant class. As we saw with the `Circle` and `Square` examples, we can use interfaces and polymorphism to provide different implementations of the same interface. -For example, we could replace our `CSVReader` with a class that reads a totally different format, +For example, we could replace our `CSVDataSource` with a class that reads a totally different format, or reads from an external service. All of these can be added in without changing the analysis. Further - if we want to write a new analysis, we can support any of these data sources @@ -352,7 +352,7 @@ That is, we have decoupled the job of loading the data from the job of analysing We can use this abstraction to also make testing more straight forward. Instead of having our tests use real file system data, we can instead provide a mock or dummy implementation of the `InflammationDataSource` that just returns some example data. -Separately, we can test the file parsing class `CSVReader` without having to understand +Separately, we can test the file parsing class `CSVDataSource` without having to understand the specifics of the statistical analysis. An convenient way to do this in Python is using Mocks. From 1f03d9ea68d3bc25e3fa37eb83a83249b65a757b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:17:52 +0100 Subject: [PATCH 60/82] Make the interfaces section not use real interfaces Since Python doesn't really have interfaces, and most of the benifits of having interfaces are not supported by Python, this needlessly complicates the lesson. Instead talking about common interfaces for different classes. --- _episodes/34-refactoring-decoupled-units.md | 120 +++++++------------- 1 file changed, 41 insertions(+), 79 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 5514e3820..6a20530ed 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -1,7 +1,7 @@ --- title: "Using classes to de-couple code." teaching: 30 -exercises: 55 +exercises: 45 questions: - "What is de-coupled code?" - "When is it useful to use classes to structure code?" @@ -200,93 +200,53 @@ These allow separate systems to communicate with each other - such as a making a to Google Maps to find the latitude and longitude of an address. However, there are internal interfaces within our software that dictate how -different units of the system interact with each other. +different parts of the system interact with each other. Even if these aren't thought out or documented, they still exist! -For example, there is an interface for how the statistical analysis in `analyse_data` -uses the class `CSVDataSource` - the method `load_inflammation_data`, how it should be called -and what it will return. +For example, our `Circle` class implicitly has an interface: +you can call `get_area` on it and it will return a number representing its area. -Interfaces are important to get right - a messy interface will force tighter coupling between -two units in the system. -Unfortunately, it would be an entire course to cover everything to consider in interface design. - -In addition to the abstract notion of an interface, many programming languages -support creating interfaces as a special kind of class. -Python doesn't support this explicitly, but we can still use this feature with -regular classes. -An interface class will define some methods, but not provide an implementation: - -```python -class Shape: - def get_area(): - raise NotImplementedError -``` - -> ## Exercise: Define an interface for your class -> As discussed, there is an interface between the CSVDataSource and the analysis. -> Write an interface(that is, a class that defines some empty methods) called `InflammationDataSource` -> that makes this interface explicit. -> Document the format the data will be returned in. +> ## Exercise: Identify the interface between `CSVDataSource` and `analyse_data` +> What is the interface that CSVDataSource has with `analyse_data`. +> Think about what functions `analyse_data` needs to be able to call, +> what parameters they need and what it will return. >> ## Solution ->> ```python ->> class InflammationDataSource: ->> """ ->> An interface for providing a series of inflammation data. ->> """ +>> The interface is the `load_inflammation_data` method. >> ->> def load_inflammation_data(self): ->> """ ->> Loads the data and returns it as a list, where each entry corresponds to one file, ->> and each entry is a 2D array with patients inflammation by day. ->> :returns: A list where each entry is a 2D array of patient inflammation results by day ->> """ ->> raise NotImplementedError ->> ``` +>> It takes no parameters. +>> +>> It returns a list where each entry is a 2D array of patient inflammation results by day +>> Any object we pass into `analyse_data` must conform to this interface. > {: .solution} {: .challenge} -An interface on its own is not useful - it cannot be instantiated. -The next step is to create a class that **implements** the interface. -That is, create a class that inherits from the interface and then provide -implementations of all the methods on the interface. -To return to our `Shape` interface, we can write classes that implement this -interface, with different implementations: +## Polymorphism -```python -class Circle(Shape): - ... - def get_area(self): - return math.pi * self.radius * self.radius +It is possible to design multiple classes that each conform to the same interface. +For example, we could provide a `Rectangle` class: + +```python class Rectangle(Shape): - ... + def __init__(self, width, height): + self.width = width + self.height = height def get_area(self): return self.width * self.height ``` -As you can see, by putting `ShapeInterface`` in brackets after the class -we are saying a `Circle` **is a** `Shape`. - -> ## Exercise: Implement the interface -> Modify the existing class to implement the interface. -> Ensure the method matches up exactly to the interface. ->> ## Solution ->> We can create a class that implements `load_inflammation_data`. ->> We can lift the code into this new class. ->> ->> ```python ->> class CSVDataSource(InflammationDataSource): ->> ``` -> {: .solution} -{: .challenge} - -## Polymorphism +Like `Circle`, this class provides a `get_area` method. +The method takes the same number of parameters (none), and returns a number. +However, the implementation is different. -Where this gets useful is by using a concept called **polymorphism** -which is a fancy way of saying we can use an instance of a class and treat -it as a `Shape`, without worrying about whether it is a `Circle` or a `Rectangle`. +When classes share an interface, then we can use an instance of a class without +knowing what specific class is being used. +When we do this, it is called **polymorphism**. +Here is an example where we create a list of shapes (either Circles or Rectangles) +and can then find the total area. +Note how we call `get_area` and Python is able to call the appropriate `get_area` +for each of the shapes. ```python my_circle = Circle(radius=10) @@ -301,8 +261,8 @@ to the relevant class. ### How polymorphism is useful -As we saw with the `Circle` and `Square` examples, we can use interfaces and polymorphism -to provide different implementations of the same interface. +As we saw with the `Circle` and `Square` examples, we can use common interfaces and polymorphism +to abstract away the details of the implementation from the caller. For example, we could replace our `CSVDataSource` with a class that reads a totally different format, or reads from an external service. @@ -313,13 +273,13 @@ That is, we have decoupled the job of loading the data from the job of analysing > ## Exercise: Introduce an alternative implementation of DataSource > Create another class that repeatedly asks the user for paths to CSVs to analyse. -> It should inherit from the interface and implement the `load_inflammation_data` method. +> It should implement the `load_inflammation_data` method. > Finally, at run time provide an instance of the new implementation if the user hasn't > put any files on the path. >> ## Solution >> You should have created a class that looks something like: >> ```python ->> class UserProvidSpecificFilesDataSource(InflammationDataSource): +>> class UserProvidSpecificFilesDataSource: >> def load_inflammation_data(self): >> paths = [] >> while(True): @@ -351,12 +311,14 @@ That is, we have decoupled the job of loading the data from the job of analysing We can use this abstraction to also make testing more straight forward. Instead of having our tests use real file system data, we can instead provide -a mock or dummy implementation of the `InflammationDataSource` that just returns some example data. +a mock or dummy implementation instead of one of the DataSource classes. +This dummy implementation could just returns some fixed example data. Separately, we can test the file parsing class `CSVDataSource` without having to understand the specifics of the statistical analysis. -An convenient way to do this in Python is using Mocks. -These are a whole topic to themselves - but a basic mock can be constructed using a couple of lines of code: +An convenient way to do this in Python is using Python's [mock object library](https://docs.python.org/3/library/unittest.mock.html). +These are a whole topic to themselves - +but a basic mock can be constructed using a couple of lines of code: ```python from unittest.mock import Mock @@ -372,7 +334,7 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > ## Exercise: Test using a mock or dummy implementation -> Create a mock for the `InflammationDataSource` that returns some fixed data to test +> Create a mock for to provide as the `data_source` that returns some fixed data to test > the `analyse_data` method. > Use this mock in a test. >> ## Solution From 0df2ed8706da25089604c78013eede3435a61383 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:20:51 +0100 Subject: [PATCH 61/82] Remove ... from example solution Since all we are omiting is the docstring, we can leave that as implict and make it clear there is no code before the load call --- _episodes/34-refactoring-decoupled-units.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 6a20530ed..fa07ae69d 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -54,8 +54,8 @@ then it becomes easier for these parts to change independently. >> This can then be used in the analysis. >> ```python >> def analyse_data(data_dir): ->> ... >> data = load_inflammation_data(data_dir) +>> daily_standard_deviation = compute_standard_deviation_by_data(data) >> ... >> ``` >> This is now easier to understand, as we don't need to understand the the file loading @@ -158,8 +158,9 @@ Then the method can access the **member variable** `radius`. >> We have "decoupled" the reading of the data from the statistical analysis. >> ```python >> def analyse_data(data_source): ->> ... >> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... >> ``` >> >> In the controller, you might have something like: From 44883039434f3ff14e380992da73dbb55c6dbe35 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:29:24 +0100 Subject: [PATCH 62/82] Ensure code samples consistent with new order Now MVC comes after classes make sure the examples in classes do not contain the changes done as part of MVC, and that the classes changes are in the MVC examples --- _episodes/34-refactoring-decoupled-units.md | 4 ++-- _episodes/35-refactoring-architecture.md | 21 +++++++++++---------- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index fa07ae69d..538c9ca5c 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -167,7 +167,7 @@ Then the method can access the **member variable** `radius`. >> >> ```python >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) ->> data_result = analyse_data(data_source) +>> analyse_data(data_source) >> ``` >> While the behaviour is unchanged, how we call `analyse_data` has changed. >> We must update our regression test to match this, to ensure we haven't broken the code: @@ -303,7 +303,7 @@ That is, we have decoupled the job of loading the data from the job of analysing >> data_source = UserProvidSpecificFilesDataSource() >> else: >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) ->> data_result = analyse_data(data_source) +>> analyse_data(data_source) >>``` >> As you have seen, all these changes were made without modifying >> the analysis code itself. diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 6d6087ddd..fca83749e 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -94,10 +94,7 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> Gets all the inflammation csvs within a directory, works out the mean >> inflammation value for each day across all datasets, then graphs the >> standard deviation of these means.""" ->> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) ->> if len(data_file_paths) == 0: ->> raise ValueError(f"No inflammation csv's found in path {data_dir}") ->> data = map(models.load_csv, data_file_paths) +>> data = data_source.load_inflammation_data() >> daily_standard_deviation = compute_standard_deviation_by_data(data) >> >> return daily_standard_deviation @@ -106,12 +103,16 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> >> ```python >> if args.full_data_analysis: ->> data_result = analyse_data(os.path.dirname(InFiles[0])) ->> graph_data = { ->> 'standard deviation by day': data_result, ->> } ->> views.visualize(graph_data) ->> return +>> if len(InFiles) == 0: +>> data_source = UserProvidSpecificFilesDataSource() +>> else: +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> data_result = analyse_data(data_source) +>> graph_data = { +>> 'standard deviation by day': data_result, +>> } +>> views.visualize(graph_data) +>> return >> ``` >> You might notice this is more-or-less the change we did to write our >> regression test. From 54f3c9c83f55554d9b886a3c90b4523ceca56967 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:42:37 +0100 Subject: [PATCH 63/82] Fix formatting of solution regression test --- _episodes/33-refactoring-functions.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 069190b3b..2d9ac2785 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -89,13 +89,13 @@ the tests at all. >> from inflammation.compute_data import analyse_data >> path = Path.cwd() / "../data" >> result = analyse_data(path) ->> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 ->> ,0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094 ->> ,1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312 ->> ,1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578 ->> ,0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417 ->> ,0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707 ->> ,0.50323031,0.47574665,0.45197398,0.22070227] +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211, +>> 0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094, +>> 1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312, +>> 1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578, +>> 0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417, +>> 0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707, +>> 0.50323031,0.47574665,0.45197398,0.22070227] >> npt.assert_array_almost_equal(result, expected_output) >> ``` >> From b663efac69a65e4a24f03e15d2410d2040a879f5 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:42:57 +0100 Subject: [PATCH 64/82] Correct name of the regression test to match convention --- _episodes/33-refactoring-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 2d9ac2785..15790411d 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -85,7 +85,7 @@ the tests at all. >> import numpy.testing as npt >> from pathlib import Path >> ->> def test_compute_data(): +>> def test_analyse_data(): >> from inflammation.compute_data import analyse_data >> path = Path.cwd() / "../data" >> result = analyse_data(path) From 9eae6e2ea60bfaa98c40d9ddcb36ed0f5aeb8eeb Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:43:37 +0100 Subject: [PATCH 65/82] Provide a skeleton for the test to make the exercise a bit easier Instead allows the student to focus on observing and then testing current behaviour, rather than getting bogged down in implemenation details --- _episodes/33-refactoring-functions.md | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 15790411d..40a3d3959 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -62,15 +62,25 @@ you're not changing the important behaviour you have to make some small tweaks t the tests at all. > ## Exercise: Write regression tests before refactoring -> Write a regression test to verify we don't break the code when refactoring. -> You will need to modify `analyse_data` to not create a graph and instead -> return the data. +> Add a new test file called `test_compute_data.py` in the tests folder. +> Add and complete this regression test to verify the current output of `analyse_data` +> is unchanged by the refactorings we are going to do: +> ```python +> def test_analyse_data(): +> from inflammation.compute_data import analyse_data +> path = Path.cwd() / "../data" +> result = analyse_data(path) > -> Don't forget you can use the `numpy.testing` function `assert_array_equal` to +> # TODO: add an assert for the value of result +> ``` +> Use `assert_array_almost_equal` from the `numpy.testing` library to > compare arrays of floating point numbers. > +> You will need to modify `analyse_data` to not create a graph and instead +> return the data. +> >> ## Hint ->> You might find it helpful to assert the result, observe the test failing +>> You might find it helpful to assert the results equal some made up array, observe the test failing >> and copy and paste the correct result into the test. > {: .solution} > From c3d845c403e9bb383055a6d2eee6eb13ceedd4c7 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:47:36 +0100 Subject: [PATCH 66/82] Provide signature for pure function This should make the exercise clearer. --- _episodes/33-refactoring-functions.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 40a3d3959..f004f9b3a 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -155,7 +155,12 @@ that is maybe harder to test, but is so simple that it only needs a handful of t > ## Exercise: Refactor the function into a pure function > Refactor the `analyse_data` function into a pure function with the logic, and an impure function that handles the input and output. -> The pure function should take in the data, and return the analysis results. +> The pure function should take in the data, and return the analysis results: +> ```python +> def compute_standard_deviation_by_day(data): +> # TODO +> return daily_standard_deviation +> ``` > The "glue" function should maintain the behaviour of the original `analyse_data` > but delegate all the calculations to the new pure function. >> ## Solution From dc747dd1b88423b7ac9a1abf966ab0f6ed48e6b0 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:48:01 +0100 Subject: [PATCH 67/82] Use variable name data rather than all_loaded_data for example This matches up with the variable names in the original code, making the refactoring more obvious --- _episodes/33-refactoring-functions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index f004f9b3a..0670de3b7 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -167,8 +167,8 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> You can move all of the code that does the analysis into a separate function that >> might look something like this: >> ```python ->> def compute_standard_deviation_by_day(all_loaded_data): ->> means_by_day = map(models.daily_mean, all_loaded_data) +>> def compute_standard_deviation_by_day(data): +>> means_by_day = map(models.daily_mean, data) >> means_by_day_matrix = np.stack(list(means_by_day)) >> >> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) From cafc20db3768a0380bb604e14fe3958245aae655 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:48:13 +0100 Subject: [PATCH 68/82] Introduce a header for the testing of pure functions section --- _episodes/33-refactoring-functions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 0670de3b7..63a321492 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -197,6 +197,8 @@ that is maybe harder to test, but is so simple that it only needs a handful of t > {: .solution} {: .challenge} +### Testing Pure Functions + Now we have a pure function for the analysis, we can write tests that cover all the things we would like tests to cover without depending on the data existing in CSVs. From fa67fab5a3f94da2c28424c3ec4266d0be242b2a Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:55:27 +0100 Subject: [PATCH 69/82] Add a hint showing how the class will be use --- _episodes/34-refactoring-decoupled-units.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 538c9ca5c..95ca16c4e 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -134,6 +134,21 @@ Then the method can access the **member variable** `radius`. > Put the configuration of where to load the files in the classes constructor. > Once this is done, you can construct this class outside the the statistical analysis > and pass the instance in to `analyse_data`. +>> ## Hint +>> When we have completed the refactoring, the code in the `analyse_data` function +>> should look like: +>> ```python +>> def analyse_data(data_source): +>> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... +>> ``` +>> The controller code should look like: +>> ```python +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> analyse_data(data_source) +>> ``` +> {: .solution} >> ## Solution >> You should have created a class that looks something like this: >> From c1eeb61c9dab37bf40816783f9fe89a8cdb23a9b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:56:00 +0100 Subject: [PATCH 70/82] Remove bit about adding a layer of indirection This isn't really a solution, and I think just muddies the meaning of the section --- _episodes/34-refactoring-decoupled-units.md | 9 --------- 1 file changed, 9 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 95ca16c4e..aaaaa87f7 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -196,15 +196,6 @@ Then the method can access the **member variable** `radius`. >> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 >> ... >> ``` ->> If this was a more complex refactoring, we could introduce an indirection to keep ->> the interface the same: ->> ```python ->> def analyse_data(dir_path): ->> data_source = CSVDataSource(os.path.dirname(InFiles[0])) ->> return analyse_data_from_source(data_source) ->> ``` ->> This can be a really useful intermediate step if `analyse_data` is called ->> from lots of different places. > {: .solution} {: .challenge} From 51a1f9973ff81e5201402a05ada07d52709fa688 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:58:56 +0100 Subject: [PATCH 71/82] Clarifying mocking section --- _episodes/34-refactoring-decoupled-units.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index aaaaa87f7..af859c675 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -316,12 +316,15 @@ That is, we have decoupled the job of loading the data from the job of analysing > {: .solution} {: .challenge} +## Testing using Mock Objects + We can use this abstraction to also make testing more straight forward. Instead of having our tests use real file system data, we can instead provide -a mock or dummy implementation instead of one of the DataSource classes. +a mock or dummy implementation instead of one of the real classes. +Providing what we substitute conforms to the same interface, the code we are testing will work +just the same. This dummy implementation could just returns some fixed example data. -Separately, we can test the file parsing class `CSVDataSource` without having to understand -the specifics of the statistical analysis. + An convenient way to do this in Python is using Python's [mock object library](https://docs.python.org/3/library/unittest.mock.html). These are a whole topic to themselves - From 84edea9a14cbf0c26c5216c682ac35f2d5a0a3dc Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 17:03:51 +0100 Subject: [PATCH 72/82] Add skeleton test for writing the mock test --- _episodes/34-refactoring-decoupled-units.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index af859c675..4072a357a 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -344,6 +344,21 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > ## Exercise: Test using a mock or dummy implementation +> Complete this test for analyse_data, using a mock object in place of the +> `data_source`: +> ```python +> from unittest.mock import Mock +> +> def test_compute_data_mock_source(): +> from inflammation.compute_data import analyse_data +> data_source = Mock() +> +> # TODO: configure data_source mock +> +> result = analyse_data(data_source) +> +> # TODO: add assert on the contents of result +> ``` > Create a mock for to provide as the `data_source` that returns some fixed data to test > the `analyse_data` method. > Use this mock in a test. From 3185e1bb65b5f2b02ca565cd0cf5ff1e95d8a73c Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 17:04:04 +0100 Subject: [PATCH 73/82] Remind students to import the appropriate package --- _episodes/34-refactoring-decoupled-units.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 4072a357a..10048f9a1 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -362,8 +362,12 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > Create a mock for to provide as the `data_source` that returns some fixed data to test > the `analyse_data` method. > Use this mock in a test. +> +> Don't forget you will need to import `Mock` from the `unittest.mock` package. >> ## Solution >> ```python +>> from unittest.mock import Mock +>> >> def test_compute_data_mock_source(): >> from inflammation.compute_data import analyse_data >> data_source = Mock() From f461a0591a60e7baa03ca4968a63b714212c6baa Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 17:43:23 +0100 Subject: [PATCH 74/82] Make example actually build a JSON reader --- _episodes/34-refactoring-decoupled-units.md | 49 +++++++++++++-------- _episodes/35-refactoring-architecture.md | 11 +++-- 2 files changed, 37 insertions(+), 23 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 10048f9a1..bcf271848 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -279,36 +279,47 @@ for free with no further work. That is, we have decoupled the job of loading the data from the job of analysing the data. > ## Exercise: Introduce an alternative implementation of DataSource -> Create another class that repeatedly asks the user for paths to CSVs to analyse. +> Create another class that supports loading JSON instead of CSV. +> There is a function in `models.py` that loads from JSON in the following format: +> ```json +> [ +> { +> "observations": [0, 1] +> }, +> { +> "observations": [0, 2] +> } +> ] +> ``` > It should implement the `load_inflammation_data` method. -> Finally, at run time provide an instance of the new implementation if the user hasn't -> put any files on the path. +> Finally, at run time construct an appropriate instance based on the file extension. >> ## Solution >> You should have created a class that looks something like: >> ```python ->> class UserProvidSpecificFilesDataSource: ->> def load_inflammation_data(self): ->> paths = [] ->> while(True): ->> input_string = input('Enter path to CSV or press enter to process paths collected: ') ->> if(len(input_string) == 0): ->> print(f'Finished entering input - will process {len(paths)} CSVs') ->> break ->> if os.path.exists(input_string): ->> paths.append(input_string) ->> else: ->> print(f'Path {input_string} does not exist, please enter a valid path') +>> class JSONDataSource: +>> """ +>> Loads all the inflammation JSON's within a specified folder. +>> """ +>> def __init__(self, dir_path): +>> self.dir_path = dir_path >> ->> data = map(models.load_csv, paths) +>> def load_inflammation_data(self): +>> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.json')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation JSON's found in path {self.dir_path}") +>> data = map(models.load_json, data_file_paths) >> return list(data) >> ``` >> Additionally, in the controller will need to select the appropriate DataSource to >> provide to the analysis: >>```python ->> if len(InFiles) == 0: ->> data_source = UserProvidSpecificFilesDataSource() ->> else: +>> _, extension = os.path.splitext(InFiles[0]) +>> if extension == '.json': +>> data_source = JSONDataSource() +>> elif extension == '.csv': >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> else: +>> raise ValueError(f'Unsupported file format: {extension}') >> analyse_data(data_source) >>``` >> As you have seen, all these changes were made without modifying diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index fca83749e..6da37e1bc 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -103,11 +103,14 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> >> ```python >> if args.full_data_analysis: ->> if len(InFiles) == 0: ->> data_source = UserProvidSpecificFilesDataSource() ->> else: +>> _, extension = os.path.splitext(InFiles[0]) +>> if extension == '.json': +>> data_source = JSONDataSource() +>> elif extension == '.csv': >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) ->> data_result = analyse_data(data_source) +>> else: +>> raise ValueError(f'Unsupported file format: {extension}') +>> analyse_data(data_source) >> graph_data = { >> 'standard deviation by day': data_result, >> } From 8ad11f7e2c3e1a96d71d875e8448450b9db9b51b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 17:48:17 +0100 Subject: [PATCH 75/82] Fix solutions based on testing --- _episodes/34-refactoring-decoupled-units.md | 2 +- _episodes/35-refactoring-architecture.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index bcf271848..8511e0f02 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -315,7 +315,7 @@ That is, we have decoupled the job of loading the data from the job of analysing >>```python >> _, extension = os.path.splitext(InFiles[0]) >> if extension == '.json': ->> data_source = JSONDataSource() +>> data_source = JSONDataSource(os.path.dirname(InFiles[0])) >> elif extension == '.csv': >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) >> else: diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 6da37e1bc..9a805fc41 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -105,7 +105,7 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> if args.full_data_analysis: >> _, extension = os.path.splitext(InFiles[0]) >> if extension == '.json': ->> data_source = JSONDataSource() +>> data_source = JSONDataSource(os.path.dirname(InFiles[0])) >> elif extension == '.csv': >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) >> else: From 47b3c81b4db0a1c7d1659e8893a16e8e4baebc14 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 18:00:25 +0100 Subject: [PATCH 76/82] Include the notion of writing tests before refactoring --- _episodes/32-software-design.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 700813f5c..fdf5f1d68 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -122,10 +122,11 @@ unchanged, but the code itself is easier to read, test and extend. When faced with a old piece of code that is hard to work with, that you need to modify, a good process to follow is: -1. Refactor the code in such a way that the new change will slot in cleanly. -2. Make the desired change, which now fits in easily. +1. Have tests that verify the current behaviour +2. Refactor the code in such a way that the new change will slot in cleanly. +3. Make the desired change, which now fits in easily. -Notice, after step 1, the *behaviour* of the code should be totally identical. +Notice, after step 2, the *behaviour* of the code should be totally identical. This allows you to test rigorously that the refactoring hasn't changed/broken anything *before* making the intended change. From a6726147dc6b3ecbee36a12b19b95ade93922997 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 18:02:43 +0100 Subject: [PATCH 77/82] Reiterate running the regression test after each refactor --- _episodes/33-refactoring-functions.md | 2 ++ _episodes/34-refactoring-decoupled-units.md | 2 ++ _episodes/35-refactoring-architecture.md | 2 ++ 3 files changed, 6 insertions(+) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 63a321492..544346128 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -194,6 +194,8 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> # views.visualize(graph_data) >> return daily_standard_deviation >>``` +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data`. > {: .solution} {: .challenge} diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 8511e0f02..8017d1415 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -61,6 +61,8 @@ then it becomes easier for these parts to change independently. >> This is now easier to understand, as we don't need to understand the the file loading >> to read the statistical analysis, and we don't have to understand the statistical analysis >> when reading the data loading. +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data`. > {: .solution} {: .challenge} diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 9a805fc41..637a04614 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -121,6 +121,8 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> regression test. >> This demonstrates that splitting up model code from view code can >> immediately make your code much more testable. +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data`. > {: .solution} {: .challenge} From e53f787fc62682a133976db4b198bb45b4747e54 Mon Sep 17 00:00:00 2001 From: Thomas Kiley <138868636+thomaskileyukaea@users.noreply.github.com> Date: Fri, 3 Nov 2023 18:00:34 +0000 Subject: [PATCH 78/82] Correct capitalisation of files Co-authored-by: Matthew --- _episodes/33-refactoring-functions.md | 2 +- _episodes/34-refactoring-decoupled-units.md | 2 +- _episodes/35-refactoring-architecture.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 544346128..42eae41f7 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -1,5 +1,5 @@ --- -title: "Refactoring functions to do just one thing" +title: "Refactoring Functions to Do Just One Thing" teaching: 30 exercises: 20 questions: diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 8017d1415..694c8705f 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -1,5 +1,5 @@ --- -title: "Using classes to de-couple code." +title: "Using Classes to De-Couple Code" teaching: 30 exercises: 45 questions: diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 637a04614..a00390828 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -1,5 +1,5 @@ --- -title: "Architecting code to separate responsibilities" +title: "Architecting Code to Separate Responsibilities" teaching: 15 exercises: 50 questions: From e7a3e0e44f6170b935d2216ea562bd5b28a0046b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Fri, 3 Nov 2023 17:57:56 +0000 Subject: [PATCH 79/82] Fix missing fullstop --- _episodes/34-refactoring-decoupled-units.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 694c8705f..a9e82d9a9 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -38,7 +38,7 @@ If one part of the code only uses another part through an appropriate abstractio then it becomes easier for these parts to change independently. > ## Exercise: Decouple the file loading from the computation -> Currently the function is hard coded to load all the files in a directory +> Currently the function is hard coded to load all the files in a directory. > Decouple this into a separate function that returns all the files to load >> ## Solution >> You should have written a new function that reads all the data into the format needed From 224ea65c11ef32f93e2482cdcc2bd5dd7fb85ccc Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Fri, 10 Nov 2023 13:26:02 +0000 Subject: [PATCH 80/82] Fix line numbers for solutions based on change to code Adding in the four lines for the "full data analysis" shifts these errors down by four lines. --- _episodes/15-coding-conventions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/15-coding-conventions.md b/_episodes/15-coding-conventions.md index 550e0feb6..e487dd91a 100644 --- a/_episodes/15-coding-conventions.md +++ b/_episodes/15-coding-conventions.md @@ -438,7 +438,7 @@ because an incorrect comment causes more confusion than no comment at all. >> which is helpfully marking inconsistencies with coding guidelines by underlying them. >> There are a few things to fix in `inflammation-analysis.py`, for example: >> ->> 1. Line 24 in `inflammation-analysis.py` is too long and not very readable. +>> 1. Line 30 in `inflammation-analysis.py` is too long and not very readable. >> A better style would be to use multiple lines and hanging indent, >> with the closing brace `}' aligned either with >> the first non-whitespace character of the last line of list @@ -487,7 +487,7 @@ because an incorrect comment causes more confusion than no comment at all. >> Note how PyCharm is warning us by underlying the whole line. >> >> 4. Only one blank line after the end of definition of function `main` ->> and the rest of the code on line 30 in `inflammation-analysis.py` - +>> and the rest of the code on line 33 in `inflammation-analysis.py` - >> should be two blank lines. >> Note how PyCharm is warning us by underlying the whole line. >> From 8f1937b945f4145fe00baa59def8b117f912f047 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Tue, 12 Dec 2023 13:57:51 +0000 Subject: [PATCH 81/82] Link and typo fixes --- _episodes/30-section3-intro.md | 2 +- _extras/databases.md | 2 +- _extras/persistence.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index 4bd5bb742..5bfdb39f1 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -134,7 +134,7 @@ within the context of the typical software development process: - How to improve existing code to be more readable, maintainable and testable. - Consider different strategies for writing well designed code, including using **pure functions**, **classes** and **abstractions**. -- How to create, asses and improve software design. +- How to create, assess and improve software design. {% include links.md %} diff --git a/_extras/databases.md b/_extras/databases.md index eed05cd55..5fda791d9 100644 --- a/_extras/databases.md +++ b/_extras/databases.md @@ -16,7 +16,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../35-refactoring-architecture/index.html). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#additional-material). {: .callout} A **database** is an organised collection of data, diff --git a/_extras/persistence.md b/_extras/persistence.md index 340ef540d..b207e0458 100644 --- a/_extras/persistence.md +++ b/_extras/persistence.md @@ -25,7 +25,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../35-refactoring-architecture/index.html). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#additional-material). {: .callout} Our patient data system so far can read in some data, process it, and display it to people. From d393dc1ab6b0071260880aa9e768ff12d69ee547 Mon Sep 17 00:00:00 2001 From: Douglas Lowe <10961945+douglowe@users.noreply.github.com> Date: Mon, 22 Jan 2024 21:35:44 +0000 Subject: [PATCH 82/82] some spelling corrections --- _episodes/32-software-design.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index fdf5f1d68..145a69b8c 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -37,7 +37,7 @@ Maintainable code is: * Testable through automated tests (like those from [episode 2](../21-automatically-testing-software/index.html)). * Adaptable to new requirements. -Writing code that meets these requirements is hard and takes practise. +Writing code that meets these requirements is hard and takes practice. Further, in most contexts you will already have a piece of code that breaks some (or maybe all!) of these principles. @@ -55,7 +55,7 @@ We will look at: * What abstractions are, and how to pick appropriate ones. * How to take code that is in a bad shape and improve it. - * Best practises to write code in ways that facilitate achieving these goals. + * Best practices to write code in ways that facilitate achieving these goals. ### Cognitive Load @@ -78,13 +78,13 @@ There are lots of ways to keep cognitive load down: An **abstraction**, at its most basic level, is a technique to hide the details of one part of a system from another part of the system. -We deal with abstractions all the time - when you press the break pedal on the +We deal with abstractions all the time - when you press the brake pedal on the car, you do not know how this manages both slowing down the engine and applying -pressure on the breaks. +pressure on the brakes. The advantage of using this abstraction is, when something changes, for example -the introduction of anti-lock breaking or an electric engine, the driver does +the introduction of anti-lock braking or an electric engine, the driver does not need to do anything differently - -the detail of how the car breaks is *abstracted* away from them. +the detail of how the car brakes is *abstracted* away from them. Abstractions are a fundamental part of software. For example, when you write Python code, you are dealing with an