From f3d36fb0a25f967d40378bae2ec61de264b679f5 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 14:56:46 +0100 Subject: [PATCH 001/105] Remove old course content and add new pages The top sections are filled out to give an idea of content --- _episodes/32-software-design.md | 253 +----- _episodes/33-programming-paradigms.md | 175 ---- _episodes/33-refactoring-functions | 15 + _episodes/34-functional-programming.md | 825 ------------------ _episodes/34-refactoring-architecture | 15 + _episodes/35-object-oriented-programming.md | 904 -------------------- _episodes/35-refactoring-decoupled-units | 15 + _episodes/36-architecture-revisited.md | 444 ---------- 8 files changed, 49 insertions(+), 2597 deletions(-) delete mode 100644 _episodes/33-programming-paradigms.md create mode 100644 _episodes/33-refactoring-functions delete mode 100644 _episodes/34-functional-programming.md create mode 100644 _episodes/34-refactoring-architecture delete mode 100644 _episodes/35-object-oriented-programming.md create mode 100644 _episodes/35-refactoring-decoupled-units delete mode 100644 _episodes/36-architecture-revisited.md diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 18dbe2ae7..6020472a8 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -4,261 +4,16 @@ teaching: 15 exercises: 30 questions: - "What should we consider when designing software?" -- "How can we make sure the components of our software are reusable?" +- "What goals should we have when structuring our code" objectives: -- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." -- "Understand the components of multi-layer software architectures." +- "Understand what an abstraction is, and when you should use one" +- "Understand what refactoring is" keypoints: -- "Planning software projects in advance can save a lot of effort and reduce 'technical debt' later - even a partial plan is better than no plan at all." +- "How code is structured is important for helping future people understand and update it" - "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. Such components can be as small as a single function, or be a software package in their own right." - "When writing software used for research, requirements will almost *always* change." - "*'Good code is written so that is readable, understandable, covered by automated tests, not over complicated and does well what is intended to do.'*" --- -## Introduction - -In this episode, we'll be looking at how we can design our software -to ensure it meets the requirements, -but also retains the other qualities of good software. -As a piece of software grows, -it will reach a point where there's too much code for us to keep in mind at once. -At this point, it becomes particularly important that the software be designed sensibly. -What should be the overall structure of our software, -how should all the pieces of functionality fit together, -and how should we work towards fulfilling this overall design throughout development? - -It's not easy to come up with a complete definition for the term **software design**, -but some of the common aspects are: - -- **Algorithm design** - - what method are we going to use to solve the core business problem? -- **Software architecture** - - what components will the software have and how will they cooperate? -- **System architecture** - - what other things will this software have to interact with and how will it do this? -- **UI/UX** (User Interface / User Experience) - - how will users interact with the software? - -As usual, the sooner you adopt a practice in the lifecycle of your project, the easier it will be. -So we should think about the design of our software from the very beginning, -ideally even before we start writing code - -but if you didn't, it's never too late to start. - - -The answers to these questions will provide us with some **design constraints** -which any software we write must satisfy. -For example, a design constraint when writing a mobile app would be -that it needs to work with a touch screen interface - -we might have some software that works really well from the command line, -but on a typical mobile phone there isn't a command line interface that people can access. - - -## Software Architecture - -At the beginning of this episode we defined **software architecture** -as an answer to the question -"what components will the software have and how will they cooperate?". -Software engineering borrowed this term, and a few other terms, -from architects (of buildings) as many of the processes and techniques have some similarities. -One of the other important terms we borrowed is 'pattern', -such as in **design patterns** and **architecture patterns**. -This term is often attributed to the book -['A Pattern Language' by Christopher Alexander *et al.*](https://en.wikipedia.org/wiki/A_Pattern_Language) -published in 1977 -and refers to a template solution to a problem commonly encountered when building a system. - -Design patterns are relatively small-scale templates -which we can use to solve problems which affect a small part of our software. -For example, the **[adapter pattern](https://en.wikipedia.org/wiki/Adapter_pattern)** -(which allows a class that does not have the "right interface" to be reused) -may be useful if part of our software needs to consume data -from a number of different external data sources. -Using this pattern, -we can create a component whose responsibility is -transforming the calls for data to the expected format, -so the rest of our program doesn't have to worry about it. - -Architecture patterns are similar, -but larger scale templates which operate at the level of whole programs, -or collections or programs. -Model-View-Controller (which we chose for our project) is one of the best known architecture patterns. -Many patterns rely on concepts from Object Oriented Programming, -so we'll come back to the MVC pattern shortly -after we learn a bit more about Object Oriented Programming. - -There are many online sources of information about design and architecture patterns, -often giving concrete examples of cases where they may be useful. -One particularly good source is [Refactoring Guru](https://refactoring.guru/design-patterns). - - -### Multilayer Architecture - -One common architectural pattern for larger software projects is **Multilayer Architecture**. -Software designed using this architecture pattern is split into layers, -each of which is responsible for a different part of the process of manipulating data. - -Often, the software is split into three layers: - -- **Presentation Layer** - - This layer is responsible for managing the interaction between - our software and the people using it - - May include the **View** components if also using the MVC pattern -- **Application Layer / Business Logic Layer** - - This layer performs most of the data processing required by the presentation layer - - Likely to include the **Controller** components if also using an MVC pattern - - May also include the **Model** components -- **Persistence Layer / Data Access Layer** - - This layer handles data storage and provides data to the rest of the system - - May include the **Model** components of an MVC pattern - if they're not in the application layer - -Although we've drawn similarities here between the layers of a system and the components of MVC, -they're actually solutions to different scales of problem. -In a small application, a multilayer architecture is unlikely to be necessary, -whereas in a very large application, -the MVC pattern may be used just within the presentation layer, -to handle getting data to and from the people using the software. - -## Addressing New Requirements - -So, let's assume we now want to extend our application - -designed around an MVC architecture - with some new functionalities -(more statistical processing and a new view to see a patient's data). -Let's recall the solution requirements we discussed in the previous episode: - -- *Functional Requirements*: - - SR1.1.1 (from UR1.1): - add standard deviation to data model and include in graph visualisation view - - SR1.2.1 (from UR1.2): - add a new view to generate a textual representation of statistics, - which is invoked by an optional command line argument -- *Non-functional Requirements*: - - SR2.1.1 (from UR2.1): - generate graphical statistics report on clinical workstation configuration in under 30 seconds - -### How Should We Test These Requirements? - -Sometimes when we make changes to our code that we plan to test later, -we find the way we've implemented that change doesn't lend itself well to how it should be tested. -So what should we do? - -Consider requirement SR1.2.1 - -we have (at least) two things we should test in some way, -for which we could write unit tests. -For the textual representation of statistics, -in a unit test we could invoke our new view function directly -with known inflammation data and test the text output as a string against what is expected. -The second one, invoking this new view with an optional command line argument, -is more problematic since the code isn't structured in a way where -we can easily invoke the argument parsing portion to test it. -To make this more amenable to unit testing we could -move the command line parsing portion to a separate function, -and use that in our unit tests. -So in general, it's a good idea to make sure -your software's features are modularised and accessible via logical functions. - -We could also consider writing unit tests for SR2.1.1, -ensuring that the system meets our performance requirement, so should we? -We do need to verify it's being met with the modified implementation, -however it's generally considered bad practice to use unit tests for this purpose. -This is because unit tests test *if* a given aspect is behaving correctly, -whereas performance tests test *how efficiently* it does it. -Performance testing produces measurements of performance which require a different kind of analysis -(using techniques such as [*code profiling*](https://towardsdatascience.com/how-to-assess-your-code-performance-in-python-346a17880c9f)), -and require careful and specific configurations of operating environments to ensure fair testing. -In addition, unit testing frameworks are not typically designed for conducting such measurements, -and only test units of a system, -which doesn't give you an idea of performance of the system -as it is typically used by stakeholders. - -The key is to think about which kind of testing should be used -to check if the code satisfies a requirement, -but also what you can do to make that code amenable to that type of testing. - -> ## Exercise: Implementing Requirements -> Pick one of the requirements SR1.1.1 or SR1.2.1 above to implement -> and create an appropriate feature branch - -> e.g. `add-std-dev` or `add-view` from your most up-to-date `develop` branch. -> -> One aspect you should consider first is -> whether the new requirement can be implemented within the existing design. -> If not, how does the design need to be changed to accommodate the inclusion of this new feature? -> Also try to ensure that the changes you make are amenable to unit testing: -> is the code suitably modularised -> such that the aspect under test can be easily invoked -> with test input data and its output tested? -> -> If you have time, feel free to implement the other requirement, or invent your own! -> -> Also make sure you push changes to your new feature branch remotely -> to your software repository on GitHub. -> -> **Note: do not add the tests for the new feature just yet - -> even though you would normally add the tests along with the new code, -> we will do this in a later episode. -> Equally, do not merge your changes to the `develop` branch just yet.** -> -> **Note 2: we have intentionally left this exercise without a solution -> to give you more freedom in implementing it how you see fit. -> If you are struggling with adding a new view and command line parameter, -> you may find the standard deviation requirement easier. -> A later episode in this section will look at -> how to handle command line parameters in a scalable way.** -{: .challenge} - -## Best Practices for 'Good' Software Design - -Aspirationally, what makes good code can be summarised in the following quote from the -[Intent HG blog](https://intenthq.com/blog/it-audience/what-is-good-code-a-scientific-definition/): - -> *“Good code is written so that is readable, understandable, -> covered by automated tests, not over complicated -> and does well what is intended to do.”* - -By taking time to design our software to be easily modifiable and extensible, -we can save ourselves a lot of time later when requirements change. -The sooner we do this the better - -ideally we should have at least a rough design sketched out for our software -before we write a single line of code. -This design should be based around the structure of the problem we're trying to solve: -what are the concepts we need to represent -and what are the relationships between them. -And importantly, who will be using our software and how will they interact with it? - -Here's another way of looking at it. - -Not following good software design and development practices -can lead to accumulated 'technical debt', -which (according to [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt)), -is the "cost of additional rework caused by choosing an easy (limited) solution now -instead of using a better approach that would take longer". -So, the pressure to achieve project goals can sometimes lead to quick and easy solutions, -which make the software become -more messy, more complex, and more difficult to understand and maintain. -The extra effort required to make changes in the future is the interest paid on the (technical) debt. -It's natural for software to accrue some technical debt, -but it's important to pay off that debt during a maintenance phase - -simplifying, clarifying the code, making it easier to understand - -to keep these interest payments on making changes manageable. -If this isn't done, the software may accrue too much technical debt, -and it can become too messy and prohibitive to maintain and develop, -and then it cannot evolve. - -Importantly, there is only so much time available. -How much effort should we spend on designing our code properly -and using good development practices? -The following [XKCD comic](https://xkcd.com/844/) summarises this tension: - -![Writing good code comic](../fig/xkcd-good-code-comic.png){: .image-with-shadow width="400px" } - -At an intermediate level there are a wealth of practices that *could* be used, -and applying suitable design and coding practices is what separates -an *intermediate developer* from someone who has just started coding. -The key for an intermediate developer is to balance these concerns -for each software project appropriately, -and employ design and development practices *enough* so that progress can be made. -It's very easy to under-design software, -but remember it's also possible to over-design software too. - {% include links.md %} diff --git a/_episodes/33-programming-paradigms.md b/_episodes/33-programming-paradigms.md deleted file mode 100644 index 520708b54..000000000 --- a/_episodes/33-programming-paradigms.md +++ /dev/null @@ -1,175 +0,0 @@ ---- -title: "Programming Paradigms" -start: false -teaching: 10 -exercises: 0 -questions: -- "How does the structure of a problem affect the structure of our code?" -- "How can we use common software paradigms to improve the quality of our software?" -objectives: -- "Describe some of the major software paradigms we can use to classify programming languages." -keypoints: -- "A software paradigm describes a way of structuring or reasoning about code." -- "Different programming languages are suited to different paradigms." -- "Different paradigms are suited to solving different classes of problems." -- "A single piece of software will often contain instances of multiple paradigms." ---- - -## Introduction - -As you become more experienced in software development it becomes increasingly important -to understand the wider landscape in which you operate, -particularly in terms of the software decisions the people around you made and why? -Today, there are a multitude of different programming languages, -with each supporting at least one way to approach a problem and structure your code. -In many cases, particularly with modern languages, -a single language can allow many different structural approaches within your code. - -One way to categorise these structural approaches is into **paradigms**. -Each paradigm represents a slightly different way of thinking about and structuring our code -and each has certain strengths and weaknesses when used to solve particular types of problems. -Once your software begins to get more complex -it's common to use aspects of different paradigms to handle different subtasks. -Because of this, it's useful to know about the major paradigms, -so you can recognise where it might be useful to switch. - -There are two major families that we can group the common programming paradigms into: -**Imperative** and **Declarative**. -An imperative program uses statements that change the program's state - -it consists of commands for the computer to perform -and focuses on describing **how** a program operates step by step. -A declarative program expresses the logic of a computation -to describe **what** should be accomplished -rather than describing its control flow as a sequence steps. - -We will look into three major paradigms -from the imperative and declarative families that may be useful to you - -**Procedural Programming**, **Functional Programming** and **Object-Oriented Programming**. -Note, however, that most of the languages can be used with multiple paradigms, -and it is common to see multiple paradigms within a single program - -so this classification of programming languages based on the paradigm they use isn't as strict. - -## Procedural Programming - -Procedural Programming comes from a family of paradigms known as the Imperative Family. -With paradigms in this family, we can think of our code as the instructions for processing data. - -Procedural Programming is probably the style you're most familiar with -and the one we used up to this point, -where we group code into -*procedures performing a single task, with exactly one entry and one exit point*. -In most modern languages we call these **functions**, instead of procedures - -so if you're grouping your code into functions, this might be the paradigm you're using. -By grouping code like this, we make it easier to reason about the overall structure, -since we should be able to tell roughly what a function does just by looking at its name. -These functions are also much easier to reuse than code outside of functions, -since we can call them from any part of our program. - -So far we have been using this technique in our code - -it contains a list of instructions that execute one after the other starting from the top. -This is an appropriate choice for smaller scripts and software -that we're writing just for a single use. -Aside from smaller scripts, Procedural Programming is also commonly seen -in code focused on high performance, with relatively simple data structures, -such as in High Performance Computing (HPC). -These programs tend to be written in C (which doesn't support Object Oriented Programming) -or Fortran (which didn't until recently). -HPC code is also often written in C++, -but C++ code would more commonly follow an Object Oriented style, -though it may have procedural sections. - -Note that you may sometimes hear people refer to this paradigm as "functional programming" -to contrast it with Object Oriented Programming, -because it uses functions rather than objects, -but this is incorrect. -Functional Programming is a separate paradigm that -places much stronger constraints on the behaviour of a function -and structures the code differently as we'll see soon. - -## Functional Programming - -Functional Programming comes from a different family of paradigms - -known as the Declarative Family. -The Declarative Family is a distinct set of paradigms -which have a different outlook on what a program is - -here code describes *what* data processing should happen. -What we really care about here is the outcome - how this is achieved is less important. - -Functional Programming is built around -a more strict definition of the term **function** borrowed from mathematics. -A function in this context can be thought of as -a mapping that transforms its input data into output data. -Anything a function does other than produce an output is known as a **side effect** -and should be avoided wherever possible. - -Being strict about this definition allows us to -break down the distinction between **code** and **data**, -for example by writing a function which accepts and transforms other functions - -in Functional Programming *code is data*. - -The most common application of Functional Programming in research is in data processing, -especially when handling **Big Data**. -One popular definition of Big Data is -data which is too large to fit in the memory of a single computer, -with a single dataset sometimes being multiple terabytes or larger. -With datasets like this, we can't move the data around easily, -so we often want to send our code to where the data is instead. -By writing our code in a functional style, -we also gain the ability to run many operations in parallel -as it's guaranteed that each operation won't interact with any of the others - -this is essential if we want to process this much data in a reasonable amount of time. - -## Object Oriented Programming - -Object Oriented Programming focuses on the specific characteristics of each object -and what each object can do. -An object has two fundamental parts - properties (characteristics) and behaviours. -In Object Oriented Programming, -we first think about the data and the things that we're modelling - and represent these by objects. - -For example, if we're writing a simulation for our chemistry research, -we're probably going to need to represent atoms and molecules. -Each of these has a set of properties which we need to know about -in order for our code to perform the tasks we want - -in this case, for example, we often need to know the mass and electric charge of each atom. -So with Object Oriented Programming, -we'll have some **object** structure which represents an atom and all of its properties, -another structure to represent a molecule, -and a relationship between the two (a molecule contains atoms). -This structure also provides a way for us to associate code with an object, -representing any **behaviours** it may have. -In our chemistry example, this could be our code for calculating the force between a pair of atoms. - -Most people would classify Object Oriented Programming as an -[extension of the Imperative family of languages](https://www.digitalocean.com/community/tutorials/functional-imperative-object-oriented-programming-comparison) -(with the extra feature being the objects), but -[others disagree](https://stackoverflow.com/questions/38527078/what-is-the-difference-between-imperative-and-object-oriented-programming). - -> ## So Which one is Python? -> Python is a multi-paradigm and multi-purpose programming language. -> You can use it as a procedural language and you can use it in a more object oriented way. -> It does tend to land more on the object oriented side as all its core data types -> (strings, integers, floats, booleans, lists, -> sets, arrays, tuples, dictionaries, files) -> as well as functions, modules and classes are objects. -> -> Since functions in Python are also objects that can be passed around like any other object, -> Python is also well suited to functional programming. -> One of the most popular Python libraries for data manipulation, -> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), -> supports a functional programming style -> as most of its functions on data are not changing the data (no side effects) -> but producing a new data to reflect the result of the function. -{: .callout} - -## Other Paradigms - -The three paradigms introduced here are some of the most common, -but there are many others which may be useful for addressing specific classes of problem - -for much more information see the Wikipedia's page on -[programming paradigms](https://en.wikipedia.org/wiki/Programming_paradigm). -Having mainly used Procedural Programming so far, -we will now have a closer look at Functional and Object Oriented Programming paradigms -and how they can affect our architectural design choices. - -{% include links.md %} diff --git a/_episodes/33-refactoring-functions b/_episodes/33-refactoring-functions new file mode 100644 index 000000000..aa240023c --- /dev/null +++ b/_episodes/33-refactoring-functions @@ -0,0 +1,15 @@ +--- +title: "Refactoring functions to do just one thing" +teaching: 0 +exercises: 0 +questions: +- "How do you refactor code without breaking it?" +- "How do you write code that is easy to test?" +objectives: +- "Understand how to refactor functions to be easier to test" +- "Be able to write regressions tests to avoid breaking existing code" +- "Understand what a pure function is." +keypoints: +- "By refactoring code into pure functions that act on data makes code easier to test." +- "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" +--- diff --git a/_episodes/34-functional-programming.md b/_episodes/34-functional-programming.md deleted file mode 100644 index 45431b994..000000000 --- a/_episodes/34-functional-programming.md +++ /dev/null @@ -1,825 +0,0 @@ ---- -title: "Functional Programming" -teaching: 30 -exercises: 30 -questions: -- What is functional programming? -- Which situations/problems is functional programming well suited for? -objectives: -- Describe the core concepts that define the functional programming paradigm -- Describe the main characteristics of code that is written in functional programming style -- Learn how to generate and process data collections efficiently using MapReduce and Python's comprehensions -keypoints: -- Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations). -- In functional programming, functions tend to be *pure* - they do not exhibit *side-effects* (by not affecting anything other than the value they return or anything outside a function). Functions can also be named, passed as arguments, and returned from other functions, just as any other data type. -- MapReduce is an instance of a data generation and processing approach, in particular suited for functional programming and handling Big Data within parallel and distributed environments. -- Python provides comprehensions for lists, dictionaries, sets and generators - a concise (if not strictly functional) way to generate new data from existing data collections while performing sophisticated mapping, filtering and conditional logic on original dataset's members. ---- - -## Introduction - -Functional programming is a programming paradigm where -programs are constructed by applying and composing/chaining **functions**. -Functional programming is based on the -[mathematical definition of a function](https://en.wikipedia.org/wiki/Function_(mathematics)) -`f()`, -which applies a transformation to some input data giving us some other data as a result -(i.e. a mapping from input `x` to output `f(x)`). -Thus, a program written in a functional style becomes a series of transformations on data -which are performed to produce a desired output. -Each function (transformation) taken by itself is simple and straightforward to understand; -complexity is handled by composing functions in various ways. - -Often when we use the term function we are referring to -a construct containing a block of code which performs a particular task and can be reused. -We have already seen this in procedural programming - -so how are functions in functional programming different? -The key difference is that functional programming is focussed on -**what** transformations are done to the data, -rather than **how** these transformations are performed -(i.e. a detailed sequence of steps which update the state of the code to reach a desired state). -Let's compare and contrast examples of these two programming paradigms. - -## Functional vs Procedural Programming - -The following two code examples implement the calculation of a factorial -in procedural and functional styles, respectively. -Recall that the factorial of a number `n` (denoted by `n!`) is calculated as -the product of integer numbers from 1 to `n`. - -The first example provides a procedural style factorial function. - -~~~ -def factorial(n): - """Calculate the factorial of a given number. - - :param int n: The factorial to calculate - :return: The resultant factorial - """ - if n < 0: - raise ValueError('Only use non-negative integers.') - - factorial = 1 - for i in range(1, n + 1): # iterate from 1 to n - # save intermediate value to use in the next iteration - factorial = factorial * i - - return factorial -~~~ -{: .language-python} - -Functions in procedural programming are *procedures* that describe -a detailed list of instructions to tell the computer what to do step by step -and how to change the state of the program and advance towards the result. -They often use *iteration* to repeat a series of steps. -Functional programming, on the other hand, typically uses *recursion* - -an ability of a function to call/repeat itself until a particular condition is reached. -Let's see how it is used in the functional programming example below -to achieve a similar effect to that of iteration in procedural programming. - -~~~ -# Functional style factorial function -def factorial(n): - """Calculate the factorial of a given number. - - :param int n: The factorial to calculate - :return: The resultant factorial - """ - if n < 0: - raise ValueError('Only use non-negative integers.') - - if n == 0 or n == 1: - return 1 # exit from recursion, prevents infinite loops - else: - return n * factorial(n-1) # recursive call to the same function -~~~ -{: .language-python} - -Note: You may have noticed that both functions in the above code examples have the same signature -(i.e. they take an integer number as input and return its factorial as output). -You could easily swap these equivalent implementations -without changing the way that the function is invoked. -Remember, a single piece of software may well contain instances of multiple programming paradigms - -including procedural, functional and object-oriented - -it is up to you to decide which one to use and when to switch -based on the problem at hand and your personal coding style. - -Functional computations only rely on the values that are provided as inputs to a function -and not on the state of the program that precedes the function call. -They do not modify data that exists outside the current function, including the input data - -this property is referred to as the *immutability of data*. -This means that such functions do not create any *side effects*, -i.e. do not perform any action that affects anything other than the value they return. -For example: printing text, -writing to a file, -modifying the value of an input argument, -or changing the value of a global variable. -Functions without side affects -that return the same data each time the same input arguments are provided -are called *pure functions*. - -> ## Exercise: Pure Functions -> -> Which of these functions are pure? -> If you're not sure, explain your reasoning to someone else, do they agree? -> -> ~~~ -> def add_one(x): -> return x + 1 -> -> def say_hello(name): -> print('Hello', name) -> -> def append_item_1(a_list, item): -> a_list += [item] -> return a_list -> -> def append_item_2(a_list, item): -> result = a_list + [item] -> return result -> ~~~ -> {: .language-python} -> -> > ## Solution -> > -> > 1. `add_one` is pure - it has no effects other than to return a value and this value will always be the same when given the same inputs -> > 2. `say_hello` is not pure - printing text counts as a side effect, even though it is the clear purpose of the function -> > 3. `append_item_1` is not pure - the argument `a_list` gets modified as a side effect - try this yourself to prove it -> > 4. `append_item_2` is pure - the result is a new variable, so this time `a_list` does not get modified - again, try this yourself -> {: .solution} -{: .challenge} - -## Benefits of Functional Code - -There are a few benefits we get when working with pure functions: - -- Testability -- Composability -- Parallelisability - -**Testability** indicates how easy it is to test the function - usually meaning unit tests. -It is much easier to test a function if we can be certain that -a particular input will always produce the same output. -If a function we are testing might have different results each time it runs -(e.g. a function that generates random numbers drawn from a normal distribution), -we need to come up with a new way to test it. -Similarly, it can be more difficult to test a function with side effects -as it is not always obvious what the side effects will be, or how to measure them. - -**Composability** refers to the ability to make a new function from a chain of other functions -by piping the output of one as the input to the next. -If a function does not have side effects or non-deterministic behaviour, -then all of its behaviour is reflected in the value it returns. -As a consequence of this, any chain of combined pure functions is itself pure, -so we keep all these benefits when we are combining functions into a larger program. -As an example of this, we could make a function called `add_two`, -using the `add_one` function we already have. - -~~~ -def add_two(x): - return add_one(add_one(x)) -~~~ -{: .language-python} - -**Parallelisability** is the ability for operations to be performed at the same time (independently). -If we know that a function is fully pure and we have got a lot of data, -we can often improve performance by -splitting data and distributing the computation across multiple processors. -The output of a pure function depends only on its input, -so we will get the right result regardless of when or where the code runs. - -> ## Everything in Moderation -> Despite the benefits that pure functions can bring, -> we should not be trying to use them everywhere. -> Any software we write needs to interact with the rest of the world somehow, -> which requires side effects. -> With pure functions you cannot read any input, write any output, -> or interact with the rest of the world in any way, -> so we cannot usually write useful software using just pure functions. -> Python programs or libraries written in functional style will usually not be -> as extreme as to completely avoid reading input, writing output, -> updating the state of internal local variables, etc.; -> instead, they will provide a functional-appearing interface -> but may use non-functional features internally. -> An example of this is the [Python Pandas library](https://pandas.pydata.org/) -> for data manipulation built on top of NumPy - -> most of its functions appear pure -> as they return new data objects instead of changing existing ones. -{: .callout} - -There are other advantageous properties that can be derived from the functional approach to coding. -In languages which support functional programming, -a function is a *first-class object* like any other object - -not only can you compose/chain functions together, -but functions can be used as inputs to, -passed around or returned as results from other functions -(remember, in functional programming *code is data*). -This is why functional programming is suitable for processing data efficiently - -in particular in the world of Big Data, where code is much smaller than the data, -sending the code to where data is located is cheaper and faster than the other way round. -Let's see how we can do data processing using functional programming. - -## MapReduce Data Processing Approach - -When working with data you will often find that you need to -apply a transformation to each datapoint of a dataset -and then perform some aggregation across the whole dataset. -One instance of this data processing approach is known as MapReduce -and is applied when processing (but not limited to) Big Data, -e.g. using tools such as [Spark](https://en.wikipedia.org/wiki/Apache_Spark) -or [Hadoop](https://hadoop.apache.org/). -The name MapReduce comes from applying an operation to (mapping) each value in a dataset, -then performing a reduction operation which -collects/aggregates all the individual results together to produce a single result. -MapReduce relies heavily on composability and parallelisability of functional programming - -both map and reduce can be done in parallel and on smaller subsets of data, -before aggregating all intermediate results into the final result. - -### Mapping -`map(f, C)` is a function takes another function `f()` and a collection `C` of data items as inputs. -Calling `map(f, L)` applies the function `f(x)` to every data item `x` in a collection `C` -and returns the resulting values as a new collection of the same size. - -This is a simple mapping that takes a list of names and -returns a list of the lengths of those names using the built-in function `len()`: - -~~~ -name_lengths = map(len, ["Mary", "Isla", "Sam"]) -print(list(name_lengths)) -~~~ -{: .language-python} -~~~ -[4, 4, 3] -~~~ -{: .output} - -This is a mapping that squares every number in the passed collection using anonymous, -inlined *lambda* expression (a simple one-line mathematical expression representing a function): - -~~~ -squares = map(lambda x: x * x, [0, 1, 2, 3, 4]) -print(list(squares)) -~~~ -{: .language-python} -~~~ -[0, 1, 4, 9, 16] -~~~ -{: .output} - -> ## Lambda -> Lambda expressions are used to create anonymous functions that can be used to -> write more compact programs by inlining function code. -> A lambda expression takes any number of input parameters and -> creates an anonymous function that returns the value of the expression. -> So, we can use the short, one-line `lambda x, y, z, ...: expression` code -> instead of defining and calling a named function `f()` as follows: -> ~~~ -> def f(x, y, z, ...): -> return expression -> ~~~ -> {: .language-python} -> The major distinction between lambda functions and ‘normal’ functions is that -> lambdas do not have names. -> We could give a name to a lambda expression if we really wanted to - -> but at that point we should be using a ‘normal’ Python function instead. -> -> ~~~ -> # Don't do this -> add_one = lambda x: x + 1 -> -> # Do this instead -> def add_one(x): -> return x + 1 -> ~~~ -> {: .language-python} -{: .callout} - -In addition to using built-in or inlining anonymous lambda functions, -we can also pass a named function that we have defined ourselves to the `map()` function. - -~~~ -def add_one(num): - return num + 1 - -result = map(add_one, [0, 1, 2]) -print(list(result)) -~~~ -{: .language-python} -~~~ -[1, 2, 3] -~~~ -{: .output} - -> ## Exercise: Check Inflammation Patient Data Against A Threshold Using Map -> Write a new function called `daily_above_threshold()` in our inflammation `models.py` that -> determines whether or not each daily inflammation value for a given patient -> exceeds a given threshold. -> -> Given a patient row number in our data, the patient dataset itself, and a given threshold, -> write the function to use `map()` to generate and return a list of booleans, -> with each value representing whether or not the daily inflammation value for that patient -> exceeded the given threshold. -> -> Ordinarily we would use Numpy's own `map` feature, -> but for this exercise, let's try a solution without it. -> -> > ## Solution -> > ~~~ -> > def daily_above_threshold(patient_num, data, threshold): -> > """Determine whether or not each daily inflammation value exceeds a given threshold for a given patient. -> > -> > :param patient_num: The patient row number -> > :param data: A 2D data array with inflammation data -> > :param threshold: An inflammation threshold to check each daily value against -> > :returns: A boolean list representing whether or not each patient's daily inflammation exceeded the threshold -> > """ -> > -> > return list(map(lambda x: x > threshold, data[patient_num])) -> > ~~~ -> > {: .language-python} -> > -> > Note: `map()` function returns a map iterator object -> > which needs to be converted to a collection object -> > (such as a list, dictionary, set, tuple) -> > using the corresponding "factory" function (in our case `list()`). -> {: .solution} -{: .challenge} - -#### Comprehensions for Mapping/Data Generation - -Another way you can generate new collections of data from existing collections in Python is -using *comprehensions*, -which are an elegant and concise way of creating data from -[iterable objects](https://www.w3schools.com/python/python_iterators.asp) using *for loops*. -While not a pure functional concept, -comprehensions provide data generation functionality -and can be used to achieve the same effect as the built-in "pure functional" function `map()`. -They are commonly used and actually recommended as a replacement of `map()` in modern Python. -Let's have a look at some examples. - -~~~ -integers = range(5) -double_ints = [2 * i for i in integers] - -print(double_ints) -~~~ -{: .language-python} -~~~ -[0, 2, 4, 6, 8] -~~~ -{: .output} - -The above example uses a *list comprehension* to double each number in a sequence. -Notice the similarity between the syntax for a list comprehension and a for loop - -in effect, this is a for loop compressed into a single line. -In this simple case, the code above is equivalent to using a map operation on a sequence, -as shown below: - -~~~ -integers = range(5) -double_ints = map(lambda i: 2 * i, integers) -print(list(double_ints)) -~~~ -{: .language-python} -~~~ -[0, 2, 4, 6, 8] -~~~ -{: .output} - -We can also use list comprehensions to filter data, by adding the filter condition to the end: - -~~~ -double_even_ints = [2 * i for i in integers if i % 2 == 0] -print(double_even_ints) -~~~ -{: .language-python} -~~~ -[0, 4, 8] -~~~ -{: .output} - -> ## Set and Dictionary Comprehensions and Generators -> We also have *set comprehensions* and *dictionary comprehensions*, -> which look similar to list comprehensions -> but use the set literal and dictionary literal syntax, respectively. -> ~~~ -> double_even_int_set = {2 * i for i in integers if i % 2 == 0} -> print(double_even_int_set) -> -> double_even_int_dict = {i: 2 * i for i in integers if i % 2 == 0} -> print(double_even_int_dict) -> ~~~ -> {: .language-python} -> ~~~ -> {0, 4, 8} -> {0: 0, 2: 4, 4: 8} -> ~~~ -> {: .output} -> -> Finally, there’s one last ‘comprehension’ in Python - a *generator expression* - -> a type of an iterable object which we can take values from and loop over, -> but does not actually compute any of the values until we need them. -> Iterable is the generic term for anything we can loop or iterate over - -> lists, sets and dictionaries are all iterables. -> ->The `range` function is an example of a generator - -> if we created a `range(1000000000)`, but didn’t iterate over it, -> we’d find that it takes almost no time to do. -> Creating a list containing a similar number of values would take much longer, -> and could be at risk of running out of memory. -> -> We can build our own generators using a generator expression. -> These look much like the comprehensions above, -> but act like a generator when we use them. -> Note the syntax difference for generator expressions - -> parenthesis are used in place of square or curly brackets. -> -> ~~~ -> doubles_generator = (2 * i for i in integers) -> for x in doubles_generator: -> print(x) -> ~~~ -> {: .language-python} -> ~~~ -> 0 -> 2 -> 4 -> 6 -> 8 -> ~~~ -> {: .output} -{: .callout} - - -Let's now have a look at reducing the elements of a data collection into a single result. - -### Reducing - -`reduce(f, C, initialiser)` function accepts a function `f()`, -a collection `C` of data items -and an optional `initialiser`, -and returns a single cumulative value which -aggregates (reduces) all the values from the collection into a single result. -The reduction function first applies the function `f()` to the first two values in the collection -(or to the `initialiser`, if present, and the first item from `C`). -Then for each remaining value in the collection, -it takes the result of the previous computation -and the next value from the collection as the new arguments to `f()` -until we have processed all of the data and reduced it to a single value. -For example, if collection `C` has 5 elements, the call `reduce(f, C)` calculates: - -~~~ -f(f(f(f(C[0], C[1]), C[2]), C[3]), C[4]) -~~~ - -One example of reducing would be to calculate the product of a sequence of numbers. - -~~~ -from functools import reduce - -l = [1, 2, 3, 4] - -def product(a, b): - return a * b - -print(reduce(product, l)) - -# The same reduction using a lambda function -print(reduce((lambda a, b: a * b), l)) -~~~ -{: .language-python} -~~~ -24 -24 -~~~ -{: .output} - -Note that `reduce()` is not a built-in function like `map()` - -you need to import it from library `functools`. - -> ## Exercise: Calculate the Sum of a Sequence of Numbers Using Reduce -> Using reduce calculate the sum of a sequence of numbers. -> Although in practice we would use the built-in `sum()` function for this - try doing it without it. -> -> > ## Solution -> > ~~~ -> > from functools import reduce -> > -> > l = [1, 2, 3, 4] -> > -> > def add(a, b): -> > return a + b -> > -> > print(reduce(add, l)) -> > -> > # The same reduction using a lambda function -> > print(reduce((lambda a, b: a + b), l)) -> > ~~~ -> > {: .language-python} -> > ~~~ -> > 10 -> > 10 -> > ~~~ -> > {: .output} -> {: .solution} -{: .challenge} - -### Putting It All Together -Let's now put together what we have learned about map and reduce so far -by writing a function that calculates the sum of the squares of the values in a list -using the MapReduce approach. - -~~~ -from functools import reduce - -def sum_of_squares(l): - squares = [x * x for x in l] # use list comprehension for mapping - return reduce(lambda a, b: a + b, squares) -~~~ -{: .language-python} - -We should see the following behaviour when we use it: - -~~~ -print(sum_of_squares([0])) -print(sum_of_squares([1])) -print(sum_of_squares([1, 2, 3])) -print(sum_of_squares([-1])) -print(sum_of_squares([-1, -2, -3])) -~~~ -{: .language-python} -~~~ -0 -1 -14 -1 -14 -~~~ -{: .output} - -Now let’s assume we’re reading in these numbers from an input file, -so they arrive as a list of strings. -We'll modify the function so that it passes the following tests: - -~~~ -print(sum_of_squares(['1', '2', '3'])) -print(sum_of_squares(['-1', '-2', '-3'])) -~~~ -{: .language-python} -~~~ -14 -14 -~~~ -{: .output} - -The code may look like: - -~~~ -from functools import reduce - -def sum_of_squares(l): - integers = [int(x) for x in l] - squares = [x * x for x in integers] - return reduce(lambda a, b: a + b, squares) -~~~ -{: .language-python} - -Finally, like comments in Python, we’d like it to be possible for users to -comment out numbers in the input file they give to our program. -We'll finally extend our function so that the following tests pass: - -~~~ -print(sum_of_squares(['1', '2', '3'])) -print(sum_of_squares(['-1', '-2', '-3'])) -print(sum_of_squares(['1', '2', '#100', '3'])) -~~~ -{: .language-python} -~~~ -14 -14 -14 -~~~ -{: .output} - -To do so, we may filter out certain elements and have: - -~~~ -from functools import reduce - -def sum_of_squares(l): - integers = [int(x) for x in l if x[0] != '#'] - squares = [x * x for x in integers] - return reduce(lambda a, b: a + b, squares) -~~~ -{: .language-python} - ->## Exercise: Extend Inflammation Threshold Function Using Reduce -> Extend the `daily_above_threshold()` function you wrote previously -> to return a count of the number of days a patient's inflammation is over the threshold. -> Use `reduce()` over the boolean array that was previously returned to generate the count, -> then return that value from the function. -> -> You may choose to define a separate function to pass to `reduce()`, -> or use an inline lambda expression to do it (which is a bit trickier!). -> -> Hints: -> - Remember that you can define an `initialiser` value with `reduce()` -> to help you start the counter -> - If defining a lambda expression, -> note that it can conditionally return different values using the syntax -> ` if else ` in the expression. -> -> > ## Solution -> > Using a separate function: -> > ~~~ -> > def daily_above_threshold(patient_num, data, threshold): -> > """Count how many days a given patient's inflammation exceeds a given threshold. -> > -> > :param patient_num: The patient row number -> > :param data: A 2D data array with inflammation data -> > :param threshold: An inflammation threshold to check each daily value against -> > :returns: An integer representing the number of days a patient's inflammation is over a given threshold -> > """ -> > def count_above_threshold(a, b): -> > if b: -> > return a + 1 -> > else: -> > return a -> > -> > # Use map to determine if each daily inflammation value exceeds a given threshold for a patient -> > above_threshold = map(lambda x: x > threshold, data[patient_num]) -> > # Use reduce to count on how many days inflammation was above the threshold for a patient -> > return reduce(count_above_threshold, above_threshold, 0) -> > ~~~ -> > {: .language-python} -> > -> > Note that the `count_above_threshold` function used by `reduce()` -> > was defined within the `daily_above_threshold()` function -> > to limit its scope and clarify its purpose -> > (i.e. it may only be useful as part of `daily_above_threshold()` -> > hence being defined as an inner function). -> > -> > The equivalent code using a lambda expression may look like: -> > -> > ~~~ -> > from functools import reduce -> > -> > ... -> > -> > def daily_above_threshold(patient_num, data, threshold): -> > """Count how many days a given patient's inflammation exceeds a given threshold. -> > -> > :param patient_num: The patient row number -> > :param data: A 2D data array with inflammation data -> > :param threshold: An inflammation threshold to check each daily value against -> > :returns: An integer representing the number of days a patient's inflammation is over a given threshold -> > """ -> > -> > above_threshold = map(lambda x: x > threshold, data[patient_num]) -> > return reduce(lambda a, b: a + 1 if b else a, above_threshold, 0) -> > ~~~ -> > {: .language-python} -> Where could this be useful? -> For example, you may want to define the success criteria for a trial if, say, -> 80% of patients do not exhibit inflammation in any of the trial days, or some similar metrics. ->{: .solution} -{: .challenge} - -## Decorators - -Finally, we will look at one last aspect of Python where functional programming is coming handy. -As we have seen in the -[episode on parametrising our unit tests](../22-scaling-up-unit-testing/index.html#parameterising-our-unit-tests), -a decorator can take a function, modify/decorate it, then return the resulting function. -This is possible because Python treats functions as first-class objects -that can be passed around as normal data. -Here, we discuss decorators in more detail and learn how to write our own. -Let's look at the following code for ways on how to "decorate" functions. - -~~~ -def with_logging(func): - - """A decorator which adds logging to a function.""" - def inner(*args, **kwargs): - print("Before function call") - result = func(*args, **kwargs) - print("After function call") - return result - - return inner - - -def add_one(n): - print("Adding one") - return n + 1 - -# Redefine function add_one by wrapping it within with_logging function -add_one = with_logging(add_one) - -# Another way to redefine a function - using a decorator -@with_logging -def add_two(n): - print("Adding two") - return n + 2 - -print(add_one(1)) -print(add_two(1)) -~~~ -{: .language-python} -~~~ -Before function call -Adding one -After function call -2 -Before function call -Adding two -After function call -3 -~~~ -{: .output} - -In this example, we see a decorator (`with_logging`) -and two different syntaxes for applying the decorator to a function. -The decorator is implemented here as a function which encloses another function. -Because the inner function (`inner()`) calls the function being decorated (`func()`) -and returns its result, -it still behaves like this original function. -Part of this is the use of `*args` and `**kwargs` - -these allow our decorated function to accept any arguments or keyword arguments -and pass them directly to the function being decorated. -Our decorator in this case does not need to modify any of the arguments, -so we do not need to know what they are. -Any additional behaviour we want to add as part of our decorated function, -we can put before or after the call to the original function. -Here we print some text both before and after the decorated function, -to show the order in which events happen. - -We also see in this example the two different ways in which a decorator can be applied. -The first of these is to use a normal function call (`with_logging(add_one)`), -where we then assign the resulting function back to a variable - -often using the original name of the function, so replacing it with the decorated version. -The second syntax is the one we have seen previously (`@with_logging`). -This syntax is equivalent to the previous one - -the result is that we have a decorated version of the function, -here with the name `add_two`. -Both of these syntaxes can be useful in different situations: -the `@` syntax is more concise if we never need to use the un-decorated version, -while the function-call syntax gives us more flexibility - -we can continue to use the un-decorated function -if we make sure to give the decorated one a different name, -and can even make multiple decorated versions using different decorators. - -> ## Exercise: Measuring Performance Using Decorators -> One small task you might find a useful case for a decorator is -> measuring the time taken to execute a particular function. -> This is an important part of performance profiling. -> -> Write a decorator which you can use to measure the execution time of the decorated function -> using the [time.process_time_ns()](https://docs.python.org/3/library/time.html#time.process_time_ns) function. -> There are several different timing functions each with slightly different use-cases, -> but we won’t worry about that here. -> -> For the function to measure, you may wish to use this as an example: -> ~~~ -> def measure_me(n): -> total = 0 -> for i in range(n): -> total += i * i -> -> return total -> ~~~ -> {: .language-python} -> > ## Solution -> > -> > ~~~ -> > import time -> > -> > def profile(func): -> > def inner(*args, **kwargs): -> > start = time.process_time_ns() -> > result = func(*args, **kwargs) -> > stop = time.process_time_ns() -> > -> > print("Took {0} seconds".format((stop - start) / 1e9)) -> > return result -> > -> > return inner -> > -> > @profile -> > def measure_me(n): -> > total = 0 -> > for i in range(n): -> > total += i * i -> > -> > return total -> > -> > print(measure_me(1000000)) -> > ~~~ -> > {: .language-python} -> > ~~~ -> > Took 0.124199753 seconds -> > 333332833333500000 -> > ~~~ -> > {: .output} -> {: .solution} -{: .challenge} diff --git a/_episodes/34-refactoring-architecture b/_episodes/34-refactoring-architecture new file mode 100644 index 000000000..aa240023c --- /dev/null +++ b/_episodes/34-refactoring-architecture @@ -0,0 +1,15 @@ +--- +title: "Refactoring functions to do just one thing" +teaching: 0 +exercises: 0 +questions: +- "How do you refactor code without breaking it?" +- "How do you write code that is easy to test?" +objectives: +- "Understand how to refactor functions to be easier to test" +- "Be able to write regressions tests to avoid breaking existing code" +- "Understand what a pure function is." +keypoints: +- "By refactoring code into pure functions that act on data makes code easier to test." +- "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" +--- diff --git a/_episodes/35-object-oriented-programming.md b/_episodes/35-object-oriented-programming.md deleted file mode 100644 index 01413497a..000000000 --- a/_episodes/35-object-oriented-programming.md +++ /dev/null @@ -1,904 +0,0 @@ ---- -title: "Object Oriented Programming" -teaching: 30 -exercises: 20 -questions: -- "How can we use code to describe the structure of data?" -- "How should the relationships between structures be described?" -objectives: -- "Describe the core concepts that define the object oriented paradigm" -- "Use classes to encapsulate data within a more complex program" -- "Structure concepts within a program in terms of sets of behaviour" -- "Identify different types of relationship between concepts within a program" -- "Structure data within a program using these relationships" -keypoints: -- "Object oriented programming is a programming paradigm based on the concept of classes, which encapsulate data and code." -- "Classes allow us to organise data into distinct concepts." -- "By breaking down our data into classes, we can reason about the behaviour of parts of our data." -- "Relationships between concepts can be described using inheritance (*is a*) and composition (*has a*)." ---- - -## Introduction - -Object oriented programming is a programming paradigm based on the concept of objects, -which are data structures that contain (encapsulate) data and code. -Data is encapsulated in the form of fields (attributes) of objects, -while code is encapsulated in the form of procedures (methods) -that manipulate objects' attributes and define "behaviour" of objects. -So, in object oriented programming, -we first think about the data and the things that we’re modelling - -and represent these by objects - -rather than define the logic of the program, -and code becomes a series of interactions between objects. - -## Structuring Data - -One of the main difficulties we encounter when building more complex software is -how to structure our data. -So far, we've been processing data from a single source and with a simple tabular structure, -but it would be useful to be able to combine data from a range of different sources -and with more data than just an array of numbers. - -~~~ -data = np.array([[1., 2., 3.], - [4., 5., 6.]]) -~~~ -{: .language-python} - -Using this data structure has the advantage of -being able to use NumPy operations to process the data -and Matplotlib to plot it, -but often we need to have more structure than this. -For example, we may need to attach more information about the patients -and store this alongside our measurements of inflammation. - -We can do this using the Python data structures we're already familiar with, -dictionaries and lists. -For instance, we could attach a name to each of our patients: - -~~~ -patients = [ - { - 'name': 'Alice', - 'data': [1., 2., 3.], - }, - { - 'name': 'Bob', - 'data': [4., 5., 6.], - }, -] -~~~ -{: .language-python} - -> ## Exercise: Structuring Data -> -> Write a function, called `attach_names`, -> which can be used to attach names to our patient dataset. -> When used as below, it should produce the expected output. -> -> If you're not sure where to begin, -> think about ways you might be able to effectively loop over two collections at once. -> Also, don't worry too much about the data type of the `data` value, -> it can be a Python list, or a NumPy array - either is fine. -> -> ~~~ -> data = np.array([[1., 2., 3.], -> [4., 5., 6.]]) -> -> output = attach_names(data, ['Alice', 'Bob']) -> print(output) -> ~~~ -> {: .language-python} -> -> ~~~ -> [ -> { -> 'name': 'Alice', -> 'data': [1., 2., 3.], -> }, -> { -> 'name': 'Bob', -> 'data': [4., 5., 6.], -> }, -> ] -> ~~~ -> {: .output} -> -> > ## Solution -> > -> > One possible solution, perhaps the most obvious, -> > is to use the `range` function to index into both lists at the same location: -> > -> > ~~~ -> > def attach_names(data, names): -> > """Create datastructure containing patient records.""" -> > output = [] -> > -> > for i in range(len(data)): -> > output.append({'name': names[i], -> > 'data': data[i]}) -> > -> > return output -> > ~~~ -> > {: .language-python} -> > -> > However, this solution has a potential problem that can occur sometimes, -> > depending on the input. -> > What might go wrong with this solution? -> > How could we fix it? -> > -> > > ## A Better Solution -> > > -> > > What would happen if the `data` and `names` inputs were different lengths? -> > > -> > > If `names` is longer, we'll loop through, until we run out of rows in the `data` input, -> > > at which point we'll stop processing the last few names. -> > > If `data` is longer, we'll loop through, but at some point we'll run out of names - -> > > but this time we try to access part of the list that doesn't exist, -> > > so we'll get an exception. -> > > -> > > A better solution would be to use the `zip` function, -> > > which allows us to iterate over multiple iterables without needing an index variable. -> > > The `zip` function also limits the iteration to whichever of the iterables is smaller, -> > > so we won't raise an exception here, -> > > but this might not quite be the behaviour we want, -> > > so we'll also explicitly `assert` that the inputs should be the same length. -> > > Checking that our inputs are valid in this way is an example of a precondition, -> > > which we introduced conceptually in an earlier episode. -> > > -> > > If you've not previously come across the `zip` function, -> > > read [this section](https://docs.python.org/3/library/functions.html#zip) -> > > of the Python documentation. -> > > -> > > ~~~ -> > > def attach_names(data, names): -> > > """Create datastructure containing patient records.""" -> > > assert len(data) == len(names) -> > > output = [] -> > > -> > > for data_row, name in zip(data, names): -> > > output.append({'name': name, -> > > 'data': data_row}) -> > > -> > > return output -> > > ~~~ -> > > {: .language-python} -> > {: .solution} -> {: .solution} -{: .challenge} - -## Classes in Python - -Using nested dictionaries and lists should work for some of the simpler cases -where we need to handle structured data, -but they get quite difficult to manage once the structure becomes a bit more complex. -For this reason, in the object oriented paradigm, -we use **classes** to help with managing this data -and the operations we would want to perform on it. -A class is a **template** (blueprint) for a structured piece of data, -so when we create some data using a class, -we can be certain that it has the same structure each time. - -With our list of dictionaries we had in the example above, -we have no real guarantee that each dictionary has the same structure, -e.g. the same keys (`name` and `data`) unless we check it manually. -With a class, if an object is an **instance** of that class -(i.e. it was made using that template), -we know it will have the structure defined by that class. -Different programming languages make slightly different guarantees -about how strictly the structure will match, -but in object oriented programming this is one of the core ideas - -all objects derived from the same class must follow the same behaviour. - -You may not have realised, but you should already be familiar with -some of the classes that come bundled as part of Python, for example: - -~~~ -my_list = [1, 2, 3] -my_dict = {1: '1', 2: '2', 3: '3'} -my_set = {1, 2, 3} - -print(type(my_list)) -print(type(my_dict)) -print(type(my_set)) -~~~ -{: .language-python} - -~~~ - - - -~~~ -{: .output} - -Lists, dictionaries and sets are a slightly special type of class, -but they behave in much the same way as a class we might define ourselves: - -- They each hold some data (**attributes** or **state**). -- They also provide some methods describing the behaviours of the data - - what can the data do and what can we do to the data? - -The behaviours we may have seen previously include: - -- Lists can be appended to -- Lists can be indexed -- Lists can be sliced -- Key-value pairs can be added to dictionaries -- The value at a key can be looked up in a dictionary -- The union of two sets can be found (the set of values present in any of the sets) -- The intersection of two sets can be found (the set of values present in all of the sets) - -## Encapsulating Data - -Let's start with a minimal example of a class representing our patients. - -~~~ -# file: inflammation/models.py - -class Patient: - def __init__(self, name): - self.name = name - self.observations = [] - -alice = Patient('Alice') -print(alice.name) -~~~ -{: .language-python} - -~~~ -Alice -~~~ -{: .output} - -Here we've defined a class with one method: `__init__`. -This method is the **initialiser** method, -which is responsible for setting up the initial values and structure of the data -inside a new instance of the class - -this is very similar to **constructors** in other languages, -so the term is often used in Python too. -The `__init__` method is called every time we create a new instance of the class, -as in `Patient('Alice')`. -The argument `self` refers to the instance on which we are calling the method -and gets filled in automatically by Python - -we do not need to provide a value for this when we call the method. - -Data encapsulated within our Patient class includes -the patient's name and a list of inflammation observations. -In the initialiser method, -we set a patient's name to the value provided, -and create a list of inflammation observations for the patient (initially empty). -Such data is also referred to as the attributes of a class -and holds the current state of an instance of the class. -Attributes are typically hidden (encapsulated) internal object details -ensuring that access to data is protected from unintended changes. -They are manipulated internally by the class, -which, in addition, can expose certain functionality as public behavior of the class -to allow other objects to interact with this class' instances. - -## Encapsulating Behaviour - -In addition to representing a piece of structured data -(e.g. a patient who has a name and a list of inflammation observations), -a class can also provide a set of functions, or **methods**, -which describe the **behaviours** of the data encapsulated in the instances of that class. -To define the behaviour of a class we add functions which operate on the data the class contains. -These functions are the member functions or methods. - -Methods on classes are the same as normal functions, -except that they live inside a class and have an extra first parameter `self`. -Using the name `self` is not strictly necessary, but is a very strong convention - -it is extremely rare to see any other name chosen. -When we call a method on an object, -the value of `self` is automatically set to this object - hence the name. -As we saw with the `__init__` method previously, -we do not need to explicitly provide a value for the `self` argument, -this is done for us by Python. - -Let's add another method on our Patient class that adds a new observation to a Patient instance. - -~~~ -# file: inflammation/models.py - -class Patient: - """A patient in an inflammation study.""" - def __init__(self, name): - self.name = name - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - if self.observations: - day = self.observations[-1]['day'] + 1 - else: - day = 0 - - new_observation = { - 'day': day, - 'value': value, - } - - self.observations.append(new_observation) - return new_observation - -alice = Patient('Alice') -print(alice) - -observation = alice.add_observation(3) -print(observation) -print(alice.observations) -~~~ -{: .language-python} - -~~~ -<__main__.Patient object at 0x7fd7e61b73d0> -{'day': 0, 'value': 3} -[{'day': 0, 'value': 3}] -~~~ -{: .output} - -Note also how we used `day=None` in the parameter list of the `add_observation` method, -then initialise it if the value is indeed `None`. -This is one of the common ways to handle an optional argument in Python, -so we'll see this pattern quite a lot in real projects. - -> ## Class and Static Methods -> -> Sometimes, the function we're writing doesn't need access to -> any data belonging to a particular object. -> For these situations, we can instead use a **class method** or a **static method**. -> Class methods have access to the class that they're a part of, -> and can access data on that class - -> but do not belong to a specific instance of that class, -> whereas static methods have access to neither the class nor its instances. -> -> By convention, class methods use `cls` as their first argument instead of `self` - -> this is how we access the class and its data, -> just like `self` allows us to access the instance and its data. -> Static methods have neither `self` nor `cls` -> so the arguments look like a typical free function. -> These are the only common exceptions to using `self` for a method's first argument. -> -> Both of these method types are created using **decorators** - -> for more information see -> the [classmethod](https://docs.python.org/3/library/functions.html#classmethod) -> and [staticmethod](https://docs.python.org/3/library/functions.html#staticmethod) -> decorator sections of the Python documentation. -{: .callout} - -### Dunder Methods - -Why is the `__init__` method not called `init`? -There are a few special method names that we can use -which Python will use to provide a few common behaviours, -each of which begins and ends with a **d**ouble-**under**score, -hence the name **dunder method**. - -When writing your own Python classes, -you'll almost always want to write an `__init__` method, -but there are a few other common ones you might need sometimes. -You may have noticed in the code above that the method `print(alice)` -returned `<__main__.Patient object at 0x7fd7e61b73d0>`, -which is the string representation of the `alice` object. -We may want the print statement to display the object's name instead. -We can achieve this by overriding the `__str__` method of our class. - -~~~ -# file: inflammation/models.py - -class Patient: - """A patient in an inflammation study.""" - def __init__(self, name): - self.name = name - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1]['day'] + 1 - - except IndexError: - day = 0 - - - new_observation = { - 'day': day, - 'value': value, - } - - self.observations.append(new_observation) - return new_observation - - def __str__(self): - return self.name - - -alice = Patient('Alice') -print(alice) -~~~ -{: .language-python} - -~~~ -Alice -~~~ -{: .output} - -These dunder methods are not usually called directly, -but rather provide the implementation of some functionality we can use - -we didn't call `alice.__str__()`, -but it was called for us when we did `print(alice)`. -Some we see quite commonly are: - -- `__str__` - converts an object into its string representation, used when you call `str(object)` or `print(object)` -- `__getitem__` - Accesses an object by key, this is how `list[x]` and `dict[x]` are implemented -- `__len__` - gets the length of an object when we use `len(object)` - usually the number of items it contains - -There are many more described in the Python documentation, -but it’s also worth experimenting with built in Python objects to -see which methods provide which behaviour. -For a more complete list of these special methods, -see the [Special Method Names](https://docs.python.org/3/reference/datamodel.html#special-method-names) -section of the Python documentation. - -> ## Exercise: A Basic Class -> -> Implement a class to represent a book. -> Your class should: -> -> - Have a title -> - Have an author -> - When printed using `print(book)`, show text in the format "title by author" -> -> ~~~ -> book = Book('A Book', 'Me') -> -> print(book) -> ~~~ -> {: .language-python} -> -> ~~~ -> A Book by Me -> ~~~ -> {: .output} -> -> > ## Solution -> > -> > ~~~ -> > class Book: -> > def __init__(self, title, author): -> > self.title = title -> > self.author = author -> > -> > def __str__(self): -> > return self.title + ' by ' + self.author -> > ~~~ -> > {: .language-python} -> {: .solution} -{: .challenge} - -### Properties - -The final special type of method we will introduce is a **property**. -Properties are methods which behave like data - -when we want to access them, we do not need to use brackets to call the method manually. - -~~~ -# file: inflammation/models.py - -class Patient: - ... - - @property - def last_observation(self): - return self.observations[-1] - -alice = Patient('Alice') - -alice.add_observation(3) -alice.add_observation(4) - -obs = alice.last_observation -print(obs) -~~~ -{: .language-python} - -~~~ -{'day': 1, 'value': 4} -~~~ -{: .output} - -You may recognise the `@` syntax from episodes on -parameterising unit tests and functional programming - -`property` is another example of a **decorator**. -In this case the `property` decorator is taking the `last_observation` function -and modifying its behaviour, -so it can be accessed as if it were a normal attribute. -It is also possible to make your own decorators, but we won't cover it here. - -## Relationships Between Classes - -We now have a language construct for grouping data and behaviour -related to a single conceptual object. -The next step we need to take is to describe the relationships between the concepts in our code. - -There are two fundamental types of relationship between objects -which we need to be able to describe: - -1. Ownership - x **has a** y - this is **composition** -2. Identity - x **is a** y - this is **inheritance** - -### Composition - -You should hopefully have come across the term **composition** already - -in the novice Software Carpentry, we use composition of functions to reduce code duplication. -That time, we used a function which converted temperatures in Celsius to Kelvin -as a **component** of another function which converted temperatures in Fahrenheit to Kelvin. - -In the same way, in object oriented programming, we can make things components of other things. - -We often use composition where we can say 'x *has a* y' - -for example in our inflammation project, -we might want to say that a doctor *has* patients -or that a patient *has* observations. - -In the case of our example, we're already saying that patients have observations, -so we're already using composition here. -We're currently implementing an observation as a dictionary with a known set of keys though, -so maybe we should make an `Observation` class as well. - -~~~ -# file: inflammation/models.py - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return str(self.value) - -class Patient: - """A patient in an inflammation study.""" - def __init__(self, name): - self.name = name - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation - - def __str__(self): - return self.name - - -alice = Patient('Alice') -obs = alice.add_observation(3) - -print(obs) -~~~ -{: .language-python} - -~~~ -3 -~~~ -{: .output} - -Now we're using a composition of two custom classes to -describe the relationship between two types of entity in the system that we're modelling. - -### Inheritance - -The other type of relationship used in object oriented programming is **inheritance**. -Inheritance is about data and behaviour shared by classes, -because they have some shared identity - 'x *is a* y'. -If class `X` inherits from (*is a*) class `Y`, -we say that `Y` is the **superclass** or **parent class** of `X`, -or `X` is a **subclass** of `Y`. - -If we want to extend the previous example to also manage people who aren't patients -we can add another class `Person`. -But `Person` will share some data and behaviour with `Patient` - -in this case both have a name and show that name when you print them. -Since we expect all patients to be people (hopefully!), -it makes sense to implement the behaviour in `Person` and then reuse it in `Patient`. - -To write our class in Python, -we used the `class` keyword, the name of the class, -and then a block of the functions that belong to it. -If the class **inherits** from another class, -we include the parent class name in brackets. - -~~~ -# file: inflammation/models.py - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return str(self.value) - -class Person: - def __init__(self, name): - self.name = name - - def __str__(self): - return self.name - -class Patient(Person): - """A patient in an inflammation study.""" - def __init__(self, name): - super().__init__(name) - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation - -alice = Patient('Alice') -print(alice) - -obs = alice.add_observation(3) -print(obs) - -bob = Person('Bob') -print(bob) - -obs = bob.add_observation(4) -print(obs) -~~~ -{: .language-python} - -~~~ -Alice -3 -Bob -AttributeError: 'Person' object has no attribute 'add_observation' -~~~ -{: .output} - -As expected, an error is thrown because we cannot add an observation to `bob`, -who is a Person but not a Patient. - -We see in the example above that to say that a class inherits from another, -we put the **parent class** (or **superclass**) in brackets after the name of the **subclass**. - -There's something else we need to add as well - -Python doesn't automatically call the `__init__` method on the parent class -if we provide a new `__init__` for our subclass, -so we'll need to call it ourselves. -This makes sure that everything that needs to be initialised on the parent class has been, -before we need to use it. -If we don't define a new `__init__` method for our subclass, -Python will look for one on the parent class and use it automatically. -This is true of all methods - -if we call a method which doesn't exist directly on our class, -Python will search for it among the parent classes. -The order in which it does this search is known as the **method resolution order** - -a little more on this in the Multiple Inheritance callout below. - -The line `super().__init__(name)` gets the parent class, -then calls the `__init__` method, -providing the `name` variable that `Person.__init__` requires. -This is quite a common pattern, particularly for `__init__` methods, -where we need to make sure an object is initialised as a valid `X`, -before we can initialise it as a valid `Y` - -e.g. a valid `Person` must have a name, -before we can properly initialise a `Patient` model with their inflammation data. - - -> ## Composition vs Inheritance -> -> When deciding how to implement a model of a particular system, -> you often have a choice of either composition or inheritance, -> where there is no obviously correct choice. -> For example, it's not obvious whether a photocopier *is a* printer and *is a* scanner, -> or *has a* printer and *has a* scanner. -> -> ~~~ -> class Machine: -> pass -> -> class Printer(Machine): -> pass -> -> class Scanner(Machine): -> pass -> -> class Copier(Printer, Scanner): -> # Copier `is a` Printer and `is a` Scanner -> pass -> ~~~ -> {: .language-python} -> -> ~~~ -> class Machine: -> pass -> -> class Printer(Machine): -> pass -> -> class Scanner(Machine): -> pass -> -> class Copier(Machine): -> def __init__(self): -> # Copier `has a` Printer and `has a` Scanner -> self.printer = Printer() -> self.scanner = Scanner() -> ~~~ -> {: .language-python} -> -> Both of these would be perfectly valid models and would work for most purposes. -> However, unless there's something about how you need to use the model -> which would benefit from using a model based on inheritance, -> it's usually recommended to opt for **composition over inheritance**. -> This is a common design principle in the object oriented paradigm and is worth remembering, -> as it's very common for people to overuse inheritance once they've been introduced to it. -> -> For much more detail on this see the -> [Python Design Patterns guide](https://python-patterns.guide/gang-of-four/composition-over-inheritance/). -{: .callout} - -> ## Multiple Inheritance -> -> **Multiple Inheritance** is when a class inherits from more than one direct parent class. -> It exists in Python, but is often not present in other Object Oriented languages. -> Although this might seem useful, like in our inheritance-based model of the photocopier above, -> it's best to avoid it unless you're sure it's the right thing to do, -> due to the complexity of the inheritance heirarchy. -> Often using multiple inheritance is a sign you should instead be using composition - -> again like the photocopier model above. -{: .callout} - - -> ## Exercise: A Model Patient -> -> Let's use what we have learnt in this episode and combine it with what we have learnt on -> [software requirements](../31-software-requirements/index.html) -> to formulate and implement a -> [few new solution requirements](../31-software-requirements/index.html#exercise-new-solution-requirements) -> to extend the model layer of our clinical trial system. -> -> Let's start with extending the system such that there must be -> a `Doctor` class to hold the data representing a single doctor, which: -> -> - must have a `name` attribute -> - must have a list of patients that this doctor is responsible for. -> -> In addition to these, try to think of an extra feature you could add to the models -> which would be useful for managing a dataset like this - -> imagine we're running a clinical trial, what else might we want to know? -> Try using Test Driven Development for any features you add: -> write the tests first, then add the feature. -> The tests have been started for you in `tests/test_patient.py`, -> but you will probably want to add some more. -> -> Once you've finished the initial implementation, do you have much duplicated code? -> Is there anywhere you could make better use of composition or inheritance -> to improve your implementation? -> -> For any extra features you've added, -> explain them and how you implemented them to your neighbour. -> Would they have implemented that feature in the same way? -> -> > ## Solution -> > One example solution is shown below. -> > You may start by writing some tests (that will initially fail), -> > and then develop the code to satisfy the new requirements and pass the tests. -> > ~~~ -> > # file: tests/test_patient.py -> > """Tests for the Patient model.""" -> > -> > def test_create_patient(): -> > """Check a patient is created correctly given a name.""" -> > from inflammation.models import Patient -> > name = 'Alice' -> > p = Patient(name=name) -> > assert p.name == name -> > -> > def test_create_doctor(): -> > """Check a doctor is created correctly given a name.""" -> > from inflammation.models import Doctor -> > name = 'Sheila Wheels' -> > doc = Doctor(name=name) -> > assert doc.name == name -> > -> > def test_doctor_is_person(): -> > """Check if a doctor is a person.""" -> > from inflammation.models import Doctor, Person -> > doc = Doctor("Sheila Wheels") -> > assert isinstance(doc, Person) -> > -> > def test_patient_is_person(): -> > """Check if a patient is a person. """ -> > from inflammation.models import Patient, Person -> > alice = Patient("Alice") -> > assert isinstance(alice, Person) -> > -> > def test_patients_added_correctly(): -> > """Check patients are being added correctly by a doctor. """ -> > from inflammation.models import Doctor, Patient -> > doc = Doctor("Sheila Wheels") -> > alice = Patient("Alice") -> > doc.add_patient(alice) -> > assert doc.patients is not None -> > assert len(doc.patients) == 1 -> > -> > def test_no_duplicate_patients(): -> > """Check adding the same patient to the same doctor twice does not result in duplicates. """ -> > from inflammation.models import Doctor, Patient -> > doc = Doctor("Sheila Wheels") -> > alice = Patient("Alice") -> > doc.add_patient(alice) -> > doc.add_patient(alice) -> > assert len(doc.patients) == 1 -> > ... -> > ~~~ -> > {: .language-python} -> > -> > ~~~ -> > # file: inflammation/models.py -> > ... -> > class Person: -> > """A person.""" -> > def __init__(self, name): -> > self.name = name -> > -> > def __str__(self): -> > return self.name -> > -> > class Patient(Person): -> > """A patient in an inflammation study.""" -> > def __init__(self, name): -> > super().__init__(name) -> > self.observations = [] -> > -> > def add_observation(self, value, day=None): -> > if day is None: -> > try: -> > day = self.observations[-1].day + 1 -> > except IndexError: -> > day = 0 -> > new_observation = Observation(day, value) -> > self.observations.append(new_observation) -> return new_observation -> > -> > class Doctor(Person): -> > """A doctor in an inflammation study.""" -> > def __init__(self, name): -> > super().__init__(name) -> > self.patients = [] -> > -> > def add_patient(self, new_patient): -> > # A crude check by name if this patient is already looked after -> > # by this doctor before adding them -> > for patient in self.patients: -> > if patient.name == new_patient.name: -> > return -> > self.patients.append(new_patient) -> > ... -> > ~~~ -> {: .language-python} -> {: .solution} -{: .challenge} - -{% include links.md %} diff --git a/_episodes/35-refactoring-decoupled-units b/_episodes/35-refactoring-decoupled-units new file mode 100644 index 000000000..cba637c80 --- /dev/null +++ b/_episodes/35-refactoring-decoupled-units @@ -0,0 +1,15 @@ +--- +title: "Using classes to de-couple code." +teaching: 0 +exercises: 0 +questions: +- "What is de-coupled code?" +- "When is it useful to use classes to structure code?" +objectives: +- "Understand the object-oriented principle of polymorphism and interfaces." +- "Be able to introduce appropriate abstractions to simplify code." +- "Understand what decoupled code is, and why you would want it." +keypoints: +- "By using interfaces, code can become more decoupled." +- "Decoupled code is easier to test, and easier to maintain." +--- diff --git a/_episodes/36-architecture-revisited.md b/_episodes/36-architecture-revisited.md deleted file mode 100644 index 0b460211a..000000000 --- a/_episodes/36-architecture-revisited.md +++ /dev/null @@ -1,444 +0,0 @@ ---- -title: "Architecture Revisited: Extending Software" -teaching: 15 -exercises: 0 -questions: -- "How can we extend our software within the constraints of the MVC architecture?" -objectives: -- "Extend our software to add a view of a single patient in the study and the software's command line interface to request a specific view." -keypoints: -- "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. - Such components can be as small as a single function, or be a software package in their own right." ---- - -As we have seen, we have different programming paradigms that are suitable for different problems -and affect the structure of our code. -In programming languages that support multiple paradigms, such as Python, -we have the luxury of using elements of different paradigms paradigms and we, -as software designers and programmers, -can decide how to use those elements in different architectural components of our software. -Let's now circle back to the architecture of our software for one final look. - -## MVC Revisited - -We've been developing our software using the **Model-View-Controller** (MVC) architecture so far, -but, as we have seen, MVC is just one of the common architectural patterns -and is not the only choice we could have made. - -There are many variants of an MVC-like pattern (such as -[Model-View-Presenter](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93presenter) (MVP), -[Model-View-Viewmodel](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel) (MVVM), etc.), -but in most cases, the distinction between these patterns isn't particularly important. -What really matters is that we are making decisions about the architecture of our software -that suit the way in which we expect to use it. -We should reuse these established ideas where we can, but we don't need to stick to them exactly. - -In this episode we'll be taking our Object Oriented code from the previous episode -and integrating it into our existing MVC pattern. -But first we will explain some features of -the Controller (`inflammation-analysis.py`) component of our architecture. - -### Controller Structure - -You will have noticed already that structure of the `inflammation-analysis.py` file -follows this pattern: - -~~~ -# import modules - -def main(): - # perform some actions - -if __name__ == "__main__": - # perform some actions before main() - main() -~~~ -{: .language-python} - -In this pattern the actions performed by the script are contained within the `main` function -(which does not need to be called `main`, -but using this convention helps others in understanding your code). -The `main` function is then called within the `if` statement `__name__ == "__main__"`, -after some other actions have been performed -(usually the parsing of command-line arguments, which will be explained below). -`__name__` is a special dunder variable which is set, -along with a number of other special dunder variables, -by the python interpreter before the execution of any code in the source file. -What value is given by the interpreter to `__name__` is determined by -the manner in which it is loaded. - -If we run the source file directly using the Python interpreter, e.g.: - -~~~ -$ python3 inflammation-analysis.py -~~~ -{: .language-bash} - -then the interpreter will assign the hard-coded string `"__main__"` to the `__name__` variable: - -~~~ -__name__ = "__main__" -... -# rest of your code -~~~ -{: .language-python} - -However, if your source file is imported by another Python script, e.g: - -~~~ -import inflammation-analysis -~~~ -{: .language-python} - -then the interpreter will assign the name `"inflammation-analysis"` -from the import statement to the `__name__` variable: - -~~~ -__name__ = "inflammation-analysis" -... -# rest of your code -~~~ -{: .language-python} - -Because of this behaviour of the interpreter, -we can put any code that should only be executed when running the script -directly within the `if __name__ == "__main__":` structure, -allowing the rest of the code within the script to be -safely imported by another script if we so wish. - -While it may not seem very useful to have your controller script importable by another script, -there are a number of situations in which you would want to do this: - -- for testing of your code, you can have your testing framework import the main script, - and run special test functions which then call the `main` function directly; -- where you want to not only be able to run your script from the command-line, - but also provide a programmer-friendly application programming interface (API) for advanced users. - -### Passing Command-line Options to Controller - -The standard Python library for reading command line arguments passed to a script is -[`argparse`](https://docs.python.org/3/library/argparse.html). -This module reads arguments passed by the system, -and enables the automatic generation of help and usage messages. -These include, as we saw at the start of this course, -the generation of helpful error messages when users give the program invalid arguments. - -The basic usage of `argparse` can be seen in the `inflammation-analysis.py` script. -First we import the library: - -~~~ -import argparse -~~~ -{: .language-python} - -We then initialise the argument parser class, passing an (optional) description of the program: - -~~~ -parser = argparse.ArgumentParser( - description='A basic patient inflammation data management system') -~~~ -{: .language-python} - -Once the parser has been initialised we can add -the arguments that we want argparse to look out for. -In our basic case, we want only the names of the file(s) to process: - -~~~ -parser.add_argument( - 'infiles', - nargs='+', - help='Input CSV(s) containing inflammation series for each patient') -~~~ -{: .language-python} - -Here we have defined what the argument will be called (`'infiles'`) when it is read in; -the number of arguments to be expected -(`nargs='+'`, where `'+'` indicates that there should be 1 or more arguments passed); -and a help string for the user -(`help='Input CSV(s) containing inflammation series for each patient'`). - -You can add as many arguments as you wish, -and these can be either mandatory (as the one above) or optional. -Most of the complexity in using `argparse` is in adding the correct argument options, -and we will explain how to do this in more detail below. - -Finally we parse the arguments passed to the script using: - -~~~ -args = parser.parse_args() -~~~ -{: .language-python} - -This returns an object (that we've called `arg`) containing all the arguments requested. -These can be accessed using the names that we have defined for each argument, -e.g. `args.infiles` would return the filenames that have been input. - -The help for the script can be accessed using the `-h` or `--help` optional argument -(which `argparse` includes by default): - -~~~ -$ python3 inflammation-analysis.py --help -~~~ -{: .language-bash} - -~~~ -usage: inflammation-analysis.py [-h] infiles [infiles ...] - -A basic patient inflammation data management system - -positional arguments: - infiles Input CSV(s) containing inflammation series for each patient - -optional arguments: - -h, --help show this help message and exit -~~~ -{: .output} - -The help page starts with the command line usage, -illustrating what inputs can be given (any within `[]` brackets are optional). -It then lists the **positional** and **optional** arguments, -giving as detailed a description of each as you have added to the `add_argument()` command. -Positional arguments are arguments that need to be included -in the proper position or order when calling the script. - -Note that optional arguments are indicated by `-` or `--`, followed by the argument name. -Positional arguments are simply inferred by their position. -It is possible to have multiple positional arguments, -but usually this is only practical where all (or all but one) positional arguments -contains a clearly defined number of elements. -If more than one option can have an indeterminate number of entries, -then it is better to create them as 'optional' arguments. -These can be made a required input though, -by setting `required = True` within the `add_argument()` command. - -> ## Positional and Optional Argument Order -> -> The usage section of the help page above shows -> the optional arguments going before the positional arguments. -> This is the customary way to present options, but is not mandatory. -> Instead there are two rules which must be followed for these arguments: -> -> 1. Positional and optional arguments must each be given all together, and not inter-mixed. -> For example, the order can be either `optional - positional` or `positional - optional`, -> but not `optional - positional - optional`. -> 2. Positional arguments must be given in the order that they are shown -> in the usage section of the help page. -{: .callout} - -Now that you have some familiarity with `argparse`, -we will demonstrate below how you can use this to add extra functionality to your controller. - -### Adding a New View - -Let's start with adding a view that allows us to see the data for a single patient. -First, we need to add the code for the view itself -and make sure our `Patient` class has the necessary data - -including the ability to pass a list of measurements to the `__init__` method. -Note that your Patient class may look very different now, -so adapt this example to fit what you have. - -~~~ -# file: inflammation/views.py - -... - -def display_patient_record(patient): - """Display data for a single patient.""" - print(patient.name) - for obs in patient.observations: - print(obs.day, obs.value) -~~~ -{: .language-python} - -~~~ -# file: inflammation/models.py - -... - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return self.value - -class Person: - def __init__(self, name): - self.name = name - - def __str__(self): - return self.name - -class Patient(Person): - """A patient in an inflammation study.""" - def __init__(self, name, observations=None): - super().__init__(name) - - self.observations = [] - if observations is not None: - self.observations = observations - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation -~~~ -{: .language-python} - -Now we need to make sure people can call this view - -that means connecting it to the controller -and ensuring that there's a way to request this view when running the program. -The changes we need to make here are that the `main` function -needs to be able to direct us to the view we've requested - -and we need to add to the command line interface - the controller - -the necessary data to drive the new view. - -~~~ -# file: inflammation-analysis.py - -#!/usr/bin/env python3 -"""Software for managing patient data in our imaginary hospital.""" - -import argparse - -from inflammation import models, views - - -def main(args): - """The MVC Controller of the patient data system. - - The Controller is responsible for: - - selecting the necessary models and views for the current task - - passing data between models and views - """ - infiles = args.infiles - if not isinstance(infiles, list): - infiles = [args.infiles] - - for filename in infiles: - inflammation_data = models.load_csv(filename) - - if args.view == 'visualize': - view_data = { - 'average': models.daily_mean(inflammation_data), - 'max': models.daily_max(inflammation_data), - 'min': models.daily_min(inflammation_data), - } - - views.visualize(view_data) - - elif args.view == 'record': - patient_data = inflammation_data[args.patient] - observations = [models.Observation(day, value) for day, value in enumerate(patient_data)] - patient = models.Patient('UNKNOWN', observations) - - views.display_patient_record(patient) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser( - description='A basic patient data management system') - - parser.add_argument( - 'infiles', - nargs='+', - help='Input CSV(s) containing inflammation series for each patient') - - parser.add_argument( - '--view', - default='visualize', - choices=['visualize', 'record'], - help='Which view should be used?') - - parser.add_argument( - '--patient', - type=int, - default=0, - help='Which patient should be displayed?') - - args = parser.parse_args() - - main(args) -~~~ -{: .language-python} - -We've added two options to our command line interface here: -one to request a specific view and one for the patient ID that we want to lookup. -For the full range of features that we have access to with `argparse` see the -[Python module documentation](https://docs.python.org/3/library/argparse.html?highlight=argparse#module-argparse). -Allowing the user to request a specific view like this is -a similar model to that used by the popular Python library Click - -if you find yourself needing to build more complex interfaces than this, -Click would be a good choice. -You can find more information in [Click's documentation](https://click.palletsprojects.com/). - -For now, we also don't know the names of any of our patients, -so we've made it `'UNKNOWN'` until we get more data. - -We can now call our program with these extra arguments to see the record for a single patient: - -~~~ -$ python3 inflammation-analysis.py --view record --patient 1 data/inflammation-01.csv -~~~ -{: .language-bash} - -~~~ -UNKNOWN -0 0.0 -1 0.0 -2 1.0 -3 3.0 -4 1.0 -5 2.0 -6 4.0 -7 7.0 -... -~~~ -{: .output} - -> ## Additional Material -> -> Now that we've covered the basics of different programming paradigms -> and how we can integrate them into our multi-layer architecture, -> there are two optional extra episodes which you may find interesting. -> -> Both episodes cover the persistence layer of software architectures -> and methods of persistently storing data, but take different approaches. -> The episode on [persistence with JSON](/persistence) covers -> some more advanced concepts in Object Oriented Programming, while -> the episode on [databases](/databases) starts to build towards a true multilayer architecture, -> which would allow our software to handle much larger quantities of data. -{: .callout} - - -## Towards Collaborative Software Development - -Having looked at some theoretical aspects of software design, -we are now circling back to implementing our software design -and developing our software to satisfy the requirements collaboratively in a team. -At an intermediate level of software development, -there is a wealth of practices that could be used, -and applying suitable design and coding practices is what separates -an intermediate developer from someone who has just started coding. -The key for an intermediate developer is to balance these concerns -for each software project appropriately, -and employ design and development practices enough so that progress can be made. - -One practice that should always be considered, -and has been shown to be very effective in team-based software development, -is that of *code review*. -Code reviews help to ensure the 'good' coding standards are achieved -and maintained within a team by having multiple people -have a look and comment on key code changes to see how they fit within the codebase. -Such reviews check the correctness of the new code, test coverage, functionality changes, -and confirm that they follow the coding guides and best practices. -Let's have a look at some code review techniques available to us. From cf49f194752d61fd5c30ae4f3c2655dd8d7efd98 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 16:22:33 +0100 Subject: [PATCH 002/105] Add a concluding section talking about YAGNI --- _episodes/36-yagni | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 _episodes/36-yagni diff --git a/_episodes/36-yagni b/_episodes/36-yagni new file mode 100644 index 000000000..82c724800 --- /dev/null +++ b/_episodes/36-yagni @@ -0,0 +1,13 @@ +--- +title: "When to abstract, and when not to." +teaching: 0 +exercises: 0 +questions: +- "How to tell what is and isn't an appropriate abstraction" +objectives: +- "Understand how to determine correct abstractions. " +- "How to design large changes to the codebase." +keypoints: +- "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." +- "The best code is simple to understand and test, not the most clever or uses advanced language features." +--- From 22ef936a2c9aeb4188cf573306478f2c8b6a8afc Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 16:59:52 +0100 Subject: [PATCH 003/105] Add headers for the various new pages --- _episodes/32-software-design.md | 22 ++++++++++++++-- _episodes/33-refactoring-functions | 22 ++++++++++++++++ _episodes/34-refactoring-architecture | 32 ++++++++++++++++++------ _episodes/35-refactoring-decoupled-units | 23 +++++++++++++++++ _episodes/36-yagni | 22 ++++++++++++++++ 5 files changed, 111 insertions(+), 10 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 6020472a8..352712900 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -1,7 +1,7 @@ --- title: "Software Architecture and Design" -teaching: 15 -exercises: 30 +teaching: 0 +exercises: 0 questions: - "What should we consider when designing software?" - "What goals should we have when structuring our code" @@ -12,8 +12,26 @@ keypoints: - "How code is structured is important for helping future people understand and update it" - "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. Such components can be as small as a single function, or be a software package in their own right." +- "These smaller components can be understood individually without having to understand the entire codebase at once." - "When writing software used for research, requirements will almost *always* change." - "*'Good code is written so that is readable, understandable, covered by automated tests, not over complicated and does well what is intended to do.'*" --- +## Introduction + +* Thoughts on software design + +## Abstractions + +* Introduce the idea of an abstraction + +## Refactoring + +* Define refactoring +* Discuss the advantages of refactoring before making changes + +## The code for this episode + +* Introduce the code that will be used for this episode + {% include links.md %} diff --git a/_episodes/33-refactoring-functions b/_episodes/33-refactoring-functions index aa240023c..6684396f7 100644 --- a/_episodes/33-refactoring-functions +++ b/_episodes/33-refactoring-functions @@ -13,3 +13,25 @@ keypoints: - "By refactoring code into pure functions that act on data makes code easier to test." - "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" --- + +## Introduction + +* What is going to happen in this episode - learn good code design by refactoring some poorly + structured code. + +## Writing tests before refactoring + +* Explain the benefits of writing tests before refactoring +* Explain techniques for writing tests for hard to test, existing code + +## Pure functions + +* Explain what a pure function is +* Explain the benefits of pure functions for testing + +## Functional Programming + +* Introduce that pure functions are a concept from functional programming +* Mention tools and techniques Python has for functional programming + +{% include links.md %} diff --git a/_episodes/34-refactoring-architecture b/_episodes/34-refactoring-architecture index aa240023c..835a5937f 100644 --- a/_episodes/34-refactoring-architecture +++ b/_episodes/34-refactoring-architecture @@ -1,15 +1,31 @@ --- -title: "Refactoring functions to do just one thing" +title: "Architecting code to separate responsibilities" teaching: 0 exercises: 0 questions: -- "How do you refactor code without breaking it?" -- "How do you write code that is easy to test?" +- "What is the point of the MVC architecture" +- "How should code be structured" objectives: -- "Understand how to refactor functions to be easier to test" -- "Be able to write regressions tests to avoid breaking existing code" -- "Understand what a pure function is." +- "Understand the MVC pattern and how to apply it." +- "Understand the benefits of using patterns" keypoints: -- "By refactoring code into pure functions that act on data makes code easier to test." -- "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" +- "By splitting up the "view" code from "model" code, you allow easier re-use of code." +- "Using coding patterns can be useful inspirations for how to structure your code." --- + +## Introduction + +* Refamiliarise with MVC + +## Separating out considerations + +* Talk about model and view as distinct parts of the code +* Model should be made up of pure functions as discussed + +## Programming patterns + +* Talk about how MVC is one pattern +* Mention a couple of others than might be useful +* Talk about how patterns can be useful for designing architecture + +{% include links.md %} diff --git a/_episodes/35-refactoring-decoupled-units b/_episodes/35-refactoring-decoupled-units index cba637c80..d9826988b 100644 --- a/_episodes/35-refactoring-decoupled-units +++ b/_episodes/35-refactoring-decoupled-units @@ -13,3 +13,26 @@ keypoints: - "By using interfaces, code can become more decoupled." - "Decoupled code is easier to test, and easier to maintain." --- + +## Introduction + +* What is coupled and decoupled code +* Why decoupled code is better + +## Polymorphism + +* Introduce what a class is +* Introduce what an interface is +* Introduce what polymorphism is +* Explain how we can use polymorphism to introduce abstractions + +## How polymorphism is useful + +* Introduce the idea of using a different implementation + without changing the code +* Explain how to test code that uses an interface + +## Object Oriented Programming + +* Polymorphism is a tool from object oriented programming +* Outline some other tools from OOP that might be useful diff --git a/_episodes/36-yagni b/_episodes/36-yagni index 82c724800..21169d629 100644 --- a/_episodes/36-yagni +++ b/_episodes/36-yagni @@ -11,3 +11,25 @@ keypoints: - "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." - "The best code is simple to understand and test, not the most clever or uses advanced language features." --- + +## Introduction + +* Talk about the bigger picture of design having seen some techniques + +## Architecting larger changes + +* Talk about box diagrams + +## An abstraction too far + +* Drawbacks of abstraction +* Example showing too complex abstractions + +## You Ain't Gonna Need It + +* Introduce and explain YAGNI principle + +## Conclusion + +* Take care to think about software with the appropriate priorities and things will get better. +* Tips for getting better at architecture From 046677dea50adb5948dd98a5f06874cc800b286b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 17:51:35 +0100 Subject: [PATCH 004/105] Add in exercises for the new sections --- _episodes/32-software-design.md | 18 ++++++++ _episodes/33-refactoring-functions | 22 ++++++++++ _episodes/34-refactoring-architecture | 22 ++++++++++ _episodes/35-refactoring-decoupled-units | 55 ++++++++++++++++++++++++ _episodes/36-yagni | 17 ++++++++ 5 files changed, 134 insertions(+) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 352712900..58dbec5ff 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -25,6 +25,12 @@ Such components can be as small as a single function, or be a software package i * Introduce the idea of an abstraction +> ## Group Exercise: Think about examples of good and bad code +> Try to come up with examples of code that has been hard to understand - why? +> +> Try to come up with examples of code that was easy to understand and modify - why? +{: .challenge} + ## Refactoring * Define refactoring @@ -34,4 +40,16 @@ Such components can be as small as a single function, or be a software package i * Introduce the code that will be used for this episode +> ## Group Exercise: What is bad about this code? +> What about this code makes it hard to understand? +> What makes this code hard to change? +>> ## Solution +>> * Everything is in a single function +>> * If I want to use the data without using the graph I'd have to change it +>> * It is always analysing a fixed set of data +>> * It seems hard to write tests for it +>> * It doesn't have any tests +> {: .solution} +{: .challenge} + {% include links.md %} diff --git a/_episodes/33-refactoring-functions b/_episodes/33-refactoring-functions index 6684396f7..acbadcb79 100644 --- a/_episodes/33-refactoring-functions +++ b/_episodes/33-refactoring-functions @@ -24,11 +24,33 @@ keypoints: * Explain the benefits of writing tests before refactoring * Explain techniques for writing tests for hard to test, existing code +> ## Exercise: Write regression tests before refactoring +> Write a regression test to verify we don't break the code when refactoring +>> ## Solution +>> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/5000b6122e576d91c2acbc437184e00893483fdd +> {: .solution} +{: .challenge} + ## Pure functions * Explain what a pure function is + +> ## Exercise: Refactor the function into a pure function +> Refactor the function to call a pure function that just operates on and returns data. +>> ## Solution +>> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 +> {: .solution} +{: .challenge} + * Explain the benefits of pure functions for testing +> ## Exercise: Write some tests for the pure function +> Now we have refactored our a pure function, we can more easily write comprehensive tests +>> ## Solution +>> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 +> {: .solution} +{: .challenge} + ## Functional Programming * Introduce that pure functions are a concept from functional programming diff --git a/_episodes/34-refactoring-architecture b/_episodes/34-refactoring-architecture index 835a5937f..ea29f95a6 100644 --- a/_episodes/34-refactoring-architecture +++ b/_episodes/34-refactoring-architecture @@ -20,7 +20,29 @@ keypoints: ## Separating out considerations * Talk about model and view as distinct parts of the code + +> ## Exercise: Identify model and view parts of the code +> Looking at the code as it is, what parts should be considered "model" code +> and what parts should be considered "view" code? +>> ## Solution +>> The computation of the standard deviation is model code +>> The display of the output as a graph is the view code. +> {: .solution} +{: .challenge} + * Model should be made up of pure functions as discussed +TODO: Reading files is model code, but not pure + +> ## Exercise: Split out the model code from the view code +> Refactor the code to have the model code separated from +> the view code. +>> ## Solution +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/97fd04b747a6491c2590f34384eed44e83a8e73c +> {: .solution} +{: .challenge} + +TODO: did originally intend to add a new view - but I think it isn't necessary, isn't a great +example and the time could be better used. ## Programming patterns diff --git a/_episodes/35-refactoring-decoupled-units b/_episodes/35-refactoring-decoupled-units index d9826988b..efc0f92fc 100644 --- a/_episodes/35-refactoring-decoupled-units +++ b/_episodes/35-refactoring-decoupled-units @@ -19,19 +19,74 @@ keypoints: * What is coupled and decoupled code * Why decoupled code is better +> ## Exercise: Decouple the file loading from the computation +> Currently the function is hard coded to load all the files in a directory +> Decouple this into a separate function that returns all the files to load +>> ## Solution +>> TODO: This is breaking this down into more steps that I originally though, but I think +>> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd +>> At the end of this exercise, perhaps just have a version of load_data written and called directly +> {: .solution} +{: .challenge} + ## Polymorphism * Introduce what a class is +* Explain member methods +* Explain constructors + +> ## Exercise: Use a class to configure loading +> Put your function as a member method of a class, separating out the configuration +> of where to load the files from in the constructor, from where it actually loads the data +>> ## Solution +>> TODO: This is breaking this down into more steps that I originally though, but I think +>> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd +>> At the end of this exercise, they would have implemented `CSVDataSource`. +> {: .solution} +{: .challenge} + * Introduce what an interface is * Introduce what polymorphism is * Explain how we can use polymorphism to introduce abstractions +> ## Exercise: Define an interface for your class +> Create an interface class that defines the methods that a data source should provide +>> ## Solution +>> TODO: This is breaking this down into more steps that I originally though, but I think +>> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd +>> At the end of this exercise, they would have the complete solution. +> {: .solution} +{: .challenge} + ## How polymorphism is useful * Introduce the idea of using a different implementation without changing the code + +> ## Exercise: Introduce an alternative implentation of DataSource +> Create another class that repeatedly asks the user for paths to CSVs to analyse. +> It should inherit from the interface and implement the load_data method. +> Finally, at run time provide an instance of the new implementation if the user hasn't +> put any files on the path. +>> ## Solution +>> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/045754a11221a269771de8648fc56a383136fdaf +>> TODO: this is kind of hard too +> {: .solution} +{: .challenge} + * Explain how to test code that uses an interface +> ## Exercise: Test using a mock or dummy implemenation +> It is now possible to test your original method by providing a dummy +> implementation of the `DataProvider`. Use this to test the method +>> ## Solution +>> TODO: I haven't done this - do we want it? +> {: .solution} +{: .challenge} + ## Object Oriented Programming * Polymorphism is a tool from object oriented programming diff --git a/_episodes/36-yagni b/_episodes/36-yagni index 21169d629..bd3332b5f 100644 --- a/_episodes/36-yagni +++ b/_episodes/36-yagni @@ -20,6 +20,16 @@ keypoints: * Talk about box diagrams +> ## Exercise: Design a high-level architecture +> Consider implementing a new feature +> TODO: suggest a more complex feature +> Using boxes and lines sketch out an architecture for the code. +> Discuss with your team +>> ## Solution +>> An example design for the hypothetical problem. +> {: .solution} +{: .challenge} + ## An abstraction too far * Drawbacks of abstraction @@ -29,6 +39,13 @@ keypoints: * Introduce and explain YAGNI principle +> ## Exercise: Applying to real world examples +> Thinking about the examples of good and bad code you identified at the start of the episode. +> Identify what kind of principles were and weren't being followed +> Identify some refactorings that could be performed that would improve the code +> Discuss the ideas as a group. +{: .challenge} + ## Conclusion * Take care to think about software with the appropriate priorities and things will get better. From 664afbc2d9f4a8c7c7d737a76cd251eaa3555b51 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 9 Aug 2023 17:52:58 +0100 Subject: [PATCH 005/105] Fix file name extensions --- .../{33-refactoring-functions => 33-refactoring-functions.md} | 0 ...34-refactoring-architecture => 34-refactoring-architecture.md} | 0 ...actoring-decoupled-units => 35-refactoring-decoupled-units.md} | 0 _episodes/{36-yagni => 36-yagni.md} | 0 4 files changed, 0 insertions(+), 0 deletions(-) rename _episodes/{33-refactoring-functions => 33-refactoring-functions.md} (100%) rename _episodes/{34-refactoring-architecture => 34-refactoring-architecture.md} (100%) rename _episodes/{35-refactoring-decoupled-units => 35-refactoring-decoupled-units.md} (100%) rename _episodes/{36-yagni => 36-yagni.md} (100%) diff --git a/_episodes/33-refactoring-functions b/_episodes/33-refactoring-functions.md similarity index 100% rename from _episodes/33-refactoring-functions rename to _episodes/33-refactoring-functions.md diff --git a/_episodes/34-refactoring-architecture b/_episodes/34-refactoring-architecture.md similarity index 100% rename from _episodes/34-refactoring-architecture rename to _episodes/34-refactoring-architecture.md diff --git a/_episodes/35-refactoring-decoupled-units b/_episodes/35-refactoring-decoupled-units.md similarity index 100% rename from _episodes/35-refactoring-decoupled-units rename to _episodes/35-refactoring-decoupled-units.md diff --git a/_episodes/36-yagni b/_episodes/36-yagni.md similarity index 100% rename from _episodes/36-yagni rename to _episodes/36-yagni.md From d39c621a4413756a3ee4586bce8ab63d5111879a Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 10 Oct 2023 13:41:01 +0100 Subject: [PATCH 006/105] Add first draft of the software design episode This section outlines the key ideas for the rest of the episode. --- _episodes/32-software-design.md | 108 ++++++++++++++++++++++++++++---- 1 file changed, 96 insertions(+), 12 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 58dbec5ff..c87491c90 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -4,10 +4,12 @@ teaching: 0 exercises: 0 questions: - "What should we consider when designing software?" -- "What goals should we have when structuring our code" +- "What goals should we have when structuring our code?" +- "What is refactoring?" objectives: -- "Understand what an abstraction is, and when you should use one" -- "Understand what refactoring is" +- "Know what goals we have when architecting and designing software." +- "Understand what an abstraction is, and when you should use one." +- "Understand what refactoring is." keypoints: - "How code is structured is important for helping future people understand and update it" - "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. @@ -19,11 +21,66 @@ Such components can be as small as a single function, or be a software package i ## Introduction -* Thoughts on software design +Typically when we start writing code, we write small scripts that +we intend to use. +We probably don't imagine we will need to change the code in the future. +We almost certainly don't expect other people will need to understand +and modify the code in the future. +However, as projects grow in complexity and the number of people involved grows, +it becomes important to think about how to structure code. +Software Architecture and Design is all about thinking about ways to make the +code be **maintainable** as projects grow. + +Maintainable code is: + + * Readable to people who didn't write the code. + * Testable through automated tests (like those from [episode 2](../21-automatically-testing-software/index.html)). + * Adaptable to new requirements. + +Writing code that meets these requirements is hard and takes practise. +Further, in most contexts you will already have a piece of code that breaks +some (or maybe all!) of these principles. + +In this episode we will explore techniques and processes that can help you +continuously improve the quality of code so, over time, it tends towards more +maintainable code. + +We will look at: + + * What abstractions are, and how to pick appropriate ones. + * How to take code that is in a bad shape and improve it. + * Best practises to write code in ways that facilitate achieving these goals. ## Abstractions -* Introduce the idea of an abstraction +An **abstraction**, at its most basic level, is a technique to hide the details +of one part of a system from another part of the system. +We deal with abstractions all the time - when you press the break pedal on the +car, you do not know how this manages both slowing down the engine and applying +pressure on the breaks. +The advantage of using this abstraction is, when something changes, for example +the introduction of anti-lock breaking or an electric engine, the driver does +not need to do anything differently - +the detail of how the car breaks is *abstracted* away from them. + +Abstractions are a fundamental part of software. +For example, when you write Python code, you are dealing with an +abstraction of the computer. +You don't need to understand how RAM functions. +Instead, you just need to understand how variables work in Python. + +In large projects it is vital to come up with good abstractions. +A good abstraction makes code easier to read, as the reader doesn't need to understand +all the details of the project to understand one part. +A good abstraction makes code easier to test, as it can be tested in isolation +from everything else. +Finally, a good abstraction makes code easier to adapt, as the details of +how a subsystem *used* to work are hidden from the user, so when they change, +the user doesn't need to know. + +In this episode we are going to look at some code and introduce various +different kinds of abstraction. +However, fundamentally any abstraction should be serving these goals. > ## Group Exercise: Think about examples of good and bad code > Try to come up with examples of code that has been hard to understand - why? @@ -33,21 +90,48 @@ Such components can be as small as a single function, or be a software package i ## Refactoring -* Define refactoring -* Discuss the advantages of refactoring before making changes +Often we are not working on brand new projects, but instead maintaining an existing +piece of software. +Often, this piece of software will be hard to maintain, perhaps because it has hard to understand, or doesn't have any tests. +In this situation, we want to adapt the code to make it more maintainable. +This will allow greater confidence of the code, as well as making future development easier. + +**Refactoring** is a process where some code is modified, such that its external behaviour remains +unchanged, but the code itself is easier to read, test and extend. + +When faced with a old piece of code that is hard to work with, that you need to modify, a good process to follow is: + +1. Refactor the code in such a way that the new change will slot in cleanly. +2. Make the desired change, which now fits in easily. + +Notice, after step 1, the *behaviour* of the code should be totally identical. +This allows you to test rigorously that the refactoring hasn't changed/broken anything +*before* making the intended change. + +In this episode, we will be making some changes to an existing bit of code that +is in need of refactoring. ## The code for this episode -* Introduce the code that will be used for this episode +The code itself is a feature to the inflammation tool we've been working on. + +In it, if the user adds `--full-data-analysis` then the program will scan the directory +of one of the provided files, compare standard deviations across the data by day and +plot a graph. + +We are going to be refactoring and extending this over the remainder of this episode. > ## Group Exercise: What is bad about this code? -> What about this code makes it hard to understand? -> What makes this code hard to change? +> In what ways does this code not live up to the ideal properties of maintainable code? +> Think about ways in which you find it hard to understand. +> Think about the kinds of changes you might want to make to it, and what would +> make making those changes challenging. >> ## Solution ->> * Everything is in a single function +>> * Everything is in a single function - reading it you have to understand how the file loading +works at the same time as the analysis itself. >> * If I want to use the data without using the graph I'd have to change it >> * It is always analysing a fixed set of data ->> * It seems hard to write tests for it +>> * It seems hard to write tests for it as it always analyses a fixed set of files >> * It doesn't have any tests > {: .solution} {: .challenge} From e0e6da1818b7cbf0ba285fb2cee223c53e04f63d Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 10 Oct 2023 14:53:47 +0100 Subject: [PATCH 007/105] Add first draft of the pure functions section --- _episodes/33-refactoring-functions.md | 188 ++++++++++++++++++++++++-- 1 file changed, 178 insertions(+), 10 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index acbadcb79..1ec91a617 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -5,6 +5,8 @@ exercises: 0 questions: - "How do you refactor code without breaking it?" - "How do you write code that is easy to test?" +- "What is functional programming?" +- "Which situations/problems is functional programming well suited for?" objectives: - "Understand how to refactor functions to be easier to test" - "Be able to write regressions tests to avoid breaking existing code" @@ -12,48 +14,214 @@ objectives: keypoints: - "By refactoring code into pure functions that act on data makes code easier to test." - "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" +- "Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations)." --- ## Introduction -* What is going to happen in this episode - learn good code design by refactoring some poorly - structured code. +In this episode we will take some code and refactor it in a way which is going to make it +easier to test. +By having more tests, we can more confident of future changes having their intended effect. +The change we will make will also end up making the code easier to understand. ## Writing tests before refactoring -* Explain the benefits of writing tests before refactoring +The process we are going to be following is: + +1. Write some tests that test the behaviour as it is now +2. Refactor the code to be more testable +3. Ensure that the original tests still pass + +By writing the tests *before* we refactor, we can be confident we haven't broken +existing behaviour through the refactoring. + +There is a bit of a chicken-and-the-egg problem here however. +If the refactoring is to make it easier to write tests, how can we write tests +before doing the refactoring? + +The tricks to get around this trap are: + + * Test at a higher level, with coarser accuracy + * Write tests that you intend to remove + +The best tests are ones that test single bits of code rigorously. +However, with this code it isn't possible to do that. +Instead we will make minimal changes to the code to make it a bit testable, +for example returning the data instead of visualising it. +We will also simply observe what the outcome is, rather than trying to +test the outcome is correct. +If the behaviour is currently broken, then we don't want to inadvertently fix it. + +As with everything in this episode, there isn't a hard and fast rule. +Refactoring doesn't change behaviour, but sometimes to make it possible to verify +you're not changing the important behaviour you have to make some small tweaks to write +the tests at all. + * Explain techniques for writing tests for hard to test, existing code > ## Exercise: Write regression tests before refactoring > Write a regression test to verify we don't break the code when refactoring >> ## Solution +>> One approach we can take is to: +>> * comment out the visualize (as this will cause our test to hang) +>> * return the data instead, so we can write asserts on the data +>> * See what the calculated value is, and assert that it is the same +>> Putting this together, you can write a test that looks something like: +>> +>> ```python +>> import numpy.testing as npt +>> +>> def test_compute_data(): +>> from inflammation.compute_data import analyse_data +>> path = 'data/' +>> result = analyse_data(path) +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 +>> ,0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094 +>> ,1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312 +>> ,1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578 +>> ,0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417 +>> ,0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707 +>> ,0.50323031,0.47574665,0.45197398,0.22070227] +>> npt.assert_array_almost_equal(result, expected_output) +>> ``` +>> +>> This isn't a good test: +>> * It isn't at all obvious why these numbers are correct. +>> * It doesn't test edge cases. +>> * If the files change, the test will start failing. +>> +>> However, it allows us to guarantee we don't accidentally change the analysis output. >> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/5000b6122e576d91c2acbc437184e00893483fdd > {: .solution} {: .challenge} ## Pure functions -* Explain what a pure function is +A **pure function** is a function that works like a mathematical function. +That is, it takes in some inputs as parameters, and it produces an output. +That output should always be the same for the same input. +That is, it does not depend on any information not present in the inputs (such as global variables, databases, the time of day etc.) +Further, it should not cause any **side effects" such as writing to a file or changing a global variable. + +You should try and have as much of the complex, analytical and mathematical code in pure functions. + +Maybe something about cognitive load here? And maybe drop other two advantages til later. + +Pure functions have a number of advantages: + +* They are easy to test: you feed in inputs and get fixed outputs +* They are easy to understand: when you are reading them you have all + the information they depend on, you don't need to know what is likely to be in + a database, or what the state of a global variable is likely to be. +* They are easy to re-use: because they always behave the same, you can always use them + +Some parts of a program are inevitably impure. +Programs need to read input from the user, or write to a database. +Well designed programs separate complex logic from the necessary "glue" code that interacts with users and systems. +This way, you have easy-to-test, easy-to-read code that contains the complex logic. +And you have really simple code that just reads data from a file, or gathers user input etc, +that is maybe harder to test, but is so simple that it only needs a handful of tests anyway. > ## Exercise: Refactor the function into a pure function -> Refactor the function to call a pure function that just operates on and returns data. +> Refactor the `analyse_data` function into a pure function with the logic, and an impure function that handles the input and output. +> The pure function should take in the data, and return the analysis results. +> The "glue" function should maintain the behaviour of the original `analyse_data` +> but delegate all the calculations to the new pure function. >> ## Solution +>> You can move all of the code that does the analysis into a separate function that +>> might look something like this: +>> ```python +>> def compute_standard_deviation_by_data(all_loaded_data): +>> means_by_day = map(models.daily_mean, all_loaded_data) +>> means_by_day_matrix = np.stack(list(means_by_day)) +>> +>> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) +>> return daily_standard_deviation +>> ``` +>> Then the glue function can use this function, whilst keeping all the logic +>> for reading the file and processing the data for showing in a graph: +>>```python +>>def analyse_data(data_dir): +>> """Calculate the standard deviation by day between datasets +>> Gets all the inflammation csvs within a directory, works out the mean +>> inflammation value for each day across all datasets, then graphs the +>> standard deviation of these means.""" +>> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {data_dir}") +>> data = map(models.load_csv, data_file_paths) +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> +>> graph_data = { +>> 'standard deviation by day': daily_standard_deviation, +>> } +>> # views.visualize(graph_data) +>> return daily_standard_deviation +>>``` >> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 > {: .solution} {: .challenge} -* Explain the benefits of pure functions for testing +Now we have a pure function for the analysis, we can write tests that cover +all the things we would like tests to cover without depending on the data +existing in CSVs. + +This will make tests easier to write, but it will also make them easier to read. +The reader will not have to open up a CSV file to understand why the test is correct. + +It will also make the tests easier to maintain. +If at some point the data format is changed from CSV to JSON, the bulk of the tests +won't need to be updated. > ## Exercise: Write some tests for the pure function -> Now we have refactored our a pure function, we can more easily write comprehensive tests +> Now we have refactored our a pure function, we can more easily write comprehensive tests. +> Add tests that check for when there is only one file with multiple rows, multiple files with one row +> and any other cases you can think of that should be tested. >> ## Solution ->> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 +>> You might hev throught of more tests, but we can easily extend the test by parameterizing +>> with more inputs and expected outputs: +>> ```python +>>@pytest.mark.parametrize('data,expected_output', [ +>> ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), +>> ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), +>> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) +>>], +>>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) +>>def test_compute_standard_deviation_by_data(data, expected_output): +>> from inflammation.compute_data import compute_standard_deviation_by_data +>> +>> result = compute_standard_deviation_by_data(data) +>> npt.assert_array_almost_equal(result, expected_output) +``` > {: .solution} {: .challenge} ## Functional Programming -* Introduce that pure functions are a concept from functional programming -* Mention tools and techniques Python has for functional programming +**Pure Functions** are a concept that is part of the idea of **Functional Programming**. +Functional programming is a style of programming that encourages using pure functions, +chained together. +Some programming languages, such as Haskell or Lisp just support writing functional code, +but it is more common for languages to allow using functional and **imperative** (the style +of code you have probably been writing thus far where you instruct the computer directly what to do). +Python, Java, C++ and many other languages allow for mixing these two styles. + +In Python, you can use the built-in functions `map`, `filter` and `reduce` to chain +pure functions together into pipelines. + +In the original code, we used `map` to "map" the file paths into the loaded data. +Extending this idea, you could then "map" the results of that through another process. + +You can read more about using these language features [here](https://www.learnpython.org/en/Map%2C_Filter%2C_Reduce). +Other programming languages will have similar features, and searching "functional style" + your programming language of choice +will help you find the features available. + +There are no hard and fast rules in software design but making your complex logic out of composed pure functions is a great place to start +when trying to make code readable, testable and maintainable. +This tends to be possible when: + +* Doing any kind of data analysis +* Simulations +* Translating data from one format to another {% include links.md %} From 3103a3cc77dd45aa1eb289ddd62969af57aed90d Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 10 Oct 2023 16:41:19 +0100 Subject: [PATCH 008/105] Adding first draft of the MVC section --- _episodes/34-refactoring-architecture.md | 105 ++++++++++++++++++++--- 1 file changed, 92 insertions(+), 13 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index ea29f95a6..afe3398c1 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -6,48 +6,127 @@ questions: - "What is the point of the MVC architecture" - "How should code be structured" objectives: +- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." - "Understand the MVC pattern and how to apply it." - "Understand the benefits of using patterns" keypoints: -- "By splitting up the "view" code from "model" code, you allow easier re-use of code." +- "By splitting up the \"view\" code from \"model\" code, you allow easier re-use of code." - "Using coding patterns can be useful inspirations for how to structure your code." --- + ## Introduction -* Refamiliarise with MVC +Model-View-Controller (MVC) is a way of separating out different portions of a typical +application. Specifically we have: + +* The **model** which contains the internal data representations for the program, and the valid + operations that can be performed on it. +* The **view** is responsible for how this data is presented to the user (e.g. through a GUI or + by writing out to a file) +* The **controller** defines how the model can be interacted with. + +Separating out these different sections into different parts of the code will make +the code much more maintainable. +For example, if the view code is kept away from the model code, then testing the model code +can be done without having to worry about how it will be presented. + +It helps with readability, as it makes it easier to have each function doing +just one thing. + +It also helps with maintainability - if the UI requirements change, these changes +are easily isolated from the more complex logic. ## Separating out considerations -* Talk about model and view as distinct parts of the code +The key thing to take away from MVC is the distinction between model code and view code. + +> The view and the controller tend to be more tightly coupled and it isn't always sensible +> to draw a thick line dividing these two. Depending on how the user interacts with the software +> this distinction may not be possible (the code that specifies there is a button on the screen, +> might be the same code that specifies what that button does). In fact, the original proposer +> of MVC groups the views and the controller into a single element, called the tool. Other modern +> architectures like Model-ViewModel-View do away with the controller and instead separate out the +> layout code from a programmable view of the UI. +{: .callout} + +The view code might be hard to test, or use libraries to draw the UI, but should +not contain any complex logic, and is really just a presentation layer on top of the model. + +The model, conversely, should operate quite agonistically of how a specific tool might interact with it. +For example, perhaps there currently is no way > ## Exercise: Identify model and view parts of the code > Looking at the code as it is, what parts should be considered "model" code > and what parts should be considered "view" code? >> ## Solution ->> The computation of the standard deviation is model code ->> The display of the output as a graph is the view code. +>> * The computation of the standard deviation is model code +>> * Reading the data is also model code. +>> * The display of the output as a graph is the view code. +>> * The controller is the logic that processes what flags the user has provided. > {: .solution} {: .challenge} -* Model should be made up of pure functions as discussed -TODO: Reading files is model code, but not pure +Within the model there is further separation that makes sense. +For example, as discussed, separating out the code that interacts with file systems from +the calculations is sensible. +Nevertheless, the MVC approach is a great starting point when thinking about how you should structure your code. > ## Exercise: Split out the model code from the view code > Refactor the code to have the model code separated from > the view code. >> ## Solution +>> The idea here is to have `analyse_data` to not have any "view" considerations. +>> That is, it should just compute and return the data. +>> +>> ```python +>> def analyse_data(data_dir): +>> """Calculate the standard deviation by day between datasets +>> Gets all the inflammation csvs within a directory, works out the mean +>> inflammation value for each day across all datasets, then graphs the +>> standard deviation of these means.""" +>> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {data_dir}") +>> data = map(models.load_csv, data_file_paths) +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> +>> return daily_standard_deviation +>> ``` +>> There can be a separate bit of code that chooses how that should be presented, e.g. as a graph: +>> +>> ```python +>> if args.full_data_analysis: +>> data_result = analyse_data(os.path.dirname(InFiles[0])) +>> graph_data = { +>> 'standard deviation by day': data_result, +>> } +>> views.visualize(graph_data) +>> return +>> ``` >> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/97fd04b747a6491c2590f34384eed44e83a8e73c > {: .solution} {: .challenge} -TODO: did originally intend to add a new view - but I think it isn't necessary, isn't a great -example and the time could be better used. - ## Programming patterns -* Talk about how MVC is one pattern -* Mention a couple of others than might be useful -* Talk about how patterns can be useful for designing architecture +MVC is a **programming pattern**, which is a template for structuring code. +Patterns are useful starting point for how to design your software. +They also work as a common vocabulary for discussing software designs with +other developers. + +The Refactoring Guru website has a [list of programming patterns](https://refactoring.guru/design-patterns/catalog). +They aren't all good design decisions, and can certainly be over-applied, but learning about them can be helpful +for thinking at a big picture level about software design. + +For example, the [visitor pattern](https://refactoring.guru/design-patterns/visitor) is +a good way of separating the problem of how to move through the data +from a specific action you want to perform on the data. + +By having a terminology for these approaches can facilitate discussions +where everyone is familiar with them. +However, they cannot replace a full design as most problems will require +a bespoke design that maps cleanly on to the specific problem you are +trying to solve. {% include links.md %} From 9c40c44fe4f0aae5965e4db3559663def62508ed Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 17 Oct 2023 16:19:13 +0100 Subject: [PATCH 009/105] First draft of the class section of the episode --- _episodes/35-refactoring-decoupled-units.md | 377 ++++++++++++++++++-- 1 file changed, 341 insertions(+), 36 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index efc0f92fc..3aae044d4 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -5,89 +5,394 @@ exercises: 0 questions: - "What is de-coupled code?" - "When is it useful to use classes to structure code?" +- "How can we make sure the components of our software are reusable?" objectives: - "Understand the object-oriented principle of polymorphism and interfaces." - "Be able to introduce appropriate abstractions to simplify code." - "Understand what decoupled code is, and why you would want it." +- "Be able to use mocks to replace a class in test code." keypoints: +- "Classes can help separate code so it is easier to understand." - "By using interfaces, code can become more decoupled." - "Decoupled code is easier to test, and easier to maintain." --- ## Introduction -* What is coupled and decoupled code -* Why decoupled code is better +When we're thinking about units of code, one important thing to consider is +whether the code is **decoupled** (as opposed to **coupled**). +Two units of code can be considered decoupled if changes in one don't +necessitate changes in the other. +While two connected units can't be totally decoupled, loose coupling +allows for more maintainable code: + +* Loosely coupled code is easier to read as you don't need to understand the + detail of the other unit. +* Loosely coupled code is easier to test, as one of the units can be replaced + by a test or mock version of it. +* Loose coupled code tends to be easier to maintain, as changes can be isolated + from other parts of the code. > ## Exercise: Decouple the file loading from the computation > Currently the function is hard coded to load all the files in a directory > Decouple this into a separate function that returns all the files to load >> ## Solution ->> TODO: This is breaking this down into more steps that I originally though, but I think ->> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd ->> At the end of this exercise, perhaps just have a version of load_data written and called directly +>> You should have written a new function that reads all the data into the format needed +>> for the analysis: +>> ```python +>> def load_inflammation_data(dir_path): +>> data_file_paths = glob.glob(os.path.join(dir_path, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {dir_path}") +>> data = map(models.load_csv, data_file_paths) +>> return list(data) +>> ``` +>> This can then be used in the analysis. +>> ```python +>> def analyse_data(data_dir): +>> ... +>> data = load_inflammation_data(data_dir) +>> ... +>> ``` +>> This is now easier to understand, as we don't need to understand the the file loading +>> to read the statistical analysis, and we don't have to understand the statistical analysis +>> when reading the data loading. > {: .solution} {: .challenge} -## Polymorphism +## Using classes to encapsulate data and behaviours + +Abstractedly, we can talk about units of code, where we are thinking of the unit doing one "thing". +In practise, in Python there are three ways we can create defined units of code. +The first is functions, which we have used. +The next level up is **classes**. +Finally, there are also modules and packages, which we won't cover. + +A class is a way of grouping together data with some specific methods. +In Python, you can declare a class as follows: + +```python +class MyClass: + pass +``` + +They are typically named using `UpperCase`. + +You can then **construct** a class elsewhere in your code by doing the following: + +```python +my_class = MyClass() +``` + +When you construct a class in this ways, its **construtor** is called. It is possible +to pass in values to the constructor that configure the class: + +```python +class Circle: + def __init__(self, radius): + self.radius = radius + +my_circle = Circle(10) +``` + +The constructor has the special name `__init__` (one of the so called "dunder methods"). +Notice it also has a special first parameter called `self` (called this by convention). +This parameter can be used to access the current **instance** of the object being created. + +A class can be thought of as a cookie cutter template, +and the instances are the cookies themselves. +That is, one class can have many instances. + +Classes can also have methods defined on them. +Like constructors, they have an special `self` parameter that must come first. -* Introduce what a class is -* Explain member methods -* Explain constructors +```python +class Circle: + ... + def get_area(self): + return Math.PI * self.radius * self.radius +... +print(my_circle.get_area()) +``` + +Here the instance of the class, `my_circle` will be automatically +passed in as the first parameter when calling `get_area`. +Then the method can access the **member variable** `radius`. + +Classes have a number of uses. + +* Encapsulating data - such as grouping three numbers together into a Vector class +* Maintaining invariants - TODO an example here would be good +* Encapsulating behaviour - such as a class that csha > ## Exercise: Use a class to configure loading > Put your function as a member method of a class, separating out the configuration -> of where to load the files from in the constructor, from where it actually loads the data +> of where to load the files from in the constructor, from where it actually loads the data. +> Once this is done, you can construct this class outside the the statistical analysis +> and pass it in. >> ## Solution ->> TODO: This is breaking this down into more steps that I originally though, but I think ->> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd ->> At the end of this exercise, they would have implemented `CSVDataSource`. +>> ```python +>> class CSVDataSource: +>> """ +>> Loads all the inflammation csvs within a specified folder. +>> """ +>> def __init__(self, dir_path): +>> self.dir_path = dir_path +>> super().__init__() +>> +>> def load_inflammation_data(self): +>> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {self.dir_path}") +>> data = map(models.load_csv, data_file_paths) +>> return list(data) +>> ``` +>> We can now pass an instance of this class into the the statistical analysis function, +>> constructing the object in the controller code. +>> This means that should we want to re-use the analysis it wouldn't be fixed to reading +>> from a directory of CSVs. +>> We have "decoupled" the reading of the data from the statistical analysis. +>> ```python +>> def analyse_data(data_source): +>> ... +>> data = data_source.load_inflammation_data() +>> ``` +>> +>> In the controller, you might have something like: +>> +>> ```python +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> data_result = analyse_data(data_source) +>> ``` +>> Note in all these refactorings the behaviour is unchanged, +>> so we can still run our original tests to ensure we've not +>> broken anything. > {: .solution} {: .challenge} -* Introduce what an interface is -* Introduce what polymorphism is -* Explain how we can use polymorphism to introduce abstractions +## Interfaces + +Another important concept in software design is the idea of **interfaces** between different units in the code. +One kind of interface you might have come across are APIs (Application Programming Interfaces). +These allow separate systems to communicate with each other - such as a making an API request +to Google Maps to find the latitude and longitude of an address. + +However, there are internal interfaces within our software that dictate how +different units of the system interact with each other. +Even if these aren't thought out or documented, they still exist! + +For example, there is an interface for how the statistical analysis in `analyse_data` +uses the class `CSVDataSource` - the method `load_inflammation_data`, how it should be called +and what it will return. + +Interfaces are important to get right - a messy interface will force tighter coupling between +two units in the system. +Unfortunately, it would be an entire course to cover everything to consider in interface design. + +In addition to the abstract notion of an interface, many programming languages +support creating interfaces as a special kind of class. +Python doesn't support this explicitly, but we can still use this feature with +regular classes. +An interface class will define some methods, but not provide an implementation: + +```python +class Shape: + def get_area(): + raise NotImplementedError +``` > ## Exercise: Define an interface for your class -> Create an interface class that defines the methods that a data source should provide +> As discussed, there is an interface between the CSVDataSource and the analysis. +> Write an interface(that is, a class that defines some empty methods) called `InflammationDataSource` +> that makes this interface explicit. +> Document the format the data will be returned in. >> ## Solution ->> TODO: This is breaking this down into more steps that I originally though, but I think ->> this is a good idea as otherwise this exercise is very hard, here's what we're aiming for: ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/7ccda313fda3a0b10ef5add83f5be50fe1d250fd ->> At the end of this exercise, they would have the complete solution. +>> ```python +>> class InflammationDataSource: +>> """ +>> An interface for providing a series of inflammation data. +>> """ +>> +>> def load_inflammation_data(self): +>> """ +>> Loads the data and returns it as a list, where each entry corresponds to one file, +>> and each entry is a 2D array with patients inflammation by day. +>> :returns: A list where each entry is a 2D array of patient inflammation results by day +>> """ +>> raise NotImplementedError +>> ``` > {: .solution} {: .challenge} -## How polymorphism is useful +An interface on its own is not useful - it cannot be instantiated. +The next step is to create a class that **implements** the interface. +That is, create a class that inherits from the interface and then provide +implementations of all the methods on the interface. +To return to our `Shape` interface, we can write classes that implement this +interface, with different implementations: + +```python +class Circle(Shape): + ... + def get_area(self): + return math.pi * self.radius * self.radius -* Introduce the idea of using a different implementation - without changing the code +class Rectangle(Shape): + ... + def get_area(self): + return self.width * self.height +``` -> ## Exercise: Introduce an alternative implentation of DataSource +As you can see, by putting `ShapeInterface`` in brackets after the class +we are saying a `Circle` **is a** `Shape`. + +> ## Exercise: Implement the interface +> Modify the existing class to implement the interface. +> Ensure the method matches up exactly to the interface. +>> ## Solution +>> We can create a class that implements `load_inflammation_data`. +>> We can lift the code into this new class. +>> +>> ```python +>> class CSVDataSource(InflammationDataSource): +>> ``` +> {: .solution} +{: .challenge} + +## Polymorphism + +Where this gets useful is by using a concept called **polymorphism** +which is a fancy way of saying we can use an instance of a class and treat +it as a `Shape`, without worrying about whether it is a `Circle` or a `Rectangle`. + + +```python +my_circle = Circle(radius=10) +my_rectangle = Rectangle(width=5, height=3) +my_shapes = [my_circle, my_rectangle] +total_area = sum(shape.get_area() for shape in my_shapes) +``` + +This is an example of **abstraction** - when we are calculating the total +area, the method for calculating the area of each shape is abstracted away +to the relevant class. + +### How polymorphism is useful + +As we saw with the `Circle` and `Square` examples, we can use interfaces and polymorphism +to provide different implementations of the same interface. + +For example, we could replace our `CSVReader` with a class that reads a totally different format, +or reads from an external service. +All of these can be added in without changing the analysis. +Further - if we want to write a new analysis, we can support any of these data sources +for free with no further work. +That is, we have decoupled the job of loading the data from the job of analysing the data. + +> ## Exercise: Introduce an alternative implementation of DataSource > Create another class that repeatedly asks the user for paths to CSVs to analyse. -> It should inherit from the interface and implement the load_data method. +> It should inherit from the interface and implement the `load_inflammation_data` method. > Finally, at run time provide an instance of the new implementation if the user hasn't > put any files on the path. >> ## Solution ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/045754a11221a269771de8648fc56a383136fdaf ->> TODO: this is kind of hard too +>> ```python +>> class UserProvidSpecificFilesDataSource(InflammationDataSource): +>> def load_inflammation_data(self): +>> paths = [] +>> while(True): +>> input_string = input('Enter path to CSV or press enter to process paths collected: ') +>> if(len(input_string) == 0): +>> print(f'Finished entering input - will process {len(paths)} CSVs') +>> break +>> if os.path.exists(input_string): +>> paths.append(input_string) +>> else: +>> print(f'Path {input_string} does not exist, please enter a valid path') +>> +>> data = map(models.load_csv, paths) +>> return list(data) +>> ``` > {: .solution} {: .challenge} -* Explain how to test code that uses an interface +We can use this abstraction to also make testing more straight forward. +Instead of having our tests use real file system data, we can instead provide +a mock or dummy implementation of the `InflammationDataSource` that just returns some example data. +Separately, we can test the file parsing class `CSVReader` without having to understand +the specifics of the statistical analysis. + +An convenient way to do this in Python is using Mocks. +These are a whole topic to themselves - but a basic mock can be constructed using a couple of lines of code: + +```python +mock_version = Mock() +mock_version.method_to_mock.return_value = 42 +``` + +Here we construct a mock in the same way you'd construct a class. +Then we specify a method that we want to behave a specific way. + +Now whenever you call `mock_version.method_to_mock()` the return value will be `42`. + > ## Exercise: Test using a mock or dummy implemenation -> It is now possible to test your original method by providing a dummy -> implementation of the `DataProvider`. Use this to test the method +> Create a mock for the `InflammationDataSource` that returns some fixed data to test +> the `analyse_data` method. +> Use this mock in a test. >> ## Solution ->> TODO: I haven't done this - do we want it? +>> ```python +>> def test_compute_data_mock_source(): +>> from inflammation.compute_data import analyse_data +>> data_source = Mock() +>> data_source.load_inflammation_data.return_value = [[[0, 2, 0]], +>> [[0, 1, 0]]] +>> +>> result = analyse_data(data_source) +>> npt.assert_array_almost_equal(result, [0, math.sqrt(0.25) ,0]) +>> ``` > {: .solution} {: .challenge} ## Object Oriented Programming -* Polymorphism is a tool from object oriented programming -* Outline some other tools from OOP that might be useful +Using classes, particularly when using polymorphism, are techniques that come from +**object oriented programming** (frequently abbreviated to OOP). +As with functional programming different programming languages will provide features to enable you +to write object oriented programming. +For example, in Python you can create classes, and use polymorphism to call the +correct method on an instance (e.g when we called `get_area` on a shape, the appropriate `get_area` was called.) + +Object oriented programming also includes **information hiding**. +In this, certain fields might be marked private to a class, +preventing them from being modified at will. + +This can be used to maintain invariants of a class (such as insisting that a circles radius is always non-negative). + +There is also inheritance, which allows classes to specialise the behaviour of other classes by **inheriting** from +another class and **overriding** certain methods. + +As with functional programming, there are times when object oriented programming is well suited, and times where it is not. + +Good uses: + + * Representing real world objects with invariants + * Providing alternative implementations such as we did with DataSource + * Representing something that has a state that will change over the programs lifetime (such as elements of a GUI) + +One downside of OOP is ending up with very large classes that contain complex methods. +As they are methods on the class, it can be hard to know up front what side effects it causes to the class. +This can make maintenance hard. + +Grouping data together into logical structures (such as three numbers into a vector) is a vital step in writing +readable and maintainable code. +However, when using classes in this way it is best for them to be immutable (can't be changed) +It is worth noting that you can use classes to group data together - a very useful feature that you should be using everywhere + - does not you can't be practising functional programming: + +You can still have classes, and these classes might have read-only methods on (such as the `get_area` we defined for shapes) +but then still have your complex logic operate on + +Don't use features for the sake of using features. +Code should be as simple as it can be, but not any simpler. +If you know your function only makes sense to operate on circles, then +don't accept shapes just to use polymorphism! From 860fbd8170b4271a31ca096bfa9254c486715e5c Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Tue, 17 Oct 2023 17:27:30 +0100 Subject: [PATCH 010/105] First draft of the episode conclusion --- _episodes/36-yagni.md | 121 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 108 insertions(+), 13 deletions(-) diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md index bd3332b5f..fec77e860 100644 --- a/_episodes/36-yagni.md +++ b/_episodes/36-yagni.md @@ -3,41 +3,125 @@ title: "When to abstract, and when not to." teaching: 0 exercises: 0 questions: -- "How to tell what is and isn't an appropriate abstraction" +- "How to tell what is and isn't an appropriate abstraction." +- "How to design larger solutions." objectives: - "Understand how to determine correct abstractions. " - "How to design large changes to the codebase." keypoints: - "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." - "The best code is simple to understand and test, not the most clever or uses advanced language features." +- "Sketching a diagram of the code can clarify how it is supposed to work, and troubleshoot problems early." --- ## Introduction -* Talk about the bigger picture of design having seen some techniques +In this episode we have explored a range of techniques for architecting code: + + * Using pure functions assembled into pipelines to perform analysis + * Using established patterns to discuss design + * Separating different considerations, such as how data is presented from how it is stored + * Using classes to create abstractions + +None of these techniques are always applicable, and they are not sufficient to design a good technical solution. ## Architecting larger changes -* Talk about box diagrams +When creating a new application, or creating a substantial change to an existing one, +it can be really helpful to sketch out the intended architecture on a whiteboard +(pen and paper works too, though of course it might get messy as you iterate on the design!). + +The basic idea is you draw boxes that will represent different units of code, as well as +other components of the system (such as users, databases etc). +Then connect these boxes with lines where information or control will be exchanged. +These lines represent the interfaces in your system. + +As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. +For example, if there is a circular dependency between two sections of the design. +It can also help with estimating how long the work will take, as it forces you to consider all the components that +need to be made. + +Diagrams aren't foolproof, and often the stuff we haven't considered won't make it on to the diagram +but they are a great starting point to break down the different responsibilities and think about +the kinds of information different parts of the system will need. + > ## Exercise: Design a high-level architecture -> Consider implementing a new feature -> TODO: suggest a more complex feature -> Using boxes and lines sketch out an architecture for the code. -> Discuss with your team +> Sketch out a design for a new feature requested by a user +> +> *"I want there to be a Google Drive folder that when I upload new inflammation data to +> the software automatically pulls it down and updates the analysis. +> The new result should be added to a database with a timestamp. +> An email should then be sent to a group email notifying them of the change."* +> +> TODO: this doesn't generate a very interesting diagram +> >> ## Solution ->> An example design for the hypothetical problem. +>> An example design for the hypothetical problem. (TODO: incomplete) +>> ```mermaid +graph TD + A[(GDrive Folder)] + B[(Database)] + C[GDrive Monitor] + C -- Checks periodically--> A + D[Download inflammation data] + C -- Trigger update --> D + E[Parse inflammation data] + D --> E + F[Perform analysis] + E --> F + G[Upload analysis] + F --> G + G --> B + H[Notify users] +>> ``` > {: .solution} {: .challenge} ## An abstraction too far -* Drawbacks of abstraction -* Example showing too complex abstractions +So far we have seen how abstractions are good for making code easier to read, maintain and test. +However, it is possible to introduce too many abstractions. + +> All problems in computer science can be solved by another level of indirection except the problem of too many levels of indirection + +When you introduce an abstraction, if the reader of the code needs to understand what is happening inside the abstraction, +it has actually made the code *harder* to read. +When code is just in the function, it can be clear to see what it is doing. +When the code is calling out to an instance of a class that, thanks to polymorphism, could be a range of possible implementations, +the only way to find out what is *actually* being called is to run the code and see. +This is much slower to understand, and actually obfuscates meaning. + +It is a judgement as to whether you have make the code too abstract. +If you have to jump around a lot when reading the code that is a clue that is too abstract. +Similarly, if there are two parts of the code that always need updating together, that is +again an indication of an incorrect or over-zealous abstraction. + ## You Ain't Gonna Need It -* Introduce and explain YAGNI principle +There are different approaches to designing software. +One principle that is popular is called You Ain't Gonna Need it - "YAGNI" for short. +The idea is that, since it is hard to predict the future needs of a piece of software, +it is always best to design the simplest solution that solves the problem at hand. +This is opposed to trying to imagine how you might want to adapt the software in future +and designing the code with that in mind. + +Then, since you know the problem you are trying to solve, you can avoid making your solution unnecessarily complex or abstracted. + +In our example, it might be tempting to abstract how the `CSVDataSource` walks the file tree into a class. +However, since we only have one strategy for exploring the file tree, this would just create indirection for the sake of it +- now a reader of CSVDataSource would have to read a different class to find out how the tree is walked. +Maybe in the future this is something that needs to be customised, but we haven't really made it any harder to do by *not* doing this prematurely +and once we have the concrete feature request, it will be easier to design it appropriately. + +> All of this is a judgement. +> For example, in this case, perhaps it *would* make sense to at least pull the file parsing out into a separate +> class, but not have the CSVDataSource be configurable. +> That way, it is clear to see how the file tree is being walked (there's no polymorphism going on) +> without mixing the *parsing* code in with the file finding code. +> There are no right answers, just guidelines. +{: .callout} > ## Exercise: Applying to real world examples > Thinking about the examples of good and bad code you identified at the start of the episode. @@ -48,5 +132,16 @@ keypoints: ## Conclusion -* Take care to think about software with the appropriate priorities and things will get better. -* Tips for getting better at architecture +Good architecture is not about applying any rules blindly, but instead practise and taking care around important things: + +* Avoid duplication of code or data. +* Keeping how much a person has to understand at once to a minimum. +* Think about how interfaces will work. +* Separate different considerations into different sections of the code. +* Don't try and design a future proof solution, focus on the problem at hand. + +Practise makes perfect. +One way to practise is to consider code that you already have and think how it might be redesigned. +Another way is to always try to leave code in a better state that you found it. +So when you're working on a less well structured part of the code, start by refactoring it so that your change fits in cleanly. +Doing this, over time, with your colleagues, will improve your skills as software architecture as well as improving the code. From fcd6aa1e9d45ad31432653ed17a0d39e548661b0 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 11:57:07 +0100 Subject: [PATCH 011/105] Update introduction to not mention paradigms --- _episodes/30-section3-intro.md | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index 2bc022d39..4bd5bb742 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -131,15 +131,10 @@ within the context of the typical software development process: - How requirements inform and drive the **design of software**, the importance, role, and examples of **software architecture**, and the ways we can describe a software design. -- **Implementation choices** in terms of **programming paradigms**, - looking at **procedural**, **functional**, and **object oriented** paradigms of development. - Modern software will often contain instances of multiple paradigms, - so it is worthwhile being familiar with them and knowing when - to switch in order to make better code. -- How you can (and should) assess and update a software's architecture when - requirements change and complexity increases - - is the architecture still fit for purpose, - or are modifications and extensions becoming increasingly difficult to make? +- How to improve existing code to be more readable, maintainable and testable. +- Consider different strategies for writing well designed code, including + using **pure functions**, **classes** and **abstractions**. +- How to create, asses and improve software design. {% include links.md %} From 0d06342cd24826ba3cb64181bec17116a6f4581c Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 11:59:40 +0100 Subject: [PATCH 012/105] Move exercise about good and bad code before abstractions This relates more to the descriptions of good code, so we might as well have this discussion before introducing new concepts --- _episodes/32-software-design.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index c87491c90..63b81d730 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -41,6 +41,12 @@ Writing code that meets these requirements is hard and takes practise. Further, in most contexts you will already have a piece of code that breaks some (or maybe all!) of these principles. +> ## Group Exercise: Think about examples of good and bad code +> Try to come up with examples of code that has been hard to understand - why? +> +> Try to come up with examples of code that was easy to understand and modify - why? +{: .challenge} + In this episode we will explore techniques and processes that can help you continuously improve the quality of code so, over time, it tends towards more maintainable code. @@ -82,12 +88,6 @@ In this episode we are going to look at some code and introduce various different kinds of abstraction. However, fundamentally any abstraction should be serving these goals. -> ## Group Exercise: Think about examples of good and bad code -> Try to come up with examples of code that has been hard to understand - why? -> -> Try to come up with examples of code that was easy to understand and modify - why? -{: .challenge} - ## Refactoring Often we are not working on brand new projects, but instead maintaining an existing From 97f8590ae83443aae447f2938ede772903885799 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 12:11:40 +0100 Subject: [PATCH 013/105] Highlight where the code we are refactoring is --- _episodes/32-software-design.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 63b81d730..80a4b23db 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -119,6 +119,8 @@ In it, if the user adds `--full-data-analysis` then the program will scan the di of one of the provided files, compare standard deviations across the data by day and plot a graph. +The main body of it exists in `inflammation/compute_data.py` in a function called `analyse_data`. + We are going to be refactoring and extending this over the remainder of this episode. > ## Group Exercise: What is bad about this code? From 78943f8e68e80b76bc6e413ca44854f587576b91 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 12:12:24 +0100 Subject: [PATCH 014/105] Expand the solution of the find problems with the code exercise The section ends with revisiting this list, so explicitly request people keep hold of it. Add some glue text to make the list flow better --- _episodes/32-software-design.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 80a4b23db..2c20e4b74 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -129,12 +129,18 @@ We are going to be refactoring and extending this over the remainder of this epi > Think about the kinds of changes you might want to make to it, and what would > make making those changes challenging. >> ## Solution +>> You may have found others, but here are some of the things that make the code +>> hard to read, test and maintain: +>> >> * Everything is in a single function - reading it you have to understand how the file loading works at the same time as the analysis itself. >> * If I want to use the data without using the graph I'd have to change it >> * It is always analysing a fixed set of data >> * It seems hard to write tests for it as it always analyses a fixed set of files >> * It doesn't have any tests +>> +>> Keep the list you created - at the end of this section we will revisit this +>> and check that we have learnt ways to address the problems we found. > {: .solution} {: .challenge} From 77ffa14db8d8ded6357fef801e8014dbf5278ca4 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 12:15:28 +0100 Subject: [PATCH 015/105] Make the section explaining the tests clearer --- _episodes/33-refactoring-functions.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 1ec91a617..19aa456c8 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -46,19 +46,19 @@ The tricks to get around this trap are: The best tests are ones that test single bits of code rigorously. However, with this code it isn't possible to do that. + Instead we will make minimal changes to the code to make it a bit testable, for example returning the data instead of visualising it. -We will also simply observe what the outcome is, rather than trying to -test the outcome is correct. -If the behaviour is currently broken, then we don't want to inadvertently fix it. + +We will make the asserts verify whatever the outcome is currently, +rather than worrying whether that is correct. +These tests are to verify the behaviour doesn't *change* rather than to check the current behaviour is correct. As with everything in this episode, there isn't a hard and fast rule. Refactoring doesn't change behaviour, but sometimes to make it possible to verify you're not changing the important behaviour you have to make some small tweaks to write the tests at all. -* Explain techniques for writing tests for hard to test, existing code - > ## Exercise: Write regression tests before refactoring > Write a regression test to verify we don't break the code when refactoring >> ## Solution From 7f9b163e98d5cf0c29ea48db8ea09bd7b5984822 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 13:29:58 +0100 Subject: [PATCH 016/105] Add guidance to the regression test exericse --- _episodes/33-refactoring-functions.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 19aa456c8..96e230b83 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -60,7 +60,18 @@ you're not changing the important behaviour you have to make some small tweaks t the tests at all. > ## Exercise: Write regression tests before refactoring -> Write a regression test to verify we don't break the code when refactoring +> Write a regression test to verify we don't break the code when refactoring. +> You will need to modify `analyse_data` to not create a graph and instead +> return the data. +> +> Don't forget you can use the `numpy.testing` function `assert_array_equal` to +> compare arrays of floating point numbers. +> +>> ## Hint +>> You might find it helpful to assert the result, observe the test failing +>> and copy and paste the correct result into the test. +> {: .solution} +> >> ## Solution >> One approach we can take is to: >> * comment out the visualize (as this will cause our test to hang) @@ -85,13 +96,12 @@ the tests at all. >> npt.assert_array_almost_equal(result, expected_output) >> ``` >> ->> This isn't a good test: +>> Note - this isn't a good test: >> * It isn't at all obvious why these numbers are correct. >> * It doesn't test edge cases. >> * If the files change, the test will start failing. >> >> However, it allows us to guarantee we don't accidentally change the analysis output. ->> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/5000b6122e576d91c2acbc437184e00893483fdd > {: .solution} {: .challenge} From 253efda9d99f31ed73f064ad6202cccb6b368440 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 13:40:30 +0100 Subject: [PATCH 017/105] Define regression testing before using it as exercise name --- _episodes/33-refactoring-functions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 96e230b83..c801561d1 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -53,6 +53,8 @@ for example returning the data instead of visualising it. We will make the asserts verify whatever the outcome is currently, rather than worrying whether that is correct. These tests are to verify the behaviour doesn't *change* rather than to check the current behaviour is correct. +This kind of testing is called **regression testing** as we are testing for +regressions in existing behaviour. As with everything in this episode, there isn't a hard and fast rule. Refactoring doesn't change behaviour, but sometimes to make it possible to verify From 81b414fb73a7ad9a15c099d3b2dabcaf874aeda0 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 13:57:47 +0100 Subject: [PATCH 018/105] Add paragraph introducing cognitive load --- _episodes/32-software-design.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 2c20e4b74..7128ac450 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -57,6 +57,23 @@ We will look at: * How to take code that is in a bad shape and improve it. * Best practises to write code in ways that facilitate achieving these goals. +### Cognitive Load + +When we are trying to understand a piece of code, in our heads we are storing +what the different variables mean and what the lines of code will do. +**Cognitive load** is a way of thinking about how much information we have to store in our +heads to understand a piece of code. + +The higher the cognitive load, the harder it is to understand the code. +If it is too high, we might have to create diagrams to help us hold it all in our head +or we might just decide we can't understand it. + +There are lots of ways to keep cognitive load down: + +* Good variable and function names +* Simple control flow +* Having each function do just one thing + ## Abstractions An **abstraction**, at its most basic level, is a technique to hide the details @@ -78,8 +95,12 @@ Instead, you just need to understand how variables work in Python. In large projects it is vital to come up with good abstractions. A good abstraction makes code easier to read, as the reader doesn't need to understand all the details of the project to understand one part. +An abstraction lowers the cognitive load of a bit of code, +as there is less to understand at once. + A good abstraction makes code easier to test, as it can be tested in isolation from everything else. + Finally, a good abstraction makes code easier to adapt, as the details of how a subsystem *used* to work are hidden from the user, so when they change, the user doesn't need to know. From a2c5f2e6ec1cc8019b9eb3b054008d22c1dbffd7 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:11:39 +0100 Subject: [PATCH 019/105] Fix type in introduction to pure functions --- _episodes/33-refactoring-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index c801561d1..3d063881e 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -113,7 +113,7 @@ A **pure function** is a function that works like a mathematical function. That is, it takes in some inputs as parameters, and it produces an output. That output should always be the same for the same input. That is, it does not depend on any information not present in the inputs (such as global variables, databases, the time of day etc.) -Further, it should not cause any **side effects" such as writing to a file or changing a global variable. +Further, it should not cause any **side effects**, such as writing to a file or changing a global variable. You should try and have as much of the complex, analytical and mathematical code in pure functions. From 6e698236b6ec731b991083ecb4dfd5c0797e0610 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:14:14 +0100 Subject: [PATCH 020/105] Add a bit about congitive load in advantages of pure functions --- _episodes/33-refactoring-functions.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 3d063881e..4c3a77027 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -117,14 +117,21 @@ Further, it should not cause any **side effects**, such as writing to a file or You should try and have as much of the complex, analytical and mathematical code in pure functions. -Maybe something about cognitive load here? And maybe drop other two advantages til later. +By eliminating dependency on external things such as global state, we +reduce the cognitive load to understand the function. +The reader only needs to concern themselves with the input +parameters of the function and the code itself, rather than +the overall context the function is operating in. + +Similarly, a function that *calls* a pure function is also easier +to understand. +Since the function won't have any side effects, the reader needs to +only understand what the function returns, which will probably +be clear from the context in which the function is called. Pure functions have a number of advantages: * They are easy to test: you feed in inputs and get fixed outputs -* They are easy to understand: when you are reading them you have all - the information they depend on, you don't need to know what is likely to be in - a database, or what the state of a global variable is likely to be. * They are easy to re-use: because they always behave the same, you can always use them Some parts of a program are inevitably impure. From 0006a8610efcaeda272de0dee96e51c337f4d627 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:16:30 +0100 Subject: [PATCH 021/105] Explain that pure functions are easier to test ine one place --- _episodes/33-refactoring-functions.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 4c3a77027..2239190c2 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -131,7 +131,6 @@ be clear from the context in which the function is called. Pure functions have a number of advantages: -* They are easy to test: you feed in inputs and get fixed outputs * They are easy to re-use: because they always behave the same, you can always use them Some parts of a program are inevitably impure. @@ -185,10 +184,16 @@ Now we have a pure function for the analysis, we can write tests that cover all the things we would like tests to cover without depending on the data existing in CSVs. -This will make tests easier to write, but it will also make them easier to read. -The reader will not have to open up a CSV file to understand why the test is correct. +This is another advantage of pure functions - they are very well suited to automated testing. -It will also make the tests easier to maintain. +They are **easier to write** - +we construct input and assert the output +without having to think about making sure the global state is correct before or after. + +Perhaps more important, they are **easier to read** - +the reader will not have to open up a CSV file to understand why the test is correct. + +It will also make the tests **easier to maintain**. If at some point the data format is changed from CSV to JSON, the bulk of the tests won't need to be updated. From b0f48e987daa1eeed7f7e73729e820cf0172f055 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:29:09 +0100 Subject: [PATCH 022/105] Incorporate the point about reuse pure functions into main text --- _episodes/33-refactoring-functions.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 2239190c2..086e70fcb 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -129,9 +129,11 @@ Since the function won't have any side effects, the reader needs to only understand what the function returns, which will probably be clear from the context in which the function is called. -Pure functions have a number of advantages: - -* They are easy to re-use: because they always behave the same, you can always use them +This property also makes them easier to re-use as the caller +only needs to understand what parameters to provide, rather +than anything else that might need to be configured +or side effects for calling it at a time that is different +to when the original author intended. Some parts of a program are inevitably impure. Programs need to read input from the user, or write to a database. From a653eeb72b48e169a2bff6714f82f84581e58918 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:29:26 +0100 Subject: [PATCH 023/105] Highlight that the glue code is the non-pure code --- _episodes/33-refactoring-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 086e70fcb..cb9cf265c 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -137,7 +137,7 @@ to when the original author intended. Some parts of a program are inevitably impure. Programs need to read input from the user, or write to a database. -Well designed programs separate complex logic from the necessary "glue" code that interacts with users and systems. +Well designed programs separate complex logic from the necessary impure "glue" code that interacts with users and systems. This way, you have easy-to-test, easy-to-read code that contains the complex logic. And you have really simple code that just reads data from a file, or gathers user input etc, that is maybe harder to test, but is so simple that it only needs a handful of tests anyway. From cd19c7f324c5278f9684298a2dca4f06f07c19ac Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:36:07 +0100 Subject: [PATCH 024/105] USe model view presenter as alternative architecture Is more common and essentially the same as MVVM. --- _episodes/34-refactoring-architecture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index afe3398c1..ea4fde03f 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -46,7 +46,7 @@ The key thing to take away from MVC is the distinction between model code and vi > this distinction may not be possible (the code that specifies there is a button on the screen, > might be the same code that specifies what that button does). In fact, the original proposer > of MVC groups the views and the controller into a single element, called the tool. Other modern -> architectures like Model-ViewModel-View do away with the controller and instead separate out the +> architectures like Model-View-Presenter do away with the controller and instead separate out the > layout code from a programmable view of the UI. {: .callout} From 5358305dbdd762c1204f9353ab62295e55fadc06 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:36:26 +0100 Subject: [PATCH 025/105] Add header to call out about the controller Makes the callout formatting work better --- _episodes/34-refactoring-architecture.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index ea4fde03f..3bda35284 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -41,6 +41,7 @@ are easily isolated from the more complex logic. The key thing to take away from MVC is the distinction between model code and view code. +> ## What about the controller > The view and the controller tend to be more tightly coupled and it isn't always sensible > to draw a thick line dividing these two. Depending on how the user interacts with the software > this distinction may not be possible (the code that specifies there is a button on the screen, From b0d520d2090673f43dba21222ff6f459f3eb5e0f Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:43:03 +0100 Subject: [PATCH 026/105] Provide an example for how the model should be agnostic about the view --- _episodes/34-refactoring-architecture.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 3bda35284..713276f82 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -54,8 +54,9 @@ The key thing to take away from MVC is the distinction between model code and vi The view code might be hard to test, or use libraries to draw the UI, but should not contain any complex logic, and is really just a presentation layer on top of the model. -The model, conversely, should operate quite agonistically of how a specific tool might interact with it. -For example, perhaps there currently is no way +The model, conversely, should not really care how the data is displayed. +For example, perhaps the UI always presents dates as "Monday 24th July 2023", but the model +would still store this using a `Date` rather than just that string. > ## Exercise: Identify model and view parts of the code > Looking at the code as it is, what parts should be considered "model" code From 4f727f0cbfd2cd2b56bdfa96d09c4139570fb159 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:43:17 +0100 Subject: [PATCH 027/105] Improve formatting of model/view classification exercise --- _episodes/34-refactoring-architecture.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 713276f82..0ba0c842d 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -62,10 +62,10 @@ would still store this using a `Date` rather than just that string. > Looking at the code as it is, what parts should be considered "model" code > and what parts should be considered "view" code? >> ## Solution ->> * The computation of the standard deviation is model code ->> * Reading the data is also model code. ->> * The display of the output as a graph is the view code. ->> * The controller is the logic that processes what flags the user has provided. +>> * The computation of the standard deviation is **model** code +>> * Reading the data from the CSV is also **model** code. +>> * The display of the output as a graph is the **view** code. +>> * The logic that processes the supplied flats is the **controller**. > {: .solution} {: .challenge} From 13b7df26d749efdbf43f3fe4b704024271f8329f Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:58:22 +0100 Subject: [PATCH 028/105] Emphasise the connection to the last episode --- _episodes/34-refactoring-architecture.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 0ba0c842d..3ed4c120b 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -70,8 +70,8 @@ would still store this using a `Date` rather than just that string. {: .challenge} Within the model there is further separation that makes sense. -For example, as discussed, separating out the code that interacts with file systems from -the calculations is sensible. +For example, as we did in the last episode, separating out the impure code that interacts with file systems from +the pure calculations is helps with readability and testability. Nevertheless, the MVC approach is a great starting point when thinking about how you should structure your code. > ## Exercise: Split out the model code from the view code From a541c889a5caf9025fc8384af081c5093082a6f7 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 14:58:45 +0100 Subject: [PATCH 029/105] Improve clarity of first exercise in the MVC section --- _episodes/34-refactoring-architecture.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 3ed4c120b..525f4b0ed 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -59,8 +59,12 @@ For example, perhaps the UI always presents dates as "Monday 24th July 2023", bu would still store this using a `Date` rather than just that string. > ## Exercise: Identify model and view parts of the code -> Looking at the code as it is, what parts should be considered "model" code -> and what parts should be considered "view" code? +> Looking at the code inside `compute_data.py`, +> +> * What parts should be considered **model** code +> * What parts should be considered **view** code? +> * What parts should be considered **controller** code? +> >> ## Solution >> * The computation of the standard deviation is **model** code >> * Reading the data from the CSV is also **model** code. From e8375d4cbda57fbd830a5e58cda1d1bb49d55a33 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:05:24 +0100 Subject: [PATCH 030/105] Improve readability of the second exercise from the MVC section --- _episodes/34-refactoring-architecture.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 525f4b0ed..cc1392d38 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -79,8 +79,9 @@ the pure calculations is helps with readability and testability. Nevertheless, the MVC approach is a great starting point when thinking about how you should structure your code. > ## Exercise: Split out the model code from the view code -> Refactor the code to have the model code separated from -> the view code. +> Refactor `analyse_data` such the *view* code we identified in the last +> exercise is removed from the function, so the function contains only +> *model* code, and the *view* code is moved elsewhere. >> ## Solution >> The idea here is to have `analyse_data` to not have any "view" considerations. >> That is, it should just compute and return the data. @@ -110,7 +111,10 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> views.visualize(graph_data) >> return >> ``` ->> See commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/97fd04b747a6491c2590f34384eed44e83a8e73c +>> You might notice this is more-or-less the change we did to write our +>> regression test. +>> This demonstrates that splitting up model code from view code can +>> immediately make your code much more testable. > {: .solution} {: .challenge} From 17f03957dd0787ee0cb3b4991bb8c104acf88c41 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:05:34 +0100 Subject: [PATCH 031/105] Tightening up concluding paragraph --- _episodes/34-refactoring-architecture.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index cc1392d38..2f1e3d473 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -120,8 +120,8 @@ Nevertheless, the MVC approach is a great starting point when thinking about how ## Programming patterns -MVC is a **programming pattern**, which is a template for structuring code. -Patterns are useful starting point for how to design your software. +MVC is a **programming pattern**. Programming patterns are templates for structuring code. +Patterns are a useful starting point for how to design your software. They also work as a common vocabulary for discussing software designs with other developers. From 9c8ec84fb337239af5ccde9a29301b1e387cec38 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:53:01 +0100 Subject: [PATCH 032/105] Fix semantic break in section about constructors --- _episodes/35-refactoring-decoupled-units.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 3aae044d4..2cf2710f1 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -84,8 +84,8 @@ You can then **construct** a class elsewhere in your code by doing the following my_class = MyClass() ``` -When you construct a class in this ways, its **construtor** is called. It is possible -to pass in values to the constructor that configure the class: +When you construct a class in this ways, the classes **construtor** is called. +It is possible to pass in values to the constructor that configure the class: ```python class Circle: From 19f07b341385d3842c7769c0e18067a6672800fe Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:53:18 +0100 Subject: [PATCH 033/105] Correct code sample to use write capitalisation for math.pi --- _episodes/35-refactoring-decoupled-units.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 2cf2710f1..6a9c96b7d 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -107,10 +107,12 @@ Classes can also have methods defined on them. Like constructors, they have an special `self` parameter that must come first. ```python +import math + class Circle: ... def get_area(self): - return Math.PI * self.radius * self.radius + return math.pi * self.radius * self.radius ... print(my_circle.get_area()) ``` From 7f6023087be77810c84edd4881fa2c69c5fba623 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:53:33 +0100 Subject: [PATCH 034/105] Add examples for invariants and encapsulation --- _episodes/35-refactoring-decoupled-units.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 6a9c96b7d..846279c92 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -124,8 +124,9 @@ Then the method can access the **member variable** `radius`. Classes have a number of uses. * Encapsulating data - such as grouping three numbers together into a Vector class -* Maintaining invariants - TODO an example here would be good -* Encapsulating behaviour - such as a class that csha +* Maintaining invariants - perhaps when storing a file path it only makes sense for that to resolve to a valid file - by storing the string in a class with a method for setting it (a **setter**), that method can validate the new value before updating the value. +* Encapsulating behaviour - such as a class representing a UI state, modifying some value will automatically + force the relevant portion of the UI to be updated. > ## Exercise: Use a class to configure loading > Put your function as a member method of a class, separating out the configuration From 66a904c1c1dfb03db68428257f3a60756409b247 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 15:54:01 +0100 Subject: [PATCH 035/105] Add a callout about why maintaining invariants is good This was too much text to include in the bullet point about using classes to maintain invariants, but might be useful context. --- _episodes/35-refactoring-decoupled-units.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 846279c92..cc956fa4c 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -128,6 +128,15 @@ Classes have a number of uses. * Encapsulating behaviour - such as a class representing a UI state, modifying some value will automatically force the relevant portion of the UI to be updated. +> ## Maintaining Invariants +> Maintaining invariants can be a really powerful tool in debugging. +> Without invariants, you can find bugs where some data is in an invalid +> state, but the problem only appears when you try to use the data. +> This makes it hard to track down the cause of the bug. +> By using classes to maintain invariants, you can force the issue +> to appear when the invalid data is set, that is, the source of the bug. +{: .callout} + > ## Exercise: Use a class to configure loading > Put your function as a member method of a class, separating out the configuration > of where to load the files from in the constructor, from where it actually loads the data. From 0f68b3ea3b444b73dc4430d0597d698d6dcc47e8 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:14:56 +0100 Subject: [PATCH 036/105] Improve the class loading exercise content --- _episodes/35-refactoring-decoupled-units.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index cc956fa4c..bdbe87742 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -138,11 +138,14 @@ Classes have a number of uses. {: .callout} > ## Exercise: Use a class to configure loading -> Put your function as a member method of a class, separating out the configuration -> of where to load the files from in the constructor, from where it actually loads the data. +> Put the `load_inflammation_data` function we wrote in the last exercise as a member method +> of a new class called `CSVDataSource`. +> Put the configuration of where to load the files in the classes constructor. > Once this is done, you can construct this class outside the the statistical analysis -> and pass it in. +> and pass the instance in to `analyse_data`. >> ## Solution +>> You should have created a class that looks something like this: +>> >> ```python >> class CSVDataSource: >> """ @@ -150,7 +153,6 @@ Classes have a number of uses. >> """ >> def __init__(self, dir_path): >> self.dir_path = dir_path ->> super().__init__() >> >> def load_inflammation_data(self): >> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.csv')) @@ -159,8 +161,7 @@ Classes have a number of uses. >> data = map(models.load_csv, data_file_paths) >> return list(data) >> ``` ->> We can now pass an instance of this class into the the statistical analysis function, ->> constructing the object in the controller code. +>> We can now pass an instance of this class into the the statistical analysis function. >> This means that should we want to re-use the analysis it wouldn't be fixed to reading >> from a directory of CSVs. >> We have "decoupled" the reading of the data from the statistical analysis. From 0c8817a91dff6b501f7bbbfb2cb06c931447d5a3 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:39:05 +0100 Subject: [PATCH 037/105] Add the controller modifications to the solution --- _episodes/35-refactoring-decoupled-units.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index bdbe87742..76c93ee66 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -308,6 +308,7 @@ That is, we have decoupled the job of loading the data from the job of analysing > Finally, at run time provide an instance of the new implementation if the user hasn't > put any files on the path. >> ## Solution +>> You should have created a class that looks something like: >> ```python >> class UserProvidSpecificFilesDataSource(InflammationDataSource): >> def load_inflammation_data(self): @@ -325,6 +326,17 @@ That is, we have decoupled the job of loading the data from the job of analysing >> data = map(models.load_csv, paths) >> return list(data) >> ``` +>> Additionally, in the controller will need to select the appropriate DataSource to +>> provide to the analysis: +>>```python +>> if len(InFiles) == 0: +>> data_source = UserProvidSpecificFilesDataSource() +>> else: +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> data_result = analyse_data(data_source) +>>``` +>> As you have seen, all these changes were made without modifying +>> the analysis code itself. > {: .solution} {: .challenge} From 79244621a6abd0a8be3f01c6506c6e125a0107c9 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:57:49 +0100 Subject: [PATCH 038/105] Fix spelling type in exercise title --- _episodes/35-refactoring-decoupled-units.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 76c93ee66..b059c9910 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -360,7 +360,7 @@ Then we specify a method that we want to behave a specific way. Now whenever you call `mock_version.method_to_mock()` the return value will be `42`. -> ## Exercise: Test using a mock or dummy implemenation +> ## Exercise: Test using a mock or dummy implementation > Create a mock for the `InflammationDataSource` that returns some fixed data to test > the `analyse_data` method. > Use this mock in a test. From 13bab516341dcd52a75e9056ead6c33169b9881f Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:58:11 +0100 Subject: [PATCH 039/105] Small fixes to flow of text in oop section --- _episodes/35-refactoring-decoupled-units.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index b059c9910..4e75ef20a 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -383,9 +383,9 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` Using classes, particularly when using polymorphism, are techniques that come from **object oriented programming** (frequently abbreviated to OOP). As with functional programming different programming languages will provide features to enable you -to write object oriented programming. +to write object oriented code. For example, in Python you can create classes, and use polymorphism to call the -correct method on an instance (e.g when we called `get_area` on a shape, the appropriate `get_area` was called.) +correct method on an instance (e.g when we called `get_area` on a shape, the appropriate `get_area` was called). Object oriented programming also includes **information hiding**. In this, certain fields might be marked private to a class, @@ -393,10 +393,12 @@ preventing them from being modified at will. This can be used to maintain invariants of a class (such as insisting that a circles radius is always non-negative). -There is also inheritance, which allows classes to specialise the behaviour of other classes by **inheriting** from +There is also inheritance, which allows classes to specialise +the behaviour of other classes by **inheriting** from another class and **overriding** certain methods. -As with functional programming, there are times when object oriented programming is well suited, and times where it is not. +As with functional programming, there are times when +object oriented programming is well suited, and times where it is not. Good uses: From 3f3ecd38088f5084c900d7c369ec3080bb0fd361 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 16:58:32 +0100 Subject: [PATCH 040/105] Make the using classes in functional programming a callout --- _episodes/35-refactoring-decoupled-units.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 4e75ef20a..f4dc532b6 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -410,14 +410,17 @@ One downside of OOP is ending up with very large classes that contain complex me As they are methods on the class, it can be hard to know up front what side effects it causes to the class. This can make maintenance hard. -Grouping data together into logical structures (such as three numbers into a vector) is a vital step in writing -readable and maintainable code. -However, when using classes in this way it is best for them to be immutable (can't be changed) -It is worth noting that you can use classes to group data together - a very useful feature that you should be using everywhere - - does not you can't be practising functional programming: - -You can still have classes, and these classes might have read-only methods on (such as the `get_area` we defined for shapes) -but then still have your complex logic operate on +> ## Classes and functional programming +> Using classes is compatible with functional programming. +> In fact, grouping data into logical structures (such as three numbers into a vector) +> is a vital step in writing readable and maintainable code with any approach. +> However, when writing in a functional style, classes should be immutable. +> That is, the methods they provide are read-only. +> If you require the class to be different, you'd create a new instance +> with the new values. +> (that is, the functions should not modify the state of the class). +{: .callout} + Don't use features for the sake of using features. Code should be as simple as it can be, but not any simpler. From bd41da7b9aadcdcb5fd93276abb3326d62090803 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 17:16:32 +0100 Subject: [PATCH 041/105] Fixed incorrect usage of episode --- _episodes/36-yagni.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md index fec77e860..733492667 100644 --- a/_episodes/36-yagni.md +++ b/_episodes/36-yagni.md @@ -16,7 +16,7 @@ keypoints: ## Introduction -In this episode we have explored a range of techniques for architecting code: +In this section we have explored a range of techniques for architecting code: * Using pure functions assembled into pipelines to perform analysis * Using established patterns to discuss design From 2bea011c53a203034aa02b102e9eed5be889cc29 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 17:17:01 +0100 Subject: [PATCH 042/105] Add diagram for solution to the architecture exercise --- _episodes/36-yagni.md | 23 ++------------------ fig/example-architecture-daigram.mermaid.txt | 18 +++++++++++++++ fig/example-architecture-diagram.svg | 1 + 3 files changed, 21 insertions(+), 21 deletions(-) create mode 100644 fig/example-architecture-daigram.mermaid.txt create mode 100644 fig/example-architecture-diagram.svg diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md index 733492667..caecc596f 100644 --- a/_episodes/36-yagni.md +++ b/_episodes/36-yagni.md @@ -53,28 +53,9 @@ the kinds of information different parts of the system will need. > the software automatically pulls it down and updates the analysis. > The new result should be added to a database with a timestamp. > An email should then be sent to a group email notifying them of the change."* -> -> TODO: this doesn't generate a very interesting diagram -> >> ## Solution ->> An example design for the hypothetical problem. (TODO: incomplete) ->> ```mermaid -graph TD - A[(GDrive Folder)] - B[(Database)] - C[GDrive Monitor] - C -- Checks periodically--> A - D[Download inflammation data] - C -- Trigger update --> D - E[Parse inflammation data] - D --> E - F[Perform analysis] - E --> F - G[Upload analysis] - F --> G - G --> B - H[Notify users] ->> ``` +>> +>> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg) > {: .solution} {: .challenge} diff --git a/fig/example-architecture-daigram.mermaid.txt b/fig/example-architecture-daigram.mermaid.txt new file mode 100644 index 000000000..c3ab99112 --- /dev/null +++ b/fig/example-architecture-daigram.mermaid.txt @@ -0,0 +1,18 @@ +graph TD + A[(GDrive Folder)] + B[(Database)] + C[GDrive Monitor] + C -- Checks periodically--> A + D[Download inflammation data] + C -- Trigger update --> D + E[Parse inflammation data] + D --> E + F[Perform analysis] + E --> F + G[Upload analysis] + F --> G + G --> B + H[Notify users] + I[Monitor database] + I -- Check periodically --> B + I --> H diff --git a/fig/example-architecture-diagram.svg b/fig/example-architecture-diagram.svg new file mode 100644 index 000000000..02a7ecceb --- /dev/null +++ b/fig/example-architecture-diagram.svg @@ -0,0 +1 @@ +
Checks periodically
Trigger update
Check periodically
GDrive Folder
Database
GDrive Monitor
Download inflammation data
Parse inflammation data
Perform analysis
Upload analysis
Notify users
Monitor database
\ No newline at end of file From 10aa3fd8825a50d2c606e7a997bd8e782152f541 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 17:25:09 +0100 Subject: [PATCH 043/105] Remove redundant see this commit text The solution now contains the code --- _episodes/33-refactoring-functions.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index cb9cf265c..8d4cd01d7 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -178,7 +178,6 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> # views.visualize(graph_data) >> return daily_standard_deviation >>``` ->> * See this commit: https://github.com/thomaskileyukaea/python-intermediate-inflammation/commit/4899b35aed854bdd67ef61cba6e50b3eeada0334 > {: .solution} {: .challenge} From 33620670f144d4b11dfb8b0fe678777424d8d721 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Wed, 18 Oct 2023 17:27:15 +0100 Subject: [PATCH 044/105] Correct broken links in extras That said, the persistence one might depended on the code written in the original version of section 3. Need to decide what to do about that --- _extras/databases.md | 2 +- _extras/persistence.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_extras/databases.md b/_extras/databases.md index b4bc67a65..2be010dbe 100644 --- a/_extras/databases.md +++ b/_extras/databases.md @@ -16,7 +16,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../36-architecture-revisited/index.html#additional-material). +> [Section 3 on software design and development](../36-yagni/index.html). {: .callout} A **database** is an organised collection of data, diff --git a/_extras/persistence.md b/_extras/persistence.md index ab0379062..6fa8fe449 100644 --- a/_extras/persistence.md +++ b/_extras/persistence.md @@ -25,7 +25,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../36-architecture-revisited/index.html#additional-material). +> [Section 3 on software design and development](../36-yagni/index.html). {: .callout} Our patient data system so far can read in some data, process it, and display it to people. From f766d7024a809afc8058eedeb5004add894dbc6e Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:44:58 +0100 Subject: [PATCH 045/105] Add initial timings for episodes --- _episodes/32-software-design.md | 4 ++-- _episodes/33-refactoring-functions.md | 4 ++-- _episodes/34-refactoring-architecture.md | 4 ++-- _episodes/35-refactoring-decoupled-units.md | 4 ++-- _episodes/36-yagni.md | 4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 7128ac450..3b6338758 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -1,7 +1,7 @@ --- title: "Software Architecture and Design" -teaching: 0 -exercises: 0 +teaching: 25 +exercises: 20 questions: - "What should we consider when designing software?" - "What goals should we have when structuring our code?" diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 8d4cd01d7..6e9f317e7 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -1,7 +1,7 @@ --- title: "Refactoring functions to do just one thing" -teaching: 0 -exercises: 0 +teaching: 30 +exercises: 20 questions: - "How do you refactor code without breaking it?" - "How do you write code that is easy to test?" diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/34-refactoring-architecture.md index 2f1e3d473..a57c8541f 100644 --- a/_episodes/34-refactoring-architecture.md +++ b/_episodes/34-refactoring-architecture.md @@ -1,7 +1,7 @@ --- title: "Architecting code to separate responsibilities" -teaching: 0 -exercises: 0 +teaching: 4 +exercises: 25 questions: - "What is the point of the MVC architecture" - "How should code be structured" diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index f4dc532b6..d0bcca438 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -1,7 +1,7 @@ --- title: "Using classes to de-couple code." -teaching: 0 -exercises: 0 +teaching: 35 +exercises: 55 questions: - "What is de-coupled code?" - "When is it useful to use classes to structure code?" diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md index caecc596f..9dff05f9a 100644 --- a/_episodes/36-yagni.md +++ b/_episodes/36-yagni.md @@ -1,7 +1,7 @@ --- title: "When to abstract, and when not to." -teaching: 0 -exercises: 0 +teaching: 10 +exercises: 25 questions: - "How to tell what is and isn't an appropriate abstraction." - "How to design larger solutions." From 4343b1d952d8b45b83f7812c5ba6f79bd8e76f39 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:50:40 +0100 Subject: [PATCH 046/105] Ensure model test is agnostic as to where it is run from --- _episodes/33-refactoring-functions.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 6e9f317e7..7da2d9e30 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -83,10 +83,11 @@ the tests at all. >> >> ```python >> import numpy.testing as npt +>> from pathlib import Path >> >> def test_compute_data(): >> from inflammation.compute_data import analyse_data ->> path = 'data/' +>> path = Path.cwd() / "../data" >> result = analyse_data(path) >> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 >> ,0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094 From 515ed184fbc1a0aa26ba0825f1cb645c409970c1 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:51:27 +0100 Subject: [PATCH 047/105] Correct example function name --- _episodes/33-refactoring-functions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 7da2d9e30..65429822e 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -152,7 +152,7 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> You can move all of the code that does the analysis into a separate function that >> might look something like this: >> ```python ->> def compute_standard_deviation_by_data(all_loaded_data): +>> def compute_standard_deviation_by_day(all_loaded_data): >> means_by_day = map(models.daily_mean, all_loaded_data) >> means_by_day_matrix = np.stack(list(means_by_day)) >> @@ -171,7 +171,7 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> if len(data_file_paths) == 0: >> raise ValueError(f"No inflammation csv's found in path {data_dir}") >> data = map(models.load_csv, data_file_paths) ->> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> daily_standard_deviation = compute_standard_deviation_by_day(data) >> >> graph_data = { >> 'standard deviation by day': daily_standard_deviation, @@ -213,7 +213,7 @@ won't need to be updated. >> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) >>], >>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) ->>def test_compute_standard_deviation_by_data(data, expected_output): +>>def test_compute_standard_deviation_by_day(data, expected_output): >> from inflammation.compute_data import compute_standard_deviation_by_data >> >> result = compute_standard_deviation_by_data(data) From 51cf100618a01b50f293ef654e5ff67022e083fe Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:51:59 +0100 Subject: [PATCH 048/105] Fixing spelling mistakes in refactoring functions exercise --- _episodes/33-refactoring-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 65429822e..069190b3b 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -204,7 +204,7 @@ won't need to be updated. > Add tests that check for when there is only one file with multiple rows, multiple files with one row > and any other cases you can think of that should be tested. >> ## Solution ->> You might hev throught of more tests, but we can easily extend the test by parameterizing +>> You might have thought of more tests, but we can easily extend the test by parametrizing >> with more inputs and expected outputs: >> ```python >>@pytest.mark.parametrize('data,expected_output', [ From a79d3940ed94acf22550624ce60433635a680058 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 13:55:44 +0100 Subject: [PATCH 049/105] Make sure there is an import for the Mock class --- _episodes/35-refactoring-decoupled-units.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index d0bcca438..098405e98 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -350,6 +350,8 @@ An convenient way to do this in Python is using Mocks. These are a whole topic to themselves - but a basic mock can be constructed using a couple of lines of code: ```python +from unittest.mock import Mock + mock_version = Mock() mock_version.method_to_mock.return_value = 42 ``` From c4e11463e4924938d9aabe398cfb5370e2f06e25 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:16:30 +0100 Subject: [PATCH 050/105] Cover the changes needed to the regression test with the class refactor --- _episodes/35-refactoring-decoupled-units.md | 24 ++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 098405e98..358f8c0e5 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -177,9 +177,27 @@ Classes have a number of uses. >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) >> data_result = analyse_data(data_source) >> ``` ->> Note in all these refactorings the behaviour is unchanged, ->> so we can still run our original tests to ensure we've not ->> broken anything. +>> While the behaviour is unchanged, how we call `analyse_data` has changed. +>> We must update our regression test to match this, to ensure we haven't broken the code: +>> ```python +>> ... +>> def test_compute_data(): +>> from inflammation.compute_data import analyse_data +>> path = Path.cwd() / "../data" +>> data_source = CSVDataSource(path) +>> result = analyse_data(data_source) +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 +>> ... +>> ``` +>> If this was a more complex refactoring, we could introduce an indirection to keep +>> the interface the same: +>> ```python +>> def analyse_data(dir_path): +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> return analyse_data_from_source(data_source) +>> ``` +>> This can be a really useful intermediate step if `analyse_data` is called +>> from lots of different places. > {: .solution} {: .challenge} From a77b9d62dfd5c197ca0e22910b7cdefc4dad59ac Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:18:10 +0100 Subject: [PATCH 051/105] Link decoupling to abstractions --- _episodes/35-refactoring-decoupled-units.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/35-refactoring-decoupled-units.md index 358f8c0e5..b26a1ef7f 100644 --- a/_episodes/35-refactoring-decoupled-units.md +++ b/_episodes/35-refactoring-decoupled-units.md @@ -33,6 +33,10 @@ allows for more maintainable code: * Loose coupled code tends to be easier to maintain, as changes can be isolated from other parts of the code. +Introducing **abstractions** is a way to decouple code. +If one part of the code only uses another part through an appropriate abstraction +then it becomes easier for these parts to change independently. + > ## Exercise: Decouple the file loading from the computation > Currently the function is hard coded to load all the files in a directory > Decouple this into a separate function that returns all the files to load From dfdb1a95248aac76a7a7db0fc319c600912c5471 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:21:06 +0100 Subject: [PATCH 052/105] Make consistent use of first/second person --- _episodes/32-software-design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 3b6338758..f1ba7bfd9 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -155,7 +155,7 @@ We are going to be refactoring and extending this over the remainder of this epi >> >> * Everything is in a single function - reading it you have to understand how the file loading works at the same time as the analysis itself. ->> * If I want to use the data without using the graph I'd have to change it +>> * If you want to use the data without using the graph you'd have to change it >> * It is always analysing a fixed set of data >> * It seems hard to write tests for it as it always analyses a fixed set of files >> * It doesn't have any tests From aebc099ab00fcbd0aee84eeed2a980099aa8a1d6 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:24:58 +0100 Subject: [PATCH 053/105] Ensure each problem links to a specific part of maintainable code --- _episodes/32-software-design.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index f1ba7bfd9..bdc571c83 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -153,12 +153,11 @@ We are going to be refactoring and extending this over the remainder of this epi >> You may have found others, but here are some of the things that make the code >> hard to read, test and maintain: >> ->> * Everything is in a single function - reading it you have to understand how the file loading +>> * **Hard to read:** Everything is in a single function - reading it you have to understand how the file loading works at the same time as the analysis itself. ->> * If you want to use the data without using the graph you'd have to change it ->> * It is always analysing a fixed set of data ->> * It seems hard to write tests for it as it always analyses a fixed set of files ->> * It doesn't have any tests +>> * **Hard to modify:** If you want to use the data without using the graph you'd have to change it +>> * **Hard to modify or test:** It is always analysing a fixed set of data stored on the disk +>> * **Hard to modify:** It doesn't have any tests meaning changes might break something >> >> Keep the list you created - at the end of this section we will revisit this >> and check that we have learnt ways to address the problems we found. From a6e502afd98497c3918bbb3826b1157cd1b58b8a Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:25:32 +0100 Subject: [PATCH 054/105] Improve grammar of exercise solution --- _episodes/32-software-design.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index bdc571c83..700813f5c 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -159,7 +159,8 @@ works at the same time as the analysis itself. >> * **Hard to modify or test:** It is always analysing a fixed set of data stored on the disk >> * **Hard to modify:** It doesn't have any tests meaning changes might break something >> ->> Keep the list you created - at the end of this section we will revisit this +>> Keep the list you have created. +>> At the end of this section we will revisit this >> and check that we have learnt ways to address the problems we found. > {: .solution} {: .challenge} From 35fd32d3bcc07d935b008e0ee0a3044cb01d4ee8 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:41:51 +0100 Subject: [PATCH 055/105] Move MVC stuff after the classes section --- ...oring-decoupled-units.md => 34-refactoring-decoupled-units.md} | 0 ...refactoring-architecture.md => 35-refactoring-architecture.md} | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename _episodes/{35-refactoring-decoupled-units.md => 34-refactoring-decoupled-units.md} (100%) rename _episodes/{34-refactoring-architecture.md => 35-refactoring-architecture.md} (100%) diff --git a/_episodes/35-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md similarity index 100% rename from _episodes/35-refactoring-decoupled-units.md rename to _episodes/34-refactoring-decoupled-units.md diff --git a/_episodes/34-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md similarity index 100% rename from _episodes/34-refactoring-architecture.md rename to _episodes/35-refactoring-architecture.md From d1c3491ad68cda9d60375784cab7d6295ae5b144 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 14:53:57 +0100 Subject: [PATCH 056/105] Combine YAGNI section into the MVC section Is really all about high level architecture --- _episodes/35-refactoring-architecture.md | 117 +++++++++++++++++++-- _episodes/36-yagni.md | 128 ----------------------- 2 files changed, 111 insertions(+), 134 deletions(-) delete mode 100644 _episodes/36-yagni.md diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index a57c8541f..6c72b6fd9 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -1,17 +1,19 @@ --- title: "Architecting code to separate responsibilities" -teaching: 4 -exercises: 25 +teaching: 15 +exercises: 50 questions: - "What is the point of the MVC architecture" -- "How should code be structured" +- "How to design larger solutions." +- "How to tell what is and isn't an appropriate abstraction." objectives: - "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." -- "Understand the MVC pattern and how to apply it." -- "Understand the benefits of using patterns" +- "How to design large changes to the codebase." +- "Understand how to determine correct abstractions. " keypoints: - "By splitting up the \"view\" code from \"model\" code, you allow easier re-use of code." -- "Using coding patterns can be useful inspirations for how to structure your code." +- "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." +- "Sketching a diagram of the code can clarify how it is supposed to work, and troubleshoot problems early." --- @@ -139,4 +141,107 @@ However, they cannot replace a full design as most problems will require a bespoke design that maps cleanly on to the specific problem you are trying to solve. +## Architecting larger changes + +When creating a new application, or creating a substantial change to an existing one, +it can be really helpful to sketch out the intended architecture on a whiteboard +(pen and paper works too, though of course it might get messy as you iterate on the design!). + +The basic idea is you draw boxes that will represent different units of code, as well as +other components of the system (such as users, databases etc). +Then connect these boxes with lines where information or control will be exchanged. +These lines represent the interfaces in your system. + +As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. +For example, if there is a circular dependency between two sections of the design. +It can also help with estimating how long the work will take, as it forces you to consider all the components that +need to be made. + +Diagrams aren't foolproof, and often the stuff we haven't considered won't make it on to the diagram +but they are a great starting point to break down the different responsibilities and think about +the kinds of information different parts of the system will need. + + +> ## Exercise: Design a high-level architecture +> Sketch out a design for a new feature requested by a user +> +> *"I want there to be a Google Drive folder that when I upload new inflammation data to +> the software automatically pulls it down and updates the analysis. +> The new result should be added to a database with a timestamp. +> An email should then be sent to a group email notifying them of the change."* +>> ## Solution +>> +>> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg) +> {: .solution} +{: .challenge} + +## An abstraction too far + +So far we have seen how abstractions are good for making code easier to read, maintain and test. +However, it is possible to introduce too many abstractions. + +> All problems in computer science can be solved by another level of indirection except the problem of too many levels of indirection + +When you introduce an abstraction, if the reader of the code needs to understand what is happening inside the abstraction, +it has actually made the code *harder* to read. +When code is just in the function, it can be clear to see what it is doing. +When the code is calling out to an instance of a class that, thanks to polymorphism, could be a range of possible implementations, +the only way to find out what is *actually* being called is to run the code and see. +This is much slower to understand, and actually obfuscates meaning. + +It is a judgement as to whether you have make the code too abstract. +If you have to jump around a lot when reading the code that is a clue that is too abstract. +Similarly, if there are two parts of the code that always need updating together, that is +again an indication of an incorrect or over-zealous abstraction. + + +## You Ain't Gonna Need It + +There are different approaches to designing software. +One principle that is popular is called You Ain't Gonna Need it - "YAGNI" for short. +The idea is that, since it is hard to predict the future needs of a piece of software, +it is always best to design the simplest solution that solves the problem at hand. +This is opposed to trying to imagine how you might want to adapt the software in future +and designing the code with that in mind. + +Then, since you know the problem you are trying to solve, you can avoid making your solution unnecessarily complex or abstracted. + +In our example, it might be tempting to abstract how the `CSVDataSource` walks the file tree into a class. +However, since we only have one strategy for exploring the file tree, this would just create indirection for the sake of it +- now a reader of CSVDataSource would have to read a different class to find out how the tree is walked. +Maybe in the future this is something that needs to be customised, but we haven't really made it any harder to do by *not* doing this prematurely +and once we have the concrete feature request, it will be easier to design it appropriately. + +> All of this is a judgement. +> For example, in this case, perhaps it *would* make sense to at least pull the file parsing out into a separate +> class, but not have the CSVDataSource be configurable. +> That way, it is clear to see how the file tree is being walked (there's no polymorphism going on) +> without mixing the *parsing* code in with the file finding code. +> There are no right answers, just guidelines. +{: .callout} + +> ## Exercise: Applying to real world examples +> Thinking about the examples of good and bad code you identified at the start of the episode. +> Identify what kind of principles were and weren't being followed +> Identify some refactorings that could be performed that would improve the code +> Discuss the ideas as a group. +{: .challenge} + +## Conclusion + +Good architecture is not about applying any rules blindly, but instead practise and taking care around important things: + +* Avoid duplication of code or data. +* Keeping how much a person has to understand at once to a minimum. +* Think about how interfaces will work. +* Separate different considerations into different sections of the code. +* Don't try and design a future proof solution, focus on the problem at hand. + +Practise makes perfect. +One way to practise is to consider code that you already have and think how it might be redesigned. +Another way is to always try to leave code in a better state that you found it. +So when you're working on a less well structured part of the code, start by refactoring it so that your change fits in cleanly. +Doing this, over time, with your colleagues, will improve your skills as software architecture as well as improving the code. + + {% include links.md %} diff --git a/_episodes/36-yagni.md b/_episodes/36-yagni.md deleted file mode 100644 index 9dff05f9a..000000000 --- a/_episodes/36-yagni.md +++ /dev/null @@ -1,128 +0,0 @@ ---- -title: "When to abstract, and when not to." -teaching: 10 -exercises: 25 -questions: -- "How to tell what is and isn't an appropriate abstraction." -- "How to design larger solutions." -objectives: -- "Understand how to determine correct abstractions. " -- "How to design large changes to the codebase." -keypoints: -- "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." -- "The best code is simple to understand and test, not the most clever or uses advanced language features." -- "Sketching a diagram of the code can clarify how it is supposed to work, and troubleshoot problems early." ---- - -## Introduction - -In this section we have explored a range of techniques for architecting code: - - * Using pure functions assembled into pipelines to perform analysis - * Using established patterns to discuss design - * Separating different considerations, such as how data is presented from how it is stored - * Using classes to create abstractions - -None of these techniques are always applicable, and they are not sufficient to design a good technical solution. - -## Architecting larger changes - -When creating a new application, or creating a substantial change to an existing one, -it can be really helpful to sketch out the intended architecture on a whiteboard -(pen and paper works too, though of course it might get messy as you iterate on the design!). - -The basic idea is you draw boxes that will represent different units of code, as well as -other components of the system (such as users, databases etc). -Then connect these boxes with lines where information or control will be exchanged. -These lines represent the interfaces in your system. - -As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. -For example, if there is a circular dependency between two sections of the design. -It can also help with estimating how long the work will take, as it forces you to consider all the components that -need to be made. - -Diagrams aren't foolproof, and often the stuff we haven't considered won't make it on to the diagram -but they are a great starting point to break down the different responsibilities and think about -the kinds of information different parts of the system will need. - - -> ## Exercise: Design a high-level architecture -> Sketch out a design for a new feature requested by a user -> -> *"I want there to be a Google Drive folder that when I upload new inflammation data to -> the software automatically pulls it down and updates the analysis. -> The new result should be added to a database with a timestamp. -> An email should then be sent to a group email notifying them of the change."* ->> ## Solution ->> ->> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg) -> {: .solution} -{: .challenge} - -## An abstraction too far - -So far we have seen how abstractions are good for making code easier to read, maintain and test. -However, it is possible to introduce too many abstractions. - -> All problems in computer science can be solved by another level of indirection except the problem of too many levels of indirection - -When you introduce an abstraction, if the reader of the code needs to understand what is happening inside the abstraction, -it has actually made the code *harder* to read. -When code is just in the function, it can be clear to see what it is doing. -When the code is calling out to an instance of a class that, thanks to polymorphism, could be a range of possible implementations, -the only way to find out what is *actually* being called is to run the code and see. -This is much slower to understand, and actually obfuscates meaning. - -It is a judgement as to whether you have make the code too abstract. -If you have to jump around a lot when reading the code that is a clue that is too abstract. -Similarly, if there are two parts of the code that always need updating together, that is -again an indication of an incorrect or over-zealous abstraction. - - -## You Ain't Gonna Need It - -There are different approaches to designing software. -One principle that is popular is called You Ain't Gonna Need it - "YAGNI" for short. -The idea is that, since it is hard to predict the future needs of a piece of software, -it is always best to design the simplest solution that solves the problem at hand. -This is opposed to trying to imagine how you might want to adapt the software in future -and designing the code with that in mind. - -Then, since you know the problem you are trying to solve, you can avoid making your solution unnecessarily complex or abstracted. - -In our example, it might be tempting to abstract how the `CSVDataSource` walks the file tree into a class. -However, since we only have one strategy for exploring the file tree, this would just create indirection for the sake of it -- now a reader of CSVDataSource would have to read a different class to find out how the tree is walked. -Maybe in the future this is something that needs to be customised, but we haven't really made it any harder to do by *not* doing this prematurely -and once we have the concrete feature request, it will be easier to design it appropriately. - -> All of this is a judgement. -> For example, in this case, perhaps it *would* make sense to at least pull the file parsing out into a separate -> class, but not have the CSVDataSource be configurable. -> That way, it is clear to see how the file tree is being walked (there's no polymorphism going on) -> without mixing the *parsing* code in with the file finding code. -> There are no right answers, just guidelines. -{: .callout} - -> ## Exercise: Applying to real world examples -> Thinking about the examples of good and bad code you identified at the start of the episode. -> Identify what kind of principles were and weren't being followed -> Identify some refactorings that could be performed that would improve the code -> Discuss the ideas as a group. -{: .challenge} - -## Conclusion - -Good architecture is not about applying any rules blindly, but instead practise and taking care around important things: - -* Avoid duplication of code or data. -* Keeping how much a person has to understand at once to a minimum. -* Think about how interfaces will work. -* Separate different considerations into different sections of the code. -* Don't try and design a future proof solution, focus on the problem at hand. - -Practise makes perfect. -One way to practise is to consider code that you already have and think how it might be redesigned. -Another way is to always try to leave code in a better state that you found it. -So when you're working on a less well structured part of the code, start by refactoring it so that your change fits in cleanly. -Doing this, over time, with your colleagues, will improve your skills as software architecture as well as improving the code. From 05fee5be7fae195326221612d8b6db96d42cf7ec Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 15:11:33 +0100 Subject: [PATCH 057/105] Use consistent langauge - responsibilties - when talking about parts of code --- _episodes/35-refactoring-architecture.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 6c72b6fd9..6d6087ddd 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -19,16 +19,16 @@ keypoints: ## Introduction -Model-View-Controller (MVC) is a way of separating out different portions of a typical +Model-View-Controller (MVC) is a way of separating out different responsibilities of a typical application. Specifically we have: -* The **model** which contains the internal data representations for the program, and the valid - operations that can be performed on it. +* The **model** which is responsible for the internal data representations for the program, + and the valid operations that can be performed on it. * The **view** is responsible for how this data is presented to the user (e.g. through a GUI or by writing out to a file) -* The **controller** defines how the model can be interacted with. +* The **controller** is responsible for how the model can be interacted with. -Separating out these different sections into different parts of the code will make +Separating out these different responsibilities into different parts of the code will make the code much more maintainable. For example, if the view code is kept away from the model code, then testing the model code can be done without having to worry about how it will be presented. @@ -39,7 +39,7 @@ just one thing. It also helps with maintainability - if the UI requirements change, these changes are easily isolated from the more complex logic. -## Separating out considerations +## Separating out responsibilities The key thing to take away from MVC is the distinction between model code and view code. From dfcf219b5aad5b2882626276e369fcf472eea7bb Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 15:21:22 +0100 Subject: [PATCH 058/105] Improve the flow of the start of the classes section Introduce a problem that classes will solve. Use consistent circle example all through. Make header more accurate. Remove benifits of using classes - we are introducing a big benifit, don't want to muddy the waters with other benifits. --- _episodes/34-refactoring-decoupled-units.md | 37 +++++++-------------- 1 file changed, 12 insertions(+), 25 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index b26a1ef7f..63d059178 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -1,6 +1,6 @@ --- title: "Using classes to de-couple code." -teaching: 35 +teaching: 30 exercises: 55 questions: - "What is de-coupled code?" @@ -64,19 +64,22 @@ then it becomes easier for these parts to change independently. > {: .solution} {: .challenge} -## Using classes to encapsulate data and behaviours +Even with this change, the file loading is coupled with the data analysis. +For example, if we wave to support reading JSON files or CSV files +we would have to pass into `analyse_data` some kind of flag indicating what we want. -Abstractedly, we can talk about units of code, where we are thinking of the unit doing one "thing". -In practise, in Python there are three ways we can create defined units of code. -The first is functions, which we have used. -The next level up is **classes**. -Finally, there are also modules and packages, which we won't cover. +Instead, we would like to decouple the consideration of what data to load +from the `analyse_data`` function entirely. + +One way we can do this is to use a language feature called a **class**. + +## Using Python Classes A class is a way of grouping together data with some specific methods. In Python, you can declare a class as follows: ```python -class MyClass: +class Circle: pass ``` @@ -85,7 +88,7 @@ They are typically named using `UpperCase`. You can then **construct** a class elsewhere in your code by doing the following: ```python -my_class = MyClass() +my_circle = Circle() ``` When you construct a class in this ways, the classes **construtor** is called. @@ -125,22 +128,6 @@ Here the instance of the class, `my_circle` will be automatically passed in as the first parameter when calling `get_area`. Then the method can access the **member variable** `radius`. -Classes have a number of uses. - -* Encapsulating data - such as grouping three numbers together into a Vector class -* Maintaining invariants - perhaps when storing a file path it only makes sense for that to resolve to a valid file - by storing the string in a class with a method for setting it (a **setter**), that method can validate the new value before updating the value. -* Encapsulating behaviour - such as a class representing a UI state, modifying some value will automatically - force the relevant portion of the UI to be updated. - -> ## Maintaining Invariants -> Maintaining invariants can be a really powerful tool in debugging. -> Without invariants, you can find bugs where some data is in an invalid -> state, but the problem only appears when you try to use the data. -> This makes it hard to track down the cause of the bug. -> By using classes to maintain invariants, you can force the issue -> to appear when the invalid data is set, that is, the source of the bug. -{: .callout} - > ## Exercise: Use a class to configure loading > Put the `load_inflammation_data` function we wrote in the last exercise as a member method > of a new class called `CSVDataSource`. From 1de7c868479bd0cb68979e6b1306967367b2c74e Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:16:35 +0100 Subject: [PATCH 059/105] Use correct name for CSVDataSource --- _episodes/34-refactoring-decoupled-units.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 63d059178..5514e3820 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -304,7 +304,7 @@ to the relevant class. As we saw with the `Circle` and `Square` examples, we can use interfaces and polymorphism to provide different implementations of the same interface. -For example, we could replace our `CSVReader` with a class that reads a totally different format, +For example, we could replace our `CSVDataSource` with a class that reads a totally different format, or reads from an external service. All of these can be added in without changing the analysis. Further - if we want to write a new analysis, we can support any of these data sources @@ -352,7 +352,7 @@ That is, we have decoupled the job of loading the data from the job of analysing We can use this abstraction to also make testing more straight forward. Instead of having our tests use real file system data, we can instead provide a mock or dummy implementation of the `InflammationDataSource` that just returns some example data. -Separately, we can test the file parsing class `CSVReader` without having to understand +Separately, we can test the file parsing class `CSVDataSource` without having to understand the specifics of the statistical analysis. An convenient way to do this in Python is using Mocks. From 1f03d9ea68d3bc25e3fa37eb83a83249b65a757b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:17:52 +0100 Subject: [PATCH 060/105] Make the interfaces section not use real interfaces Since Python doesn't really have interfaces, and most of the benifits of having interfaces are not supported by Python, this needlessly complicates the lesson. Instead talking about common interfaces for different classes. --- _episodes/34-refactoring-decoupled-units.md | 120 +++++++------------- 1 file changed, 41 insertions(+), 79 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 5514e3820..6a20530ed 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -1,7 +1,7 @@ --- title: "Using classes to de-couple code." teaching: 30 -exercises: 55 +exercises: 45 questions: - "What is de-coupled code?" - "When is it useful to use classes to structure code?" @@ -200,93 +200,53 @@ These allow separate systems to communicate with each other - such as a making a to Google Maps to find the latitude and longitude of an address. However, there are internal interfaces within our software that dictate how -different units of the system interact with each other. +different parts of the system interact with each other. Even if these aren't thought out or documented, they still exist! -For example, there is an interface for how the statistical analysis in `analyse_data` -uses the class `CSVDataSource` - the method `load_inflammation_data`, how it should be called -and what it will return. +For example, our `Circle` class implicitly has an interface: +you can call `get_area` on it and it will return a number representing its area. -Interfaces are important to get right - a messy interface will force tighter coupling between -two units in the system. -Unfortunately, it would be an entire course to cover everything to consider in interface design. - -In addition to the abstract notion of an interface, many programming languages -support creating interfaces as a special kind of class. -Python doesn't support this explicitly, but we can still use this feature with -regular classes. -An interface class will define some methods, but not provide an implementation: - -```python -class Shape: - def get_area(): - raise NotImplementedError -``` - -> ## Exercise: Define an interface for your class -> As discussed, there is an interface between the CSVDataSource and the analysis. -> Write an interface(that is, a class that defines some empty methods) called `InflammationDataSource` -> that makes this interface explicit. -> Document the format the data will be returned in. +> ## Exercise: Identify the interface between `CSVDataSource` and `analyse_data` +> What is the interface that CSVDataSource has with `analyse_data`. +> Think about what functions `analyse_data` needs to be able to call, +> what parameters they need and what it will return. >> ## Solution ->> ```python ->> class InflammationDataSource: ->> """ ->> An interface for providing a series of inflammation data. ->> """ +>> The interface is the `load_inflammation_data` method. >> ->> def load_inflammation_data(self): ->> """ ->> Loads the data and returns it as a list, where each entry corresponds to one file, ->> and each entry is a 2D array with patients inflammation by day. ->> :returns: A list where each entry is a 2D array of patient inflammation results by day ->> """ ->> raise NotImplementedError ->> ``` +>> It takes no parameters. +>> +>> It returns a list where each entry is a 2D array of patient inflammation results by day +>> Any object we pass into `analyse_data` must conform to this interface. > {: .solution} {: .challenge} -An interface on its own is not useful - it cannot be instantiated. -The next step is to create a class that **implements** the interface. -That is, create a class that inherits from the interface and then provide -implementations of all the methods on the interface. -To return to our `Shape` interface, we can write classes that implement this -interface, with different implementations: +## Polymorphism -```python -class Circle(Shape): - ... - def get_area(self): - return math.pi * self.radius * self.radius +It is possible to design multiple classes that each conform to the same interface. +For example, we could provide a `Rectangle` class: + +```python class Rectangle(Shape): - ... + def __init__(self, width, height): + self.width = width + self.height = height def get_area(self): return self.width * self.height ``` -As you can see, by putting `ShapeInterface`` in brackets after the class -we are saying a `Circle` **is a** `Shape`. - -> ## Exercise: Implement the interface -> Modify the existing class to implement the interface. -> Ensure the method matches up exactly to the interface. ->> ## Solution ->> We can create a class that implements `load_inflammation_data`. ->> We can lift the code into this new class. ->> ->> ```python ->> class CSVDataSource(InflammationDataSource): ->> ``` -> {: .solution} -{: .challenge} - -## Polymorphism +Like `Circle`, this class provides a `get_area` method. +The method takes the same number of parameters (none), and returns a number. +However, the implementation is different. -Where this gets useful is by using a concept called **polymorphism** -which is a fancy way of saying we can use an instance of a class and treat -it as a `Shape`, without worrying about whether it is a `Circle` or a `Rectangle`. +When classes share an interface, then we can use an instance of a class without +knowing what specific class is being used. +When we do this, it is called **polymorphism**. +Here is an example where we create a list of shapes (either Circles or Rectangles) +and can then find the total area. +Note how we call `get_area` and Python is able to call the appropriate `get_area` +for each of the shapes. ```python my_circle = Circle(radius=10) @@ -301,8 +261,8 @@ to the relevant class. ### How polymorphism is useful -As we saw with the `Circle` and `Square` examples, we can use interfaces and polymorphism -to provide different implementations of the same interface. +As we saw with the `Circle` and `Square` examples, we can use common interfaces and polymorphism +to abstract away the details of the implementation from the caller. For example, we could replace our `CSVDataSource` with a class that reads a totally different format, or reads from an external service. @@ -313,13 +273,13 @@ That is, we have decoupled the job of loading the data from the job of analysing > ## Exercise: Introduce an alternative implementation of DataSource > Create another class that repeatedly asks the user for paths to CSVs to analyse. -> It should inherit from the interface and implement the `load_inflammation_data` method. +> It should implement the `load_inflammation_data` method. > Finally, at run time provide an instance of the new implementation if the user hasn't > put any files on the path. >> ## Solution >> You should have created a class that looks something like: >> ```python ->> class UserProvidSpecificFilesDataSource(InflammationDataSource): +>> class UserProvidSpecificFilesDataSource: >> def load_inflammation_data(self): >> paths = [] >> while(True): @@ -351,12 +311,14 @@ That is, we have decoupled the job of loading the data from the job of analysing We can use this abstraction to also make testing more straight forward. Instead of having our tests use real file system data, we can instead provide -a mock or dummy implementation of the `InflammationDataSource` that just returns some example data. +a mock or dummy implementation instead of one of the DataSource classes. +This dummy implementation could just returns some fixed example data. Separately, we can test the file parsing class `CSVDataSource` without having to understand the specifics of the statistical analysis. -An convenient way to do this in Python is using Mocks. -These are a whole topic to themselves - but a basic mock can be constructed using a couple of lines of code: +An convenient way to do this in Python is using Python's [mock object library](https://docs.python.org/3/library/unittest.mock.html). +These are a whole topic to themselves - +but a basic mock can be constructed using a couple of lines of code: ```python from unittest.mock import Mock @@ -372,7 +334,7 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > ## Exercise: Test using a mock or dummy implementation -> Create a mock for the `InflammationDataSource` that returns some fixed data to test +> Create a mock for to provide as the `data_source` that returns some fixed data to test > the `analyse_data` method. > Use this mock in a test. >> ## Solution From 0df2ed8706da25089604c78013eede3435a61383 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:20:51 +0100 Subject: [PATCH 061/105] Remove ... from example solution Since all we are omiting is the docstring, we can leave that as implict and make it clear there is no code before the load call --- _episodes/34-refactoring-decoupled-units.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 6a20530ed..fa07ae69d 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -54,8 +54,8 @@ then it becomes easier for these parts to change independently. >> This can then be used in the analysis. >> ```python >> def analyse_data(data_dir): ->> ... >> data = load_inflammation_data(data_dir) +>> daily_standard_deviation = compute_standard_deviation_by_data(data) >> ... >> ``` >> This is now easier to understand, as we don't need to understand the the file loading @@ -158,8 +158,9 @@ Then the method can access the **member variable** `radius`. >> We have "decoupled" the reading of the data from the statistical analysis. >> ```python >> def analyse_data(data_source): ->> ... >> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... >> ``` >> >> In the controller, you might have something like: From 44883039434f3ff14e380992da73dbb55c6dbe35 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:29:24 +0100 Subject: [PATCH 062/105] Ensure code samples consistent with new order Now MVC comes after classes make sure the examples in classes do not contain the changes done as part of MVC, and that the classes changes are in the MVC examples --- _episodes/34-refactoring-decoupled-units.md | 4 ++-- _episodes/35-refactoring-architecture.md | 21 +++++++++++---------- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index fa07ae69d..538c9ca5c 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -167,7 +167,7 @@ Then the method can access the **member variable** `radius`. >> >> ```python >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) ->> data_result = analyse_data(data_source) +>> analyse_data(data_source) >> ``` >> While the behaviour is unchanged, how we call `analyse_data` has changed. >> We must update our regression test to match this, to ensure we haven't broken the code: @@ -303,7 +303,7 @@ That is, we have decoupled the job of loading the data from the job of analysing >> data_source = UserProvidSpecificFilesDataSource() >> else: >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) ->> data_result = analyse_data(data_source) +>> analyse_data(data_source) >>``` >> As you have seen, all these changes were made without modifying >> the analysis code itself. diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 6d6087ddd..fca83749e 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -94,10 +94,7 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> Gets all the inflammation csvs within a directory, works out the mean >> inflammation value for each day across all datasets, then graphs the >> standard deviation of these means.""" ->> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) ->> if len(data_file_paths) == 0: ->> raise ValueError(f"No inflammation csv's found in path {data_dir}") ->> data = map(models.load_csv, data_file_paths) +>> data = data_source.load_inflammation_data() >> daily_standard_deviation = compute_standard_deviation_by_data(data) >> >> return daily_standard_deviation @@ -106,12 +103,16 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> >> ```python >> if args.full_data_analysis: ->> data_result = analyse_data(os.path.dirname(InFiles[0])) ->> graph_data = { ->> 'standard deviation by day': data_result, ->> } ->> views.visualize(graph_data) ->> return +>> if len(InFiles) == 0: +>> data_source = UserProvidSpecificFilesDataSource() +>> else: +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> data_result = analyse_data(data_source) +>> graph_data = { +>> 'standard deviation by day': data_result, +>> } +>> views.visualize(graph_data) +>> return >> ``` >> You might notice this is more-or-less the change we did to write our >> regression test. From 54f3c9c83f55554d9b886a3c90b4523ceca56967 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:42:37 +0100 Subject: [PATCH 063/105] Fix formatting of solution regression test --- _episodes/33-refactoring-functions.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 069190b3b..2d9ac2785 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -89,13 +89,13 @@ the tests at all. >> from inflammation.compute_data import analyse_data >> path = Path.cwd() / "../data" >> result = analyse_data(path) ->> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 ->> ,0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094 ->> ,1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312 ->> ,1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578 ->> ,0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417 ->> ,0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707 ->> ,0.50323031,0.47574665,0.45197398,0.22070227] +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211, +>> 0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094, +>> 1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312, +>> 1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578, +>> 0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417, +>> 0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707, +>> 0.50323031,0.47574665,0.45197398,0.22070227] >> npt.assert_array_almost_equal(result, expected_output) >> ``` >> From b663efac69a65e4a24f03e15d2410d2040a879f5 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:42:57 +0100 Subject: [PATCH 064/105] Correct name of the regression test to match convention --- _episodes/33-refactoring-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 2d9ac2785..15790411d 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -85,7 +85,7 @@ the tests at all. >> import numpy.testing as npt >> from pathlib import Path >> ->> def test_compute_data(): +>> def test_analyse_data(): >> from inflammation.compute_data import analyse_data >> path = Path.cwd() / "../data" >> result = analyse_data(path) From 9eae6e2ea60bfaa98c40d9ddcb36ed0f5aeb8eeb Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:43:37 +0100 Subject: [PATCH 065/105] Provide a skeleton for the test to make the exercise a bit easier Instead allows the student to focus on observing and then testing current behaviour, rather than getting bogged down in implemenation details --- _episodes/33-refactoring-functions.md | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 15790411d..40a3d3959 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -62,15 +62,25 @@ you're not changing the important behaviour you have to make some small tweaks t the tests at all. > ## Exercise: Write regression tests before refactoring -> Write a regression test to verify we don't break the code when refactoring. -> You will need to modify `analyse_data` to not create a graph and instead -> return the data. +> Add a new test file called `test_compute_data.py` in the tests folder. +> Add and complete this regression test to verify the current output of `analyse_data` +> is unchanged by the refactorings we are going to do: +> ```python +> def test_analyse_data(): +> from inflammation.compute_data import analyse_data +> path = Path.cwd() / "../data" +> result = analyse_data(path) > -> Don't forget you can use the `numpy.testing` function `assert_array_equal` to +> # TODO: add an assert for the value of result +> ``` +> Use `assert_array_almost_equal` from the `numpy.testing` library to > compare arrays of floating point numbers. > +> You will need to modify `analyse_data` to not create a graph and instead +> return the data. +> >> ## Hint ->> You might find it helpful to assert the result, observe the test failing +>> You might find it helpful to assert the results equal some made up array, observe the test failing >> and copy and paste the correct result into the test. > {: .solution} > From c3d845c403e9bb383055a6d2eee6eb13ceedd4c7 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:47:36 +0100 Subject: [PATCH 066/105] Provide signature for pure function This should make the exercise clearer. --- _episodes/33-refactoring-functions.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 40a3d3959..f004f9b3a 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -155,7 +155,12 @@ that is maybe harder to test, but is so simple that it only needs a handful of t > ## Exercise: Refactor the function into a pure function > Refactor the `analyse_data` function into a pure function with the logic, and an impure function that handles the input and output. -> The pure function should take in the data, and return the analysis results. +> The pure function should take in the data, and return the analysis results: +> ```python +> def compute_standard_deviation_by_day(data): +> # TODO +> return daily_standard_deviation +> ``` > The "glue" function should maintain the behaviour of the original `analyse_data` > but delegate all the calculations to the new pure function. >> ## Solution From dc747dd1b88423b7ac9a1abf966ab0f6ed48e6b0 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:48:01 +0100 Subject: [PATCH 067/105] Use variable name data rather than all_loaded_data for example This matches up with the variable names in the original code, making the refactoring more obvious --- _episodes/33-refactoring-functions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index f004f9b3a..0670de3b7 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -167,8 +167,8 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> You can move all of the code that does the analysis into a separate function that >> might look something like this: >> ```python ->> def compute_standard_deviation_by_day(all_loaded_data): ->> means_by_day = map(models.daily_mean, all_loaded_data) +>> def compute_standard_deviation_by_day(data): +>> means_by_day = map(models.daily_mean, data) >> means_by_day_matrix = np.stack(list(means_by_day)) >> >> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) From cafc20db3768a0380bb604e14fe3958245aae655 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:48:13 +0100 Subject: [PATCH 068/105] Introduce a header for the testing of pure functions section --- _episodes/33-refactoring-functions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 0670de3b7..63a321492 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -197,6 +197,8 @@ that is maybe harder to test, but is so simple that it only needs a handful of t > {: .solution} {: .challenge} +### Testing Pure Functions + Now we have a pure function for the analysis, we can write tests that cover all the things we would like tests to cover without depending on the data existing in CSVs. From fa67fab5a3f94da2c28424c3ec4266d0be242b2a Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:55:27 +0100 Subject: [PATCH 069/105] Add a hint showing how the class will be use --- _episodes/34-refactoring-decoupled-units.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 538c9ca5c..95ca16c4e 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -134,6 +134,21 @@ Then the method can access the **member variable** `radius`. > Put the configuration of where to load the files in the classes constructor. > Once this is done, you can construct this class outside the the statistical analysis > and pass the instance in to `analyse_data`. +>> ## Hint +>> When we have completed the refactoring, the code in the `analyse_data` function +>> should look like: +>> ```python +>> def analyse_data(data_source): +>> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... +>> ``` +>> The controller code should look like: +>> ```python +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> analyse_data(data_source) +>> ``` +> {: .solution} >> ## Solution >> You should have created a class that looks something like this: >> From c1eeb61c9dab37bf40816783f9fe89a8cdb23a9b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:56:00 +0100 Subject: [PATCH 070/105] Remove bit about adding a layer of indirection This isn't really a solution, and I think just muddies the meaning of the section --- _episodes/34-refactoring-decoupled-units.md | 9 --------- 1 file changed, 9 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 95ca16c4e..aaaaa87f7 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -196,15 +196,6 @@ Then the method can access the **member variable** `radius`. >> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 >> ... >> ``` ->> If this was a more complex refactoring, we could introduce an indirection to keep ->> the interface the same: ->> ```python ->> def analyse_data(dir_path): ->> data_source = CSVDataSource(os.path.dirname(InFiles[0])) ->> return analyse_data_from_source(data_source) ->> ``` ->> This can be a really useful intermediate step if `analyse_data` is called ->> from lots of different places. > {: .solution} {: .challenge} From 51a1f9973ff81e5201402a05ada07d52709fa688 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 16:58:56 +0100 Subject: [PATCH 071/105] Clarifying mocking section --- _episodes/34-refactoring-decoupled-units.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index aaaaa87f7..af859c675 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -316,12 +316,15 @@ That is, we have decoupled the job of loading the data from the job of analysing > {: .solution} {: .challenge} +## Testing using Mock Objects + We can use this abstraction to also make testing more straight forward. Instead of having our tests use real file system data, we can instead provide -a mock or dummy implementation instead of one of the DataSource classes. +a mock or dummy implementation instead of one of the real classes. +Providing what we substitute conforms to the same interface, the code we are testing will work +just the same. This dummy implementation could just returns some fixed example data. -Separately, we can test the file parsing class `CSVDataSource` without having to understand -the specifics of the statistical analysis. + An convenient way to do this in Python is using Python's [mock object library](https://docs.python.org/3/library/unittest.mock.html). These are a whole topic to themselves - From 84edea9a14cbf0c26c5216c682ac35f2d5a0a3dc Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 17:03:51 +0100 Subject: [PATCH 072/105] Add skeleton test for writing the mock test --- _episodes/34-refactoring-decoupled-units.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index af859c675..4072a357a 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -344,6 +344,21 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > ## Exercise: Test using a mock or dummy implementation +> Complete this test for analyse_data, using a mock object in place of the +> `data_source`: +> ```python +> from unittest.mock import Mock +> +> def test_compute_data_mock_source(): +> from inflammation.compute_data import analyse_data +> data_source = Mock() +> +> # TODO: configure data_source mock +> +> result = analyse_data(data_source) +> +> # TODO: add assert on the contents of result +> ``` > Create a mock for to provide as the `data_source` that returns some fixed data to test > the `analyse_data` method. > Use this mock in a test. From 3185e1bb65b5f2b02ca565cd0cf5ff1e95d8a73c Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 17:04:04 +0100 Subject: [PATCH 073/105] Remind students to import the appropriate package --- _episodes/34-refactoring-decoupled-units.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 4072a357a..10048f9a1 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -362,8 +362,12 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > Create a mock for to provide as the `data_source` that returns some fixed data to test > the `analyse_data` method. > Use this mock in a test. +> +> Don't forget you will need to import `Mock` from the `unittest.mock` package. >> ## Solution >> ```python +>> from unittest.mock import Mock +>> >> def test_compute_data_mock_source(): >> from inflammation.compute_data import analyse_data >> data_source = Mock() From f461a0591a60e7baa03ca4968a63b714212c6baa Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 17:43:23 +0100 Subject: [PATCH 074/105] Make example actually build a JSON reader --- _episodes/34-refactoring-decoupled-units.md | 49 +++++++++++++-------- _episodes/35-refactoring-architecture.md | 11 +++-- 2 files changed, 37 insertions(+), 23 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 10048f9a1..bcf271848 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -279,36 +279,47 @@ for free with no further work. That is, we have decoupled the job of loading the data from the job of analysing the data. > ## Exercise: Introduce an alternative implementation of DataSource -> Create another class that repeatedly asks the user for paths to CSVs to analyse. +> Create another class that supports loading JSON instead of CSV. +> There is a function in `models.py` that loads from JSON in the following format: +> ```json +> [ +> { +> "observations": [0, 1] +> }, +> { +> "observations": [0, 2] +> } +> ] +> ``` > It should implement the `load_inflammation_data` method. -> Finally, at run time provide an instance of the new implementation if the user hasn't -> put any files on the path. +> Finally, at run time construct an appropriate instance based on the file extension. >> ## Solution >> You should have created a class that looks something like: >> ```python ->> class UserProvidSpecificFilesDataSource: ->> def load_inflammation_data(self): ->> paths = [] ->> while(True): ->> input_string = input('Enter path to CSV or press enter to process paths collected: ') ->> if(len(input_string) == 0): ->> print(f'Finished entering input - will process {len(paths)} CSVs') ->> break ->> if os.path.exists(input_string): ->> paths.append(input_string) ->> else: ->> print(f'Path {input_string} does not exist, please enter a valid path') +>> class JSONDataSource: +>> """ +>> Loads all the inflammation JSON's within a specified folder. +>> """ +>> def __init__(self, dir_path): +>> self.dir_path = dir_path >> ->> data = map(models.load_csv, paths) +>> def load_inflammation_data(self): +>> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.json')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation JSON's found in path {self.dir_path}") +>> data = map(models.load_json, data_file_paths) >> return list(data) >> ``` >> Additionally, in the controller will need to select the appropriate DataSource to >> provide to the analysis: >>```python ->> if len(InFiles) == 0: ->> data_source = UserProvidSpecificFilesDataSource() ->> else: +>> _, extension = os.path.splitext(InFiles[0]) +>> if extension == '.json': +>> data_source = JSONDataSource() +>> elif extension == '.csv': >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> else: +>> raise ValueError(f'Unsupported file format: {extension}') >> analyse_data(data_source) >>``` >> As you have seen, all these changes were made without modifying diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index fca83749e..6da37e1bc 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -103,11 +103,14 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> >> ```python >> if args.full_data_analysis: ->> if len(InFiles) == 0: ->> data_source = UserProvidSpecificFilesDataSource() ->> else: +>> _, extension = os.path.splitext(InFiles[0]) +>> if extension == '.json': +>> data_source = JSONDataSource() +>> elif extension == '.csv': >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) ->> data_result = analyse_data(data_source) +>> else: +>> raise ValueError(f'Unsupported file format: {extension}') +>> analyse_data(data_source) >> graph_data = { >> 'standard deviation by day': data_result, >> } From 8ad11f7e2c3e1a96d71d875e8448450b9db9b51b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 17:48:17 +0100 Subject: [PATCH 075/105] Fix solutions based on testing --- _episodes/34-refactoring-decoupled-units.md | 2 +- _episodes/35-refactoring-architecture.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index bcf271848..8511e0f02 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -315,7 +315,7 @@ That is, we have decoupled the job of loading the data from the job of analysing >>```python >> _, extension = os.path.splitext(InFiles[0]) >> if extension == '.json': ->> data_source = JSONDataSource() +>> data_source = JSONDataSource(os.path.dirname(InFiles[0])) >> elif extension == '.csv': >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) >> else: diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 6da37e1bc..9a805fc41 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -105,7 +105,7 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> if args.full_data_analysis: >> _, extension = os.path.splitext(InFiles[0]) >> if extension == '.json': ->> data_source = JSONDataSource() +>> data_source = JSONDataSource(os.path.dirname(InFiles[0])) >> elif extension == '.csv': >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) >> else: From 47b3c81b4db0a1c7d1659e8893a16e8e4baebc14 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 18:00:25 +0100 Subject: [PATCH 076/105] Include the notion of writing tests before refactoring --- _episodes/32-software-design.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 700813f5c..fdf5f1d68 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -122,10 +122,11 @@ unchanged, but the code itself is easier to read, test and extend. When faced with a old piece of code that is hard to work with, that you need to modify, a good process to follow is: -1. Refactor the code in such a way that the new change will slot in cleanly. -2. Make the desired change, which now fits in easily. +1. Have tests that verify the current behaviour +2. Refactor the code in such a way that the new change will slot in cleanly. +3. Make the desired change, which now fits in easily. -Notice, after step 1, the *behaviour* of the code should be totally identical. +Notice, after step 2, the *behaviour* of the code should be totally identical. This allows you to test rigorously that the refactoring hasn't changed/broken anything *before* making the intended change. From a6726147dc6b3ecbee36a12b19b95ade93922997 Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Mon, 23 Oct 2023 18:02:43 +0100 Subject: [PATCH 077/105] Reiterate running the regression test after each refactor --- _episodes/33-refactoring-functions.md | 2 ++ _episodes/34-refactoring-decoupled-units.md | 2 ++ _episodes/35-refactoring-architecture.md | 2 ++ 3 files changed, 6 insertions(+) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 63a321492..544346128 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -194,6 +194,8 @@ that is maybe harder to test, but is so simple that it only needs a handful of t >> # views.visualize(graph_data) >> return daily_standard_deviation >>``` +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data`. > {: .solution} {: .challenge} diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 8511e0f02..8017d1415 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -61,6 +61,8 @@ then it becomes easier for these parts to change independently. >> This is now easier to understand, as we don't need to understand the the file loading >> to read the statistical analysis, and we don't have to understand the statistical analysis >> when reading the data loading. +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data`. > {: .solution} {: .challenge} diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 9a805fc41..637a04614 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -121,6 +121,8 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> regression test. >> This demonstrates that splitting up model code from view code can >> immediately make your code much more testable. +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data`. > {: .solution} {: .challenge} From e53f787fc62682a133976db4b198bb45b4747e54 Mon Sep 17 00:00:00 2001 From: Thomas Kiley <138868636+thomaskileyukaea@users.noreply.github.com> Date: Fri, 3 Nov 2023 18:00:34 +0000 Subject: [PATCH 078/105] Correct capitalisation of files Co-authored-by: Matthew --- _episodes/33-refactoring-functions.md | 2 +- _episodes/34-refactoring-decoupled-units.md | 2 +- _episodes/35-refactoring-architecture.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md index 544346128..42eae41f7 100644 --- a/_episodes/33-refactoring-functions.md +++ b/_episodes/33-refactoring-functions.md @@ -1,5 +1,5 @@ --- -title: "Refactoring functions to do just one thing" +title: "Refactoring Functions to Do Just One Thing" teaching: 30 exercises: 20 questions: diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 8017d1415..694c8705f 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -1,5 +1,5 @@ --- -title: "Using classes to de-couple code." +title: "Using Classes to De-Couple Code" teaching: 30 exercises: 45 questions: diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md index 637a04614..a00390828 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-refactoring-architecture.md @@ -1,5 +1,5 @@ --- -title: "Architecting code to separate responsibilities" +title: "Architecting Code to Separate Responsibilities" teaching: 15 exercises: 50 questions: From e7a3e0e44f6170b935d2216ea562bd5b28a0046b Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Fri, 3 Nov 2023 17:57:56 +0000 Subject: [PATCH 079/105] Fix missing fullstop --- _episodes/34-refactoring-decoupled-units.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md index 694c8705f..a9e82d9a9 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-refactoring-decoupled-units.md @@ -38,7 +38,7 @@ If one part of the code only uses another part through an appropriate abstractio then it becomes easier for these parts to change independently. > ## Exercise: Decouple the file loading from the computation -> Currently the function is hard coded to load all the files in a directory +> Currently the function is hard coded to load all the files in a directory. > Decouple this into a separate function that returns all the files to load >> ## Solution >> You should have written a new function that reads all the data into the format needed From 224ea65c11ef32f93e2482cdcc2bd5dd7fb85ccc Mon Sep 17 00:00:00 2001 From: Thomas Kiley Date: Fri, 10 Nov 2023 13:26:02 +0000 Subject: [PATCH 080/105] Fix line numbers for solutions based on change to code Adding in the four lines for the "full data analysis" shifts these errors down by four lines. --- _episodes/15-coding-conventions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_episodes/15-coding-conventions.md b/_episodes/15-coding-conventions.md index 550e0feb6..e487dd91a 100644 --- a/_episodes/15-coding-conventions.md +++ b/_episodes/15-coding-conventions.md @@ -438,7 +438,7 @@ because an incorrect comment causes more confusion than no comment at all. >> which is helpfully marking inconsistencies with coding guidelines by underlying them. >> There are a few things to fix in `inflammation-analysis.py`, for example: >> ->> 1. Line 24 in `inflammation-analysis.py` is too long and not very readable. +>> 1. Line 30 in `inflammation-analysis.py` is too long and not very readable. >> A better style would be to use multiple lines and hanging indent, >> with the closing brace `}' aligned either with >> the first non-whitespace character of the last line of list @@ -487,7 +487,7 @@ because an incorrect comment causes more confusion than no comment at all. >> Note how PyCharm is warning us by underlying the whole line. >> >> 4. Only one blank line after the end of definition of function `main` ->> and the rest of the code on line 30 in `inflammation-analysis.py` - +>> and the rest of the code on line 33 in `inflammation-analysis.py` - >> should be two blank lines. >> Note how PyCharm is warning us by underlying the whole line. >> From 8f1937b945f4145fe00baa59def8b117f912f047 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Tue, 12 Dec 2023 13:57:51 +0000 Subject: [PATCH 081/105] Link and typo fixes --- _episodes/30-section3-intro.md | 2 +- _extras/databases.md | 2 +- _extras/persistence.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index 4bd5bb742..5bfdb39f1 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -134,7 +134,7 @@ within the context of the typical software development process: - How to improve existing code to be more readable, maintainable and testable. - Consider different strategies for writing well designed code, including using **pure functions**, **classes** and **abstractions**. -- How to create, asses and improve software design. +- How to create, assess and improve software design. {% include links.md %} diff --git a/_extras/databases.md b/_extras/databases.md index eed05cd55..5fda791d9 100644 --- a/_extras/databases.md +++ b/_extras/databases.md @@ -16,7 +16,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../35-refactoring-architecture/index.html). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#additional-material). {: .callout} A **database** is an organised collection of data, diff --git a/_extras/persistence.md b/_extras/persistence.md index 340ef540d..b207e0458 100644 --- a/_extras/persistence.md +++ b/_extras/persistence.md @@ -25,7 +25,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../35-refactoring-architecture/index.html). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#additional-material). {: .callout} Our patient data system so far can read in some data, process it, and display it to people. From e3a994e811fba3f2a36c54121536e9405ee6de2c Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Thu, 14 Dec 2023 14:27:07 +0000 Subject: [PATCH 082/105] Initial pass over intro and first 2 episodes of section 3 --- _episodes/30-section3-intro.md | 25 +- _episodes/31-software-requirements.md | 5 +- _episodes/32-software-design.md | 336 +++++---- _extras/databases.md | 4 +- _extras/functional-programming.md | 826 +++++++++++++++++++++ _extras/persistence.md | 4 +- _extras/protect-main-branch.md | 2 +- _extras/software-architecture-paradigms.md | 281 +++++++ _extras/vscode.md | 2 +- 9 files changed, 1321 insertions(+), 164 deletions(-) create mode 100644 _extras/functional-programming.md create mode 100644 _extras/software-architecture-paradigms.md diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index 5bfdb39f1..e969de22a 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -13,7 +13,11 @@ objectives: keypoints: - "Software engineering takes a wider view of software development beyond programming (or coding)." - "Ensuring requirements are sufficiently captured is critical to the success of any project." -- "Following a process makes development predictable, can save time, and helps ensure each stage of development is given sufficient consideration before proceeding to the next." +- "Following a process makes software development predictable, saves time in the long run, + and helps ensure each stage of development is given sufficient consideration + before proceeding to the next." +- "Once you get the hang of a programming language, writing code to do what you want is relatively +easy. The hard part is writing code that is easy to adapt when your requirements change." --- In this section, we will take a step back from coding development practices and tools @@ -65,7 +69,7 @@ Someone who is engineering software takes a wider view: but there is an assumption that the software - or even just a part of it - could be reused in the future. -### The Software Development Process +### Software Development Process The typical stages of a software development process can be categorised as follows: @@ -99,7 +103,7 @@ these stages are followed implicitly or explicitly in every software project. What is required for a project (during requirements gathering) is always considered, for example, even if it isn't explored sufficiently or well understood. -Following a process of development offers some major benefits: +Following a **process** of development offers some major benefits: - **Stage gating:** a quality *gate* at the end of each stage, where stakeholders review the stage's outcomes to decide @@ -115,26 +119,27 @@ Following a process of development offers some major benefits: - **Transparency:** essentially, each stage generates output(s) into subsequent stages, which presents opportunities for them to be published as part of an open development process. -- **It saves time:** a well-known result from +- **Time saving:** a well-known result from [empirical software engineering studies](https://web.archive.org/web/20160731150816/http://superwebdeveloper.com/2009/11/25/the-incredible-rate-of-diminishing-returns-of-fixing-software-bugs/) - is that it becomes exponentially more expensive to fix mistakes in future stages. - For example, if a mistake takes 1 hour to fix in requirements, + is that fixing software mistakes is exponentially more expensive in later software development + stages. + For example, if a mistake takes 1 hour to fix in the requirements stage, it may take 5 times that during design, and perhaps as much as 20 times that to fix if discovered during testing. In this section we will place the actual writing of software (implementation) -within the context of the typical software development process: +within the context of a typical software development process: - Explore the **importance of software requirements**, - the different classes of requirements, + different classes of requirements, and how we can interpret and capture them. - How requirements inform and drive the **design of software**, the importance, role, and examples of **software architecture**, and the ways we can describe a software design. -- How to improve existing code to be more readable, maintainable and testable. +- How to **improve** existing code to be more **readable**, **testable** and **maintainable**. - Consider different strategies for writing well designed code, including using **pure functions**, **classes** and **abstractions**. -- How to create, assess and improve software design. +- How to create, assess and improve **software design**. {% include links.md %} diff --git a/_episodes/31-software-requirements.md b/_episodes/31-software-requirements.md index 78cca1e8f..917726df2 100644 --- a/_episodes/31-software-requirements.md +++ b/_episodes/31-software-requirements.md @@ -22,7 +22,7 @@ The requirements of our software are the basis on which the whole project rests if we get the requirements wrong, we'll build the wrong software. However, it's unlikely that we'll be able to determine all of the requirements upfront. Especially when working in a research context, -requirements are flexible and may change as we develop our software. +requirements are flexible and may change as we develop our software. ## Types of Requirements @@ -226,8 +226,7 @@ and these aspects should be considered as part of the software's non-functional > Think back to a piece of code or software (either small or large) you've written, > or which you have experience using. > First, try to formulate a few of its key business requirements, -> then derive these into user and then solution requirements -> (in a similar fashion to the ones above in *Types of Requirements*). +> then derive these into user and then solution requirements. {: .challenge} diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index fdf5f1d68..7cf76c767 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -1,152 +1,107 @@ --- -title: "Software Architecture and Design" +title: "Software Design" teaching: 25 exercises: 20 questions: +- "Why should we invest time in software design?" - "What should we consider when designing software?" -- "What goals should we have when structuring our code?" -- "What is refactoring?" objectives: -- "Know what goals we have when architecting and designing software." -- "Understand what an abstraction is, and when you should use one." -- "Understand what refactoring is." +- "Understand the goals and principles of designing 'good' software." +- "Understand what a code abstraction is, and when we should use it." +- "Understand what code refactoring is." keypoints: -- "How code is structured is important for helping future people understand and update it" -- "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. -Such components can be as small as a single function, or be a software package in their own right." -- "These smaller components can be understood individually without having to understand the entire codebase at once." - "When writing software used for research, requirements will almost *always* change." -- "*'Good code is written so that is readable, understandable, covered by automated tests, not over complicated and does well what is intended to do.'*" +- "'Good' code is designed to be maintainable: readable by people who did not author the code, +testable through a set of automated tests, adaptable to new requirements." +- "The sooner you adopt a practice of designing your software in the lifecycle of your project, +the easier the development and maintenance process will." --- ## Introduction -Typically when we start writing code, we write small scripts that -we intend to use. -We probably don't imagine we will need to change the code in the future. -We almost certainly don't expect other people will need to understand -and modify the code in the future. -However, as projects grow in complexity and the number of people involved grows, -it becomes important to think about how to structure code. -Software Architecture and Design is all about thinking about ways to make the -code be **maintainable** as projects grow. - -Maintainable code is: - - * Readable to people who didn't write the code. - * Testable through automated tests (like those from [episode 2](../21-automatically-testing-software/index.html)). - * Adaptable to new requirements. - -Writing code that meets these requirements is hard and takes practise. -Further, in most contexts you will already have a piece of code that breaks -some (or maybe all!) of these principles. - -> ## Group Exercise: Think about examples of good and bad code -> Try to come up with examples of code that has been hard to understand - why? +Ideally, we should have at least a rough design sketched out for our software before we write a +single line of code. +This design should be based around the requirements and the structure of the problem we are trying +to solve: what are the concepts we need to represent and what are the relationships between them. +And importantly, who will be using our software and how will they interact with it. + +As a piece of software grows, +it will reach a point where there's too much code for us to keep in mind at once. +At this point, it becomes particularly important to think of the overall design and +structure of our software, how should all the pieces of functionality fit together, +and how should we work towards fulfilling this overall design throughout development. +Even if you did not think about the design of your software from the very beginning - +it is not too late to start now. + +It's not easy to come up with a complete definition for the term **software design**, +but some of the common aspects are: + +- **Algorithm design** - + what method are we going to use to solve the core research/business problem? +- **Software architecture** - + what components will the software have and how will they cooperate? +- **System architecture** - + what other things will this software have to interact with and how will it do this? +- **UI/UX** (User Interface / User Experience) - + how will users interact with the software? + +There is literature on each of the above software design aspects - we will not go into details of +them all here. +Instead, we will learn some techniques to structure our code better to satisfy some of the +requirements of 'good' software and revisit +our software's [MVC architecture](/11-software-project/index.html#software-architecture) +in the context of software design. + +## Good Software Design Goals +Aspirationally, what makes good code can be summarised in the following quote from the +[Intent HG blog](https://intenthq.com/blog/it-audience/what-is-good-code-a-scientific-definition/): + +> *“Good code is written so that is readable, understandable, +> covered by automated tests, not over complicated +> and does well what is intended to do.”* + +Software has become a crucial aspect of reproducible research, as well as an asset that +can be reused or repurposed. +Thus, it is even more important to take time to design the software to be easily *modifiable* and +*extensible*, to save ourselves and our team a lot of time later on when we have +to fix a problem or the software's requirements change. + +Satisfying the above properties will lead to an overall software design +goal of having *maintainable* code, which is: + +* *readable* (and understandable) by developers who did not write the code, e.g. by: + * following a consistent coding style and naming conventions + * using meaningful and descriptive names for variables, functions, and classes + * documenting code to describe it does and how it may be used + * using simple control flow to make it easier to follow the code execution + * keeping functions and methods small and focused on a single task and avoiding large functions + that do a little bit of everything (also important for testing) +* *testable* through a set of (preferably automated) tests, e.g. by: + * writing unit, functional, regression tests to verify the code produces + the expected outputs from controlled inputs and exhibits the expected behavior over time + as the code changes +* *adaptable* (easily modifiable and extensible) to satisfy new requirements, e.g. by: + * writing low-coupled/decoupled code where each part of the code has a separate concern and + the lowest possible dependency on other parts of the code making it + easier to test, update or replace - e.g. by separating the "business logic" and "presentation" + layers of the code on the architecture level (recall the [MVC architecture](/11-software-project/index.html#software-architecture)), + or separating "pure" (without side-effects) and "impure" (with side-effects) parts of the code on the + level of functions. + +Now that we know what goals we should aspire to, let's take a critical look at the code in our +software project and try to identify ways in which it can be improved. + +> ## Exercise: Identifying How Code Can be Improved? +> A team member has implemented a feature to our inflammation analysis software so that when a +> `--full-data-analysis` command line parameter parameter is passed to software, +> it scans the directory of one of the provided files, compares standard deviations across +> the data by day and plots a graph. +> The code is located in `compute_data.py` file within the `inflammation` project +> in a function called `analyse_data()`. > -> Try to come up with examples of code that was easy to understand and modify - why? -{: .challenge} - -In this episode we will explore techniques and processes that can help you -continuously improve the quality of code so, over time, it tends towards more -maintainable code. - -We will look at: - - * What abstractions are, and how to pick appropriate ones. - * How to take code that is in a bad shape and improve it. - * Best practises to write code in ways that facilitate achieving these goals. - -### Cognitive Load - -When we are trying to understand a piece of code, in our heads we are storing -what the different variables mean and what the lines of code will do. -**Cognitive load** is a way of thinking about how much information we have to store in our -heads to understand a piece of code. - -The higher the cognitive load, the harder it is to understand the code. -If it is too high, we might have to create diagrams to help us hold it all in our head -or we might just decide we can't understand it. - -There are lots of ways to keep cognitive load down: - -* Good variable and function names -* Simple control flow -* Having each function do just one thing - -## Abstractions - -An **abstraction**, at its most basic level, is a technique to hide the details -of one part of a system from another part of the system. -We deal with abstractions all the time - when you press the break pedal on the -car, you do not know how this manages both slowing down the engine and applying -pressure on the breaks. -The advantage of using this abstraction is, when something changes, for example -the introduction of anti-lock breaking or an electric engine, the driver does -not need to do anything differently - -the detail of how the car breaks is *abstracted* away from them. - -Abstractions are a fundamental part of software. -For example, when you write Python code, you are dealing with an -abstraction of the computer. -You don't need to understand how RAM functions. -Instead, you just need to understand how variables work in Python. - -In large projects it is vital to come up with good abstractions. -A good abstraction makes code easier to read, as the reader doesn't need to understand -all the details of the project to understand one part. -An abstraction lowers the cognitive load of a bit of code, -as there is less to understand at once. - -A good abstraction makes code easier to test, as it can be tested in isolation -from everything else. - -Finally, a good abstraction makes code easier to adapt, as the details of -how a subsystem *used* to work are hidden from the user, so when they change, -the user doesn't need to know. - -In this episode we are going to look at some code and introduce various -different kinds of abstraction. -However, fundamentally any abstraction should be serving these goals. - -## Refactoring - -Often we are not working on brand new projects, but instead maintaining an existing -piece of software. -Often, this piece of software will be hard to maintain, perhaps because it has hard to understand, or doesn't have any tests. -In this situation, we want to adapt the code to make it more maintainable. -This will allow greater confidence of the code, as well as making future development easier. - -**Refactoring** is a process where some code is modified, such that its external behaviour remains -unchanged, but the code itself is easier to read, test and extend. - -When faced with a old piece of code that is hard to work with, that you need to modify, a good process to follow is: - -1. Have tests that verify the current behaviour -2. Refactor the code in such a way that the new change will slot in cleanly. -3. Make the desired change, which now fits in easily. - -Notice, after step 2, the *behaviour* of the code should be totally identical. -This allows you to test rigorously that the refactoring hasn't changed/broken anything -*before* making the intended change. - -In this episode, we will be making some changes to an existing bit of code that -is in need of refactoring. - -## The code for this episode - -The code itself is a feature to the inflammation tool we've been working on. - -In it, if the user adds `--full-data-analysis` then the program will scan the directory -of one of the provided files, compare standard deviations across the data by day and -plot a graph. - -The main body of it exists in `inflammation/compute_data.py` in a function called `analyse_data`. - -We are going to be refactoring and extending this over the remainder of this episode. - -> ## Group Exercise: What is bad about this code? -> In what ways does this code not live up to the ideal properties of maintainable code? +> Critically examine this new code. +> In what ways does this code not live up to the ideal properties +> of maintainable code? > Think about ways in which you find it hard to understand. > Think about the kinds of changes you might want to make to it, and what would > make making those changes challenging. @@ -154,16 +109,107 @@ We are going to be refactoring and extending this over the remainder of this epi >> You may have found others, but here are some of the things that make the code >> hard to read, test and maintain: >> ->> * **Hard to read:** Everything is in a single function - reading it you have to understand how the file loading -works at the same time as the analysis itself. ->> * **Hard to modify:** If you want to use the data without using the graph you'd have to change it ->> * **Hard to modify or test:** It is always analysing a fixed set of data stored on the disk ->> * **Hard to modify:** It doesn't have any tests meaning changes might break something ->> ->> Keep the list you have created. ->> At the end of this section we will revisit this ->> and check that we have learnt ways to address the problems we found. +>> * **Hard to read:** everything is implemented in a single function. +>> In order to understand it, you need to understand how file loading works at the same time as +>> the analysis itself. +>> * **Hard to modify:** if you want to use the data without using the graph you would have to +>> change the function. +>> * **Hard to modify or test:** it is always analysing a fixed set of data stored on the disk. +>> * **Hard to modify:** it does not have any tests meaning changes may break something and it +>> would be hard to know what. +>> +>> Make sure to keep the list you have created in the exercise above. +>> For the remainder of this section, we will work on improving this code. +>> At the end, we will revisit your list to check that you have learnt ways to address each of the +>> problems you had found. > {: .solution} {: .challenge} +## Technical Debt + +When faced with a problem that you need to solve by writing code - it may be tempted to +skip the design phase and dive straight into coding. +What happens if you do not follow the good software design and development best practices? +It can lead to accumulated 'technical debt', +which (according to [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt)), +is the "cost of additional rework caused by choosing an easy (limited) solution now +instead of using a better approach that would take longer". +The pressure to achieve project goals can sometimes lead to quick and easy solutions, +which make the software become +more messy, more complex, and more difficult to understand and maintain. +The extra effort required to make changes in the future is the interest paid on the (technical) debt. +It is natural for software to accrue some technical debt, +but it is important to pay off that debt during a maintenance phase - +simplifying, clarifying the code, making it easier to understand - +to keep these interest payments on making changes manageable. + +There is only so much time available in a project. +How much effort should we spend on designing our code properly +and using good development practices? +The following [XKCD comic](https://xkcd.com/844/) summarises this tension: + +![Writing good code comic](../fig/xkcd-good-code-comic.png){: .image-with-shadow width="400px" } + +At an intermediate level there are a wealth of practices that *could* be used, +and applying suitable design and coding practices is what separates +an *intermediate developer* from someone who has just started coding. +The key for an intermediate developer is to balance these concerns +for each software project appropriately, +and employ design and development practices *enough* so that progress can be made. +It is very easy to under-design software, +but remember it's also possible to over-design software too. + +## Techniques for Improving Code + +How code is structured is important for helping people who are developing and maintaining it +to understand and update it. +By breaking down our software into components with a single responsibility, +we avoid having to rewrite it all when requirements change. +Such components can be as small as a single function, or be a software package in their own right. +These smaller components can be understood individually without having to understand +the entire codebase at once. + +### Code Refactoring + +*Refactoring* is the process of changing the internal structure of code without changing its +external behavior, with the goal of making the code more readable, maintainable, efficient or easier +to test. +This can include things such as renaming variables, reorganising +functions to avoid code duplication and increase reuse, and simplifying conditional statements. + +When faced with an existing piece of code that needs modifying a good refactoring +process to follow is: + +1. Make sure you have tests that verify the current behaviour +2. Refactor the code in such a way that the behaviour of the code is identical to that +before refactoring + +Another technique to use when improving code are *abstractions*. + +### Abstractions + +*Abstraction* is the process of hiding the implementation details of a piece of +code behind an interface - i.e. the details of *how* something works are hidden away, +leaving us to deal only with *what* it does. +This allows developers to work with the code at a higher level +of abstraction, without needing to understand the underlying details. +Abstraction is used to simplify complex systems by breaking them down into smaller, +more manageable parts. + +Abstraction can be +achieved through techniques like *encapsulation*, *inheritance*, and *polymorphism*, which we will +cover in the next episodes. + +## Improving Our Software Design + +Both refactoring and abstraction are important for creating maintainable code. +Refactoring helps to keep the codebase clean and easy to understand, while abstraction allows +developers to work with the code in a more abstract and modular way. + +Writing good code is hard and takes practise. +You may also be faced with an existing piece of code that breaks some (or all) of the +good code principles, and your job will be to improve it so that the code can evolve further. +In the rest of this section, we will use the refactoring and abstraction techniques to +help us redesign our code to incrementally improve its quality. + {% include links.md %} diff --git a/_extras/databases.md b/_extras/databases.md index 5fda791d9..9f0267a27 100644 --- a/_extras/databases.md +++ b/_extras/databases.md @@ -1,5 +1,5 @@ --- -title: "Additional Material: Databases" +title: "Databases" layout: episode teaching: 30 exercises: 30 @@ -16,7 +16,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../35-refactoring-architecture/index.html#additional-material). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#conclusion). {: .callout} A **database** is an organised collection of data, diff --git a/_extras/functional-programming.md b/_extras/functional-programming.md new file mode 100644 index 000000000..a9b5fb30d --- /dev/null +++ b/_extras/functional-programming.md @@ -0,0 +1,826 @@ +--- +title: "Functional Programming" +teaching: 30 +exercises: 30 +layout: episode +questions: +- What is functional programming? +- Which situations/problems is functional programming well suited for? +objectives: +- Describe the core concepts that define the functional programming paradigm +- Describe the main characteristics of code that is written in functional programming style +- Learn how to generate and process data collections efficiently using MapReduce and Python's comprehensions +keypoints: +- Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations). +- In functional programming, functions tend to be *pure* - they do not exhibit *side-effects* (by not affecting anything other than the value they return or anything outside a function). Functions can also be named, passed as arguments, and returned from other functions, just as any other data type. +- MapReduce is an instance of a data generation and processing approach, in particular suited for functional programming and handling Big Data within parallel and distributed environments. +- Python provides comprehensions for lists, dictionaries, sets and generators - a concise (if not strictly functional) way to generate new data from existing data collections while performing sophisticated mapping, filtering and conditional logic on original dataset's members. +--- + +## Introduction + +Functional programming is a programming paradigm where +programs are constructed by applying and composing/chaining **functions**. +Functional programming is based on the +[mathematical definition of a function](https://en.wikipedia.org/wiki/Function_(mathematics)) +`f()`, +which applies a transformation to some input data giving us some other data as a result +(i.e. a mapping from input `x` to output `f(x)`). +Thus, a program written in a functional style becomes a series of transformations on data +which are performed to produce a desired output. +Each function (transformation) taken by itself is simple and straightforward to understand; +complexity is handled by composing functions in various ways. + +Often when we use the term function we are referring to +a construct containing a block of code which performs a particular task and can be reused. +We have already seen this in procedural programming - +so how are functions in functional programming different? +The key difference is that functional programming is focussed on +**what** transformations are done to the data, +rather than **how** these transformations are performed +(i.e. a detailed sequence of steps which update the state of the code to reach a desired state). +Let's compare and contrast examples of these two programming paradigms. + +## Functional vs Procedural Programming + +The following two code examples implement the calculation of a factorial +in procedural and functional styles, respectively. +Recall that the factorial of a number `n` (denoted by `n!`) is calculated as +the product of integer numbers from 1 to `n`. + +The first example provides a procedural style factorial function. + +~~~ +def factorial(n): + """Calculate the factorial of a given number. + + :param int n: The factorial to calculate + :return: The resultant factorial + """ + if n < 0: + raise ValueError('Only use non-negative integers.') + + factorial = 1 + for i in range(1, n + 1): # iterate from 1 to n + # save intermediate value to use in the next iteration + factorial = factorial * i + + return factorial +~~~ +{: .language-python} + +Functions in procedural programming are *procedures* that describe +a detailed list of instructions to tell the computer what to do step by step +and how to change the state of the program and advance towards the result. +They often use *iteration* to repeat a series of steps. +Functional programming, on the other hand, typically uses *recursion* - +an ability of a function to call/repeat itself until a particular condition is reached. +Let's see how it is used in the functional programming example below +to achieve a similar effect to that of iteration in procedural programming. + +~~~ +# Functional style factorial function +def factorial(n): + """Calculate the factorial of a given number. + + :param int n: The factorial to calculate + :return: The resultant factorial + """ + if n < 0: + raise ValueError('Only use non-negative integers.') + + if n == 0 or n == 1: + return 1 # exit from recursion, prevents infinite loops + else: + return n * factorial(n-1) # recursive call to the same function +~~~ +{: .language-python} + +***Note:** You may have noticed that both functions in the above code examples have the same signature +(i.e. they take an integer number as input and return its factorial as output). +You could easily swap these equivalent implementations +without changing the way that the function is invoked. +Remember, a single piece of software may well contain instances of multiple programming paradigms - +including procedural, functional and object-oriented - +it is up to you to decide which one to use and when to switch +based on the problem at hand and your personal coding style.* + +Functional computations only rely on the values that are provided as inputs to a function +and not on the state of the program that precedes the function call. +They do not modify data that exists outside the current function, including the input data - +this property is referred to as the *immutability of data*. +This means that such functions do not create any *side effects*, +i.e. do not perform any action that affects anything other than the value they return. +For example: printing text, +writing to a file, +modifying the value of an input argument, +or changing the value of a global variable. +Functions without side affects +that return the same data each time the same input arguments are provided +are called *pure functions*. + +> ## Exercise: Pure Functions +> +> Which of these functions are pure? +> If you're not sure, explain your reasoning to someone else, do they agree? +> +> ~~~ +> def add_one(x): +> return x + 1 +> +> def say_hello(name): +> print('Hello', name) +> +> def append_item_1(a_list, item): +> a_list += [item] +> return a_list +> +> def append_item_2(a_list, item): +> result = a_list + [item] +> return result +> ~~~ +> {: .language-python} +> +> > ## Solution +> > +> > 1. `add_one` is pure - it has no effects other than to return a value and this value will always be the same when given the same inputs +> > 2. `say_hello` is not pure - printing text counts as a side effect, even though it is the clear purpose of the function +> > 3. `append_item_1` is not pure - the argument `a_list` gets modified as a side effect - try this yourself to prove it +> > 4. `append_item_2` is pure - the result is a new variable, so this time `a_list` does not get modified - again, try this yourself +> {: .solution} +{: .challenge} + +## Benefits of Functional Code + +There are a few benefits we get when working with pure functions: + +- Testability +- Composability +- Parallelisability + +**Testability** indicates how easy it is to test the function - usually meaning unit tests. +It is much easier to test a function if we can be certain that +a particular input will always produce the same output. +If a function we are testing might have different results each time it runs +(e.g. a function that generates random numbers drawn from a normal distribution), +we need to come up with a new way to test it. +Similarly, it can be more difficult to test a function with side effects +as it is not always obvious what the side effects will be, or how to measure them. + +**Composability** refers to the ability to make a new function from a chain of other functions +by piping the output of one as the input to the next. +If a function does not have side effects or non-deterministic behaviour, +then all of its behaviour is reflected in the value it returns. +As a consequence of this, any chain of combined pure functions is itself pure, +so we keep all these benefits when we are combining functions into a larger program. +As an example of this, we could make a function called `add_two`, +using the `add_one` function we already have. + +~~~ +def add_two(x): + return add_one(add_one(x)) +~~~ +{: .language-python} + +**Parallelisability** is the ability for operations to be performed at the same time (independently). +If we know that a function is fully pure and we have got a lot of data, +we can often improve performance by +splitting data and distributing the computation across multiple processors. +The output of a pure function depends only on its input, +so we will get the right result regardless of when or where the code runs. + +> ## Everything in Moderation +> Despite the benefits that pure functions can bring, +> we should not be trying to use them everywhere. +> Any software we write needs to interact with the rest of the world somehow, +> which requires side effects. +> With pure functions you cannot read any input, write any output, +> or interact with the rest of the world in any way, +> so we cannot usually write useful software using just pure functions. +> Python programs or libraries written in functional style will usually not be +> as extreme as to completely avoid reading input, writing output, +> updating the state of internal local variables, etc.; +> instead, they will provide a functional-appearing interface +> but may use non-functional features internally. +> An example of this is the [Python Pandas library](https://pandas.pydata.org/) +> for data manipulation built on top of NumPy - +> most of its functions appear pure +> as they return new data objects instead of changing existing ones. +{: .callout} + +There are other advantageous properties that can be derived from the functional approach to coding. +In languages which support functional programming, +a function is a *first-class object* like any other object - +not only can you compose/chain functions together, +but functions can be used as inputs to, +passed around or returned as results from other functions +(remember, in functional programming *code is data*). +This is why functional programming is suitable for processing data efficiently - +in particular in the world of Big Data, where code is much smaller than the data, +sending the code to where data is located is cheaper and faster than the other way round. +Let's see how we can do data processing using functional programming. + +## MapReduce Data Processing Approach + +When working with data you will often find that you need to +apply a transformation to each datapoint of a dataset +and then perform some aggregation across the whole dataset. +One instance of this data processing approach is known as MapReduce +and is applied when processing (but not limited to) Big Data, +e.g. using tools such as [Spark](https://en.wikipedia.org/wiki/Apache_Spark) +or [Hadoop](https://hadoop.apache.org/). +The name MapReduce comes from applying an operation to (mapping) each value in a dataset, +then performing a reduction operation which +collects/aggregates all the individual results together to produce a single result. +MapReduce relies heavily on composability and parallelisability of functional programming - +both map and reduce can be done in parallel and on smaller subsets of data, +before aggregating all intermediate results into the final result. + +### Mapping +`map(f, C)` is a function takes another function `f()` and a collection `C` of data items as inputs. +Calling `map(f, C)` applies the function `f(x)` to every data item `x` in a collection `C` +and returns the resulting values as a new collection of the same size. + +This is a simple mapping that takes a list of names and +returns a list of the lengths of those names using the built-in function `len()`: + +~~~ +name_lengths = map(len, ["Mary", "Isla", "Sam"]) +print(list(name_lengths)) +~~~ +{: .language-python} +~~~ +[4, 4, 3] +~~~ +{: .output} + +This is a mapping that squares every number in the passed collection using anonymous, +inlined *lambda* expression (a simple one-line mathematical expression representing a function): + +~~~ +squares = map(lambda x: x * x, [0, 1, 2, 3, 4]) +print(list(squares)) +~~~ +{: .language-python} +~~~ +[0, 1, 4, 9, 16] +~~~ +{: .output} + +> ## Lambda +> Lambda expressions are used to create anonymous functions that can be used to +> write more compact programs by inlining function code. +> A lambda expression takes any number of input parameters and +> creates an anonymous function that returns the value of the expression. +> So, we can use the short, one-line `lambda x, y, z, ...: expression` code +> instead of defining and calling a named function `f()` as follows: +> ~~~ +> def f(x, y, z, ...): +> return expression +> ~~~ +> {: .language-python} +> The major distinction between lambda functions and ‘normal’ functions is that +> lambdas do not have names. +> We could give a name to a lambda expression if we really wanted to - +> but at that point we should be using a ‘normal’ Python function instead. +> +> ~~~ +> # Don't do this +> add_one = lambda x: x + 1 +> +> # Do this instead +> def add_one(x): +> return x + 1 +> ~~~ +> {: .language-python} +{: .callout} + +In addition to using built-in or inlining anonymous lambda functions, +we can also pass a named function that we have defined ourselves to the `map()` function. + +~~~ +def add_one(num): + return num + 1 + +result = map(add_one, [0, 1, 2]) +print(list(result)) +~~~ +{: .language-python} +~~~ +[1, 2, 3] +~~~ +{: .output} + +> ## Exercise: Check Inflammation Patient Data Against A Threshold Using Map +> Write a new function called `daily_above_threshold()` in our inflammation `models.py` that +> determines whether or not each daily inflammation value for a given patient +> exceeds a given threshold. +> +> Given a patient row number in our data, the patient dataset itself, and a given threshold, +> write the function to use `map()` to generate and return a list of booleans, +> with each value representing whether or not the daily inflammation value for that patient +> exceeded the given threshold. +> +> Ordinarily we would use Numpy's own `map` feature, +> but for this exercise, let's try a solution without it. +> +> > ## Solution +> > ~~~ +> > def daily_above_threshold(patient_num, data, threshold): +> > """Determine whether or not each daily inflammation value exceeds a given threshold for a given patient. +> > +> > :param patient_num: The patient row number +> > :param data: A 2D data array with inflammation data +> > :param threshold: An inflammation threshold to check each daily value against +> > :returns: A boolean list representing whether or not each patient's daily inflammation exceeded the threshold +> > """ +> > +> > return list(map(lambda x: x > threshold, data[patient_num])) +> > ~~~ +> > {: .language-python} +> > +> > ***Note:** `map()` function returns a map iterator object +> > which needs to be converted to a collection object +> > (such as a list, dictionary, set, tuple) +> > using the corresponding "factory" function (in our case `list()`).* +> {: .solution} +{: .challenge} + +#### Comprehensions for Mapping/Data Generation + +Another way you can generate new collections of data from existing collections in Python is +using *comprehensions*, +which are an elegant and concise way of creating data from +[iterable objects](https://www.w3schools.com/python/python_iterators.asp) using *for loops*. +While not a pure functional concept, +comprehensions provide data generation functionality +and can be used to achieve the same effect as the built-in "pure functional" function `map()`. +They are commonly used and actually recommended as a replacement of `map()` in modern Python. +Let's have a look at some examples. + +~~~ +integers = range(5) +double_ints = [2 * i for i in integers] + +print(double_ints) +~~~ +{: .language-python} +~~~ +[0, 2, 4, 6, 8] +~~~ +{: .output} + +The above example uses a *list comprehension* to double each number in a sequence. +Notice the similarity between the syntax for a list comprehension and a for loop - +in effect, this is a for loop compressed into a single line. +In this simple case, the code above is equivalent to using a map operation on a sequence, +as shown below: + +~~~ +integers = range(5) +double_ints = map(lambda i: 2 * i, integers) +print(list(double_ints)) +~~~ +{: .language-python} +~~~ +[0, 2, 4, 6, 8] +~~~ +{: .output} + +We can also use list comprehensions to filter data, by adding the filter condition to the end: + +~~~ +double_even_ints = [2 * i for i in integers if i % 2 == 0] +print(double_even_ints) +~~~ +{: .language-python} +~~~ +[0, 4, 8] +~~~ +{: .output} + +> ## Set and Dictionary Comprehensions and Generators +> We also have *set comprehensions* and *dictionary comprehensions*, +> which look similar to list comprehensions +> but use the set literal and dictionary literal syntax, respectively. +> ~~~ +> double_even_int_set = {2 * i for i in integers if i % 2 == 0} +> print(double_even_int_set) +> +> double_even_int_dict = {i: 2 * i for i in integers if i % 2 == 0} +> print(double_even_int_dict) +> ~~~ +> {: .language-python} +> ~~~ +> {0, 4, 8} +> {0: 0, 2: 4, 4: 8} +> ~~~ +> {: .output} +> +> Finally, there’s one last ‘comprehension’ in Python - a *generator expression* - +> a type of an iterable object which we can take values from and loop over, +> but does not actually compute any of the values until we need them. +> Iterable is the generic term for anything we can loop or iterate over - +> lists, sets and dictionaries are all iterables. +> +>The `range` function is an example of a generator - +> if we created a `range(1000000000)`, but didn’t iterate over it, +> we’d find that it takes almost no time to do. +> Creating a list containing a similar number of values would take much longer, +> and could be at risk of running out of memory. +> +> We can build our own generators using a generator expression. +> These look much like the comprehensions above, +> but act like a generator when we use them. +> Note the syntax difference for generator expressions - +> parenthesis are used in place of square or curly brackets. +> +> ~~~ +> doubles_generator = (2 * i for i in integers) +> for x in doubles_generator: +> print(x) +> ~~~ +> {: .language-python} +> ~~~ +> 0 +> 2 +> 4 +> 6 +> 8 +> ~~~ +> {: .output} +{: .callout} + + +Let's now have a look at reducing the elements of a data collection into a single result. + +### Reducing + +`reduce(f, C, initialiser)` function accepts a function `f()`, +a collection `C` of data items +and an optional `initialiser`, +and returns a single cumulative value which +aggregates (reduces) all the values from the collection into a single result. +The reduction function first applies the function `f()` to the first two values in the collection +(or to the `initialiser`, if present, and the first item from `C`). +Then for each remaining value in the collection, +it takes the result of the previous computation +and the next value from the collection as the new arguments to `f()` +until we have processed all of the data and reduced it to a single value. +For example, if collection `C` has 5 elements, the call `reduce(f, C)` calculates: + +~~~ +f(f(f(f(C[0], C[1]), C[2]), C[3]), C[4]) +~~~ + +One example of reducing would be to calculate the product of a sequence of numbers. + +~~~ +from functools import reduce + +sequence = [1, 2, 3, 4] + +def product(a, b): + return a * b + +print(reduce(product, sequence)) + +# The same reduction using a lambda function +print(reduce((lambda a, b: a * b), sequence)) +~~~ +{: .language-python} +~~~ +24 +24 +~~~ +{: .output} + +Note that `reduce()` is not a built-in function like `map()` - +you need to import it from library `functools`. + +> ## Exercise: Calculate the Sum of a Sequence of Numbers Using Reduce +> Using reduce calculate the sum of a sequence of numbers. +> Although in practice we would use the built-in `sum()` function for this - try doing it without it. +> +> > ## Solution +> > ~~~ +> > from functools import reduce +> > +> > sequence = [1, 2, 3, 4] +> > +> > def add(a, b): +> > return a + b +> > +> > print(reduce(add, sequence)) +> > +> > # The same reduction using a lambda function +> > print(reduce((lambda a, b: a + b), sequence)) +> > ~~~ +> > {: .language-python} +> > ~~~ +> > 10 +> > 10 +> > ~~~ +> > {: .output} +> {: .solution} +{: .challenge} + +### Putting It All Together +Let's now put together what we have learned about map and reduce so far +by writing a function that calculates the sum of the squares of the values in a list +using the MapReduce approach. + +~~~ +from functools import reduce + +def sum_of_squares(sequence): + squares = [x * x for x in sequence] # use list comprehension for mapping + return reduce(lambda a, b: a + b, squares) +~~~ +{: .language-python} + +We should see the following behaviour when we use it: + +~~~ +print(sum_of_squares([0])) +print(sum_of_squares([1])) +print(sum_of_squares([1, 2, 3])) +print(sum_of_squares([-1])) +print(sum_of_squares([-1, -2, -3])) +~~~ +{: .language-python} +~~~ +0 +1 +14 +1 +14 +~~~ +{: .output} + +Now let’s assume we’re reading in these numbers from an input file, +so they arrive as a list of strings. +We'll modify the function so that it passes the following tests: + +~~~ +print(sum_of_squares(['1', '2', '3'])) +print(sum_of_squares(['-1', '-2', '-3'])) +~~~ +{: .language-python} +~~~ +14 +14 +~~~ +{: .output} + +The code may look like: + +~~~ +from functools import reduce + +def sum_of_squares(sequence): + integers = [int(x) for x in sequence] + squares = [x * x for x in integers] + return reduce(lambda a, b: a + b, squares) +~~~ +{: .language-python} + +Finally, like comments in Python, we’d like it to be possible for users to +comment out numbers in the input file they give to our program. +We'll finally extend our function so that the following tests pass: + +~~~ +print(sum_of_squares(['1', '2', '3'])) +print(sum_of_squares(['-1', '-2', '-3'])) +print(sum_of_squares(['1', '2', '#100', '3'])) +~~~ +{: .language-python} +~~~ +14 +14 +14 +~~~ +{: .output} + +To do so, we may filter out certain elements and have: + +~~~ +from functools import reduce + +def sum_of_squares(sequence): + integers = [int(x) for x in sequence if x[0] != '#'] + squares = [x * x for x in integers] + return reduce(lambda a, b: a + b, squares) +~~~ +{: .language-python} + +>## Exercise: Extend Inflammation Threshold Function Using Reduce +> Extend the `daily_above_threshold()` function you wrote previously +> to return a count of the number of days a patient's inflammation is over the threshold. +> Use `reduce()` over the boolean array that was previously returned to generate the count, +> then return that value from the function. +> +> You may choose to define a separate function to pass to `reduce()`, +> or use an inline lambda expression to do it (which is a bit trickier!). +> +> Hints: +> - Remember that you can define an `initialiser` value with `reduce()` + > to help you start the counter +> - If defining a lambda expression, + > note that it can conditionally return different values using the syntax + > ` if else ` in the expression. +> +> > ## Solution +> > Using a separate function: +> > ~~~ +> > def daily_above_threshold(patient_num, data, threshold): +> > """Count how many days a given patient's inflammation exceeds a given threshold. +> > +> > :param patient_num: The patient row number +> > :param data: A 2D data array with inflammation data +> > :param threshold: An inflammation threshold to check each daily value against +> > :returns: An integer representing the number of days a patient's inflammation is over a given threshold +> > """ +> > def count_above_threshold(a, b): +> > if b: +> > return a + 1 +> > else: +> > return a +> > +> > # Use map to determine if each daily inflammation value exceeds a given threshold for a patient +> > above_threshold = map(lambda x: x > threshold, data[patient_num]) +> > # Use reduce to count on how many days inflammation was above the threshold for a patient +> > return reduce(count_above_threshold, above_threshold, 0) +> > ~~~ +> > {: .language-python} +> > +> > Note that the `count_above_threshold` function used by `reduce()` +> > was defined within the `daily_above_threshold()` function +> > to limit its scope and clarify its purpose +> > (i.e. it may only be useful as part of `daily_above_threshold()` +> > hence being defined as an inner function). +> > +> > The equivalent code using a lambda expression may look like: +> > +> > ~~~ +> > from functools import reduce +> > +> > ... +> > +> > def daily_above_threshold(patient_num, data, threshold): +> > """Count how many days a given patient's inflammation exceeds a given threshold. +> > +> > :param patient_num: The patient row number +> > :param data: A 2D data array with inflammation data +> > :param threshold: An inflammation threshold to check each daily value against +> > :returns: An integer representing the number of days a patient's inflammation is over a given threshold +> > """ +> > +> > above_threshold = map(lambda x: x > threshold, data[patient_num]) +> > return reduce(lambda a, b: a + 1 if b else a, above_threshold, 0) +> > ~~~ +> > {: .language-python} +> Where could this be useful? +> For example, you may want to define the success criteria for a trial if, say, +> 80% of patients do not exhibit inflammation in any of the trial days, or some similar metrics. +>{: .solution} +{: .challenge} + +## Decorators + +Finally, we will look at one last aspect of Python where functional programming is coming handy. +As we have seen in the +[episode on parametrising our unit tests](../22-scaling-up-unit-testing/index.html#parameterising-our-unit-tests), +a decorator can take a function, modify/decorate it, then return the resulting function. +This is possible because Python treats functions as first-class objects +that can be passed around as normal data. +Here, we discuss decorators in more detail and learn how to write our own. +Let's look at the following code for ways on how to "decorate" functions. + +~~~ +def with_logging(func): + + """A decorator which adds logging to a function.""" + def inner(*args, **kwargs): + print("Before function call") + result = func(*args, **kwargs) + print("After function call") + return result + + return inner + + +def add_one(n): + print("Adding one") + return n + 1 + +# Redefine function add_one by wrapping it within with_logging function +add_one = with_logging(add_one) + +# Another way to redefine a function - using a decorator +@with_logging +def add_two(n): + print("Adding two") + return n + 2 + +print(add_one(1)) +print(add_two(1)) +~~~ +{: .language-python} +~~~ +Before function call +Adding one +After function call +2 +Before function call +Adding two +After function call +3 +~~~ +{: .output} + +In this example, we see a decorator (`with_logging`) +and two different syntaxes for applying the decorator to a function. +The decorator is implemented here as a function which encloses another function. +Because the inner function (`inner()`) calls the function being decorated (`func()`) +and returns its result, +it still behaves like this original function. +Part of this is the use of `*args` and `**kwargs` - +these allow our decorated function to accept any arguments or keyword arguments +and pass them directly to the function being decorated. +Our decorator in this case does not need to modify any of the arguments, +so we do not need to know what they are. +Any additional behaviour we want to add as part of our decorated function, +we can put before or after the call to the original function. +Here we print some text both before and after the decorated function, +to show the order in which events happen. + +We also see in this example the two different ways in which a decorator can be applied. +The first of these is to use a normal function call (`with_logging(add_one)`), +where we then assign the resulting function back to a variable - +often using the original name of the function, so replacing it with the decorated version. +The second syntax is the one we have seen previously (`@with_logging`). +This syntax is equivalent to the previous one - +the result is that we have a decorated version of the function, +here with the name `add_two`. +Both of these syntaxes can be useful in different situations: +the `@` syntax is more concise if we never need to use the un-decorated version, +while the function-call syntax gives us more flexibility - +we can continue to use the un-decorated function +if we make sure to give the decorated one a different name, +and can even make multiple decorated versions using different decorators. + +> ## Exercise: Measuring Performance Using Decorators +> One small task you might find a useful case for a decorator is +> measuring the time taken to execute a particular function. +> This is an important part of performance profiling. +> +> Write a decorator which you can use to measure the execution time of the decorated function +> using the [time.process_time_ns()](https://docs.python.org/3/library/time.html#time.process_time_ns) function. +> There are several different timing functions each with slightly different use-cases, +> but we won’t worry about that here. +> +> For the function to measure, you may wish to use this as an example: +> ~~~ +> def measure_me(n): +> total = 0 +> for i in range(n): +> total += i * i +> +> return total +> ~~~ +> {: .language-python} +> > ## Solution +> > +> > ~~~ +> > import time +> > +> > def profile(func): +> > def inner(*args, **kwargs): +> > start = time.process_time_ns() +> > result = func(*args, **kwargs) +> > stop = time.process_time_ns() +> > +> > print("Took {0} seconds".format((stop - start) / 1e9)) +> > return result +> > +> > return inner +> > +> > @profile +> > def measure_me(n): +> > total = 0 +> > for i in range(n): +> > total += i * i +> > +> > return total +> > +> > print(measure_me(1000000)) +> > ~~~ +> > {: .language-python} +> > ~~~ +> > Took 0.124199753 seconds +> > 333332833333500000 +> > ~~~ +> > {: .output} +> {: .solution} +{: .challenge} diff --git a/_extras/persistence.md b/_extras/persistence.md index b207e0458..47fe9cf43 100644 --- a/_extras/persistence.md +++ b/_extras/persistence.md @@ -1,5 +1,5 @@ --- -title: "Additional Material: Persistence" +title: "Persistence" layout: episode teaching: 25 exercises: 25 @@ -25,7 +25,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../35-refactoring-architecture/index.html#additional-material). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#conclusion). {: .callout} Our patient data system so far can read in some data, process it, and display it to people. diff --git a/_extras/protect-main-branch.md b/_extras/protect-main-branch.md index b358f726e..c9745fe86 100644 --- a/_extras/protect-main-branch.md +++ b/_extras/protect-main-branch.md @@ -1,5 +1,5 @@ --- -title: "Additional Material: Protecting the Main Branch on a Shared GitHub Repository" +title: "Protecting the Main Branch on a Shared GitHub Repository" --- ## Introduction diff --git a/_extras/software-architecture-paradigms.md b/_extras/software-architecture-paradigms.md new file mode 100644 index 000000000..7e8f99c2d --- /dev/null +++ b/_extras/software-architecture-paradigms.md @@ -0,0 +1,281 @@ +--- +title: "Software Architecture and Programming Paradigms" +teaching: 30 +exercises: 0 +layout: episode +questions: +- "What should we consider when designing software?" +objectives: +- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." +- "Understand the components of multi-layer software architectures." +- "Describe some of the major software paradigms we can use to classify programming languages." +keypoints: +- "A software paradigm describes a way of structuring or reasoning about code." +- "Different programming languages are suited to different paradigms." +- "Different paradigms are suited to solving different classes of problems." +- "A single piece of software will often contain instances of multiple paradigms." +--- + +## Introduction + +As a piece of software grows, +it will reach a point where there's too much code for us to keep in mind at once. +At this point, it becomes particularly important that the software be designed sensibly. +What should be the overall structure of our software, +how should all the pieces of functionality fit together, +and how should we work towards fulfilling this overall design throughout development? + +It's not easy to come up with a complete definition for the term **software design**, +but some of the common aspects are: + +- **Algorithm design** - + what method are we going to use to solve the core business problem? +- **Software architecture** - + what components will the software have and how will they cooperate? +- **System architecture** - + what other things will this software have to interact with and how will it do this? +- **UI/UX** (User Interface / User Experience) - + how will users interact with the software? + +As usual, the sooner you adopt a practice in the lifecycle of your project, the easier it will be. +So we should think about the design of our software from the very beginning, +ideally even before we start writing code - +but if you didn't, it's never too late to start. + +The answers to these questions will provide us with some **design constraints** +which any software we write must satisfy. +For example, a design constraint when writing a mobile app would be +that it needs to work with a touch screen interface - +we might have some software that works really well from the command line, +but on a typical mobile phone there isn't a command line interface that people can access. + +## Software Architecture + +At the beginning of this episode we defined **software architecture** +as an answer to the question +"what components will the software have and how will they cooperate?". +Software engineering borrowed this term, and a few other terms, +from architects (of buildings) as many of the processes and techniques have some similarities. +One of the other important terms we borrowed is 'pattern', +such as in **design patterns** and **architecture patterns**. +This term is often attributed to the book +['A Pattern Language' by Christopher Alexander *et al.*](https://en.wikipedia.org/wiki/A_Pattern_Language) +published in 1977 +and refers to a template solution to a problem commonly encountered when building a system. + +Design patterns are relatively small-scale templates +which we can use to solve problems which affect a small part of our software. +For example, the **[adapter pattern](https://en.wikipedia.org/wiki/Adapter_pattern)** +(which allows a class that does not have the "right interface" to be reused) +may be useful if part of our software needs to consume data +from a number of different external data sources. +Using this pattern, +we can create a component whose responsibility is +transforming the calls for data to the expected format, +so the rest of our program doesn't have to worry about it. + +Architecture patterns are similar, +but larger scale templates which operate at the level of whole programs, +or collections or programs. +Model-View-Controller (which we chose for our project) is one of the best known architecture patterns. +Many patterns rely on concepts from Object Oriented Programming, +so we'll come back to the MVC pattern shortly +after we learn a bit more about Object Oriented Programming. + +There are many online sources of information about design and architecture patterns, +often giving concrete examples of cases where they may be useful. +One particularly good source is [Refactoring Guru](https://refactoring.guru/design-patterns). + +### Multilayer Architecture + +One common architectural pattern for larger software projects is **Multilayer Architecture**. +Software designed using this architecture pattern is split into layers, +each of which is responsible for a different part of the process of manipulating data. + +Often, the software is split into three layers: + +- **Presentation Layer** + - This layer is responsible for managing the interaction between + our software and the people using it + - May include the **View** components if also using the MVC pattern +- **Application Layer / Business Logic Layer** + - This layer performs most of the data processing required by the presentation layer + - Likely to include the **Controller** components if also using an MVC pattern + - May also include the **Model** components +- **Persistence Layer / Data Access Layer** + - This layer handles data storage and provides data to the rest of the system + - May include the **Model** components of an MVC pattern + if they're not in the application layer + +Although we've drawn similarities here between the layers of a system and the components of MVC, +they're actually solutions to different scales of problem. +In a small application, a multilayer architecture is unlikely to be necessary, +whereas in a very large application, +the MVC pattern may be used just within the presentation layer, +to handle getting data to and from the people using the software. + +## Programming Paradigms + +In addition to architectural decisions on bigger components of your code, it is important +to understand the wider landscape of programming paradigms and languages, +with each supporting at least one way to approach a problem and structure your code. +In many cases, particularly with modern languages, +a single language can allow many different structural approaches within your code. + +One way to categorise these structural approaches is into **paradigms**. +Each paradigm represents a slightly different way of thinking about and structuring our code +and each has certain strengths and weaknesses when used to solve particular types of problems. +Once your software begins to get more complex +it's common to use aspects of different paradigms to handle different subtasks. +Because of this, it's useful to know about the major paradigms, +so you can recognise where it might be useful to switch. + +There are two major families that we can group the common programming paradigms into: +**Imperative** and **Declarative**. +An imperative program uses statements that change the program's state - +it consists of commands for the computer to perform +and focuses on describing **how** a program operates step by step. +A declarative program expresses the logic of a computation +to describe **what** should be accomplished +rather than describing its control flow as a sequence steps. + +We will look into three major paradigms +from the imperative and declarative families that may be useful to you - +**Procedural Programming**, **Functional Programming** and **Object-Oriented Programming**. +Note, however, that most of the languages can be used with multiple paradigms, +and it is common to see multiple paradigms within a single program - +so this classification of programming languages based on the paradigm they use isn't as strict. + +### Procedural Programming + +Procedural Programming comes from a family of paradigms known as the Imperative Family. +With paradigms in this family, we can think of our code as the instructions for processing data. + +Procedural Programming is probably the style you're most familiar with +and the one we used up to this point, +where we group code into +*procedures performing a single task, with exactly one entry and one exit point*. +In most modern languages we call these **functions**, instead of procedures - +so if you're grouping your code into functions, this might be the paradigm you're using. +By grouping code like this, we make it easier to reason about the overall structure, +since we should be able to tell roughly what a function does just by looking at its name. +These functions are also much easier to reuse than code outside of functions, +since we can call them from any part of our program. + +So far we have been using this technique in our code - +it contains a list of instructions that execute one after the other starting from the top. +This is an appropriate choice for smaller scripts and software +that we're writing just for a single use. +Aside from smaller scripts, Procedural Programming is also commonly seen +in code focused on high performance, with relatively simple data structures, +such as in High Performance Computing (HPC). +These programs tend to be written in C (which doesn't support Object Oriented Programming) +or Fortran (which didn't until recently). +HPC code is also often written in C++, +but C++ code would more commonly follow an Object Oriented style, +though it may have procedural sections. + +Note that you may sometimes hear people refer to this paradigm as "functional programming" +to contrast it with Object Oriented Programming, +because it uses functions rather than objects, +but this is incorrect. +Functional Programming is a separate paradigm that +places much stronger constraints on the behaviour of a function +and structures the code differently as we'll see soon. + +### Functional Programming + +Functional Programming comes from a different family of paradigms - +known as the Declarative Family. +The Declarative Family is a distinct set of paradigms +which have a different outlook on what a program is - +here code describes *what* data processing should happen. +What we really care about here is the outcome - how this is achieved is less important. + +Functional Programming is built around +a more strict definition of the term **function** borrowed from mathematics. +A function in this context can be thought of as +a mapping that transforms its input data into output data. +Anything a function does other than produce an output is known as a **side effect** +and should be avoided wherever possible. + +Being strict about this definition allows us to +break down the distinction between **code** and **data**, +for example by writing a function which accepts and transforms other functions - +in Functional Programming *code is data*. + +The most common application of Functional Programming in research is in data processing, +especially when handling **Big Data**. +One popular definition of Big Data is +data which is too large to fit in the memory of a single computer, +with a single dataset sometimes being multiple terabytes or larger. +With datasets like this, we can't move the data around easily, +so we often want to send our code to where the data is instead. +By writing our code in a functional style, +we also gain the ability to run many operations in parallel +as it's guaranteed that each operation won't interact with any of the others - +this is essential if we want to process this much data in a reasonable amount of time. + +You can read more in an [Extras episode on Functional Programming](/functional-programming/index.html). + +### Object Oriented Programming + +Object Oriented Programming focuses on the specific characteristics of each object +and what each object can do. +An object has two fundamental parts - properties (characteristics) and behaviours. +In Object Oriented Programming, +we first think about the data and the things that we're modelling - and represent these by objects. + +For example, if we're writing a simulation for our chemistry research, +we're probably going to need to represent atoms and molecules. +Each of these has a set of properties which we need to know about +in order for our code to perform the tasks we want - +in this case, for example, we often need to know the mass and electric charge of each atom. +So with Object Oriented Programming, +we'll have some **object** structure which represents an atom and all of its properties, +another structure to represent a molecule, +and a relationship between the two (a molecule contains atoms). +This structure also provides a way for us to associate code with an object, +representing any **behaviours** it may have. +In our chemistry example, this could be our code for calculating the force between a pair of atoms. + +Most people would classify Object Oriented Programming as an +[extension of the Imperative family of languages](https://www.digitalocean.com/community/tutorials/functional-imperative-object-oriented-programming-comparison) +(with the extra feature being the objects), but +[others disagree](https://stackoverflow.com/questions/38527078/what-is-the-difference-between-imperative-and-object-oriented-programming). + +You can read more in an [Extras episode on Object Oriented Programming](/object-oriented-programming/index.html). + +> ## So Which one is Python? +> Python is a multi-paradigm and multi-purpose programming language. +> You can use it as a procedural language and you can use it in a more object oriented way. +> It does tend to land more on the object oriented side as all its core data types +> (strings, integers, floats, booleans, lists, +> sets, arrays, tuples, dictionaries, files) +> as well as functions, modules and classes are objects. +> +> Since functions in Python are also objects that can be passed around like any other object, +> Python is also well suited to functional programming. +> One of the most popular Python libraries for data manipulation, +> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), +> supports a functional programming style +> as most of its functions on data are not changing the data (no side effects) +> but producing a new data to reflect the result of the function. +{: .callout} + +## Other Paradigms + +The three paradigms introduced here are some of the most common, +but there are many others which may be useful for addressing specific classes of problem - +for much more information see the Wikipedia's page on +[programming paradigms](https://en.wikipedia.org/wiki/Programming_paradigm). + +We have mainly used Procedural Programming in this lesson, but you can +have a closer look at [Functional](/functional-programming/index.html) and +[Object Oriented Programming](/object-oriented-programming/index.html) paradigms +in Extras episodes and how they can affect our architectural design choices. + +{% include links.md %} + + +{% include links.md %} diff --git a/_extras/vscode.md b/_extras/vscode.md index 6796e7088..34b01b8a5 100644 --- a/_extras/vscode.md +++ b/_extras/vscode.md @@ -1,5 +1,5 @@ --- -title: "Additional Material: Using Microsoft Visual Studio Code" +title: "Using Microsoft Visual Studio Code" --- [Visual Studio Code (VS Code)](https://code.visualstudio.com/), not to be confused with [Visual Studio](https://visualstudio.microsoft.com/), From b62841647009f6f22dbf575645660aa036deda34 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Thu, 14 Dec 2023 14:27:47 +0000 Subject: [PATCH 083/105] Initial fix of config --- _config.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_config.yml b/_config.yml index 434f152ab..96103624a 100644 --- a/_config.yml +++ b/_config.yml @@ -95,10 +95,10 @@ extras_order: - discuss - protect-main-branch - vscode + - software-architecture-paradigms - functional-programming - persistence - databases - - verifying-code-style-linters - quiz # Files and directories that are not to be copied. exclude: From e1f4e558fa29aee63cdc81c6666ce7ccd32f6007 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Thu, 14 Dec 2023 19:11:10 +0000 Subject: [PATCH 084/105] Mainly review of episode on refactoring --- _episodes/30-section3-intro.md | 2 +- _episodes/31-software-requirements.md | 4 +- _episodes/32-software-design.md | 5 +- _episodes/33-refactoring-functions.md | 272 ----------------- _episodes/33-refactoring.md | 274 ++++++++++++++++++ ...ng-decoupled-units.md => 34-decoupling.md} | 87 +++--- ...tecture.md => 35-software-architecture.md} | 3 +- 7 files changed, 327 insertions(+), 320 deletions(-) delete mode 100644 _episodes/33-refactoring-functions.md create mode 100644 _episodes/33-refactoring.md rename _episodes/{34-refactoring-decoupled-units.md => 34-decoupling.md} (84%) rename _episodes/{35-refactoring-architecture.md => 35-software-architecture.md} (99%) diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index e969de22a..461d55f4c 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -2,7 +2,7 @@ title: "Section 3: Software Development as a Process" colour: "#fafac8" start: true -teaching: 5 +teaching: 10 exercises: 0 questions: - "How can we design and write 'good' software that meets its goals and requirements?" diff --git a/_episodes/31-software-requirements.md b/_episodes/31-software-requirements.md index 917726df2..87634a989 100644 --- a/_episodes/31-software-requirements.md +++ b/_episodes/31-software-requirements.md @@ -1,7 +1,7 @@ --- title: "Software Requirements" -teaching: 15 -exercises: 30 +teaching: 25 +exercises: 15 questions: - "Where do we start when beginning a new software project?" - "How can we capture and organise what is required for software to function as intended?" diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 7cf76c767..33c3822c2 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -74,8 +74,7 @@ goal of having *maintainable* code, which is: * using meaningful and descriptive names for variables, functions, and classes * documenting code to describe it does and how it may be used * using simple control flow to make it easier to follow the code execution - * keeping functions and methods small and focused on a single task and avoiding large functions - that do a little bit of everything (also important for testing) + * keeping functions and methods small and focused on a single task (also important for testing) * *testable* through a set of (preferably automated) tests, e.g. by: * writing unit, functional, regression tests to verify the code produces the expected outputs from controlled inputs and exhibits the expected behavior over time @@ -125,7 +124,7 @@ software project and try to identify ways in which it can be improved. > {: .solution} {: .challenge} -## Technical Debt +## Poor Design Choices & Technical Debt When faced with a problem that you need to solve by writing code - it may be tempted to skip the design phase and dive straight into coding. diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md deleted file mode 100644 index 42eae41f7..000000000 --- a/_episodes/33-refactoring-functions.md +++ /dev/null @@ -1,272 +0,0 @@ ---- -title: "Refactoring Functions to Do Just One Thing" -teaching: 30 -exercises: 20 -questions: -- "How do you refactor code without breaking it?" -- "How do you write code that is easy to test?" -- "What is functional programming?" -- "Which situations/problems is functional programming well suited for?" -objectives: -- "Understand how to refactor functions to be easier to test" -- "Be able to write regressions tests to avoid breaking existing code" -- "Understand what a pure function is." -keypoints: -- "By refactoring code into pure functions that act on data makes code easier to test." -- "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" -- "Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations)." ---- - -## Introduction - -In this episode we will take some code and refactor it in a way which is going to make it -easier to test. -By having more tests, we can more confident of future changes having their intended effect. -The change we will make will also end up making the code easier to understand. - -## Writing tests before refactoring - -The process we are going to be following is: - -1. Write some tests that test the behaviour as it is now -2. Refactor the code to be more testable -3. Ensure that the original tests still pass - -By writing the tests *before* we refactor, we can be confident we haven't broken -existing behaviour through the refactoring. - -There is a bit of a chicken-and-the-egg problem here however. -If the refactoring is to make it easier to write tests, how can we write tests -before doing the refactoring? - -The tricks to get around this trap are: - - * Test at a higher level, with coarser accuracy - * Write tests that you intend to remove - -The best tests are ones that test single bits of code rigorously. -However, with this code it isn't possible to do that. - -Instead we will make minimal changes to the code to make it a bit testable, -for example returning the data instead of visualising it. - -We will make the asserts verify whatever the outcome is currently, -rather than worrying whether that is correct. -These tests are to verify the behaviour doesn't *change* rather than to check the current behaviour is correct. -This kind of testing is called **regression testing** as we are testing for -regressions in existing behaviour. - -As with everything in this episode, there isn't a hard and fast rule. -Refactoring doesn't change behaviour, but sometimes to make it possible to verify -you're not changing the important behaviour you have to make some small tweaks to write -the tests at all. - -> ## Exercise: Write regression tests before refactoring -> Add a new test file called `test_compute_data.py` in the tests folder. -> Add and complete this regression test to verify the current output of `analyse_data` -> is unchanged by the refactorings we are going to do: -> ```python -> def test_analyse_data(): -> from inflammation.compute_data import analyse_data -> path = Path.cwd() / "../data" -> result = analyse_data(path) -> -> # TODO: add an assert for the value of result -> ``` -> Use `assert_array_almost_equal` from the `numpy.testing` library to -> compare arrays of floating point numbers. -> -> You will need to modify `analyse_data` to not create a graph and instead -> return the data. -> ->> ## Hint ->> You might find it helpful to assert the results equal some made up array, observe the test failing ->> and copy and paste the correct result into the test. -> {: .solution} -> ->> ## Solution ->> One approach we can take is to: ->> * comment out the visualize (as this will cause our test to hang) ->> * return the data instead, so we can write asserts on the data ->> * See what the calculated value is, and assert that it is the same ->> Putting this together, you can write a test that looks something like: ->> ->> ```python ->> import numpy.testing as npt ->> from pathlib import Path ->> ->> def test_analyse_data(): ->> from inflammation.compute_data import analyse_data ->> path = Path.cwd() / "../data" ->> result = analyse_data(path) ->> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211, ->> 0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094, ->> 1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312, ->> 1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578, ->> 0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417, ->> 0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707, ->> 0.50323031,0.47574665,0.45197398,0.22070227] ->> npt.assert_array_almost_equal(result, expected_output) ->> ``` ->> ->> Note - this isn't a good test: ->> * It isn't at all obvious why these numbers are correct. ->> * It doesn't test edge cases. ->> * If the files change, the test will start failing. ->> ->> However, it allows us to guarantee we don't accidentally change the analysis output. -> {: .solution} -{: .challenge} - -## Pure functions - -A **pure function** is a function that works like a mathematical function. -That is, it takes in some inputs as parameters, and it produces an output. -That output should always be the same for the same input. -That is, it does not depend on any information not present in the inputs (such as global variables, databases, the time of day etc.) -Further, it should not cause any **side effects**, such as writing to a file or changing a global variable. - -You should try and have as much of the complex, analytical and mathematical code in pure functions. - -By eliminating dependency on external things such as global state, we -reduce the cognitive load to understand the function. -The reader only needs to concern themselves with the input -parameters of the function and the code itself, rather than -the overall context the function is operating in. - -Similarly, a function that *calls* a pure function is also easier -to understand. -Since the function won't have any side effects, the reader needs to -only understand what the function returns, which will probably -be clear from the context in which the function is called. - -This property also makes them easier to re-use as the caller -only needs to understand what parameters to provide, rather -than anything else that might need to be configured -or side effects for calling it at a time that is different -to when the original author intended. - -Some parts of a program are inevitably impure. -Programs need to read input from the user, or write to a database. -Well designed programs separate complex logic from the necessary impure "glue" code that interacts with users and systems. -This way, you have easy-to-test, easy-to-read code that contains the complex logic. -And you have really simple code that just reads data from a file, or gathers user input etc, -that is maybe harder to test, but is so simple that it only needs a handful of tests anyway. - -> ## Exercise: Refactor the function into a pure function -> Refactor the `analyse_data` function into a pure function with the logic, and an impure function that handles the input and output. -> The pure function should take in the data, and return the analysis results: -> ```python -> def compute_standard_deviation_by_day(data): -> # TODO -> return daily_standard_deviation -> ``` -> The "glue" function should maintain the behaviour of the original `analyse_data` -> but delegate all the calculations to the new pure function. ->> ## Solution ->> You can move all of the code that does the analysis into a separate function that ->> might look something like this: ->> ```python ->> def compute_standard_deviation_by_day(data): ->> means_by_day = map(models.daily_mean, data) ->> means_by_day_matrix = np.stack(list(means_by_day)) ->> ->> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) ->> return daily_standard_deviation ->> ``` ->> Then the glue function can use this function, whilst keeping all the logic ->> for reading the file and processing the data for showing in a graph: ->>```python ->>def analyse_data(data_dir): ->> """Calculate the standard deviation by day between datasets ->> Gets all the inflammation csvs within a directory, works out the mean ->> inflammation value for each day across all datasets, then graphs the ->> standard deviation of these means.""" ->> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) ->> if len(data_file_paths) == 0: ->> raise ValueError(f"No inflammation csv's found in path {data_dir}") ->> data = map(models.load_csv, data_file_paths) ->> daily_standard_deviation = compute_standard_deviation_by_day(data) ->> ->> graph_data = { ->> 'standard deviation by day': daily_standard_deviation, ->> } ->> # views.visualize(graph_data) ->> return daily_standard_deviation ->>``` ->> Ensure you re-run our regression test to check this refactoring has not ->> changed the output of `analyse_data`. -> {: .solution} -{: .challenge} - -### Testing Pure Functions - -Now we have a pure function for the analysis, we can write tests that cover -all the things we would like tests to cover without depending on the data -existing in CSVs. - -This is another advantage of pure functions - they are very well suited to automated testing. - -They are **easier to write** - -we construct input and assert the output -without having to think about making sure the global state is correct before or after. - -Perhaps more important, they are **easier to read** - -the reader will not have to open up a CSV file to understand why the test is correct. - -It will also make the tests **easier to maintain**. -If at some point the data format is changed from CSV to JSON, the bulk of the tests -won't need to be updated. - -> ## Exercise: Write some tests for the pure function -> Now we have refactored our a pure function, we can more easily write comprehensive tests. -> Add tests that check for when there is only one file with multiple rows, multiple files with one row -> and any other cases you can think of that should be tested. ->> ## Solution ->> You might have thought of more tests, but we can easily extend the test by parametrizing ->> with more inputs and expected outputs: ->> ```python ->>@pytest.mark.parametrize('data,expected_output', [ ->> ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), ->> ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), ->> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) ->>], ->>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) ->>def test_compute_standard_deviation_by_day(data, expected_output): ->> from inflammation.compute_data import compute_standard_deviation_by_data ->> ->> result = compute_standard_deviation_by_data(data) ->> npt.assert_array_almost_equal(result, expected_output) -``` -> {: .solution} -{: .challenge} - -## Functional Programming - -**Pure Functions** are a concept that is part of the idea of **Functional Programming**. -Functional programming is a style of programming that encourages using pure functions, -chained together. -Some programming languages, such as Haskell or Lisp just support writing functional code, -but it is more common for languages to allow using functional and **imperative** (the style -of code you have probably been writing thus far where you instruct the computer directly what to do). -Python, Java, C++ and many other languages allow for mixing these two styles. - -In Python, you can use the built-in functions `map`, `filter` and `reduce` to chain -pure functions together into pipelines. - -In the original code, we used `map` to "map" the file paths into the loaded data. -Extending this idea, you could then "map" the results of that through another process. - -You can read more about using these language features [here](https://www.learnpython.org/en/Map%2C_Filter%2C_Reduce). -Other programming languages will have similar features, and searching "functional style" + your programming language of choice -will help you find the features available. - -There are no hard and fast rules in software design but making your complex logic out of composed pure functions is a great place to start -when trying to make code readable, testable and maintainable. -This tends to be possible when: - -* Doing any kind of data analysis -* Simulations -* Translating data from one format to another - -{% include links.md %} diff --git a/_episodes/33-refactoring.md b/_episodes/33-refactoring.md new file mode 100644 index 000000000..0251b6ad3 --- /dev/null +++ b/_episodes/33-refactoring.md @@ -0,0 +1,274 @@ +--- +title: "Refactoring Code" +teaching: 30 +exercises: 20 +questions: +- "How do you refactor code without breaking it?" +- "What are benefits of pure functions?" +objectives: +- "Understand the use of regressions tests to avoid breaking existing code when refactoring." +- "Understand the use of pure functions in software design to make the code easier to test." +keypoints: +- "Implementing regression tests before you refactor the code gives you confidence that your changes have not +broken anything." +- "By refactoring code into pure functions that process data without side effects makes code easier +to read, test and maintain." +--- + +## Introduction + +In this episode we will refactor the function `analyse_data()` in `compute_data.py` +from our project in the following two ways: +* add more tests so we can be more confident that future changes will have the +intended effect and will not break the existing code. +* split the `analyse_data()` function into a number of smaller (functions) making the code +easier to understand and test. + +## Writing Tests Before Refactoring + +When refactoring, it is useful to apply the following process: + +1. Write some tests that test the behaviour as it is now +2. Refactor the code +3. Check that the original tests still pass + +By writing the tests *before* we refactor, we can be confident we have not broken +existing behaviour through refactoring. + +There is a bit of a "chicken and egg" problem here - if the refactoring is supposed to make it easier +to write tests in the future, how can we write tests before doing the refactoring? +The tricks to get around this trap are: + + * Test at a higher level, with coarser accuracy + * Write tests that you intend to remove + +The best tests are ones that test single bits of functionality rigorously. +However, with our current `analyse_data()` code that is not possible because it is a +large function doing a little bit of everything. +Instead we will make minimal changes to the code to make it a bit more testable. + +Firstly, +we will modify the function to return the data instead of visualising it because graphs are harder +to test automatically (i.e. they need to be viewed and inspected manually in order to determine +their correctness). +Next, we will make the assert statements verify what the outcome is +currently, rather than checking whether that is correct or not. +Such tests are meant to +verify that the behaviour does not *change* rather than checking the current behaviour is correct +(there should be another set of tests checking the correctness). +This kind of testing is called **regression testing** as we are testing for +regressions in existing behaviour. + +Refactoring code is not meant to change its behaviour, but sometimes to make it possible to verify +you not changing the important behaviour you have to make small tweaks to the code to write +the tests at all. + +> ## Exercise: Write Regression Tests +> Modify the `analyse_data()` function not to plot a graph and return the data instead. +> Then, add a new test file called `test_compute_data.py` in the `tests` folder and +> add a regression test to verify the current output of `analyse_data()`. We will use this test +> in the remainder of this section to verify the output `analyse_data()` is unchanged each time +> we refactor or change code in the future. +> +> Start from the skeleton test code below: +> +> ```python +> def test_analyse_data(): +> from inflammation.compute_data import analyse_data +> path = Path.cwd() / "../data" +> result = analyse_data(path) +> +> # TODO: add an assert for the value of result +> ``` +> Use `assert_array_almost_equal` from the `numpy.testing` library to +> compare arrays of floating point numbers. +> +>> ## Hint +>> When determining the correct return data result to use in tests, it may be helpful to assert the +>> result equals some random made-up data, observe the test fail initially and then +>> copy and paste the correct result into the test. +> {: .solution} +> +>> ## Solution +>> One approach we can take is to: +>> * comment out the visualize method on `analyse_data()` +>> (as this will cause our test to hang waiting for the result data) +>> * return the data instead, so we can write asserts on the data +>> * See what the calculated value is, and assert that it is the same as the expected value +>> +>> Putting this together, your test may look like: +>> +>> ```python +>> import numpy.testing as npt +>> from pathlib import Path +>> +>> def test_analyse_data(): +>> from inflammation.compute_data import analyse_data +>> path = Path.cwd() / "../data" +>> result = analyse_data(path) +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211, +>> 0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094, +>> 1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312, +>> 1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578, +>> 0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417, +>> 0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707, +>> 0.50323031,0.47574665,0.45197398,0.22070227] +>> npt.assert_array_almost_equal(result, expected_output) +>> ``` +>> +>> Note that while the above test will detect if we accidentally break the analysis code and +>> change the output of the analysis, is not a good or complete test for the following reasons: +>> * It is not at all obvious why the `expected_output` is correct +>> * It does not test edge cases +>> * If the data files in the directory change - the test will fail +>> +>> We would need additional tests to check the above. +> {: .solution} +{: .challenge} + +## Separating Pure and Impure Code + +Now that we have our regression test for `analyse_data()` in place, we are ready to refactor the +function further. +We would like to separate out as much of its code as possible as **pure functions**. +Pure functions are very useful and much easier to test as they take input only from its input +parameters and output only via their return values. + +### Pure Functions + +A pure function in programming works like a mathematical function - +it takes in some input and produces an output and that output is +always the same for the same input. +That is, the output of a pure function does not depend on any information +which is not present in the input (such as global variables). +Furthermore, pure functions do not cause any *side effects* - they do not modify the input data +or data that exist outside the function (such as printing text, writing to a file or +changing a global variable). They perform actions that affect nothing but the value they return. + +### Benefits of Pure Functions + +Pure functions are easier to understand because they eliminate side effects. +The reader only needs to concern themselves with the input +parameters of the function and the function code itself, rather than +the overall context the function is operating in. +Similarly, a function that calls a pure function is also easier +to understand - we only need to understand what the function returns, which will probably +be clear from the context in which the function is called. +Finally, pure functions are easier to reuse as the caller +only needs to understand what parameters to provide, rather +than anything else that might need to be configured prior to the call. +For these reasons, you should try and have as much of the complex, analytical and mathematical +code are pure functions. + + +Some parts of a program are inevitably impure. +Programs need to read input from users, generate a graph, or write results to a file or a database. +Well designed programs separate complex logic from the necessary impure "glue" code that +interacts with users and other systems. +This way, you have easy-to-read and easy-to-test pure code that contains the complex logic +and simplified impure code that reads data from a file or gathers user input. Impure code may +be harder to test but, when simplified like this, may only require a handful of tests anyway. + +> ## Exercise: Refactoring To Use a Pure Function +> Refactor the `analyse_data()` function to delegate the data analysis to a new +> pure function `compute_standard_deviation_by_day()` and separate it +> from the impure code that handles the input and output. +> The pure function should take in the data, and return the analysis result, as follows: +> ```python +> def compute_standard_deviation_by_day(data): +> # TODO +> return daily_standard_deviation +> ``` +>> ## Solution +>> The analysis code will be refactored into a separate function that may look something like: +>> ```python +>> def compute_standard_deviation_by_day(data): +>> means_by_day = map(models.daily_mean, data) +>> means_by_day_matrix = np.stack(list(means_by_day)) +>> +>> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) +>> return daily_standard_deviation +>> ``` +>> The `analyse_data()` function now calls the `compute_standard_deviation_by_day()` function, +>> while keeping all the logic for reading the data, processing it and showing it in a graph: +>>```python +>>def analyse_data(data_dir): +>> """Calculate the standard deviation by day between datasets +>> Gets all the inflammation csvs within a directory, works out the mean +>> inflammation value for each day across all datasets, then graphs the +>> standard deviation of these means.""" +>> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {data_dir}") +>> data = map(models.load_csv, data_file_paths) +>> daily_standard_deviation = compute_standard_deviation_by_day(data) +>> +>> graph_data = { +>> 'standard deviation by day': daily_standard_deviation, +>> } +>> # views.visualize(graph_data) +>> return daily_standard_deviation +>>``` +>> Make sure to re-run the regression test to check this refactoring has not +>> changed the output of `analyse_data()`. +> {: .solution} +{: .challenge} + +### Testing Pure Functions + +Now we have our analysis implemented as a pure function, we can write tests that cover +all the things we would like to check without depending on CSVs files. +This is another advantage of pure functions - they are very well suited to automated testing, +i.e. their tests are: +* **easier to write** - we construct input and assert the output +without having to think about making sure the global state is correct before or after +* **easier to read** - the reader will not have to open a CSV file to understand why +the test is correct +* **easier to maintain** - if at some point the data format changes +from CSV to JSON, the bulk of the tests need not be updated + +> ## Exercise: Testing a Pure Function +> Add tests for `compute_standard_deviation_by_data()` that check for situations +> when there is only one file with multiple rows, +> multiple files with one row, and any other cases you can think of that should be tested. +>> ## Solution +>> You might have thought of more tests, but we can easily extend the test by parametrizing +>> with more inputs and expected outputs: +>> ```python +>>@pytest.mark.parametrize('data,expected_output', [ +>> ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), +>> ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), +>> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) +>>], +>>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) +>>def test_compute_standard_deviation_by_day(data, expected_output): +>> from inflammation.compute_data import compute_standard_deviation_by_data +>> +>> result = compute_standard_deviation_by_data(data) +>> npt.assert_array_almost_equal(result, expected_output) +``` +> {: .solution} +{: .challenge} + +> ## Functional Programming +> **Functional programming** is a programming paradigm where programs are constructed by +> applying and composing/chaining pure functions. +> Some programming languages, such as Haskell or Lisp, support writing pure functional code only. +> Other languages, such as Python, Java, C++, allow mixing **functional** and **procedural** +> programming paradigms. +> Read more in the [extra episode on functional programming](/functional-programming/index.html) +> and when it can be very useful to switch to this paradigm +> (e.g. to employ MapReduce approach for data processing). +{: .callout} + + +There are no definite rules in software design but making your complex logic out of +composed pure functions is a great place to start when trying to make your code readable, +testable and maintainable. This is particularly useful for: + +* Data processing and analysis +(for example, using [Python Pandas library](https://pandas.pydata.org/) for data manipulation where most of functions appear pure) +* Doing simulations +* Translating data from one format to another + +{% include links.md %} diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-decoupling.md similarity index 84% rename from _episodes/34-refactoring-decoupled-units.md rename to _episodes/34-decoupling.md index a9e82d9a9..02ab7044a 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-decoupling.md @@ -1,5 +1,5 @@ --- -title: "Using Classes to De-Couple Code" +title: "Decoupling Code" teaching: 30 exercises: 45 questions: @@ -19,27 +19,33 @@ keypoints: ## Introduction -When we're thinking about units of code, one important thing to consider is -whether the code is **decoupled** (as opposed to **coupled**). -Two units of code can be considered decoupled if changes in one don't -necessitate changes in the other. -While two connected units can't be totally decoupled, loose coupling -allows for more maintainable code: +In software design, an important aspect is the extent its components and smaller units +as **coupled**. +Two units of code can be considered **decoupled** if a change in one does not +necessitate a change in the other. +While two connected units cannot always be totally decoupled, **loose coupling** +is something we should aim for. Benefits of decoupled code include: -* Loosely coupled code is easier to read as you don't need to understand the +* easier to read as you do not need to understand the detail of the other unit. -* Loosely coupled code is easier to test, as one of the units can be replaced - by a test or mock version of it. -* Loose coupled code tends to be easier to maintain, as changes can be isolated +* easier to test, as one of the units can be replaced + by a test or a mock version of it. +* code tends to be easier to maintain, as changes can be isolated from other parts of the code. -Introducing **abstractions** is a way to decouple code. +## Abstractions + +We have already mentioned abstractions as a principle that simplifies complexity by +hiding details and focusing on high-level view and efficiency. + + +Abstractions are a way of decoupling code. If one part of the code only uses another part through an appropriate abstraction then it becomes easier for these parts to change independently. -> ## Exercise: Decouple the file loading from the computation -> Currently the function is hard coded to load all the files in a directory. -> Decouple this into a separate function that returns all the files to load +> ## Exercise: Decouple Data Loading from Analysis +> Loading data from CSV files in a directory is baked into the `analyse_data()` function. +> Decouple this into a separate function that returns all the files to load. >> ## Solution >> You should have written a new function that reads all the data into the format needed >> for the analysis: @@ -56,45 +62,44 @@ then it becomes easier for these parts to change independently. >> def analyse_data(data_dir): >> data = load_inflammation_data(data_dir) >> daily_standard_deviation = compute_standard_deviation_by_data(data) ->> ... +>> ... >> ``` ->> This is now easier to understand, as we don't need to understand the the file loading ->> to read the statistical analysis, and we don't have to understand the statistical analysis ->> when reading the data loading. ->> Ensure you re-run our regression test to check this refactoring has not ->> changed the output of `analyse_data`. +>> The code is now easier to follow since we do not need to understand the the data loading from +>> files to read the statistical analysis, and vice versa - we do not have to understand the +>> statistical analysis when looking at data loading. +>> Ensure you re-run the regression tests to check this refactoring has not +>> changed the output of `analyse_data()`. > {: .solution} {: .challenge} -Even with this change, the file loading is coupled with the data analysis. -For example, if we wave to support reading JSON files or CSV files -we would have to pass into `analyse_data` some kind of flag indicating what we want. - -Instead, we would like to decouple the consideration of what data to load -from the `analyse_data`` function entirely. +However, even with this change, the data loading is still coupled with the data analysis. +For example, if we have to support loading data from different sources +(e.g. JSON files and CSV files), we would have to pass some kind of a flag indicating +what we want into `analyse_data()`. Instead, we would like to decouple the +consideration of what data to load from the `analyse_data()` function entirely. -One way we can do this is to use a language feature called a **class**. +One way we can do this is to use an object-oriented language feature called a *class*. -## Using Python Classes +## Classes -A class is a way of grouping together data with some specific methods. -In Python, you can declare a class as follows: +A class is a way of grouping together data with some specific methods on that data. +In Python, you can **declare** a class as follows: ```python class Circle: pass ``` -They are typically named using `UpperCase`. +They are typically named using "CapitalisedWords" naming convention. -You can then **construct** a class elsewhere in your code by doing the following: +You can then **construct** a class **instance** elsewhere in your code by doing the following: ```python my_circle = Circle() ``` -When you construct a class in this ways, the classes **construtor** is called. -It is possible to pass in values to the constructor that configure the class: +When you construct a class in this ways, the class' **constructor** is called. +It is also possible to pass in values to the constructor to configure the class instance: ```python class Circle: @@ -104,15 +109,15 @@ class Circle: my_circle = Circle(10) ``` -The constructor has the special name `__init__` (one of the so called "dunder methods"). -Notice it also has a special first parameter called `self` (called this by convention). +The constructor has the special name `__init__`. +Notice it has a special first parameter called `self` by convention. This parameter can be used to access the current **instance** of the object being created. A class can be thought of as a cookie cutter template, and the instances are the cookies themselves. That is, one class can have many instances. -Classes can also have methods defined on them. +Classes can also have other methods defined on them. Like constructors, they have an special `self` parameter that must come first. ```python @@ -130,11 +135,11 @@ Here the instance of the class, `my_circle` will be automatically passed in as the first parameter when calling `get_area`. Then the method can access the **member variable** `radius`. -> ## Exercise: Use a class to configure loading +> ## Exercise: Use Classes to Abstract out Data Loading > Put the `load_inflammation_data` function we wrote in the last exercise as a member method > of a new class called `CSVDataSource`. > Put the configuration of where to load the files in the classes constructor. -> Once this is done, you can construct this class outside the the statistical analysis +> Once this is done, you can construct this class outside the statistical analysis > and pass the instance in to `analyse_data`. >> ## Hint >> When we have completed the refactoring, the code in the `analyse_data` function @@ -172,7 +177,7 @@ Then the method can access the **member variable** `radius`. >> We can now pass an instance of this class into the the statistical analysis function. >> This means that should we want to re-use the analysis it wouldn't be fixed to reading >> from a directory of CSVs. ->> We have "decoupled" the reading of the data from the statistical analysis. +>> We have fully decoupled the reading of the data from the statistical analysis. >> ```python >> def analyse_data(data_source): >> data = data_source.load_inflammation_data() diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-software-architecture.md similarity index 99% rename from _episodes/35-refactoring-architecture.md rename to _episodes/35-software-architecture.md index a00390828..3fad1388d 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-software-architecture.md @@ -1,5 +1,5 @@ --- -title: "Architecting Code to Separate Responsibilities" +title: "Software Architecture" teaching: 15 exercises: 50 questions: @@ -18,6 +18,7 @@ keypoints: ## Introduction +Separating Responsibilities Model-View-Controller (MVC) is a way of separating out different responsibilities of a typical application. Specifically we have: From 83ff771b7890caf1ee97c6760539a1e277f30b65 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Fri, 15 Dec 2023 10:42:38 +0000 Subject: [PATCH 085/105] Reworded the refactoring process a bit --- _episodes/32-software-design.md | 6 +++--- _episodes/33-refactoring.md | 12 ++++-------- 2 files changed, 7 insertions(+), 11 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 33c3822c2..b239dc64e 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -180,10 +180,10 @@ When faced with an existing piece of code that needs modifying a good refactorin process to follow is: 1. Make sure you have tests that verify the current behaviour -2. Refactor the code in such a way that the behaviour of the code is identical to that -before refactoring +2. Refactor the code +3. Verify that that the behaviour of the code is identical to that before refactoring. -Another technique to use when improving code are *abstractions*. +Another useful technique to use when improving code is *abstraction*. ### Abstractions diff --git a/_episodes/33-refactoring.md b/_episodes/33-refactoring.md index 0251b6ad3..b0f0c7bec 100644 --- a/_episodes/33-refactoring.md +++ b/_episodes/33-refactoring.md @@ -26,14 +26,10 @@ easier to understand and test. ## Writing Tests Before Refactoring -When refactoring, it is useful to apply the following process: - -1. Write some tests that test the behaviour as it is now -2. Refactor the code -3. Check that the original tests still pass - -By writing the tests *before* we refactor, we can be confident we have not broken -existing behaviour through refactoring. +When refactoring, remember we should first make sure there are tests that verity +the code behaviour as it is now (or write them if they are missing), +then refactor the code and, finally, check that the original tests still pass. +This is to make sure we do not break the existing behaviour through refactoring. There is a bit of a "chicken and egg" problem here - if the refactoring is supposed to make it easier to write tests in the future, how can we write tests before doing the refactoring? From 8af5e71f2b29f22d8a2e4fe44fa737c22e147847 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Tue, 27 Feb 2024 22:17:39 +0000 Subject: [PATCH 086/105] More review of Thomas' work --- _episodes/32-software-design.md | 41 ++-- _episodes/33-refactoring.md | 2 +- _episodes/34-decoupling.md | 354 +++++++++++++++++--------------- 3 files changed, 212 insertions(+), 185 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index b239dc64e..7a4f3cc2e 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -19,8 +19,8 @@ the easier the development and maintenance process will." ## Introduction -Ideally, we should have at least a rough design sketched out for our software before we write a -single line of code. +Ideally, we should have at least a rough design of our software sketched out +before we write a single line of code. This design should be based around the requirements and the structure of the problem we are trying to solve: what are the concepts we need to represent and what are the relationships between them. And importantly, who will be using our software and how will they interact with it. @@ -183,32 +183,39 @@ process to follow is: 2. Refactor the code 3. Verify that that the behaviour of the code is identical to that before refactoring. -Another useful technique to use when improving code is *abstraction*. +### Code Decoupling -### Abstractions +*Code decoupling* is another technique of improving the code by breaking a (complex) +software system into smaller parts, more manageable parts, and reducing the interdependence +between these different components or modules of the system. +This means that a change in one part of the code usually does not require a change in the other, +thereby making its development more efficient. + + +### Code Abstraction *Abstraction* is the process of hiding the implementation details of a piece of -code behind an interface - i.e. the details of *how* something works are hidden away, -leaving us to deal only with *what* it does. +code (typically behind an interface) - i.e. the details of *how* something works are hidden away, +leaving code developers to deal only with *what* it does. This allows developers to work with the code at a higher level -of abstraction, without needing to understand the underlying details. -Abstraction is used to simplify complex systems by breaking them down into smaller, -more manageable parts. +of abstraction, without needing to understand fully (or keep in mind) all the underlying +details at any given time and thereby reducing the cognitive load when programming. -Abstraction can be -achieved through techniques like *encapsulation*, *inheritance*, and *polymorphism*, which we will -cover in the next episodes. +Abstraction can be achieved through techniques such as *encapsulation*, *inheritance*, and +*polymorphism*, which we will explore in the next episodes. There are other [abstraction techniques](https://en.wikipedia.org/wiki/Abstraction_(computer_science)) +available too. ## Improving Our Software Design -Both refactoring and abstraction are important for creating maintainable code. -Refactoring helps to keep the codebase clean and easy to understand, while abstraction allows -developers to work with the code in a more abstract and modular way. +Refactoring our code to make it more decoupled and to introduce abstractions to +hide all but the relevant information about parts of the code is important for creating more +maintainable code. +It will help to keep our codebase clean, modular and easier to understand. Writing good code is hard and takes practise. You may also be faced with an existing piece of code that breaks some (or all) of the good code principles, and your job will be to improve it so that the code can evolve further. -In the rest of this section, we will use the refactoring and abstraction techniques to -help us redesign our code to incrementally improve its quality. +We will now look into some examples of the techniques that can help us redesign our code +and incrementally improve its quality. {% include links.md %} diff --git a/_episodes/33-refactoring.md b/_episodes/33-refactoring.md index b0f0c7bec..28e920390 100644 --- a/_episodes/33-refactoring.md +++ b/_episodes/33-refactoring.md @@ -1,5 +1,5 @@ --- -title: "Refactoring Code" +title: "Code Refactoring" teaching: 30 exercises: 20 questions: diff --git a/_episodes/34-decoupling.md b/_episodes/34-decoupling.md index 02ab7044a..d6f31720b 100644 --- a/_episodes/34-decoupling.md +++ b/_episodes/34-decoupling.md @@ -1,5 +1,5 @@ --- -title: "Decoupling Code" +title: "Code Decoupling & Abstractions" teaching: 30 exercises: 45 questions: @@ -19,36 +19,45 @@ keypoints: ## Introduction -In software design, an important aspect is the extent its components and smaller units -as **coupled**. -Two units of code can be considered **decoupled** if a change in one does not +Decoupling means breaking the system into smaller components and reducing the interdependence +between these components, so that they can be tested and maintained independently. +Two components of code can be considered **decoupled** if a change in one does not necessitate a change in the other. While two connected units cannot always be totally decoupled, **loose coupling** is something we should aim for. Benefits of decoupled code include: * easier to read as you do not need to understand the - detail of the other unit. -* easier to test, as one of the units can be replaced + details of the other component. +* easier to test, as one of the components can be replaced by a test or a mock version of it. * code tends to be easier to maintain, as changes can be isolated from other parts of the code. -## Abstractions +*Abstraction* is the process of hiding the implementation details of a piece of +code behind an interface - i.e. the details of *how* something works are hidden away, +leaving us to deal only with *what* it does. +This allows developers to work with the code at a higher level +of abstraction, without needing to understand fully (or keep in mind) all the underlying +details and thereby reducing the cognitive load when programming. -We have already mentioned abstractions as a principle that simplifies complexity by -hiding details and focusing on high-level view and efficiency. - - -Abstractions are a way of decoupling code. +Abstractions can aid decoupling of code. If one part of the code only uses another part through an appropriate abstraction then it becomes easier for these parts to change independently. -> ## Exercise: Decouple Data Loading from Analysis -> Loading data from CSV files in a directory is baked into the `analyse_data()` function. -> Decouple this into a separate function that returns all the files to load. +Let's start redesigning our code by introducing some of the decoupling and abstraction techniques +to incrementally improve its design. + +You may have noticed that loading data from CSV files in a directory is "baked" into +(i.e. is part of) the `analyse_data()` function. +This is not strictly a functionality of the data analysis function, so let's decouple the date +loading this into a separate function. + +> ## Exercise: Decouple Data Loading from Data Analysis +> Separate out the data loading functionality from `analyse_data()` into a new function +> `load_inflammation_data()` that returns all the files to load. >> ## Solution ->> You should have written a new function that reads all the data into the format needed ->> for the analysis: +>> The new function `load_inflammation_data()` that reads all the data into the format needed +>> for the analysis should look something like: >> ```python >> def load_inflammation_data(dir_path): >> data_file_paths = glob.glob(os.path.join(dir_path, 'inflammation*.csv')) @@ -57,7 +66,7 @@ then it becomes easier for these parts to change independently. >> data = map(models.load_csv, data_file_paths) >> return list(data) >> ``` ->> This can then be used in the analysis. +>> This function can now be used in the analysis as follows: >> ```python >> def analyse_data(data_dir): >> data = load_inflammation_data(data_dir) @@ -77,29 +86,43 @@ For example, if we have to support loading data from different sources (e.g. JSON files and CSV files), we would have to pass some kind of a flag indicating what we want into `analyse_data()`. Instead, we would like to decouple the consideration of what data to load from the `analyse_data()` function entirely. +One way we can do this is by using *encapsulation* and *classes*. + +## Encapsulation & Classes -One way we can do this is to use an object-oriented language feature called a *class*. +*Encapsulation* is the packing of "data" and "functions operating on that data" into a +single component/object. +It is also provides a mechanism for restricting the access to that data. +Encapsulation means that the internal representation of a component is generally hidden +from view outside of the component's definition. -## Classes +Encapsulation allows developers to present a consistent interface to an object/component +that is independent of its internal implementation. +For example, encapsulation can be used to hide the values or +state of a structured data object inside a **class**, preventing direct access to them +that could violate the object's state maintained by the class' methods. +Note that object-oriented programming (OOP) languages support encapsulation, +but encapsulation is not unique to OOP. -A class is a way of grouping together data with some specific methods on that data. -In Python, you can **declare** a class as follows: +So, a class is a way of grouping together data with some methods that manipulate that data. +In Python, you can *declare* a class as follows: ```python class Circle: pass ``` -They are typically named using "CapitalisedWords" naming convention. +Classes are typically named using "CapitalisedWords" naming convention - e.g. FileReader, +OutputStream, Rectangle. -You can then **construct** a class **instance** elsewhere in your code by doing the following: +You can *construct* an *instance* of a class elsewhere in the code by doing the following: ```python my_circle = Circle() ``` -When you construct a class in this ways, the class' **constructor** is called. -It is also possible to pass in values to the constructor to configure the class instance: +When you construct a class in this ways, the class' *constructor* method is called. +It is also possible to pass values to the constructor in order to configure the class instance: ```python class Circle: @@ -110,15 +133,14 @@ my_circle = Circle(10) ``` The constructor has the special name `__init__`. -Notice it has a special first parameter called `self` by convention. -This parameter can be used to access the current **instance** of the object being created. +Note it has a special first parameter called `self` by convention - it is +used to access the current *instance* of the object being created. -A class can be thought of as a cookie cutter template, -and the instances are the cookies themselves. +A class can be thought of as a cookie cutter template, and instances as the cookies themselves. That is, one class can have many instances. Classes can also have other methods defined on them. -Like constructors, they have an special `self` parameter that must come first. +Like constructors, they have the special parameter `self` that must come first. ```python import math @@ -131,19 +153,28 @@ class Circle: print(my_circle.get_area()) ``` -Here the instance of the class, `my_circle` will be automatically -passed in as the first parameter when calling `get_area`. -Then the method can access the **member variable** `radius`. +On the last line of the code above, the instance of the class, `my_circle`, will be automatically +passed as the first parameter (`self`) when calling the `get_area()` method. +The `get_area()` method can then access the variable `radius` encapsulated within the object, which +is otherwise invisible to the world outside of the object. +The method `get_area()` itself can also be accessed via the object/instance only. + +As we can see, internal representation of any instance of class `Circle` is hidden +outside of this class (encapsulation). +In addition, implementation of the method `get_area()` is hidden too (abstraction). + +> ## Encapsulation & Abstraction +> Encapsulation provides **information hiding**. Abstraction provides **implementation hiding**. +{: .callout} > ## Exercise: Use Classes to Abstract out Data Loading -> Put the `load_inflammation_data` function we wrote in the last exercise as a member method -> of a new class called `CSVDataSource`. -> Put the configuration of where to load the files in the classes constructor. -> Once this is done, you can construct this class outside the statistical analysis -> and pass the instance in to `analyse_data`. +> Declare a new class `CSVDataSource` that contains the `load_inflammation_data` function +> we wrote in the previous exercise as a method of this class. +> The directory path where to load the files from should be passed in the class' constructor method. +> Finally, construct an instance of the class `CSVDataSource` outside the statistical +> analysis and pass it to `analyse_data()` function. >> ## Hint ->> When we have completed the refactoring, the code in the `analyse_data` function ->> should look like: +>> At the end of this exercise, the code in the `analyse_data()` function should look like: >> ```python >> def analyse_data(data_source): >> data = data_source.load_inflammation_data() @@ -157,12 +188,12 @@ Then the method can access the **member variable** `radius`. >> ``` > {: .solution} >> ## Solution ->> You should have created a class that looks something like this: +>> For example, we can declare class `CSVDataSource` like this: >> >> ```python >> class CSVDataSource: >> """ ->> Loads all the inflammation csvs within a specified folder. +>> Loads all the inflammation CSV files within a specified directory. >> """ >> def __init__(self, dir_path): >> self.dir_path = dir_path @@ -170,29 +201,33 @@ Then the method can access the **member variable** `radius`. >> def load_inflammation_data(self): >> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.csv')) >> if len(data_file_paths) == 0: ->> raise ValueError(f"No inflammation csv's found in path {self.dir_path}") +>> raise ValueError(f"No inflammation CSV files found in path {self.dir_path}") >> data = map(models.load_csv, data_file_paths) >> return list(data) >> ``` ->> We can now pass an instance of this class into the the statistical analysis function. ->> This means that should we want to re-use the analysis it wouldn't be fixed to reading ->> from a directory of CSVs. ->> We have fully decoupled the reading of the data from the statistical analysis. +>> In the controller, we create an instance of CSVDataSource and pass it +>> into the the statistical analysis function. +>> +>> ```python +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> analyse_data(data_source) +>> ``` +>> The `analyse_data()` function is modified to receive any data source object (that implements +>> the `load_inflammation_data()` method) as a parameter. >> ```python >> def analyse_data(data_source): >> data = data_source.load_inflammation_data() >> daily_standard_deviation = compute_standard_deviation_by_data(data) >> ... >> ``` ->> ->> In the controller, you might have something like: ->> ->> ```python ->> data_source = CSVDataSource(os.path.dirname(InFiles[0])) ->> analyse_data(data_source) ->> ``` ->> While the behaviour is unchanged, how we call `analyse_data` has changed. ->> We must update our regression test to match this, to ensure we haven't broken the code: +>> We have now fully decoupled the reading of the data from the statistical analysis and +>> the analysis is not fixed to reading from a directory of CSV files. Indeed, we can pass various +>> data sources to this function now, as long as they implement the `load_inflammation_data()` +>> method. +>> +>> While the overall behaviour of the code and its results are unchanged, +>> the way we invoke data analysis has changed. +>> We must update our regression test to match this, to ensure we have not broken anything: >> ```python >> ... >> def test_compute_data(): @@ -206,42 +241,50 @@ Then the method can access the **member variable** `radius`. > {: .solution} {: .challenge} + ## Interfaces -Another important concept in software design is the idea of **interfaces** between different units in the code. -One kind of interface you might have come across are APIs (Application Programming Interfaces). -These allow separate systems to communicate with each other - such as a making an API request -to Google Maps to find the latitude and longitude of an address. +An interface is another important concept in software design related to abstraction and +encapsulation. For a software component, it declares the operations that can be invoked on +that component, along with input arguments and what it returns. By knowing these details, +we can communicate with this component without the need to know how it implements this interface. + +API (Application Programming Interface) is one example of an interface that allows separate +systems (external to one another) to communicate with each other. +For example, a request to Google Maps service API may get +you the latitude and longitude for a given address. +Twitter API may return all tweets that contain +a given keyword that have been posted within a certain date range. -However, there are internal interfaces within our software that dictate how +Internal interfaces within software dictate how different parts of the system interact with each other. -Even if these aren't thought out or documented, they still exist! +Even when these are not explicitly documented or thought out, they still exist. -For example, our `Circle` class implicitly has an interface: -you can call `get_area` on it and it will return a number representing its area. +For example, our `Circle` class implicitly has an interface - you can call `get_area()` method +on it and it will return a number representing its surface area. -> ## Exercise: Identify the interface between `CSVDataSource` and `analyse_data` -> What is the interface that CSVDataSource has with `analyse_data`. -> Think about what functions `analyse_data` needs to be able to call, -> what parameters they need and what it will return. +> ## Exercise: Identify an Interface Between `CSVDataSource` and `analyse_data` +> What is the interface between CSVDataSource class and `analyse_data()` function. +> Think about what functions `analyse_data()` needs to be able to call to perform its duty, +> what parameters they need and what they return. >> ## Solution ->> The interface is the `load_inflammation_data` method. ->> ->> It takes no parameters. ->> ->> It returns a list where each entry is a 2D array of patient inflammation results by day ->> Any object we pass into `analyse_data` must conform to this interface. +>> The interface is the `load_inflammation_data()` method, which takes no parameters and +>> returns a list where each entry is a 2D array of patient inflammation data (read from some +> data source). +>> +>> Any object passed into `analyse_data()` should conform to this interface. > {: .solution} {: .challenge} + ## Polymorphism -It is possible to design multiple classes that each conform to the same interface. +In OOP, it is possible to have different object classes that conform to the same interface. -For example, we could provide a `Rectangle` class: +For example, let's have a look at the `Rectangle` class: ```python -class Rectangle(Shape): +class Rectangle: def __init__(self, width, height): self.width = width self.height = height @@ -249,18 +292,15 @@ class Rectangle(Shape): return self.width * self.height ``` -Like `Circle`, this class provides a `get_area` method. +Like `Circle`, this class provides a `get_area()` method. The method takes the same number of parameters (none), and returns a number. -However, the implementation is different. +However, the implementation is different. This is one type of *polymorphism*. -When classes share an interface, then we can use an instance of a class without -knowing what specific class is being used. -When we do this, it is called **polymorphism**. +The word "polymorphism" means "many forms", and in programming it refers to +methods/functions/operators with the same name that can be executed on many objects or classes. -Here is an example where we create a list of shapes (either Circles or Rectangles) -and can then find the total area. -Note how we call `get_area` and Python is able to call the appropriate `get_area` -for each of the shapes. +Using our `Circle` and `Rectangle` classes, we can create a list of different shapes and iterate +through the list to find their total surface area as follows: ```python my_circle = Circle(radius=10) @@ -269,24 +309,22 @@ my_shapes = [my_circle, my_rectangle] total_area = sum(shape.get_area() for shape in my_shapes) ``` -This is an example of **abstraction** - when we are calculating the total -area, the method for calculating the area of each shape is abstracted away -to the relevant class. - -### How polymorphism is useful - -As we saw with the `Circle` and `Square` examples, we can use common interfaces and polymorphism -to abstract away the details of the implementation from the caller. - -For example, we could replace our `CSVDataSource` with a class that reads a totally different format, -or reads from an external service. -All of these can be added in without changing the analysis. -Further - if we want to write a new analysis, we can support any of these data sources -for free with no further work. -That is, we have decoupled the job of loading the data from the job of analysing the data. - -> ## Exercise: Introduce an alternative implementation of DataSource -> Create another class that supports loading JSON instead of CSV. +Note that we have not created a common superclass or linked the classes `Circle` and `Rectangle` +together in any way. It is possible due to polymorphism. +You could also say that, when we are calculating the total surface area, +the method for calculating the area of each shape is abstracted away to the relevant class. + +How can polymorphism be useful in our software project? +For example, we can replace our `CSVDataSource` with another class that reads a totally +different file format (e.g. JSON instead of CSV), or reads from an external service or database +All of these changes can be now be made without changing the analysis function as we have decoupled +the process of data loading from the data analysis earlier. +Conversely, if we wanted to write a new analysis function, we could support any of these +data sources with no extra work. + +> ## Exercise: Add an Additional DataSource +> Create another class that supports loading patient data from JSON files, with the +> appropriate `load_inflammation_data()` method. > There is a function in `models.py` that loads from JSON in the following format: > ```json > [ @@ -298,14 +336,13 @@ That is, we have decoupled the job of loading the data from the job of analysing > } > ] > ``` -> It should implement the `load_inflammation_data` method. > Finally, at run time construct an appropriate instance based on the file extension. >> ## Solution ->> You should have created a class that looks something like: +>> The new class could look something like: >> ```python >> class JSONDataSource: >> """ ->> Loads all the inflammation JSON's within a specified folder. +>> Loads patient data with inflammation values from JSON files within a specified folder. >> """ >> def __init__(self, dir_path): >> self.dir_path = dir_path @@ -329,23 +366,22 @@ That is, we have decoupled the job of loading the data from the job of analysing >> raise ValueError(f'Unsupported file format: {extension}') >> analyse_data(data_source) >>``` ->> As you have seen, all these changes were made without modifying +>> As you can seen, all the above changes have been made made without modifying >> the analysis code itself. > {: .solution} {: .challenge} -## Testing using Mock Objects +## Testing Using Mock Objects We can use this abstraction to also make testing more straight forward. Instead of having our tests use real file system data, we can instead provide a mock or dummy implementation instead of one of the real classes. -Providing what we substitute conforms to the same interface, the code we are testing will work -just the same. -This dummy implementation could just returns some fixed example data. - +Providing that what we use as a substitute conforms to the same interface, +the code we are testing should work just the same. +Such mock/dummy implementation could just returns some fixed example data. An convenient way to do this in Python is using Python's [mock object library](https://docs.python.org/3/library/unittest.mock.html). -These are a whole topic to themselves - +This is a whole topic in itself - but a basic mock can be constructed using a couple of lines of code: ```python @@ -355,14 +391,14 @@ mock_version = Mock() mock_version.method_to_mock.return_value = 42 ``` -Here we construct a mock in the same way you'd construct a class. +Here we construct a mock in the same way you would construct a class. Then we specify a method that we want to behave a specific way. Now whenever you call `mock_version.method_to_mock()` the return value will be `42`. -> ## Exercise: Test using a mock or dummy implementation -> Complete this test for analyse_data, using a mock object in place of the +> ## Exercise: Test Using a Mock Implementation +> Complete this test for `analyse_data()`, using a mock object in place of the > `data_source`: > ```python > from unittest.mock import Mock @@ -377,11 +413,11 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > > # TODO: add assert on the contents of result > ``` -> Create a mock for to provide as the `data_source` that returns some fixed data to test +> Create a mock that returns some fixed data and to use as the `data_source` in order to test > the `analyse_data` method. > Use this mock in a test. > -> Don't forget you will need to import `Mock` from the `unittest.mock` package. +> Do not forget to import `Mock` from the `unittest.mock` package. >> ## Solution >> ```python >> from unittest.mock import Mock @@ -398,51 +434,35 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > {: .solution} {: .challenge} -## Object Oriented Programming - -Using classes, particularly when using polymorphism, are techniques that come from -**object oriented programming** (frequently abbreviated to OOP). -As with functional programming different programming languages will provide features to enable you -to write object oriented code. -For example, in Python you can create classes, and use polymorphism to call the -correct method on an instance (e.g when we called `get_area` on a shape, the appropriate `get_area` was called). - -Object oriented programming also includes **information hiding**. -In this, certain fields might be marked private to a class, -preventing them from being modified at will. - -This can be used to maintain invariants of a class (such as insisting that a circles radius is always non-negative). - -There is also inheritance, which allows classes to specialise -the behaviour of other classes by **inheriting** from -another class and **overriding** certain methods. - -As with functional programming, there are times when -object oriented programming is well suited, and times where it is not. - -Good uses: - - * Representing real world objects with invariants - * Providing alternative implementations such as we did with DataSource - * Representing something that has a state that will change over the programs lifetime (such as elements of a GUI) - -One downside of OOP is ending up with very large classes that contain complex methods. -As they are methods on the class, it can be hard to know up front what side effects it causes to the class. -This can make maintenance hard. - -> ## Classes and functional programming -> Using classes is compatible with functional programming. -> In fact, grouping data into logical structures (such as three numbers into a vector) -> is a vital step in writing readable and maintainable code with any approach. -> However, when writing in a functional style, classes should be immutable. -> That is, the methods they provide are read-only. -> If you require the class to be different, you'd create a new instance -> with the new values. -> (that is, the functions should not modify the state of the class). +## Programming Paradigms + +Until now, we have mainly written procedural code. +In this episode, we have touched a bit upon classes, encapsulation and polymorphism, +which are characteristics of (but not limited to) the Object Oriented Programming (OOP). +These different paradigms provide varied approaches to solving a problem and structuring +your code - each with certain strengths and weaknesses when used to solve particular types of +problems. +In many cases, particularly with modern languages, a single language can allow many different +structural approaches within your code. +Once your software begins to get more complex - it is common to use aspects of different paradigms +to handle different subtasks. +Because of this, it is useful to know about the major paradigms, +so you can recognise where it might be useful to switch. +This is outside of scope of this course, so we will point you to some further reading. + +> ## So Which is Python? +> Python is a multi-paradigm and multi-purpose programming language. +> You can use it as a procedural language and you can use it in a more object oriented way. +> It does tend to land more on the object oriented side as all its core data types +> (strings, integers, floats, booleans, lists, +> sets, arrays, tuples, dictionaries, files) +> as well as functions, modules and classes are objects. +> +> Since functions in Python are also objects that can be passed around like any other object, +> Python is also well suited to functional programming. +> One of the most popular Python libraries for data manipulation, +> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), +> supports a functional programming style +> as most of its functions on data are not changing the data (no side effects) +> but producing a new data to reflect the result of the function. {: .callout} - - -Don't use features for the sake of using features. -Code should be as simple as it can be, but not any simpler. -If you know your function only makes sense to operate on circles, then -don't accept shapes just to use polymorphism! From 5e135e09801f3cb64fb044b6b7f650de0157904a Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Wed, 28 Feb 2024 09:27:24 +0000 Subject: [PATCH 087/105] More review of section 3 --- _episodes/30-section3-intro.md | 2 +- _episodes/31-software-requirements.md | 14 ++++++------ _episodes/32-software-design.md | 31 ++++++++++++++------------- 3 files changed, 24 insertions(+), 23 deletions(-) diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index 461d55f4c..5835ddc5d 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -79,7 +79,7 @@ The typical stages of a software development process can be categorised as follo This helps maintain a clear direction throughout development, and sets clear targets for what the software needs to do. - **Design:** where the requirements are translated into an overall design for the software. - It covers what will be the basic software 'components' and how they'll fit together, + It covers what will be the basic software 'components' and how they will fit together, as well as the tools and technologies that will be used, which will together address the requirements identified in the first stage. - **Implementation:** the software is developed according to the design, diff --git a/_episodes/31-software-requirements.md b/_episodes/31-software-requirements.md index 87634a989..9faf0ed08 100644 --- a/_episodes/31-software-requirements.md +++ b/_episodes/31-software-requirements.md @@ -223,7 +223,7 @@ and these aspects should be considered as part of the software's non-functional > ## Optional Exercise: Requirements for Your Software Project > -> Think back to a piece of code or software (either small or large) you've written, +> Think back to a piece of code or software (either small or large) you have written, > or which you have experience using. > First, try to formulate a few of its key business requirements, > then derive these into user and then solution requirements. @@ -232,7 +232,7 @@ and these aspects should be considered as part of the software's non-functional ### Long- or Short-Lived Code? -Along with requirements, here's something to consider early on. +Along with requirements, here is something to consider early on. You, perhaps with others, may be developing open-source software with the intent that it will live on after your project completes. It could be important to you that your software is adopted and used by other projects @@ -248,10 +248,10 @@ so be sure to consider these aspects. On the other hand, you might want to knock together some code to prove a concept or to perform a quick calculation and then just discard it. -But can you be sure you'll never want to use it again? -Maybe a few months from now you'll realise you need it after all, +But can you be sure you will never want to use it again? +Maybe a few months from now you will realise you need it after all, or you'll have a colleague say "I wish I had a..." -and realise you've already made one. +and realise you have already made one. A little effort now could save you a lot in the future. ## From Requirements to Implementation, via Design @@ -268,12 +268,12 @@ At each level, not only are the perspectives different, but so are the nature of the objectives and the language used to describe them, since they each reflect the perspective and language of their stakeholder group. -It's often tempting to go right ahead and implement requirements within existing software, +It is often tempting to go right ahead and implement requirements within existing software, but this neglects a crucial step: do these new requirements fit within our existing design, or does our design need to be revisited? It may not need any changes at all, -but if it doesn't fit logically our design will need a bigger rethink +but if it does not fit logically our design will need a bigger rethink so the new requirement can be implemented in a sensible way. We'll look at this a bit later in this section, but simply adding new code without considering diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 7a4f3cc2e..4db957407 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -7,10 +7,9 @@ questions: - "What should we consider when designing software?" objectives: - "Understand the goals and principles of designing 'good' software." -- "Understand what a code abstraction is, and when we should use it." +- "Understand code decoupling and code abstraction design techniques." - "Understand what code refactoring is." keypoints: -- "When writing software used for research, requirements will almost *always* change." - "'Good' code is designed to be maintainable: readable by people who did not author the code, testable through a set of automated tests, adaptable to new requirements." - "The sooner you adopt a practice of designing your software in the lifecycle of your project, @@ -22,18 +21,19 @@ the easier the development and maintenance process will." Ideally, we should have at least a rough design of our software sketched out before we write a single line of code. This design should be based around the requirements and the structure of the problem we are trying -to solve: what are the concepts we need to represent and what are the relationships between them. +to solve: what are the concepts we need to represent in our code +and what are the relationships between them. And importantly, who will be using our software and how will they interact with it. As a piece of software grows, -it will reach a point where there's too much code for us to keep in mind at once. +it will reach a point where there is too much code for us to keep in mind at once. At this point, it becomes particularly important to think of the overall design and structure of our software, how should all the pieces of functionality fit together, and how should we work towards fulfilling this overall design throughout development. Even if you did not think about the design of your software from the very beginning - it is not too late to start now. -It's not easy to come up with a complete definition for the term **software design**, +It is not easy to come up with a complete definition for the term **software design**, but some of the common aspects are: - **Algorithm design** - @@ -87,12 +87,12 @@ goal of having *maintainable* code, which is: or separating "pure" (without side-effects) and "impure" (with side-effects) parts of the code on the level of functions. -Now that we know what goals we should aspire to, let's take a critical look at the code in our +Now that we know what goals we should aspire to, let us take a critical look at the code in our software project and try to identify ways in which it can be improved. > ## Exercise: Identifying How Code Can be Improved? > A team member has implemented a feature to our inflammation analysis software so that when a -> `--full-data-analysis` command line parameter parameter is passed to software, +> `--full-data-analysis` command line parameter parameter is passed, > it scans the directory of one of the provided files, compares standard deviations across > the data by day and plots a graph. > The code is located in `compute_data.py` file within the `inflammation` project @@ -106,7 +106,7 @@ software project and try to identify ways in which it can be improved. > make making those changes challenging. >> ## Solution >> You may have found others, but here are some of the things that make the code ->> hard to read, test and maintain: +>> hard to read, test and maintain. >> >> * **Hard to read:** everything is implemented in a single function. >> In order to understand it, you need to understand how file loading works at the same time as @@ -156,7 +156,7 @@ The key for an intermediate developer is to balance these concerns for each software project appropriately, and employ design and development practices *enough* so that progress can be made. It is very easy to under-design software, -but remember it's also possible to over-design software too. +but remember it is also possible to over-design software too. ## Techniques for Improving Code @@ -170,7 +170,8 @@ the entire codebase at once. ### Code Refactoring -*Refactoring* is the process of changing the internal structure of code without changing its +*Code refactoring* is the process of improving the design of existing code - +changing the internal structure of code without changing its external behavior, with the goal of making the code more readable, maintainable, efficient or easier to test. This can include things such as renaming variables, reorganising @@ -185,11 +186,11 @@ process to follow is: ### Code Decoupling -*Code decoupling* is another technique of improving the code by breaking a (complex) -software system into smaller parts, more manageable parts, and reducing the interdependence -between these different components or modules of the system. +*Code decoupling* is a code design technique that involves breaking a (complex) +software system into smaller, more manageable parts, and reducing the interdependence +between these different parts of the system. This means that a change in one part of the code usually does not require a change in the other, -thereby making its development more efficient. +thereby making its development more efficient and less error prone. ### Code Abstraction @@ -214,7 +215,7 @@ It will help to keep our codebase clean, modular and easier to understand. Writing good code is hard and takes practise. You may also be faced with an existing piece of code that breaks some (or all) of the -good code principles, and your job will be to improve it so that the code can evolve further. +good code principles, and your job will be to improve/refactor it so that it can evolve further. We will now look into some examples of the techniques that can help us redesign our code and incrementally improve its quality. From 466b840831264bc3e706fed93c5d26108dd7b488 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Wed, 28 Feb 2024 10:11:50 +0000 Subject: [PATCH 088/105] Added extra episdoe on OOP --- _config.yml | 1 + ...-refactoring.md => 33-code-refactoring.md} | 30 +- ...-decoupling.md => 34-code-abstractions.md} | 63 +- _extras/object-oriented-programming.md | 908 ++++++++++++++++++ _extras/software-architecture-paradigms.md | 42 +- 5 files changed, 960 insertions(+), 84 deletions(-) rename _episodes/{33-refactoring.md => 33-code-refactoring.md} (90%) rename _episodes/{34-decoupling.md => 34-code-abstractions.md} (89%) create mode 100644 _extras/object-oriented-programming.md diff --git a/_config.yml b/_config.yml index 96103624a..6ab746107 100644 --- a/_config.yml +++ b/_config.yml @@ -97,6 +97,7 @@ extras_order: - vscode - software-architecture-paradigms - functional-programming + - object-oriented-programming - persistence - databases - quiz diff --git a/_episodes/33-refactoring.md b/_episodes/33-code-refactoring.md similarity index 90% rename from _episodes/33-refactoring.md rename to _episodes/33-code-refactoring.md index 28e920390..6bd74c057 100644 --- a/_episodes/33-refactoring.md +++ b/_episodes/33-code-refactoring.md @@ -4,29 +4,45 @@ teaching: 30 exercises: 20 questions: - "How do you refactor code without breaking it?" +- "What is decoupled code?" - "What are benefits of pure functions?" objectives: +- "Understand the benefits of code decoupling." - "Understand the use of regressions tests to avoid breaking existing code when refactoring." - "Understand the use of pure functions in software design to make the code easier to test." keypoints: -- "Implementing regression tests before you refactor the code gives you confidence that your changes have not -broken anything." -- "By refactoring code into pure functions that process data without side effects makes code easier +- "Implementing regression tests before refactoring gives you confidence that your changes have not +broken the code." +- "Decoupling code into pure functions that process data without side effects makes code easier to read, test and maintain." --- ## Introduction +Recall that *code decoupling* means breaking the system into smaller components and reducing the +interdependence between these components, so that they can be tested and maintained independently. +Two components of code can be considered **decoupled** if a change in one does not +necessitate a change in the other. +While two connected units cannot always be totally decoupled, **loose coupling** +is something we should aim for. Benefits of decoupled code include: + +* easier to read as you do not need to understand the + details of the other component. +* easier to test, as one of the components can be replaced + by a test or a mock version of it. +* code tends to be easier to maintain, as changes can be isolated + from other parts of the code. + In this episode we will refactor the function `analyse_data()` in `compute_data.py` from our project in the following two ways: * add more tests so we can be more confident that future changes will have the intended effect and will not break the existing code. -* split the `analyse_data()` function into a number of smaller (functions) making the code -easier to understand and test. +* split the monolithic `analyse_data()` function into a number of smaller and mode decoupled functions +making the code easier to understand and test. ## Writing Tests Before Refactoring -When refactoring, remember we should first make sure there are tests that verity +When refactoring, first we need to make sure there are tests that verity the code behaviour as it is now (or write them if they are missing), then refactor the code and, finally, check that the original tests still pass. This is to make sure we do not break the existing behaviour through refactoring. @@ -126,7 +142,7 @@ the tests at all. Now that we have our regression test for `analyse_data()` in place, we are ready to refactor the function further. -We would like to separate out as much of its code as possible as **pure functions**. +We would like to separate out as much of its code as possible as *pure functions*. Pure functions are very useful and much easier to test as they take input only from its input parameters and output only via their return values. diff --git a/_episodes/34-decoupling.md b/_episodes/34-code-abstractions.md similarity index 89% rename from _episodes/34-decoupling.md rename to _episodes/34-code-abstractions.md index d6f31720b..fbd55d6e9 100644 --- a/_episodes/34-decoupling.md +++ b/_episodes/34-code-abstractions.md @@ -1,39 +1,21 @@ --- -title: "Code Decoupling & Abstractions" +title: "Code Abstractions" teaching: 30 exercises: 45 questions: -- "What is de-coupled code?" - "When is it useful to use classes to structure code?" - "How can we make sure the components of our software are reusable?" objectives: -- "Understand the object-oriented principle of polymorphism and interfaces." -- "Be able to introduce appropriate abstractions to simplify code." -- "Understand what decoupled code is, and why you would want it." +- "Introduce appropriate abstractions to simplify code." +- "Understand the principles of polymorphism and interfaces." - "Be able to use mocks to replace a class in test code." keypoints: -- "Classes can help separate code so it is easier to understand." -- "By using interfaces, code can become more decoupled." -- "Decoupled code is easier to test, and easier to maintain." +- "Classes and interfaces can help decouple code so it is easier to understand, test and maintain." --- ## Introduction -Decoupling means breaking the system into smaller components and reducing the interdependence -between these components, so that they can be tested and maintained independently. -Two components of code can be considered **decoupled** if a change in one does not -necessitate a change in the other. -While two connected units cannot always be totally decoupled, **loose coupling** -is something we should aim for. Benefits of decoupled code include: - -* easier to read as you do not need to understand the - details of the other component. -* easier to test, as one of the components can be replaced - by a test or a mock version of it. -* code tends to be easier to maintain, as changes can be isolated - from other parts of the code. - -*Abstraction* is the process of hiding the implementation details of a piece of +*Code abstraction* is the process of hiding the implementation details of a piece of code behind an interface - i.e. the details of *how* something works are hidden away, leaving us to deal only with *what* it does. This allows developers to work with the code at a higher level @@ -44,13 +26,13 @@ Abstractions can aid decoupling of code. If one part of the code only uses another part through an appropriate abstraction then it becomes easier for these parts to change independently. -Let's start redesigning our code by introducing some of the decoupling and abstraction techniques +Let's start redesigning our code by introducing some of the abstraction techniques to incrementally improve its design. You may have noticed that loading data from CSV files in a directory is "baked" into (i.e. is part of) the `analyse_data()` function. -This is not strictly a functionality of the data analysis function, so let's decouple the date -loading this into a separate function. +This is not strictly a functionality of the data analysis function, so firstly +let's decouple the data loading into a separate function. > ## Exercise: Decouple Data Loading from Data Analysis > Separate out the data loading functionality from `analyse_data()` into a new function @@ -279,9 +261,8 @@ on it and it will return a number representing its surface area. ## Polymorphism -In OOP, it is possible to have different object classes that conform to the same interface. - -For example, let's have a look at the `Rectangle` class: +In OOP, it is possible to have different object classes that conform to the same interface. +For example, let's have a look at the following class representing a `Rectangle`: ```python class Rectangle: @@ -292,7 +273,7 @@ class Rectangle: return self.width * self.height ``` -Like `Circle`, this class provides a `get_area()` method. +Like `Circle`, this class provides the `get_area()` method. The method takes the same number of parameters (none), and returns a number. However, the implementation is different. This is one type of *polymorphism*. @@ -436,21 +417,25 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` ## Programming Paradigms -Until now, we have mainly written procedural code. +Until now, we have mainly been writing procedural code. +In the previous episode, we mentioned [pure functions](/33-code-refactoring/index.html#pure-functions) +and Functional Programming. In this episode, we have touched a bit upon classes, encapsulation and polymorphism, which are characteristics of (but not limited to) the Object Oriented Programming (OOP). -These different paradigms provide varied approaches to solving a problem and structuring -your code - each with certain strengths and weaknesses when used to solve particular types of -problems. +All these different programming paradigms provide varied approaches to structuring your code - +each with certain strengths and weaknesses when used to solve particular types of problems. In many cases, particularly with modern languages, a single language can allow many different -structural approaches within your code. -Once your software begins to get more complex - it is common to use aspects of different paradigms +structural approaches and mixing programming paradigms within your code. +Once your software begins to get more complex - it is common to use aspects of [different paradigm](/software-architecture-paradigms/index.html) to handle different subtasks. -Because of this, it is useful to know about the major paradigms, +Because of this, it is useful to know about the [major paradigms](/software-architecture-paradigms/index.html), so you can recognise where it might be useful to switch. -This is outside of scope of this course, so we will point you to some further reading. +This is outside of scope of this course - we have some extra episodes on the topics of +[Procedural Programming](/software-architecture-paradigms/index.html#procedural-programming), +[Functional Programming](/functional-programming/index.html) and +[Object Oriented Programming](/object-oriented-programming/index.html) if you want to know more. -> ## So Which is Python? +> ## So Which One is Python? > Python is a multi-paradigm and multi-purpose programming language. > You can use it as a procedural language and you can use it in a more object oriented way. > It does tend to land more on the object oriented side as all its core data types diff --git a/_extras/object-oriented-programming.md b/_extras/object-oriented-programming.md new file mode 100644 index 000000000..2a882cebc --- /dev/null +++ b/_extras/object-oriented-programming.md @@ -0,0 +1,908 @@ +--- +title: "Object Oriented Programming" +teaching: 30 +exercises: 35 +questions: +- "How can we use code to describe the structure of data?" +- "How should the relationships between structures be described?" +objectives: +- "Describe the core concepts that define the object oriented paradigm" +- "Use classes to encapsulate data within a more complex program" +- "Structure concepts within a program in terms of sets of behaviour" +- "Identify different types of relationship between concepts within a program" +- "Structure data within a program using these relationships" +keypoints: +- "Object oriented programming is a programming paradigm based on the concept of classes, which encapsulate data and code." +- "Classes allow us to organise data into distinct concepts." +- "By breaking down our data into classes, we can reason about the behaviour of parts of our data." +- "Relationships between concepts can be described using inheritance (*is a*) and composition (*has a*)." +--- + +## Introduction + +Object oriented programming is a programming paradigm based on the concept of objects, +which are data structures that contain (encapsulate) data and code. +Data is encapsulated in the form of fields (attributes) of objects, +while code is encapsulated in the form of procedures (methods) +that manipulate objects' attributes and define "behaviour" of objects. +So, in object oriented programming, +we first think about the data and the things that we’re modelling - +and represent these by objects - +rather than define the logic of the program, +and code becomes a series of interactions between objects. + +## Structuring Data + +One of the main difficulties we encounter when building more complex software is +how to structure our data. +So far, we've been processing data from a single source and with a simple tabular structure, +but it would be useful to be able to combine data from a range of different sources +and with more data than just an array of numbers. + +~~~ +data = np.array([[1., 2., 3.], + [4., 5., 6.]]) +~~~ +{: .language-python} + +Using this data structure has the advantage of +being able to use NumPy operations to process the data +and Matplotlib to plot it, +but often we need to have more structure than this. +For example, we may need to attach more information about the patients +and store this alongside our measurements of inflammation. + +We can do this using the Python data structures we're already familiar with, +dictionaries and lists. +For instance, we could attach a name to each of our patients: + +~~~ +patients = [ + { + 'name': 'Alice', + 'data': [1., 2., 3.], + }, + { + 'name': 'Bob', + 'data': [4., 5., 6.], + }, +] +~~~ +{: .language-python} + +> ## Exercise: Structuring Data +> +> Write a function, called `attach_names`, +> which can be used to attach names to our patient dataset. +> When used as below, it should produce the expected output. +> +> If you are not sure where to begin, +> think about ways you might be able to effectively loop over two collections at once. +> Also, do not worry too much about the data type of the `data` value, +> it can be a Python list, or a NumPy array - either is fine. +> +> ~~~ +> data = np.array([[1., 2., 3.], +> [4., 5., 6.]]) +> +> output = attach_names(data, ['Alice', 'Bob']) +> print(output) +> ~~~ +> {: .language-python} +> +> ~~~ +> [ +> { +> 'name': 'Alice', +> 'data': [1., 2., 3.], +> }, +> { +> 'name': 'Bob', +> 'data': [4., 5., 6.], +> }, +> ] +> ~~~ +> {: .output} +> +> Time: 10 min +> > ## Solution +> > +> > One possible solution, perhaps the most obvious, +> > is to use the `range` function to index into both lists at the same location: +> > +> > ~~~ +> > def attach_names(data, names): +> > """Create datastructure containing patient records.""" +> > output = [] +> > +> > for i in range(len(data)): +> > output.append({'name': names[i], +> > 'data': data[i]}) +> > +> > return output +> > ~~~ +> > {: .language-python} +> > +> > However, this solution has a potential problem that can occur sometimes, +> > depending on the input. +> > What might go wrong with this solution? +> > How could we fix it? +> > +> > > ## A Better Solution +> > > +> > > What would happen if the `data` and `names` inputs were different lengths? +> > > +> > > If `names` is longer, we'll loop through, until we run out of rows in the `data` input, +> > > at which point we'll stop processing the last few names. +> > > If `data` is longer, we'll loop through, but at some point we'll run out of names - +> > > but this time we try to access part of the list that doesn't exist, +> > > so we'll get an exception. +> > > +> > > A better solution would be to use the `zip` function, +> > > which allows us to iterate over multiple iterables without needing an index variable. +> > > The `zip` function also limits the iteration to whichever of the iterables is smaller, +> > > so we won't raise an exception here, +> > > but this might not quite be the behaviour we want, +> > > so we'll also explicitly `assert` that the inputs should be the same length. +> > > Checking that our inputs are valid in this way is an example of a precondition, +> > > which we introduced conceptually in an earlier episode. +> > > +> > > If you've not previously come across the `zip` function, +> > > read [this section](https://docs.python.org/3/library/functions.html#zip) +> > > of the Python documentation. +> > > +> > > ~~~ +> > > def attach_names(data, names): +> > > """Create datastructure containing patient records.""" +> > > assert len(data) == len(names) +> > > output = [] +> > > +> > > for data_row, name in zip(data, names): +> > > output.append({'name': name, +> > > 'data': data_row}) +> > > +> > > return output +> > > ~~~ +> > > {: .language-python} +> > {: .solution} +> {: .solution} +{: .challenge} + +## Classes in Python + +Using nested dictionaries and lists should work for some of the simpler cases +where we need to handle structured data, +but they get quite difficult to manage once the structure becomes a bit more complex. +For this reason, in the object oriented paradigm, +we use **classes** to help with managing this data +and the operations we would want to perform on it. +A class is a **template** (blueprint) for a structured piece of data, +so when we create some data using a class, +we can be certain that it has the same structure each time. + +With our list of dictionaries we had in the example above, +we have no real guarantee that each dictionary has the same structure, +e.g. the same keys (`name` and `data`) unless we check it manually. +With a class, if an object is an **instance** of that class +(i.e. it was made using that template), +we know it will have the structure defined by that class. +Different programming languages make slightly different guarantees +about how strictly the structure will match, +but in object oriented programming this is one of the core ideas - +all objects derived from the same class must follow the same behaviour. + +You may not have realised, but you should already be familiar with +some of the classes that come bundled as part of Python, for example: + +~~~ +my_list = [1, 2, 3] +my_dict = {1: '1', 2: '2', 3: '3'} +my_set = {1, 2, 3} + +print(type(my_list)) +print(type(my_dict)) +print(type(my_set)) +~~~ +{: .language-python} + +~~~ + + + +~~~ +{: .output} + +Lists, dictionaries and sets are a slightly special type of class, +but they behave in much the same way as a class we might define ourselves: + +- They each hold some data (**attributes** or **state**). +- They also provide some methods describing the behaviours of the data - + what can the data do and what can we do to the data? + +The behaviours we may have seen previously include: + +- Lists can be appended to +- Lists can be indexed +- Lists can be sliced +- Key-value pairs can be added to dictionaries +- The value at a key can be looked up in a dictionary +- The union of two sets can be found (the set of values present in any of the sets) +- The intersection of two sets can be found (the set of values present in all of the sets) + +## Encapsulating Data + +Let's start with a minimal example of a class representing our patients. + +~~~ +# file: inflammation/models.py + +class Patient: + def __init__(self, name): + self.name = name + self.observations = [] + +alice = Patient('Alice') +print(alice.name) +~~~ +{: .language-python} + +~~~ +Alice +~~~ +{: .output} + +Here we've defined a class with one method: `__init__`. +This method is the **initialiser** method, +which is responsible for setting up the initial values and structure of the data +inside a new instance of the class - +this is very similar to **constructors** in other languages, +so the term is often used in Python too. +The `__init__` method is called every time we create a new instance of the class, +as in `Patient('Alice')`. +The argument `self` refers to the instance on which we are calling the method +and gets filled in automatically by Python - +we do not need to provide a value for this when we call the method. + +Data encapsulated within our Patient class includes +the patient's name and a list of inflammation observations. +In the initialiser method, +we set a patient's name to the value provided, +and create a list of inflammation observations for the patient (initially empty). +Such data is also referred to as the attributes of a class +and holds the current state of an instance of the class. +Attributes are typically hidden (encapsulated) internal object details +ensuring that access to data is protected from unintended changes. +They are manipulated internally by the class, +which, in addition, can expose certain functionality as public behavior of the class +to allow other objects to interact with this class' instances. + +## Encapsulating Behaviour + +In addition to representing a piece of structured data +(e.g. a patient who has a name and a list of inflammation observations), +a class can also provide a set of functions, or **methods**, +which describe the **behaviours** of the data encapsulated in the instances of that class. +To define the behaviour of a class we add functions which operate on the data the class contains. +These functions are the member functions or methods. + +Methods on classes are the same as normal functions, +except that they live inside a class and have an extra first parameter `self`. +Using the name `self` is not strictly necessary, but is a very strong convention - +it is extremely rare to see any other name chosen. +When we call a method on an object, +the value of `self` is automatically set to this object - hence the name. +As we saw with the `__init__` method previously, +we do not need to explicitly provide a value for the `self` argument, +this is done for us by Python. + +Let's add another method on our Patient class that adds a new observation to a Patient instance. + +~~~ +# file: inflammation/models.py + +class Patient: + """A patient in an inflammation study.""" + def __init__(self, name): + self.name = name + self.observations = [] + + def add_observation(self, value, day=None): + if day is None: + if self.observations: + day = self.observations[-1]['day'] + 1 + else: + day = 0 + + new_observation = { + 'day': day, + 'value': value, + } + + self.observations.append(new_observation) + return new_observation + +alice = Patient('Alice') +print(alice) + +observation = alice.add_observation(3) +print(observation) +print(alice.observations) +~~~ +{: .language-python} + +~~~ +<__main__.Patient object at 0x7fd7e61b73d0> +{'day': 0, 'value': 3} +[{'day': 0, 'value': 3}] +~~~ +{: .output} + +Note also how we used `day=None` in the parameter list of the `add_observation` method, +then initialise it if the value is indeed `None`. +This is one of the common ways to handle an optional argument in Python, +so we'll see this pattern quite a lot in real projects. + +> ## Class and Static Methods +> +> Sometimes, the function we're writing doesn't need access to +> any data belonging to a particular object. +> For these situations, we can instead use a **class method** or a **static method**. +> Class methods have access to the class that they're a part of, +> and can access data on that class - +> but do not belong to a specific instance of that class, +> whereas static methods have access to neither the class nor its instances. +> +> By convention, class methods use `cls` as their first argument instead of `self` - +> this is how we access the class and its data, +> just like `self` allows us to access the instance and its data. +> Static methods have neither `self` nor `cls` +> so the arguments look like a typical free function. +> These are the only common exceptions to using `self` for a method's first argument. +> +> Both of these method types are created using **decorators** - +> for more information see +> the [classmethod](https://docs.python.org/3/library/functions.html#classmethod) +> and [staticmethod](https://docs.python.org/3/library/functions.html#staticmethod) +> decorator sections of the Python documentation. +{: .callout} + +### Dunder Methods + +Why is the `__init__` method not called `init`? +There are a few special method names that we can use +which Python will use to provide a few common behaviours, +each of which begins and ends with a **d**ouble-**under**score, +hence the name **dunder method**. + +When writing your own Python classes, +you'll almost always want to write an `__init__` method, +but there are a few other common ones you might need sometimes. +You may have noticed in the code above that the method `print(alice)` +returned `<__main__.Patient object at 0x7fd7e61b73d0>`, +which is the string representation of the `alice` object. +We may want the print statement to display the object's name instead. +We can achieve this by overriding the `__str__` method of our class. + +~~~ +# file: inflammation/models.py + +class Patient: + """A patient in an inflammation study.""" + def __init__(self, name): + self.name = name + self.observations = [] + + def add_observation(self, value, day=None): + if day is None: + try: + day = self.observations[-1]['day'] + 1 + + except IndexError: + day = 0 + + + new_observation = { + 'day': day, + 'value': value, + } + + self.observations.append(new_observation) + return new_observation + + def __str__(self): + return self.name + + +alice = Patient('Alice') +print(alice) +~~~ +{: .language-python} + +~~~ +Alice +~~~ +{: .output} + +These dunder methods are not usually called directly, +but rather provide the implementation of some functionality we can use - +we didn't call `alice.__str__()`, +but it was called for us when we did `print(alice)`. +Some we see quite commonly are: + +- `__str__` - converts an object into its string representation, used when you call `str(object)` or `print(object)` +- `__getitem__` - Accesses an object by key, this is how `list[x]` and `dict[x]` are implemented +- `__len__` - gets the length of an object when we use `len(object)` - usually the number of items it contains + +There are many more described in the Python documentation, +but it’s also worth experimenting with built in Python objects to +see which methods provide which behaviour. +For a more complete list of these special methods, +see the [Special Method Names](https://docs.python.org/3/reference/datamodel.html#special-method-names) +section of the Python documentation. + +> ## Exercise: A Basic Class +> +> Implement a class to represent a book. +> Your class should: +> +> - Have a title +> - Have an author +> - When printed using `print(book)`, show text in the format "title by author" +> +> ~~~ +> book = Book('A Book', 'Me') +> +> print(book) +> ~~~ +> {: .language-python} +> +> ~~~ +> A Book by Me +> ~~~ +> {: .output} +> +> Time: 5 min +> > ## Solution +> > +> > ~~~ +> > class Book: +> > def __init__(self, title, author): +> > self.title = title +> > self.author = author +> > +> > def __str__(self): +> > return self.title + ' by ' + self.author +> > ~~~ +> > {: .language-python} +> {: .solution} +{: .challenge} + +### Properties + +The final special type of method we will introduce is a **property**. +Properties are methods which behave like data - +when we want to access them, we do not need to use brackets to call the method manually. + +~~~ +# file: inflammation/models.py + +class Patient: + ... + + @property + def last_observation(self): + return self.observations[-1] + +alice = Patient('Alice') + +alice.add_observation(3) +alice.add_observation(4) + +obs = alice.last_observation +print(obs) +~~~ +{: .language-python} + +~~~ +{'day': 1, 'value': 4} +~~~ +{: .output} + +You may recognise the `@` syntax from episodes on +parameterising unit tests and functional programming - +`property` is another example of a **decorator**. +In this case the `property` decorator is taking the `last_observation` function +and modifying its behaviour, +so it can be accessed as if it were a normal attribute. +It is also possible to make your own decorators, but we won't cover it here. + +## Relationships Between Classes + +We now have a language construct for grouping data and behaviour +related to a single conceptual object. +The next step we need to take is to describe the relationships between the concepts in our code. + +There are two fundamental types of relationship between objects +which we need to be able to describe: + +1. Ownership - x **has a** y - this is **composition** +2. Identity - x **is a** y - this is **inheritance** + +### Composition + +You should hopefully have come across the term **composition** already - +in the novice Software Carpentry, we use composition of functions to reduce code duplication. +That time, we used a function which converted temperatures in Celsius to Kelvin +as a **component** of another function which converted temperatures in Fahrenheit to Kelvin. + +In the same way, in object oriented programming, we can make things components of other things. + +We often use composition where we can say 'x *has a* y' - +for example in our inflammation project, +we might want to say that a doctor *has* patients +or that a patient *has* observations. + +In the case of our example, we're already saying that patients have observations, +so we're already using composition here. +We're currently implementing an observation as a dictionary with a known set of keys though, +so maybe we should make an `Observation` class as well. + +~~~ +# file: inflammation/models.py + +class Observation: + def __init__(self, day, value): + self.day = day + self.value = value + + def __str__(self): + return str(self.value) + +class Patient: + """A patient in an inflammation study.""" + def __init__(self, name): + self.name = name + self.observations = [] + + def add_observation(self, value, day=None): + if day is None: + try: + day = self.observations[-1].day + 1 + + except IndexError: + day = 0 + + new_observation = Observation(day, value) + + self.observations.append(new_observation) + return new_observation + + def __str__(self): + return self.name + + +alice = Patient('Alice') +obs = alice.add_observation(3) + +print(obs) +~~~ +{: .language-python} + +~~~ +3 +~~~ +{: .output} + +Now we're using a composition of two custom classes to +describe the relationship between two types of entity in the system that we're modelling. + +### Inheritance + +The other type of relationship used in object oriented programming is **inheritance**. +Inheritance is about data and behaviour shared by classes, +because they have some shared identity - 'x *is a* y'. +If class `X` inherits from (*is a*) class `Y`, +we say that `Y` is the **superclass** or **parent class** of `X`, +or `X` is a **subclass** of `Y`. + +If we want to extend the previous example to also manage people who aren't patients +we can add another class `Person`. +But `Person` will share some data and behaviour with `Patient` - +in this case both have a name and show that name when you print them. +Since we expect all patients to be people (hopefully!), +it makes sense to implement the behaviour in `Person` and then reuse it in `Patient`. + +To write our class in Python, +we used the `class` keyword, the name of the class, +and then a block of the functions that belong to it. +If the class **inherits** from another class, +we include the parent class name in brackets. + +~~~ +# file: inflammation/models.py + +class Observation: + def __init__(self, day, value): + self.day = day + self.value = value + + def __str__(self): + return str(self.value) + +class Person: + def __init__(self, name): + self.name = name + + def __str__(self): + return self.name + +class Patient(Person): + """A patient in an inflammation study.""" + def __init__(self, name): + super().__init__(name) + self.observations = [] + + def add_observation(self, value, day=None): + if day is None: + try: + day = self.observations[-1].day + 1 + + except IndexError: + day = 0 + + new_observation = Observation(day, value) + + self.observations.append(new_observation) + return new_observation + +alice = Patient('Alice') +print(alice) + +obs = alice.add_observation(3) +print(obs) + +bob = Person('Bob') +print(bob) + +obs = bob.add_observation(4) +print(obs) +~~~ +{: .language-python} + +~~~ +Alice +3 +Bob +AttributeError: 'Person' object has no attribute 'add_observation' +~~~ +{: .output} + +As expected, an error is thrown because we cannot add an observation to `bob`, +who is a Person but not a Patient. + +We see in the example above that to say that a class inherits from another, +we put the **parent class** (or **superclass**) in brackets after the name of the **subclass**. + +There's something else we need to add as well - +Python doesn't automatically call the `__init__` method on the parent class +if we provide a new `__init__` for our subclass, +so we'll need to call it ourselves. +This makes sure that everything that needs to be initialised on the parent class has been, +before we need to use it. +If we don't define a new `__init__` method for our subclass, +Python will look for one on the parent class and use it automatically. +This is true of all methods - +if we call a method which doesn't exist directly on our class, +Python will search for it among the parent classes. +The order in which it does this search is known as the **method resolution order** - +a little more on this in the Multiple Inheritance callout below. + +The line `super().__init__(name)` gets the parent class, +then calls the `__init__` method, +providing the `name` variable that `Person.__init__` requires. +This is quite a common pattern, particularly for `__init__` methods, +where we need to make sure an object is initialised as a valid `X`, +before we can initialise it as a valid `Y` - +e.g. a valid `Person` must have a name, +before we can properly initialise a `Patient` model with their inflammation data. + + +> ## Composition vs Inheritance +> +> When deciding how to implement a model of a particular system, +> you often have a choice of either composition or inheritance, +> where there is no obviously correct choice. +> For example, it's not obvious whether a photocopier *is a* printer and *is a* scanner, +> or *has a* printer and *has a* scanner. +> +> ~~~ +> class Machine: +> pass +> +> class Printer(Machine): +> pass +> +> class Scanner(Machine): +> pass +> +> class Copier(Printer, Scanner): +> # Copier `is a` Printer and `is a` Scanner +> pass +> ~~~ +> {: .language-python} +> +> ~~~ +> class Machine: +> pass +> +> class Printer(Machine): +> pass +> +> class Scanner(Machine): +> pass +> +> class Copier(Machine): +> def __init__(self): +> # Copier `has a` Printer and `has a` Scanner +> self.printer = Printer() +> self.scanner = Scanner() +> ~~~ +> {: .language-python} +> +> Both of these would be perfectly valid models and would work for most purposes. +> However, unless there's something about how you need to use the model +> which would benefit from using a model based on inheritance, +> it's usually recommended to opt for **composition over inheritance**. +> This is a common design principle in the object oriented paradigm and is worth remembering, +> as it's very common for people to overuse inheritance once they've been introduced to it. +> +> For much more detail on this see the +> [Python Design Patterns guide](https://python-patterns.guide/gang-of-four/composition-over-inheritance/). +{: .callout} + +> ## Multiple Inheritance +> +> **Multiple Inheritance** is when a class inherits from more than one direct parent class. +> It exists in Python, but is often not present in other Object Oriented languages. +> Although this might seem useful, like in our inheritance-based model of the photocopier above, +> it's best to avoid it unless you're sure it's the right thing to do, +> due to the complexity of the inheritance heirarchy. +> Often using multiple inheritance is a sign you should instead be using composition - +> again like the photocopier model above. +{: .callout} + + +> ## Exercise: A Model Patient +> +> Let's use what we have learnt in this episode and combine it with what we have learnt on +> [software requirements](../31-software-requirements/index.html) +> to formulate and implement a +> [few new solution requirements](../31-software-requirements/index.html#exercise-new-solution-requirements) +> to extend the model layer of our clinical trial system. +> +> Let's start with extending the system such that there must be +> a `Doctor` class to hold the data representing a single doctor, which: +> +> - must have a `name` attribute +> - must have a list of patients that this doctor is responsible for. +> +> In addition to these, try to think of an extra feature you could add to the models +> which would be useful for managing a dataset like this - +> imagine we're running a clinical trial, what else might we want to know? +> Try using Test Driven Development for any features you add: +> write the tests first, then add the feature. +> The tests have been started for you in `tests/test_patient.py`, +> but you will probably want to add some more. +> +> Once you've finished the initial implementation, do you have much duplicated code? +> Is there anywhere you could make better use of composition or inheritance +> to improve your implementation? +> +> For any extra features you've added, +> explain them and how you implemented them to your neighbour. +> Would they have implemented that feature in the same way? +> +> Time: 20 min +> > ## Solution +> > One example solution is shown below. +> > You may start by writing some tests (that will initially fail), +> > and then develop the code to satisfy the new requirements and pass the tests. +> > ~~~ +> > # file: tests/test_patient.py +> > """Tests for the Patient model.""" +> > +> > def test_create_patient(): +> > """Check a patient is created correctly given a name.""" +> > from inflammation.models import Patient +> > name = 'Alice' +> > p = Patient(name=name) +> > assert p.name == name +> > +> > def test_create_doctor(): +> > """Check a doctor is created correctly given a name.""" +> > from inflammation.models import Doctor +> > name = 'Sheila Wheels' +> > doc = Doctor(name=name) +> > assert doc.name == name +> > +> > def test_doctor_is_person(): +> > """Check if a doctor is a person.""" +> > from inflammation.models import Doctor, Person +> > doc = Doctor("Sheila Wheels") +> > assert isinstance(doc, Person) +> > +> > def test_patient_is_person(): +> > """Check if a patient is a person. """ +> > from inflammation.models import Patient, Person +> > alice = Patient("Alice") +> > assert isinstance(alice, Person) +> > +> > def test_patients_added_correctly(): +> > """Check patients are being added correctly by a doctor. """ +> > from inflammation.models import Doctor, Patient +> > doc = Doctor("Sheila Wheels") +> > alice = Patient("Alice") +> > doc.add_patient(alice) +> > assert doc.patients is not None +> > assert len(doc.patients) == 1 +> > +> > def test_no_duplicate_patients(): +> > """Check adding the same patient to the same doctor twice does not result in duplicates. """ +> > from inflammation.models import Doctor, Patient +> > doc = Doctor("Sheila Wheels") +> > alice = Patient("Alice") +> > doc.add_patient(alice) +> > doc.add_patient(alice) +> > assert len(doc.patients) == 1 +> > ... +> > ~~~ +> > {: .language-python} +> > +> > ~~~ +> > # file: inflammation/models.py +> > ... +> > class Person: +> > """A person.""" +> > def __init__(self, name): +> > self.name = name +> > +> > def __str__(self): +> > return self.name +> > +> > class Patient(Person): +> > """A patient in an inflammation study.""" +> > def __init__(self, name): +> > super().__init__(name) +> > self.observations = [] +> > +> > def add_observation(self, value, day=None): +> > if day is None: +> > try: +> > day = self.observations[-1].day + 1 +> > except IndexError: +> > day = 0 +> > new_observation = Observation(day, value) +> > self.observations.append(new_observation) +> return new_observation +> > +> > class Doctor(Person): +> > """A doctor in an inflammation study.""" +> > def __init__(self, name): +> > super().__init__(name) +> > self.patients = [] +> > +> > def add_patient(self, new_patient): +> > # A crude check by name if this patient is already looked after +> > # by this doctor before adding them +> > for patient in self.patients: +> > if patient.name == new_patient.name: +> > return +> > self.patients.append(new_patient) +> > ... +> > ~~~ +> {: .language-python} +> {: .solution} +{: .challenge} + +{% include links.md %} + diff --git a/_extras/software-architecture-paradigms.md b/_extras/software-architecture-paradigms.md index 7e8f99c2d..359bc25e6 100644 --- a/_extras/software-architecture-paradigms.md +++ b/_extras/software-architecture-paradigms.md @@ -16,43 +16,9 @@ keypoints: - "A single piece of software will often contain instances of multiple paradigms." --- -## Introduction - -As a piece of software grows, -it will reach a point where there's too much code for us to keep in mind at once. -At this point, it becomes particularly important that the software be designed sensibly. -What should be the overall structure of our software, -how should all the pieces of functionality fit together, -and how should we work towards fulfilling this overall design throughout development? - -It's not easy to come up with a complete definition for the term **software design**, -but some of the common aspects are: - -- **Algorithm design** - - what method are we going to use to solve the core business problem? -- **Software architecture** - - what components will the software have and how will they cooperate? -- **System architecture** - - what other things will this software have to interact with and how will it do this? -- **UI/UX** (User Interface / User Experience) - - how will users interact with the software? - -As usual, the sooner you adopt a practice in the lifecycle of your project, the easier it will be. -So we should think about the design of our software from the very beginning, -ideally even before we start writing code - -but if you didn't, it's never too late to start. - -The answers to these questions will provide us with some **design constraints** -which any software we write must satisfy. -For example, a design constraint when writing a mobile app would be -that it needs to work with a touch screen interface - -we might have some software that works really well from the command line, -but on a typical mobile phone there isn't a command line interface that people can access. - ## Software Architecture -At the beginning of this episode we defined **software architecture** -as an answer to the question +**Software architecture** provides an answer to the question "what components will the software have and how will they cooperate?". Software engineering borrowed this term, and a few other terms, from architects (of buildings) as many of the processes and techniques have some similarities. @@ -216,7 +182,7 @@ we also gain the ability to run many operations in parallel as it's guaranteed that each operation won't interact with any of the others - this is essential if we want to process this much data in a reasonable amount of time. -You can read more in an [Extras episode on Functional Programming](/functional-programming/index.html). +You can read more in an [extra episode on Functional Programming](/functional-programming/index.html). ### Object Oriented Programming @@ -244,7 +210,7 @@ Most people would classify Object Oriented Programming as an (with the extra feature being the objects), but [others disagree](https://stackoverflow.com/questions/38527078/what-is-the-difference-between-imperative-and-object-oriented-programming). -You can read more in an [Extras episode on Object Oriented Programming](/object-oriented-programming/index.html). +You can read more in an [extra episode on Object Oriented Programming](/object-oriented-programming/index.html). > ## So Which one is Python? > Python is a multi-paradigm and multi-purpose programming language. @@ -273,7 +239,7 @@ for much more information see the Wikipedia's page on We have mainly used Procedural Programming in this lesson, but you can have a closer look at [Functional](/functional-programming/index.html) and [Object Oriented Programming](/object-oriented-programming/index.html) paradigms -in Extras episodes and how they can affect our architectural design choices. +in extra episodes and how they can affect our architectural design choices. {% include links.md %} From e56693c311775514bd9112712579e16adb08616a Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Thu, 29 Feb 2024 17:20:18 +0000 Subject: [PATCH 089/105] Review of episode on architecture --- _config.yml | 3 +- _episodes/32-software-design.md | 55 +- _episodes/34-code-abstractions.md | 6 +- _episodes/35-software-architecture.md | 577 ++++++++++++------ ...-paradigms.md => programming-paradigms.md} | 79 +-- _extras/software-architecture-extra.md | 77 +++ 6 files changed, 524 insertions(+), 273 deletions(-) rename _extras/{software-architecture-paradigms.md => programming-paradigms.md} (71%) create mode 100644 _extras/software-architecture-extra.md diff --git a/_config.yml b/_config.yml index 6ab746107..330dba1b2 100644 --- a/_config.yml +++ b/_config.yml @@ -95,7 +95,8 @@ extras_order: - discuss - protect-main-branch - vscode - - software-architecture-paradigms + - software-architecture-extra + - programing-paradigms - functional-programming - object-oriented-programming - persistence diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 4db957407..03ded01d2 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -16,6 +16,7 @@ testable through a set of automated tests, adaptable to new requirements." the easier the development and maintenance process will." --- + ## Introduction Ideally, we should have at least a rough design of our software sketched out @@ -90,17 +91,35 @@ goal of having *maintainable* code, which is: Now that we know what goals we should aspire to, let us take a critical look at the code in our software project and try to identify ways in which it can be improved. +Our software project contains a branch `full-data-analysis` with code for a new feature of our +inflammation analysis software. Recall that you can see all your branches as follows: +~~~ +$ git branch --all +~~~ +{: .language-bash} + +Let's checkout a new local branch from the `full-data-analysis` branch, making sure we +have saved and committed all current changes before doing so. + +~~~ +git checkout -b full-data-analysis +~~~ +{: .language-bash} + +This new feature enables user to pass a new command-line parameter `--full-data-analysis` causing +the software to find the directory containing the first input data file (provided via command line +parameter `infiles`) and invoke the data analysis over all the data files in that directory. +This bit of functionality is handled by `inflammation-analysis.py` in the project root. + +The new data analysis code is located in `compute_data.py` file within the `inflammation` directory +in a function called `analyse_data()`. +This function loads all the data files for a given a directory path, then +calculates and compares standard deviation across all the data by day and finaly plots a graph. + > ## Exercise: Identifying How Code Can be Improved? -> A team member has implemented a feature to our inflammation analysis software so that when a -> `--full-data-analysis` command line parameter parameter is passed, -> it scans the directory of one of the provided files, compares standard deviations across -> the data by day and plots a graph. -> The code is located in `compute_data.py` file within the `inflammation` project -> in a function called `analyse_data()`. -> -> Critically examine this new code. -> In what ways does this code not live up to the ideal properties -> of maintainable code? +> Critically examine the code in `analyse_data()` function in `compute_data.py` file. +> +> In what ways does this code not live up to the ideal properties of maintainable code? > Think about ways in which you find it hard to understand. > Think about the kinds of changes you might want to make to it, and what would > make making those changes challenging. @@ -111,16 +130,22 @@ software project and try to identify ways in which it can be improved. >> * **Hard to read:** everything is implemented in a single function. >> In order to understand it, you need to understand how file loading works at the same time as >> the analysis itself. ->> * **Hard to modify:** if you want to use the data without using the graph you would have to ->> change the function. ->> * **Hard to modify or test:** it is always analysing a fixed set of data stored on the disk. ->> * **Hard to modify:** it does not have any tests meaning changes may break something and it ->> would be hard to know what. +>> * **Hard to modify:** if you wanted to use the data for some other purpose and not just +>> plotting the graph you would have to change the `data_analysis()` function. +>> * **Hard to modify or test:** it is always analysing a fixed set of CSV data files +>> stored on a disk. +>> * **Hard to modify:** it does not have any tests so we cannot be 100% confident the code does +>> what it claims to do; any changes to the code may break something and it would be harder and +>> more time-consuming to figure out what. >> >> Make sure to keep the list you have created in the exercise above. >> For the remainder of this section, we will work on improving this code. >> At the end, we will revisit your list to check that you have learnt ways to address each of the >> problems you had found. +>> +>> There may be other things to improve with the code on this branch, e.g. how command line +>> parameters are being handled in `inflammation-analysis.py`, but we are focussing on +>> `analyse_data()` function for the time being. > {: .solution} {: .challenge} diff --git a/_episodes/34-code-abstractions.md b/_episodes/34-code-abstractions.md index fbd55d6e9..f49320652 100644 --- a/_episodes/34-code-abstractions.md +++ b/_episodes/34-code-abstractions.md @@ -426,12 +426,12 @@ All these different programming paradigms provide varied approaches to structuri each with certain strengths and weaknesses when used to solve particular types of problems. In many cases, particularly with modern languages, a single language can allow many different structural approaches and mixing programming paradigms within your code. -Once your software begins to get more complex - it is common to use aspects of [different paradigm](/software-architecture-paradigms/index.html) +Once your software begins to get more complex - it is common to use aspects of [different paradigm](/programming-paradigms/index.html) to handle different subtasks. -Because of this, it is useful to know about the [major paradigms](/software-architecture-paradigms/index.html), +Because of this, it is useful to know about the [major paradigms](/programming-paradigms/index.html), so you can recognise where it might be useful to switch. This is outside of scope of this course - we have some extra episodes on the topics of -[Procedural Programming](/software-architecture-paradigms/index.html#procedural-programming), +[Procedural Programming](/programming-paradigms/index.html#procedural-programming), [Functional Programming](/functional-programming/index.html) and [Object Oriented Programming](/object-oriented-programming/index.html) if you want to know more. diff --git a/_episodes/35-software-architecture.md b/_episodes/35-software-architecture.md index 3fad1388d..e96b0d8e0 100644 --- a/_episodes/35-software-architecture.md +++ b/_episodes/35-software-architecture.md @@ -3,94 +3,133 @@ title: "Software Architecture" teaching: 15 exercises: 50 questions: -- "What is the point of the MVC architecture" -- "How to design larger solutions." -- "How to tell what is and isn't an appropriate abstraction." +- "What is software architecture?" +- "What are components of Model-View-Controller (MVC) architecture?" objectives: -- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." -- "How to design large changes to the codebase." -- "Understand how to determine correct abstractions. " +- "Understand the use of common design patterns to improve the extensibility, reusability and +overall quality of software." +- "List some best practices when designing software." keypoints: -- "By splitting up the \"view\" code from \"model\" code, you allow easier re-use of code." -- "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." -- "Sketching a diagram of the code can clarify how it is supposed to work, and troubleshoot problems early." +- "Try to leave the code in a better state that you found it." --- -## Introduction -Separating Responsibilities - -Model-View-Controller (MVC) is a way of separating out different responsibilities of a typical -application. Specifically we have: - -* The **model** which is responsible for the internal data representations for the program, - and the valid operations that can be performed on it. -* The **view** is responsible for how this data is presented to the user (e.g. through a GUI or - by writing out to a file) -* The **controller** is responsible for how the model can be interacted with. - -Separating out these different responsibilities into different parts of the code will make -the code much more maintainable. -For example, if the view code is kept away from the model code, then testing the model code -can be done without having to worry about how it will be presented. - -It helps with readability, as it makes it easier to have each function doing -just one thing. - -It also helps with maintainability - if the UI requirements change, these changes -are easily isolated from the more complex logic. - -## Separating out responsibilities - -The key thing to take away from MVC is the distinction between model code and view code. - -> ## What about the controller -> The view and the controller tend to be more tightly coupled and it isn't always sensible -> to draw a thick line dividing these two. Depending on how the user interacts with the software -> this distinction may not be possible (the code that specifies there is a button on the screen, -> might be the same code that specifies what that button does). In fact, the original proposer -> of MVC groups the views and the controller into a single element, called the tool. Other modern -> architectures like Model-View-Presenter do away with the controller and instead separate out the -> layout code from a programmable view of the UI. -{: .callout} - -The view code might be hard to test, or use libraries to draw the UI, but should -not contain any complex logic, and is really just a presentation layer on top of the model. - -The model, conversely, should not really care how the data is displayed. -For example, perhaps the UI always presents dates as "Monday 24th July 2023", but the model -would still store this using a `Date` rather than just that string. - -> ## Exercise: Identify model and view parts of the code -> Looking at the code inside `compute_data.py`, -> -> * What parts should be considered **model** code -> * What parts should be considered **view** code? -> * What parts should be considered **controller** code? +## Software Architecture + +A software architecture is the fundamental structure of a software system +that is typically decided at the beginning of project development +based on its requirements and is not that easy to change once implemented. +It refers to a "bigger picture" of a software system +that describes high-level components (modules) of the system, what their functionality/roles are +and how they interact. + +There are various [software architectures](/software-architecture-extra/index.html) around defining different ways of +dividing the code into smaller modules with well defined roles that are outside the scope of +this course. +We have been developing our software using the **Model-View-Controller** (MVC) architecture, +but, MVC is just one of the common architectural patterns +and is not the only choice we could have made. + +### Model-View-Controller (MVC) Architecture +MVC architecture divides the related program logic +into three interconnected modules: + +- **Model** (data) +- **View** (client interface), and +- **Controller** (processes that handle input/output and manipulate the data). + +Model represents the data used by a program and also contains operations/rules +for manipulating and changing the data in the model. +This may be a database, a file, a single data object or a series of objects - +for example a table representing patients' data. + +View is the means of displaying data to users/clients within an application +(i.e. provides visualisation of the state of the model). +For example, displaying a window with input fields and buttons (Graphical User Interface, GUI) +or textual options within a command line (Command Line Interface, CLI) are examples of Views. +They include anything that the user can see from the application. +While building GUIs is not the topic of this course, +we do cover building CLIs (handling command line arguments) in Python to a certain extent. + +Controller manipulates both the Model and the View. +It accepts input from the View +and performs the corresponding action on the Model (changing the state of the model) +and then updates the View accordingly. +For example, on user request, +Controller updates a picture on a user's GitHub profile +and then modifies the View by displaying the updated profile back to the user. + +### Separation of Responsibilities + +Separation of responsibilities is important when designing software architectures +in order to reduce the code's complexity and increase its maintainability. +Note, however, there are limits to everything - +and MVC architecture is no exception. +Controller often transcends into Model and View +and a clear separation is sometimes difficult to maintain. +For example, the Command Line Interface provides both the View +(what user sees and how they interact with the command line) +and the Controller (invoking of a command) aspects of a CLI application. +In Web applications, Controller often manipulates the data (received from the Model) +before displaying it to the user or passing it from the user to the Model. + +There are many variants of an MVC-like pattern +(such as [Model-View-Presenter](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93presenter) (MVP), +[Model-View-Viewmodel](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel) (MVVM), etc.), +where the Controller role is handled slightly differently, +but in most cases, the distinction between these patterns is not particularly important. +What really matters is that we are making conscious decisions about the architecture of our software +that suit the way in which we expect to use it. +We should reuse these established ideas where we can, but we do not need to stick to them exactly. + +The key thing to take away is the distinction between the Model and the View code, while +the View and the Controller can be more or less coupled together (e.g. the code that specifies +there is a button on the screen, might be the same code that specifies what that button does). +The View may be hard to test, or use special libraries to draw the UI, but should not contain any +complex logic, and is really just a presentation layer on top of the Model. +The Model, conversely, should not care how the data is displayed. +For example, the View may present dates as "Monday 24th July 2023", +but the Model stores it using a `Date` object rather than its string representation. + +## Our Project's Architecture (Revisited) + +Recall that in our software project, the **Controller** module is in `inflammation-analysis.py`, +and the View and Model modules are contained in +`inflammation/views.py` and `inflammation/models.py`, respectively. +Data underlying the Model is contained within the directory `data`. + +Looking at the code in the branch `full-data-analysis` (where we should be currently located), +we can notice that the new code was added in a separate script `inflammation/compute_data.py` and +contains a mix of Model, View and Controller code. + +> ## Exercise: Identify Model, View and Controller Parts of the Code +> Looking at the code inside `compute_data.py`, what parts could be considered +> Model, View and Controller code? > >> ## Solution ->> * The computation of the standard deviation is **model** code ->> * Reading the data from the CSV is also **model** code. ->> * The display of the output as a graph is the **view** code. ->> * The logic that processes the supplied flats is the **controller**. +>> * Computing the standard deviation belongs to Model. +>> * Reading the data from CSV files also belongs to Model. +>> * Displaying of the output as a graph is View. +>> * The logic that processes the supplied files is Controller. > {: .solution} {: .challenge} -Within the model there is further separation that makes sense. -For example, as we did in the last episode, separating out the impure code that interacts with file systems from -the pure calculations is helps with readability and testability. -Nevertheless, the MVC approach is a great starting point when thinking about how you should structure your code. +Within the Model further separations make sense. +For example, as we did in the before, separating out the impure code that interacts with +the file system from the pure calculations helps with readability and testability. +Nevertheless, the MVC architectural pattern is a great starting point when thinking about +how you should structure your code. -> ## Exercise: Split out the model code from the view code -> Refactor `analyse_data` such the *view* code we identified in the last -> exercise is removed from the function, so the function contains only -> *model* code, and the *view* code is moved elsewhere. +> ## Exercise: Split out the Model, View and Controller Code +> Refactor `analyse_data()` function so that the Model, View and Controller code +> we identified in the previous exercise is moved to appropriate modules. >> ## Solution ->> The idea here is to have `analyse_data` to not have any "view" considerations. ->> That is, it should just compute and return the data. +>> The idea here is for the `analyse_data()` function not to have any "view" considerations. +>> That is, it should just compute and return the data and +>> should be located in `inflammation/models.py`. >> >> ```python ->> def analyse_data(data_dir): +>> def analyse_data(data_source): >> """Calculate the standard deviation by day between datasets >> Gets all the inflammation csvs within a directory, works out the mean >> inflammation value for each day across all datasets, then graphs the @@ -100,7 +139,8 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> >> return daily_standard_deviation >> ``` ->> There can be a separate bit of code that chooses how that should be presented, e.g. as a graph: +>> There can be a separate bit of code in the Controller `inflammation-analysis.py` +>> that chooses how data should be presented, e.g. as a graph: >> >> ```python >> if args.full_data_analysis: @@ -111,144 +151,321 @@ Nevertheless, the MVC approach is a great starting point when thinking about how >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) >> else: >> raise ValueError(f'Unsupported file format: {extension}') ->> analyse_data(data_source) +>> data_result = analyse_data(data_source) >> graph_data = { >> 'standard deviation by day': data_result, >> } >> views.visualize(graph_data) >> return >> ``` ->> You might notice this is more-or-less the change we did to write our ->> regression test. ->> This demonstrates that splitting up model code from view code can +>> Note that this is, more or less, the change we did to write our regression test. +>> This demonstrates that splitting up Model code from View code can >> immediately make your code much more testable. >> Ensure you re-run our regression test to check this refactoring has not ->> changed the output of `analyse_data`. +>> changed the output of `analyse_data()`. > {: .solution} {: .challenge} -## Programming patterns - -MVC is a **programming pattern**. Programming patterns are templates for structuring code. -Patterns are a useful starting point for how to design your software. -They also work as a common vocabulary for discussing software designs with -other developers. - -The Refactoring Guru website has a [list of programming patterns](https://refactoring.guru/design-patterns/catalog). -They aren't all good design decisions, and can certainly be over-applied, but learning about them can be helpful -for thinking at a big picture level about software design. - -For example, the [visitor pattern](https://refactoring.guru/design-patterns/visitor) is -a good way of separating the problem of how to move through the data -from a specific action you want to perform on the data. - -By having a terminology for these approaches can facilitate discussions -where everyone is familiar with them. -However, they cannot replace a full design as most problems will require -a bespoke design that maps cleanly on to the specific problem you are -trying to solve. - -## Architecting larger changes +At this point, you have refactored and tested all the code on branch `full-data-analysis` +and it is working as expected. The branch is ready to be incorporated into `develop` +and then, later on, `main`, which may also have been changed by other developers working on +the code at the same time so make sure to update accordingly or resolve any conflicts. + +~~~ +$ git switch develop +$ git merge full-data-analysis +~~~ +{: .language-bash} + +Let's now have a closer look at our Controller, and how can handling command line arguments in Python +(which is something you may find yourself doing often if you need to run the code from a +command line tool). + +### Controller Structure + +You will have noticed already that structure of the `inflammation-analysis.py` file +follows this pattern: + +~~~ +# import modules + +def main(args): + # perform some actions + +if __name__ == "__main__": + # perform some actions before main() + main(args) +~~~ +{: .language-python} + +In this pattern the actions performed by the script are contained within the `main` function +(which does not need to be called `main`, +but using this convention helps others in understanding your code). +The `main` function is then called within the `if` statement `__name__ == "__main__"`, +after some other actions have been performed +(usually the parsing of command-line arguments, which will be explained below). +`__name__` is a special dunder variable which is set, +along with a number of other special dunder variables, +by the python interpreter before the execution of any code in the source file. +What value is given by the interpreter to `__name__` is determined by +the manner in which it is loaded. + +If we run the source file directly using the Python interpreter, e.g.: + +~~~ +$ python3 inflammation-analysis.py +~~~ +{: .language-bash} + +then the interpreter will assign the hard-coded string `"__main__"` to the `__name__` variable: + +~~~ +__name__ = "__main__" +... +# rest of your code +~~~ +{: .language-python} + +However, if your source file is imported by another Python script, e.g: + +~~~ +import inflammation-analysis +~~~ +{: .language-python} + +then the interpreter will assign the name `"inflammation-analysis"` +from the import statement to the `__name__` variable: + +~~~ +__name__ = "inflammation-analysis" +... +# rest of your code +~~~ +{: .language-python} + +Because of this behaviour of the interpreter, +we can put any code that should only be executed when running the script +directly within the `if __name__ == "__main__":` structure, +allowing the rest of the code within the script to be +safely imported by another script if we so wish. + +While it may not seem very useful to have your controller script importable by another script, +there are a number of situations in which you would want to do this: + +- for testing of your code, you can have your testing framework import the main script, + and run special test functions which then call the `main` function directly; +- where you want to not only be able to run your script from the command-line, + but also provide a programmer-friendly application programming interface (API) for advanced users. + +### Passing Command-line Options to Controller + +The standard Python library for reading command line arguments passed to a script is +[`argparse`](https://docs.python.org/3/library/argparse.html). +This module reads arguments passed by the system, +and enables the automatic generation of help and usage messages. +These include, as we saw at the start of this course, +the generation of helpful error messages when users give the program invalid arguments. + +The basic usage of `argparse` can be seen in the `inflammation-analysis.py` script. +First we import the library: + +~~~ +import argparse +~~~ +{: .language-python} + +We then initialise the argument parser class, passing an (optional) description of the program: + +~~~ +parser = argparse.ArgumentParser( + description='A basic patient inflammation data management system') +~~~ +{: .language-python} + +Once the parser has been initialised we can add +the arguments that we want argparse to look out for. +In our basic case, we want only the names of the file(s) to process: + +~~~ +parser.add_argument( + 'infiles', + nargs='+', + help='Input CSV(s) containing inflammation series for each patient') +~~~ +{: .language-python} + +Here we have defined what the argument will be called (`'infiles'`) when it is read in; +the number of arguments to be expected +(`nargs='+'`, where `'+'` indicates that there should be 1 or more arguments passed); +and a help string for the user +(`help='Input CSV(s) containing inflammation series for each patient'`). + +You can add as many arguments as you wish, +and these can be either mandatory (as the one above) or optional. +Most of the complexity in using `argparse` is in adding the correct argument options, +and we will explain how to do this in more detail below. + +Finally we parse the arguments passed to the script using: + +~~~ +args = parser.parse_args() +~~~ +{: .language-python} + +This returns an object (that we have called `args`) containing all the arguments requested. +These can be accessed using the names that we have defined for each argument, +e.g. `args.infiles` would return the filenames that have been input. + +The help for the script can be accessed using the `-h` or `--help` optional argument +(which `argparse` includes by default): + +~~~ +$ python3 inflammation-analysis.py --help +~~~ +{: .language-bash} + +~~~ +usage: inflammation-analysis.py [-h] infiles [infiles ...] + +A basic patient inflammation data management system + +positional arguments: + infiles Input CSV(s) containing inflammation series for each patient + +optional arguments: + -h, --help show this help message and exit +~~~ +{: .output} + +The help page starts with the command line usage, +illustrating what inputs can be given (any within `[]` brackets are optional). +It then lists the **positional** and **optional** arguments, +giving as detailed a description of each as you have added to the `add_argument()` command. +Positional arguments are arguments that need to be included +in the proper position or order when calling the script. + +Note that optional arguments are indicated by `-` or `--`, followed by the argument name. +Positional arguments are simply inferred by their position. +It is possible to have multiple positional arguments, +but usually this is only practical where all (or all but one) positional arguments +contains a clearly defined number of elements. +If more than one option can have an indeterminate number of entries, +then it is better to create them as 'optional' arguments. +These can be made a required input though, +by setting `required = True` within the `add_argument()` command. + +> ## Positional and Optional Argument Order +> +> The usage section of the help page above shows +> the optional arguments going before the positional arguments. +> This is the customary way to present options, but is not mandatory. +> Instead there are two rules which must be followed for these arguments: +> +> 1. Positional and optional arguments must each be given all together, and not inter-mixed. + For example, the order can be either "optional, positional" or "positional, optional", + but not "optional, positional, optional". +> 2. Positional arguments must be given in the order that they are shown +in the usage section of the help page. +{: .callout} -When creating a new application, or creating a substantial change to an existing one, -it can be really helpful to sketch out the intended architecture on a whiteboard -(pen and paper works too, though of course it might get messy as you iterate on the design!). +## Architecting Software +When designing a new software application, or making a substantial change to an existing one, +it can be really helpful to sketch out the intended architecture. The basic idea is you draw boxes that will represent different units of code, as well as -other components of the system (such as users, databases etc). +other components of the system (such as users, databases, etc). Then connect these boxes with lines where information or control will be exchanged. These lines represent the interfaces in your system. As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. For example, if there is a circular dependency between two sections of the design. -It can also help with estimating how long the work will take, as it forces you to consider all the components that -need to be made. - -Diagrams aren't foolproof, and often the stuff we haven't considered won't make it on to the diagram -but they are a great starting point to break down the different responsibilities and think about -the kinds of information different parts of the system will need. +It can also help with estimating how long the work will take, as it forces you to consider all +the components that need to be made. +Diagrams are not foolproof, but are a great starting point to break down the different +responsibilities and think about the kinds of information different parts of the system will need. -> ## Exercise: Design a high-level architecture -> Sketch out a design for a new feature requested by a user +> ## Exercise: Design a High-Level Architecture for a New Requirement +> Sketch out an architectural design for a new feature requested by a user. > -> *"I want there to be a Google Drive folder that when I upload new inflammation data to +> *"I want there to be a Google Drive folder such that when I upload new inflammation data to it, > the software automatically pulls it down and updates the analysis. > The new result should be added to a database with a timestamp. -> An email should then be sent to a group email notifying them of the change."* +> An email should then be sent to a group mailing list notifying them of the change."* +> +> You can draw by hand on a piece of paper or whiteboard, or use an online drawing tool +> such as [Excalidraw](https://excalidraw.com/). >> ## Solution >> ->> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg) +>> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg){: width="600px" } > {: .solution} {: .challenge} -## An abstraction too far - -So far we have seen how abstractions are good for making code easier to read, maintain and test. -However, it is possible to introduce too many abstractions. - -> All problems in computer science can be solved by another level of indirection except the problem of too many levels of indirection - -When you introduce an abstraction, if the reader of the code needs to understand what is happening inside the abstraction, -it has actually made the code *harder* to read. -When code is just in the function, it can be clear to see what it is doing. -When the code is calling out to an instance of a class that, thanks to polymorphism, could be a range of possible implementations, -the only way to find out what is *actually* being called is to run the code and see. -This is much slower to understand, and actually obfuscates meaning. - -It is a judgement as to whether you have make the code too abstract. -If you have to jump around a lot when reading the code that is a clue that is too abstract. -Similarly, if there are two parts of the code that always need updating together, that is -again an indication of an incorrect or over-zealous abstraction. - +### Architectural & Programming Patterns -## You Ain't Gonna Need It +[Architectural]((https://www.redhat.com/architect/14-software-architecture-patterns)) and +[programming patterns](https://refactoring.guru/design-patterns/catalog) are reusable templates for +software systems and code that provide solutions for some common software design challenges. +MVC is one architectural pattern. +Patterns are a useful starting point for how to design your software and also provide +a common vocabulary for discussing software designs with other developers. +They may not always provide a full design solution as some problems may require +a bespoke design that maps cleanly on to the specific problem you are trying to solve. -There are different approaches to designing software. -One principle that is popular is called You Ain't Gonna Need it - "YAGNI" for short. -The idea is that, since it is hard to predict the future needs of a piece of software, -it is always best to design the simplest solution that solves the problem at hand. -This is opposed to trying to imagine how you might want to adapt the software in future -and designing the code with that in mind. +### Design Guidelines -Then, since you know the problem you are trying to solve, you can avoid making your solution unnecessarily complex or abstracted. - -In our example, it might be tempting to abstract how the `CSVDataSource` walks the file tree into a class. -However, since we only have one strategy for exploring the file tree, this would just create indirection for the sake of it -- now a reader of CSVDataSource would have to read a different class to find out how the tree is walked. -Maybe in the future this is something that needs to be customised, but we haven't really made it any harder to do by *not* doing this prematurely -and once we have the concrete feature request, it will be easier to design it appropriately. - -> All of this is a judgement. -> For example, in this case, perhaps it *would* make sense to at least pull the file parsing out into a separate -> class, but not have the CSVDataSource be configurable. -> That way, it is clear to see how the file tree is being walked (there's no polymorphism going on) -> without mixing the *parsing* code in with the file finding code. -> There are no right answers, just guidelines. -{: .callout} - -> ## Exercise: Applying to real world examples -> Thinking about the examples of good and bad code you identified at the start of the episode. -> Identify what kind of principles were and weren't being followed -> Identify some refactorings that could be performed that would improve the code -> Discuss the ideas as a group. -{: .challenge} - -## Conclusion - -Good architecture is not about applying any rules blindly, but instead practise and taking care around important things: +Creating good software architecture is not about applying any rules or patterns blindly, +but instead practise and taking care to: +* Discuss design with your colleagues before writing the code. +* Separate different concerns into different sections of the code. * Avoid duplication of code or data. -* Keeping how much a person has to understand at once to a minimum. -* Think about how interfaces will work. -* Separate different considerations into different sections of the code. -* Don't try and design a future proof solution, focus on the problem at hand. - -Practise makes perfect. -One way to practise is to consider code that you already have and think how it might be redesigned. -Another way is to always try to leave code in a better state that you found it. -So when you're working on a less well structured part of the code, start by refactoring it so that your change fits in cleanly. -Doing this, over time, with your colleagues, will improve your skills as software architecture as well as improving the code. - +* Keep how much a person has to understand at once to a minimum. +* Try not to have too many abstractions (if you have to jump around a lot when reading the +code that is a clue that your code may be too abstract). +* Think about how interfaces will work (?). +* Not try to design a future-proof solution or to anticipate future requirements or adaptations +of the software - design the simplest solution that solves the problem at hand. +* (When working on a less well-structured part of the code), start by refactoring it so that your +change fits in cleanly. +* Try to leave the code in a better state that you found it. + + +### Additional Reading Material & References + +Now that we have covered the basics of [software architecture](/software-architecture-extra/index.html) +and [different programming paradigms](/programming-paradigms/index.html) +and how we can integrate them into our multi-layer architecture, +there are two optional extra episodes which you may find interesting. + +Both episodes cover the persistence layer of software architectures +and methods of persistently storing data, but take different approaches. +The episode on [persistence with JSON](../persistence) covers +some more advanced concepts in Object Oriented Programming, while +the episode on [databases](../databases) starts to build towards a true multilayer architecture, +which would allow our software to handle much larger quantities of data. + +## Towards Collaborative Software Development + +Having looked at some aspects of software design and architecture, +we are now circling back to implementing our software design +and developing our software to satisfy the requirements collaboratively in a team. +At an intermediate level of software development, +there is a wealth of practices that could be used, +and applying suitable design and coding practices is what separates +an intermediate developer from someone who has just started coding. +The key for an intermediate developer is to balance these concerns +for each software project appropriately, +and employ design and development practices enough so that progress can be made. + +One practice that should always be considered, +and has been shown to be very effective in team-based software development, +is that of *code review*. +Code reviews help to ensure the 'good' coding standards are achieved +and maintained within a team by having multiple people +have a look and comment on key code changes to see how they fit within the codebase. +Such reviews check the correctness of the new code, test coverage, functionality changes, +and confirm that they follow the coding guides and best practices. +Let's have a look at some code review techniques available to us. {% include links.md %} diff --git a/_extras/software-architecture-paradigms.md b/_extras/programming-paradigms.md similarity index 71% rename from _extras/software-architecture-paradigms.md rename to _extras/programming-paradigms.md index 359bc25e6..25c0175e0 100644 --- a/_extras/software-architecture-paradigms.md +++ b/_extras/programming-paradigms.md @@ -1,13 +1,11 @@ --- -title: "Software Architecture and Programming Paradigms" -teaching: 30 +title: "Programming Paradigms" +teaching: 20 exercises: 0 layout: episode questions: - "What should we consider when designing software?" objectives: -- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." -- "Understand the components of multi-layer software architectures." - "Describe some of the major software paradigms we can use to classify programming languages." keypoints: - "A software paradigm describes a way of structuring or reasoning about code." @@ -16,73 +14,9 @@ keypoints: - "A single piece of software will often contain instances of multiple paradigms." --- -## Software Architecture - -**Software architecture** provides an answer to the question -"what components will the software have and how will they cooperate?". -Software engineering borrowed this term, and a few other terms, -from architects (of buildings) as many of the processes and techniques have some similarities. -One of the other important terms we borrowed is 'pattern', -such as in **design patterns** and **architecture patterns**. -This term is often attributed to the book -['A Pattern Language' by Christopher Alexander *et al.*](https://en.wikipedia.org/wiki/A_Pattern_Language) -published in 1977 -and refers to a template solution to a problem commonly encountered when building a system. - -Design patterns are relatively small-scale templates -which we can use to solve problems which affect a small part of our software. -For example, the **[adapter pattern](https://en.wikipedia.org/wiki/Adapter_pattern)** -(which allows a class that does not have the "right interface" to be reused) -may be useful if part of our software needs to consume data -from a number of different external data sources. -Using this pattern, -we can create a component whose responsibility is -transforming the calls for data to the expected format, -so the rest of our program doesn't have to worry about it. - -Architecture patterns are similar, -but larger scale templates which operate at the level of whole programs, -or collections or programs. -Model-View-Controller (which we chose for our project) is one of the best known architecture patterns. -Many patterns rely on concepts from Object Oriented Programming, -so we'll come back to the MVC pattern shortly -after we learn a bit more about Object Oriented Programming. - -There are many online sources of information about design and architecture patterns, -often giving concrete examples of cases where they may be useful. -One particularly good source is [Refactoring Guru](https://refactoring.guru/design-patterns). - -### Multilayer Architecture - -One common architectural pattern for larger software projects is **Multilayer Architecture**. -Software designed using this architecture pattern is split into layers, -each of which is responsible for a different part of the process of manipulating data. - -Often, the software is split into three layers: - -- **Presentation Layer** - - This layer is responsible for managing the interaction between - our software and the people using it - - May include the **View** components if also using the MVC pattern -- **Application Layer / Business Logic Layer** - - This layer performs most of the data processing required by the presentation layer - - Likely to include the **Controller** components if also using an MVC pattern - - May also include the **Model** components -- **Persistence Layer / Data Access Layer** - - This layer handles data storage and provides data to the rest of the system - - May include the **Model** components of an MVC pattern - if they're not in the application layer - -Although we've drawn similarities here between the layers of a system and the components of MVC, -they're actually solutions to different scales of problem. -In a small application, a multilayer architecture is unlikely to be necessary, -whereas in a very large application, -the MVC pattern may be used just within the presentation layer, -to handle getting data to and from the people using the software. - ## Programming Paradigms -In addition to architectural decisions on bigger components of your code, it is important +In addition to [architectural decisions](/software-architecture-extra/index.html) on bigger components of your code, it is important to understand the wider landscape of programming paradigms and languages, with each supporting at least one way to approach a problem and structure your code. In many cases, particularly with modern languages, @@ -237,11 +171,8 @@ for much more information see the Wikipedia's page on [programming paradigms](https://en.wikipedia.org/wiki/Programming_paradigm). We have mainly used Procedural Programming in this lesson, but you can -have a closer look at [Functional](/functional-programming/index.html) and -[Object Oriented Programming](/object-oriented-programming/index.html) paradigms +have a closer look at [Functional](/functional-programming/index.html) and +[Object Oriented Programming](/object-oriented-programming/index.html) paradigms in extra episodes and how they can affect our architectural design choices. {% include links.md %} - - -{% include links.md %} diff --git a/_extras/software-architecture-extra.md b/_extras/software-architecture-extra.md new file mode 100644 index 000000000..5197d7d5b --- /dev/null +++ b/_extras/software-architecture-extra.md @@ -0,0 +1,77 @@ +--- +title: "Software Architecture" +teaching: 15 +exercises: 0 +layout: episode +questions: +- "What should we consider when designing software?" +objectives: +- "Understand the components of multi-layer software architectures." +keypoints: +- "Software architecture provides an answer to the question +'what components will the software have and how will they cooperate?'." +--- + +## Software Architecture + +**Software architecture** provides an answer to the question +"what components will the software have and how will they cooperate?". +Software engineering borrowed this term, and a few other terms, +from architects (of buildings) as many of the processes and techniques have some similarities. +One of the other important terms we borrowed is 'pattern', +such as in **design patterns** and **architecture patterns**. +This term is often attributed to the book +['A Pattern Language' by Christopher Alexander *et al.*](https://en.wikipedia.org/wiki/A_Pattern_Language) +published in 1977 +and refers to a template solution to a problem commonly encountered when building a system. + +Design patterns are relatively small-scale templates +which we can use to solve problems which affect a small part of our software. +For example, the **[adapter pattern](https://en.wikipedia.org/wiki/Adapter_pattern)** +(which allows a class that does not have the "right interface" to be reused) +may be useful if part of our software needs to consume data +from a number of different external data sources. +Using this pattern, +we can create a component whose responsibility is +transforming the calls for data to the expected format, +so the rest of our program doesn't have to worry about it. + +Architecture patterns are similar, +but larger scale templates which operate at the level of whole programs, +or collections or programs. +Model-View-Controller (which we chose for our project) is one of the best known architecture patterns. +Many patterns rely on concepts from [Object Oriented Programming](/object-oriented-programming/index.html). + +There are many online sources of information about design and architecture patterns, +often giving concrete examples of cases where they may be useful. +One particularly good source is [Refactoring Guru](https://refactoring.guru/design-patterns). + +### Multilayer Architecture + +One common architectural pattern for larger software projects is **Multilayer Architecture**. +Software designed using this architecture pattern is split into layers, +each of which is responsible for a different part of the process of manipulating data. + +Often, the software is split into three layers: + +- **Presentation Layer** + - This layer is responsible for managing the interaction between + our software and the people using it + - May include the **View** components if also using the MVC pattern +- **Application Layer / Business Logic Layer** + - This layer performs most of the data processing required by the presentation layer + - Likely to include the **Controller** components if also using an MVC pattern + - May also include the **Model** components +- **Persistence Layer / Data Access Layer** + - This layer handles data storage and provides data to the rest of the system + - May include the **Model** components of an MVC pattern + if they're not in the application layer + +Although we've drawn similarities here between the layers of a system and the components of MVC, +they're actually solutions to different scales of problem. +In a small application, a multilayer architecture is unlikely to be necessary, +whereas in a very large application, +the MVC pattern may be used just within the presentation layer, +to handle getting data to and from the people using the software. + +{% include links.md %} From 1efc7916c926d5473dc927de92cfc8b739ae63f4 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Fri, 1 Mar 2024 10:28:09 +0000 Subject: [PATCH 090/105] Further updates the episode on architecture --- _episodes/32-software-design.md | 13 ++----------- _episodes/33-code-refactoring.md | 12 +++++++++++- _episodes/34-code-abstractions.md | 19 +++++++++++++++++-- 3 files changed, 30 insertions(+), 14 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 03ded01d2..8d91a74b0 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -16,7 +16,6 @@ testable through a set of automated tests, adaptable to new requirements." the easier the development and maintenance process will." --- - ## Introduction Ideally, we should have at least a rough design of our software sketched out @@ -119,7 +118,7 @@ calculates and compares standard deviation across all the data by day and finaly > ## Exercise: Identifying How Code Can be Improved? > Critically examine the code in `analyse_data()` function in `compute_data.py` file. > -> In what ways does this code not live up to the ideal properties of maintainable code? +> In what ways does this code not live up to the ideal properties of 'good' code? > Think about ways in which you find it hard to understand. > Think about the kinds of changes you might want to make to it, and what would > make making those changes challenging. @@ -195,20 +194,13 @@ the entire codebase at once. ### Code Refactoring -*Code refactoring* is the process of improving the design of existing code - +*Code refactoring* is the process of improving the design of an existing code - changing the internal structure of code without changing its external behavior, with the goal of making the code more readable, maintainable, efficient or easier to test. This can include things such as renaming variables, reorganising functions to avoid code duplication and increase reuse, and simplifying conditional statements. -When faced with an existing piece of code that needs modifying a good refactoring -process to follow is: - -1. Make sure you have tests that verify the current behaviour -2. Refactor the code -3. Verify that that the behaviour of the code is identical to that before refactoring. - ### Code Decoupling *Code decoupling* is a code design technique that involves breaking a (complex) @@ -217,7 +209,6 @@ between these different parts of the system. This means that a change in one part of the code usually does not require a change in the other, thereby making its development more efficient and less error prone. - ### Code Abstraction *Abstraction* is the process of hiding the implementation details of a piece of diff --git a/_episodes/33-code-refactoring.md b/_episodes/33-code-refactoring.md index 6bd74c057..ae152dfed 100644 --- a/_episodes/33-code-refactoring.md +++ b/_episodes/33-code-refactoring.md @@ -5,11 +5,12 @@ exercises: 20 questions: - "How do you refactor code without breaking it?" - "What is decoupled code?" -- "What are benefits of pure functions?" +- "What are benefits of using pure functions in our code?" objectives: - "Understand the benefits of code decoupling." - "Understand the use of regressions tests to avoid breaking existing code when refactoring." - "Understand the use of pure functions in software design to make the code easier to test." +- "Refactor a piece of code to separate out 'pure' from 'impure' code." keypoints: - "Implementing regression tests before refactoring gives you confidence that your changes have not broken the code." @@ -19,6 +20,8 @@ to read, test and maintain." ## Introduction +*Code refactoring* is the process of improving the design of an existing code - for example +to make it more decoupled. Recall that *code decoupling* means breaking the system into smaller components and reducing the interdependence between these components, so that they can be tested and maintained independently. Two components of code can be considered **decoupled** if a change in one does not @@ -33,6 +36,13 @@ is something we should aim for. Benefits of decoupled code include: * code tends to be easier to maintain, as changes can be isolated from other parts of the code. +When faced with an existing piece of code that needs modifying a good refactoring +process to follow is: + +1. Make sure you have tests that verify the current behaviour +2. Refactor the code +3. Verify that that the behaviour of the code is identical to that before refactoring. + In this episode we will refactor the function `analyse_data()` in `compute_data.py` from our project in the following two ways: * add more tests so we can be more confident that future changes will have the diff --git a/_episodes/34-code-abstractions.md b/_episodes/34-code-abstractions.md index f49320652..bb078e26e 100644 --- a/_episodes/34-code-abstractions.md +++ b/_episodes/34-code-abstractions.md @@ -7,10 +7,15 @@ questions: - "How can we make sure the components of our software are reusable?" objectives: - "Introduce appropriate abstractions to simplify code." -- "Understand the principles of polymorphism and interfaces." -- "Be able to use mocks to replace a class in test code." +- "Understand the principles of encapsulation, polymorphism and interfaces." +- "Use mocks to replace a class in test code." keypoints: - "Classes and interfaces can help decouple code so it is easier to understand, test and maintain." +- "Encapsulation is bundling related data into a structured component, +along with the methods that operate on the data. It is also provides a mechanism for restricting +the access to that data, hiding the internal representation of the component." +- "Polymorphism describes the provision of a single interface to entities of different types, +or the use of a single symbol to represent different types." --- ## Introduction @@ -261,6 +266,16 @@ on it and it will return a number representing its surface area. ## Polymorphism +In general, polymorphism is the idea of having multiple implementations/forms/shapes +of the same abstract concept. +It is the provision of a single interface to entities of different types, +or the use of a single symbol to represent multiple different types. + +There are [different versions of polymorphism](https://www.bmc.com/blogs/polymorphism-programming/). +For example, method or operator overloading is one +type of polymorphism enabling methods and operators to take parameters of different types. + +We will have a look at the interface-based polymorphism. In OOP, it is possible to have different object classes that conform to the same interface. For example, let's have a look at the following class representing a `Rectangle`: From 0a36e7c851fb2448d658edb3fcefe619918af89f Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Wed, 6 Mar 2024 14:22:31 +0000 Subject: [PATCH 091/105] Fix for extras episode name --- _config.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_config.yml b/_config.yml index 330dba1b2..05fa64eb3 100644 --- a/_config.yml +++ b/_config.yml @@ -96,7 +96,7 @@ extras_order: - protect-main-branch - vscode - software-architecture-extra - - programing-paradigms + - programming-paradigms.md - functional-programming - object-oriented-programming - persistence From 77169e24a962c756af452fc195bb32ccf63ab73f Mon Sep 17 00:00:00 2001 From: Steve Crouch Date: Mon, 25 Mar 2024 11:39:55 +0000 Subject: [PATCH 092/105] #321 - Initial rework inc. section ordering and design goals --- _episodes/32-software-design.md | 102 ++++++++++++++++---------------- 1 file changed, 51 insertions(+), 51 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 8d91a74b0..33b2189ec 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -52,7 +52,42 @@ requirements of 'good' software and revisit our software's [MVC architecture](/11-software-project/index.html#software-architecture) in the context of software design. +## Poor Design Choices & Technical Debt + +When faced with a problem that you need to solve by writing code - it may be tempted to +skip the design phase and dive straight into coding. +What happens if you do not follow the good software design and development best practices? +It can lead to accumulated 'technical debt', +which (according to [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt)), +is the "cost of additional rework caused by choosing an easy (limited) solution now +instead of using a better approach that would take longer". +The pressure to achieve project goals can sometimes lead to quick and easy solutions, +which make the software become +more messy, more complex, and more difficult to understand and maintain. +The extra effort required to make changes in the future is the interest paid on the (technical) debt. +It is natural for software to accrue some technical debt, +but it is important to pay off that debt during a maintenance phase - +simplifying, clarifying the code, making it easier to understand - +to keep these interest payments on making changes manageable. + +There is only so much time available in a project. +How much effort should we spend on designing our code properly +and using good development practices? +The following [XKCD comic](https://xkcd.com/844/) summarises this tension: + +![Writing good code comic](../fig/xkcd-good-code-comic.png){: .image-with-shadow width="400px" } + +At an intermediate level there are a wealth of practices that *could* be used, +and applying suitable design and coding practices is what separates +an *intermediate developer* from someone who has just started coding. +The key for an intermediate developer is to balance these concerns +for each software project appropriately, +and employ design and development practices *enough* so that progress can be made. +It is very easy to under-design software, +but remember it is also possible to over-design software too. + ## Good Software Design Goals + Aspirationally, what makes good code can be summarised in the following quote from the [Intent HG blog](https://intenthq.com/blog/it-audience/what-is-good-code-a-scientific-definition/): @@ -69,29 +104,28 @@ to fix a problem or the software's requirements change. Satisfying the above properties will lead to an overall software design goal of having *maintainable* code, which is: -* *readable* (and understandable) by developers who did not write the code, e.g. by: - * following a consistent coding style and naming conventions - * using meaningful and descriptive names for variables, functions, and classes - * documenting code to describe it does and how it may be used - * using simple control flow to make it easier to follow the code execution - * keeping functions and methods small and focused on a single task (also important for testing) -* *testable* through a set of (preferably automated) tests, e.g. by: - * writing unit, functional, regression tests to verify the code produces +* *understandable* by developers who did not develop the code, +by having a clear and well-considered high-level design (or *architecture*) that separates out the different components and aspects of its function logically +and in a modular way, and having the interactions between these different parts clear, simple, and sufficiently high-level that they do not contravene this design. + * Moving this forward into implementation, *understandable* would mean being consistent in coding style, using sensible naming conventions for functions, classes and variables, documenting and commenting code, simple control flow, and having small functions and methods focused on single tasks. +* *adaptable* by designing the code to be easily modifiable and extensible to satisfy new requirements, +by incorporating points in the modular design where new behaviour can be added in a clear and straightforward manner +(e.g. as individual functions in existing modules, or perhaps at a higher-level as plugins). + * In an implementation sense, this means writing low-coupled/decoupled code where each part of the code has a separate concern, and has the lowest possible dependency on other parts of the code. + This makes it easier to test, update or replace. +* *testable* by designing the code in a sufficiently modular way to make it easier to test the functionality within a modular design, +either as a whole or in terms of its individual functions. + * This would carry forward in an implementation sense in two ways. Firstly, having functions sufficiently small to be amenable to individual (ideally automated) test cases, e.g. by writing unit, regression tests to verify the code produces the expected outputs from controlled inputs and exhibits the expected behavior over time - as the code changes -* *adaptable* (easily modifiable and extensible) to satisfy new requirements, e.g. by: - * writing low-coupled/decoupled code where each part of the code has a separate concern and - the lowest possible dependency on other parts of the code making it - easier to test, update or replace - e.g. by separating the "business logic" and "presentation" - layers of the code on the architecture level (recall the [MVC architecture](/11-software-project/index.html#software-architecture)), - or separating "pure" (without side-effects) and "impure" (with side-effects) parts of the code on the - level of functions. + as the code changes. + Secondly, at a higher-level in implementation, this would allow functional tests to be written to create tests to verify entire pathways through the code, from initial software input to testing eventual output. Now that we know what goals we should aspire to, let us take a critical look at the code in our software project and try to identify ways in which it can be improved. Our software project contains a branch `full-data-analysis` with code for a new feature of our -inflammation analysis software. Recall that you can see all your branches as follows: +inflammation analysis software. Recall that you can see all your branches as follows: + ~~~ $ git branch --all ~~~ @@ -148,40 +182,6 @@ calculates and compares standard deviation across all the data by day and finaly > {: .solution} {: .challenge} -## Poor Design Choices & Technical Debt - -When faced with a problem that you need to solve by writing code - it may be tempted to -skip the design phase and dive straight into coding. -What happens if you do not follow the good software design and development best practices? -It can lead to accumulated 'technical debt', -which (according to [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt)), -is the "cost of additional rework caused by choosing an easy (limited) solution now -instead of using a better approach that would take longer". -The pressure to achieve project goals can sometimes lead to quick and easy solutions, -which make the software become -more messy, more complex, and more difficult to understand and maintain. -The extra effort required to make changes in the future is the interest paid on the (technical) debt. -It is natural for software to accrue some technical debt, -but it is important to pay off that debt during a maintenance phase - -simplifying, clarifying the code, making it easier to understand - -to keep these interest payments on making changes manageable. - -There is only so much time available in a project. -How much effort should we spend on designing our code properly -and using good development practices? -The following [XKCD comic](https://xkcd.com/844/) summarises this tension: - -![Writing good code comic](../fig/xkcd-good-code-comic.png){: .image-with-shadow width="400px" } - -At an intermediate level there are a wealth of practices that *could* be used, -and applying suitable design and coding practices is what separates -an *intermediate developer* from someone who has just started coding. -The key for an intermediate developer is to balance these concerns -for each software project appropriately, -and employ design and development practices *enough* so that progress can be made. -It is very easy to under-design software, -but remember it is also possible to over-design software too. - ## Techniques for Improving Code How code is structured is important for helping people who are developing and maintaining it From c44c2f5bde132092a4649f5f3bcb9b98ebdb301b Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Mon, 25 Mar 2024 14:22:24 +0000 Subject: [PATCH 093/105] Initial swap of refactoring and decoupling/abstractions episodes --- _config.yml | 3 +- _episodes/32-software-design.md | 22 ++-- ....md => 33-code-decoupling-abstractions.md} | 102 +++++++----------- ...-refactoring.md => 34-code-refactoring.md} | 89 +++++++++------ _extras/object-oriented-programming.md | 3 +- _extras/procedural-programming.md | 38 +++++++ _extras/programming-paradigms.md | 40 +++---- 7 files changed, 165 insertions(+), 132 deletions(-) rename _episodes/{34-code-abstractions.md => 33-code-decoupling-abstractions.md} (81%) rename _episodes/{33-code-refactoring.md => 34-code-refactoring.md} (77%) create mode 100644 _extras/procedural-programming.md diff --git a/_config.yml b/_config.yml index 05fa64eb3..6c956248b 100644 --- a/_config.yml +++ b/_config.yml @@ -96,7 +96,8 @@ extras_order: - protect-main-branch - vscode - software-architecture-extra - - programming-paradigms.md + - programming-paradigms + - procedural-programming - functional-programming - object-oriented-programming - persistence diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 8d91a74b0..e22e7a612 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -182,7 +182,7 @@ and employ design and development practices *enough* so that progress can be mad It is very easy to under-design software, but remember it is also possible to over-design software too. -## Techniques for Improving Code +## Techniques for Good Software Design How code is structured is important for helping people who are developing and maintaining it to understand and update it. @@ -192,14 +192,7 @@ Such components can be as small as a single function, or be a software package i These smaller components can be understood individually without having to understand the entire codebase at once. -### Code Refactoring - -*Code refactoring* is the process of improving the design of an existing code - -changing the internal structure of code without changing its -external behavior, with the goal of making the code more readable, maintainable, efficient or easier -to test. -This can include things such as renaming variables, reorganising -functions to avoid code duplication and increase reuse, and simplifying conditional statements. +Most commonly used software design techniques are given below. ### Code Decoupling @@ -222,6 +215,17 @@ Abstraction can be achieved through techniques such as *encapsulation*, *inherit *polymorphism*, which we will explore in the next episodes. There are other [abstraction techniques](https://en.wikipedia.org/wiki/Abstraction_(computer_science)) available too. +### Code Refactoring + +*Code refactoring* is the process of improving the design of an *existing codebase* - +changing the internal structure of code without changing its +external behavior, with the goal of making the code more readable, maintainable, efficient or easier +to test. +This can include introducing things such as code decoupling and abstractions, but also +renaming variables, reorganising functions to avoid code duplication and increase reuse, +and simplifying conditional statements. + + ## Improving Our Software Design Refactoring our code to make it more decoupled and to introduce abstractions to diff --git a/_episodes/34-code-abstractions.md b/_episodes/33-code-decoupling-abstractions.md similarity index 81% rename from _episodes/34-code-abstractions.md rename to _episodes/33-code-decoupling-abstractions.md index bb078e26e..4f26ee7f8 100644 --- a/_episodes/34-code-abstractions.md +++ b/_episodes/33-code-decoupling-abstractions.md @@ -1,16 +1,20 @@ --- -title: "Code Abstractions" +title: "Code Decoupling & Abstractions" teaching: 30 exercises: 45 questions: +- "What is decoupled code?" +- "What are commonly used code abstractions?" - "When is it useful to use classes to structure code?" - "How can we make sure the components of our software are reusable?" objectives: +- "Understand the benefits of code decoupling." - "Introduce appropriate abstractions to simplify code." - "Understand the principles of encapsulation, polymorphism and interfaces." - "Use mocks to replace a class in test code." keypoints: -- "Classes and interfaces can help decouple code so it is easier to understand, test and maintain." +- "Classes and interfaces can help decouple code by abstracting out/hiding certain details of the +code so it is easier to understand, test and maintain." - "Encapsulation is bundling related data into a structured component, along with the methods that operate on the data. It is also provides a mechanism for restricting the access to that data, hiding the internal representation of the component." @@ -20,6 +24,20 @@ or the use of a single symbol to represent different types." ## Introduction +*Code decoupling* means breaking the system into smaller components and reducing the +interdependence between these components, so that they can be tested and maintained independently. +Two components of code can be considered **decoupled** if a change in one does not +necessitate a change in the other. +While two connected units cannot always be totally decoupled, **loose coupling** +is something we should aim for. Benefits of decoupled code include: + +* easier to read as you do not need to understand the + details of the other component. +* easier to test, as one of the components can be replaced + by a test or a mock version of it. +* code tends to be easier to maintain, as changes can be isolated + from other parts of the code. + *Code abstraction* is the process of hiding the implementation details of a piece of code behind an interface - i.e. the details of *how* something works are hidden away, leaving us to deal only with *what* it does. @@ -34,24 +52,26 @@ then it becomes easier for these parts to change independently. Let's start redesigning our code by introducing some of the abstraction techniques to incrementally improve its design. -You may have noticed that loading data from CSV files in a directory is "baked" into +In the code from our current branch `full-data-analysis`, +you may have noticed that loading data from CSV files in a directory is "baked" into (i.e. is part of) the `analyse_data()` function. This is not strictly a functionality of the data analysis function, so firstly let's decouple the data loading into a separate function. > ## Exercise: Decouple Data Loading from Data Analysis > Separate out the data loading functionality from `analyse_data()` into a new function -> `load_inflammation_data()` that returns all the files to load. +> `load_inflammation_data()` that returns a list of 2D NumPy arrays with inflammation data +> loaded from all inflammation CSV files found in a specified directory path. >> ## Solution ->> The new function `load_inflammation_data()` that reads all the data into the format needed ->> for the analysis should look something like: +>> The new function `load_inflammation_data()` that reads all the inflammation data into the +>> format needed for the analysis could look something like: >> ```python >> def load_inflammation_data(dir_path): >> data_file_paths = glob.glob(os.path.join(dir_path, 'inflammation*.csv')) >> if len(data_file_paths) == 0: ->> raise ValueError(f"No inflammation csv's found in path {dir_path}") ->> data = map(models.load_csv, data_file_paths) ->> return list(data) +>> raise ValueError(f"No inflammation CSV files found in path {dir_path}") +>> data = map(models.load_csv, data_file_paths) # load inflammation data from CSV files +>> return list(data) # return the list of 2D NumPy arrays with inflammation data >> ``` >> This function can now be used in the analysis as follows: >> ```python @@ -63,8 +83,6 @@ let's decouple the data loading into a separate function. >> The code is now easier to follow since we do not need to understand the the data loading from >> files to read the statistical analysis, and vice versa - we do not have to understand the >> statistical analysis when looking at data loading. ->> Ensure you re-run the regression tests to check this refactoring has not ->> changed the output of `analyse_data()`. > {: .solution} {: .challenge} @@ -213,18 +231,7 @@ In addition, implementation of the method `get_area()` is hidden too (abstractio >> method. >> >> While the overall behaviour of the code and its results are unchanged, ->> the way we invoke data analysis has changed. ->> We must update our regression test to match this, to ensure we have not broken anything: ->> ```python ->> ... ->> def test_compute_data(): ->> from inflammation.compute_data import analyse_data ->> path = Path.cwd() / "../data" ->> data_source = CSVDataSource(path) ->> result = analyse_data(data_source) ->> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 ->> ... ->> ``` +>> the way we invoke data analysis has changed. > {: .solution} {: .challenge} @@ -256,7 +263,7 @@ on it and it will return a number representing its surface area. > what parameters they need and what they return. >> ## Solution >> The interface is the `load_inflammation_data()` method, which takes no parameters and ->> returns a list where each entry is a 2D array of patient inflammation data (read from some +>> returns a list where each entry is a 2D NumPy array of patient inflammation data (read from some > data source). >> >> Any object passed into `analyse_data()` should conform to this interface. @@ -346,7 +353,7 @@ data sources with no extra work. >> def load_inflammation_data(self): >> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.json')) >> if len(data_file_paths) == 0: ->> raise ValueError(f"No inflammation JSON's found in path {self.dir_path}") +>> raise ValueError(f"No inflammation JSON files found in path {self.dir_path}") >> data = map(models.load_json, data_file_paths) >> return list(data) >> ``` @@ -369,12 +376,12 @@ data sources with no extra work. ## Testing Using Mock Objects -We can use this abstraction to also make testing more straight forward. -Instead of having our tests use real file system data, we can instead provide +We can use a *mock object* abstraction to make testing more straightforward. +Instead of having our tests use real data stored on a file system, we can instead provide a mock or dummy implementation instead of one of the real classes. Providing that what we use as a substitute conforms to the same interface, the code we are testing should work just the same. -Such mock/dummy implementation could just returns some fixed example data. +Such mock/dummy implementation could just return some fixed example data. An convenient way to do this in Python is using Python's [mock object library](https://docs.python.org/3/library/unittest.mock.html). This is a whole topic in itself - @@ -430,39 +437,6 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > {: .solution} {: .challenge} -## Programming Paradigms - -Until now, we have mainly been writing procedural code. -In the previous episode, we mentioned [pure functions](/33-code-refactoring/index.html#pure-functions) -and Functional Programming. -In this episode, we have touched a bit upon classes, encapsulation and polymorphism, -which are characteristics of (but not limited to) the Object Oriented Programming (OOP). -All these different programming paradigms provide varied approaches to structuring your code - -each with certain strengths and weaknesses when used to solve particular types of problems. -In many cases, particularly with modern languages, a single language can allow many different -structural approaches and mixing programming paradigms within your code. -Once your software begins to get more complex - it is common to use aspects of [different paradigm](/programming-paradigms/index.html) -to handle different subtasks. -Because of this, it is useful to know about the [major paradigms](/programming-paradigms/index.html), -so you can recognise where it might be useful to switch. -This is outside of scope of this course - we have some extra episodes on the topics of -[Procedural Programming](/programming-paradigms/index.html#procedural-programming), -[Functional Programming](/functional-programming/index.html) and -[Object Oriented Programming](/object-oriented-programming/index.html) if you want to know more. - -> ## So Which One is Python? -> Python is a multi-paradigm and multi-purpose programming language. -> You can use it as a procedural language and you can use it in a more object oriented way. -> It does tend to land more on the object oriented side as all its core data types -> (strings, integers, floats, booleans, lists, -> sets, arrays, tuples, dictionaries, files) -> as well as functions, modules and classes are objects. -> -> Since functions in Python are also objects that can be passed around like any other object, -> Python is also well suited to functional programming. -> One of the most popular Python libraries for data manipulation, -> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), -> supports a functional programming style -> as most of its functions on data are not changing the data (no side effects) -> but producing a new data to reflect the result of the function. -{: .callout} + +{% include links.md %} + diff --git a/_episodes/33-code-refactoring.md b/_episodes/34-code-refactoring.md similarity index 77% rename from _episodes/33-code-refactoring.md rename to _episodes/34-code-refactoring.md index ae152dfed..2d155c227 100644 --- a/_episodes/33-code-refactoring.md +++ b/_episodes/34-code-refactoring.md @@ -4,37 +4,25 @@ teaching: 30 exercises: 20 questions: - "How do you refactor code without breaking it?" -- "What is decoupled code?" -- "What are benefits of using pure functions in our code?" +- "What are benefits of using pure functions in code?" objectives: -- "Understand the benefits of code decoupling." - "Understand the use of regressions tests to avoid breaking existing code when refactoring." - "Understand the use of pure functions in software design to make the code easier to test." - "Refactor a piece of code to separate out 'pure' from 'impure' code." keypoints: - "Implementing regression tests before refactoring gives you confidence that your changes have not broken the code." -- "Decoupling code into pure functions that process data without side effects makes code easier +- "Refactoring code into pure functions that process data without side effects makes code easier to read, test and maintain." --- ## Introduction -*Code refactoring* is the process of improving the design of an existing code - for example -to make it more decoupled. -Recall that *code decoupling* means breaking the system into smaller components and reducing the -interdependence between these components, so that they can be tested and maintained independently. -Two components of code can be considered **decoupled** if a change in one does not -necessitate a change in the other. -While two connected units cannot always be totally decoupled, **loose coupling** -is something we should aim for. Benefits of decoupled code include: - -* easier to read as you do not need to understand the - details of the other component. -* easier to test, as one of the components can be replaced - by a test or a mock version of it. -* code tends to be easier to maintain, as changes can be isolated - from other parts of the code. +Code refactoring is the process of improving the design of an existing codebase - changing the +internal structure of code without changing its external behavior, with the goal of making the code +more readable, maintainable, efficient or easier to test. This can include introducing things such +as code decoupling and abstractions, but also renaming variables, reorganising functions to avoid +code duplication and increase reuse, and simplifying conditional statements. When faced with an existing piece of code that needs modifying a good refactoring process to follow is: @@ -47,8 +35,9 @@ In this episode we will refactor the function `analyse_data()` in `compute_data. from our project in the following two ways: * add more tests so we can be more confident that future changes will have the intended effect and will not break the existing code. -* split the monolithic `analyse_data()` function into a number of smaller and mode decoupled functions -making the code easier to understand and test. +* further split the monolithic `analyse_data()` function into a number of smaller and more +decoupled functions (continuing the work from the previous episode) making the code easier to +understand and test. ## Writing Tests Before Refactoring @@ -98,13 +87,13 @@ the tests at all. > def test_analyse_data(): > from inflammation.compute_data import analyse_data > path = Path.cwd() / "../data" -> result = analyse_data(path) -> -> # TODO: add an assert for the value of result +> data_source = CSVDataSource(path) +> result = analyse_data(data_source) +> +> # TODO: add assert statement(s) to test the result value is as expected > ``` > Use `assert_array_almost_equal` from the `numpy.testing` library to > compare arrays of floating point numbers. -> >> ## Hint >> When determining the correct return data result to use in tests, it may be helpful to assert the >> result equals some random made-up data, observe the test fail initially and then @@ -113,7 +102,7 @@ the tests at all. > >> ## Solution >> One approach we can take is to: ->> * comment out the visualize method on `analyse_data()` +>> * comment out the visualise method on `analyse_data()` >> (as this will cause our test to hang waiting for the result data) >> * return the data instead, so we can write asserts on the data >> * See what the calculated value is, and assert that it is the same as the expected value @@ -127,7 +116,8 @@ the tests at all. >> def test_analyse_data(): >> from inflammation.compute_data import analyse_data >> path = Path.cwd() / "../data" ->> result = analyse_data(path) +>> data_source = CSVDataSource(path) +>> result = analyse_data(data_source) >> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211, >> 0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094, >> 1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312, @@ -144,7 +134,7 @@ the tests at all. >> * It does not test edge cases >> * If the data files in the directory change - the test will fail >> ->> We would need additional tests to check the above. +>> We would need to add additional tests to check the above. > {: .solution} {: .challenge} @@ -290,7 +280,46 @@ testable and maintainable. This is particularly useful for: * Data processing and analysis (for example, using [Python Pandas library](https://pandas.pydata.org/) for data manipulation where most of functions appear pure) -* Doing simulations -* Translating data from one format to another +* Doing simulations (? needs more explanation) +* Translating data from one format to another (? an example would be good) + +## Programming Paradigms + +Until this section, we have mainly been writing procedural code. +In the previous episode, we have touched a bit upon classes, encapsulation and polymorphism, +which are characteristics of (but not limited to) the Object Oriented Programming (OOP). +In this episode, we mentioned [pure functions](./index.html#pure-functions) +and Functional Programming. + +These are examples of different [programming paradigms](/programming-paradigms/index.html) +and provide varied approaches to structuring your code - +each with certain strengths and weaknesses when used to solve particular types of problems. +In many cases, particularly with modern languages, a single language can allow many different +structural approaches and mixing programming paradigms within your code. +Once your software begins to get more complex - it is common to use aspects of [different paradigm](/programming-paradigms/index.html) +to handle different subtasks. +Because of this, it is useful to know about the [major paradigms](/programming-paradigms/index.html), +so you can recognise where it might be useful to switch. +This is outside of scope of this course - we have some extra episodes on the topics of +[Procedural Programming](/procedural-programming/index.html), +[Functional Programming](/functional-programming/index.html) and +[Object Oriented Programming](/object-oriented-programming/index.html) if you want to know more. + +> ## So Which One is Python? +> Python is a multi-paradigm and multi-purpose programming language. +> You can use it as a procedural language and you can use it in a more object oriented way. +> It does tend to land more on the object oriented side as all its core data types +> (strings, integers, floats, booleans, lists, +> sets, arrays, tuples, dictionaries, files) +> as well as functions, modules and classes are objects. +> +> Since functions in Python are also objects that can be passed around like any other object, +> Python is also well suited to functional programming. +> One of the most popular Python libraries for data manipulation, +> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), +> supports a functional programming style +> as most of its functions on data are not changing the data (no side effects) +> but producing a new data to reflect the result of the function. +{: .callout} {% include links.md %} diff --git a/_extras/object-oriented-programming.md b/_extras/object-oriented-programming.md index 2a882cebc..a23bd6305 100644 --- a/_extras/object-oriented-programming.md +++ b/_extras/object-oriented-programming.md @@ -1,5 +1,6 @@ --- title: "Object Oriented Programming" +layout: episode teaching: 30 exercises: 35 questions: @@ -26,7 +27,7 @@ Data is encapsulated in the form of fields (attributes) of objects, while code is encapsulated in the form of procedures (methods) that manipulate objects' attributes and define "behaviour" of objects. So, in object oriented programming, -we first think about the data and the things that we’re modelling - +we first think about the data and the things that we are modelling - and represent these by objects - rather than define the logic of the program, and code becomes a series of interactions between objects. diff --git a/_extras/procedural-programming.md b/_extras/procedural-programming.md new file mode 100644 index 000000000..46bf4341a --- /dev/null +++ b/_extras/procedural-programming.md @@ -0,0 +1,38 @@ +--- +title: "Procedural Programming" +teaching: 10 +exercises: 0 +layout: episode +questions: +- "What is procedural programming?" +- "Which situations/problems is procedural programming well suited for?" +objectives: +- "Describe the core concepts that define the procedural programming paradigm" +- "Describe the main characteristics of code that is written in procedural programming style" +keypoints: +- "Procedural Programming emphasises a structured approach to coding, using a sequence of tasks and subroutines to create a well-organised program." +--- + +In procedural programming code is grouped into +procedures (also known as routines - reusable piece of code that performs a specific action but +have no return value) and functions (similar to procedures but return value after an execution). +Procedures and function both perform a single task, with exactly one entry and one exit point and +containing a series of logical steps (instructions) to be carried out. +The primary concern is the *process* through which the input is transformed into the desired output. +Procedural languages treat data and procedures as two different +entities (unlike in functional programming, where code is also treated as data). + +Key features of procedural programming include: + +* Sequence control: the code execution process goes through the steps in a defined order, with clear starting and ending points. +* Modularity: code can be divided into separate modules or procedures to perform specific tasks, making it easier to maintain and reuse. +* Standard data structures: Procedural Programming makes use of standard data structures such as +arrays, lists, and records to store and manipulate data efficiently. +* Abstraction: procedures encapsulate complex operations and allow them to be represented as simple, high-level commands. +* Execution control: variable implementations of loops, branches, and jumps give more control over the flow of execution. + +To better understand procedural programming, it is useful to compare it with other prevalent +programming paradigms such as +[object-oriented programming](/object-oriented-programming/index.html) (OOP) +and [functional programming](/functional-programming/index.html#functional-vs-procedural-programming) +to shed light on their distinctions, advantages, and drawbacks. diff --git a/_extras/programming-paradigms.md b/_extras/programming-paradigms.md index 25c0175e0..b22d8e269 100644 --- a/_extras/programming-paradigms.md +++ b/_extras/programming-paradigms.md @@ -26,8 +26,8 @@ One way to categorise these structural approaches is into **paradigms**. Each paradigm represents a slightly different way of thinking about and structuring our code and each has certain strengths and weaknesses when used to solve particular types of problems. Once your software begins to get more complex -it's common to use aspects of different paradigms to handle different subtasks. -Because of this, it's useful to know about the major paradigms, +it is common to use aspects of different paradigms to handle different subtasks. +Because of this, it is useful to know about the major paradigms, so you can recognise where it might be useful to switch. There are two major families that we can group the common programming paradigms into: @@ -51,12 +51,12 @@ so this classification of programming languages based on the paradigm they use i Procedural Programming comes from a family of paradigms known as the Imperative Family. With paradigms in this family, we can think of our code as the instructions for processing data. -Procedural Programming is probably the style you're most familiar with +Procedural Programming is probably the style you are most familiar with and the one we used up to this point, where we group code into *procedures performing a single task, with exactly one entry and one exit point*. In most modern languages we call these **functions**, instead of procedures - -so if you're grouping your code into functions, this might be the paradigm you're using. +so if you are grouping your code into functions, this might be the paradigm you're using. By grouping code like this, we make it easier to reason about the overall structure, since we should be able to tell roughly what a function does just by looking at its name. These functions are also much easier to reuse than code outside of functions, @@ -65,12 +65,12 @@ since we can call them from any part of our program. So far we have been using this technique in our code - it contains a list of instructions that execute one after the other starting from the top. This is an appropriate choice for smaller scripts and software -that we're writing just for a single use. +that we are writing just for a single use. Aside from smaller scripts, Procedural Programming is also commonly seen in code focused on high performance, with relatively simple data structures, such as in High Performance Computing (HPC). These programs tend to be written in C (which doesn't support Object Oriented Programming) -or Fortran (which didn't until recently). +or Fortran (which did not until recently). HPC code is also often written in C++, but C++ code would more commonly follow an Object Oriented style, though it may have procedural sections. @@ -81,7 +81,10 @@ because it uses functions rather than objects, but this is incorrect. Functional Programming is a separate paradigm that places much stronger constraints on the behaviour of a function -and structures the code differently as we'll see soon. +and structures the code differently as we will see soon. + +You can read more in an [extra episode on Procedural Programming](/procedural-programming/index.html). + ### Functional Programming @@ -113,7 +116,7 @@ With datasets like this, we can't move the data around easily, so we often want to send our code to where the data is instead. By writing our code in a functional style, we also gain the ability to run many operations in parallel -as it's guaranteed that each operation won't interact with any of the others - +as it is guaranteed that each operation won't interact with any of the others - this is essential if we want to process this much data in a reasonable amount of time. You can read more in an [extra episode on Functional Programming](/functional-programming/index.html). @@ -126,8 +129,8 @@ An object has two fundamental parts - properties (characteristics) and behaviour In Object Oriented Programming, we first think about the data and the things that we're modelling - and represent these by objects. -For example, if we're writing a simulation for our chemistry research, -we're probably going to need to represent atoms and molecules. +For example, if we are writing a simulation for our chemistry research, +we are probably going to need to represent atoms and molecules. Each of these has a set of properties which we need to know about in order for our code to perform the tasks we want - in this case, for example, we often need to know the mass and electric charge of each atom. @@ -146,23 +149,6 @@ Most people would classify Object Oriented Programming as an You can read more in an [extra episode on Object Oriented Programming](/object-oriented-programming/index.html). -> ## So Which one is Python? -> Python is a multi-paradigm and multi-purpose programming language. -> You can use it as a procedural language and you can use it in a more object oriented way. -> It does tend to land more on the object oriented side as all its core data types -> (strings, integers, floats, booleans, lists, -> sets, arrays, tuples, dictionaries, files) -> as well as functions, modules and classes are objects. -> -> Since functions in Python are also objects that can be passed around like any other object, -> Python is also well suited to functional programming. -> One of the most popular Python libraries for data manipulation, -> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), -> supports a functional programming style -> as most of its functions on data are not changing the data (no side effects) -> but producing a new data to reflect the result of the function. -{: .callout} - ## Other Paradigms The three paradigms introduced here are some of the most common, From d08fe3de87d78e818339a6d300eb2858dd1cda85 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Mon, 25 Mar 2024 15:02:58 +0000 Subject: [PATCH 094/105] Added procedural programming episode --- _extras/procedural-programming.md | 39 ++++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/_extras/procedural-programming.md b/_extras/procedural-programming.md index 46bf4341a..77e4141e2 100644 --- a/_extras/procedural-programming.md +++ b/_extras/procedural-programming.md @@ -19,8 +19,6 @@ have no return value) and functions (similar to procedures but return value afte Procedures and function both perform a single task, with exactly one entry and one exit point and containing a series of logical steps (instructions) to be carried out. The primary concern is the *process* through which the input is transformed into the desired output. -Procedural languages treat data and procedures as two different -entities (unlike in functional programming, where code is also treated as data). Key features of procedural programming include: @@ -34,5 +32,40 @@ arrays, lists, and records to store and manipulate data efficiently. To better understand procedural programming, it is useful to compare it with other prevalent programming paradigms such as [object-oriented programming](/object-oriented-programming/index.html) (OOP) -and [functional programming](/functional-programming/index.html#functional-vs-procedural-programming) +and [functional programming](/functional-programming/index.html) to shed light on their distinctions, advantages, and drawbacks. + +Procedural programming uses a very detailed list of instructions to tell the computer what to do +step by step. This approach uses iteration to repeat a series of steps as often as needed. +Functional programming is an approach to problem solving that treats every computation as a +mathematical function (an expression) and relies more heavily on recursion as a primary control +structure (rather than iteration). +Procedural languages treat data and procedures as two different +entities whereas, in functional programming, code is also treated as data - functions +can take other functions as arguments or return them as results. +Compare and contract [two different implementations](/functional-programming/index.html#functional-vs-procedural-programming) +of the same functionality in procedural and functional programming styles +to better grasp their differences. + +Procedural and [object-oriented programming](/object-oriented-programming/index.html) have fundamental differences in their approach to +organising code and solving problems. +In procedural programming, the code is structured around functions and procedures that execute a +specific task or operations. Object-oriented programming is based around objects and classes, +where data is encapsulated within objects and methods on objects that used to manipulate that data. +Both procedural and object-oriented programming paradigms support [abstraction and modularization](/33-code-decoupling-abstractions/index.html). +Procedural programming achieves this through procedures and functions, while OOP uses classes and +objects. +However, OOP goes further by encapsulating related data and methods within objects, +enabling a higher level of abstraction and separation between different components. +Inheritance and polymorphism are two vital features provided by OOP, which are not intrinsically +supported by procedural languages. [Inheritance](/object-oriented-programming/index.html#inheritance) allows the creation of classes that inherit +properties and methods from existing classes – enabling code reusability and reducing redundancy. +[Polymorphism](/33-code-decoupling-abstractions/index.html#polymorphism) permits a single function or method to operate on multiple data types or objects, +improving flexibility and adaptability. + +The choice between procedural, functional and object-oriented programming depends primarily on +the specific project requirements and personal preference. +Procedural programming may be more suitable for smaller projects, whereas OOP is typically +preferred for larger and more complex projects, especially when working in a team. +Functional programming can offer more elegant and scalable solutions for complex problems, +particularly in parallel computing. From cba2d0c0dfa82d13a294f2041f4a85f4d4485385 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Mon, 25 Mar 2024 20:33:06 +0000 Subject: [PATCH 095/105] Review of abstraction episode --- _episodes/33-code-decoupling-abstractions.md | 93 ++++++++++---------- _episodes/34-code-refactoring.md | 46 +++++----- 2 files changed, 74 insertions(+), 65 deletions(-) diff --git a/_episodes/33-code-decoupling-abstractions.md b/_episodes/33-code-decoupling-abstractions.md index 4f26ee7f8..cdfe244ed 100644 --- a/_episodes/33-code-decoupling-abstractions.md +++ b/_episodes/33-code-decoupling-abstractions.md @@ -13,32 +13,27 @@ objectives: - "Understand the principles of encapsulation, polymorphism and interfaces." - "Use mocks to replace a class in test code." keypoints: -- "Classes and interfaces can help decouple code by abstracting out/hiding certain details of the -code so it is easier to understand, test and maintain." -- "Encapsulation is bundling related data into a structured component, -along with the methods that operate on the data. It is also provides a mechanism for restricting -the access to that data, hiding the internal representation of the component." +- "Abstractions can hide certain details of the code behind classes and interfaces" +- "Encapsulation bundles data into a structured component along with the methods that operate +on the data, and provides a mechanism for restricting access to that data, +hiding the internal representation of the component." - "Polymorphism describes the provision of a single interface to entities of different types, or the use of a single symbol to represent different types." +- "Code decoupling is separating code into smaller components and reducing the interdependence +between them so that the code is easier to understand, test and maintain." + --- ## Introduction -*Code decoupling* means breaking the system into smaller components and reducing the -interdependence between these components, so that they can be tested and maintained independently. -Two components of code can be considered **decoupled** if a change in one does not +**Code decoupling** refers to breaking up the software into smaller components and reducing the +interdependence between these components so that they can be tested and maintained independently. +Two components of code can be considered *decoupled* if a change in one does not necessitate a change in the other. -While two connected units cannot always be totally decoupled, **loose coupling** -is something we should aim for. Benefits of decoupled code include: - -* easier to read as you do not need to understand the - details of the other component. -* easier to test, as one of the components can be replaced - by a test or a mock version of it. -* code tends to be easier to maintain, as changes can be isolated - from other parts of the code. +While two connected units cannot always be totally decoupled, *loose coupling* +is something we should aim for. -*Code abstraction* is the process of hiding the implementation details of a piece of +**Code abstraction** is the process of hiding the implementation details of a piece of code behind an interface - i.e. the details of *how* something works are hidden away, leaving us to deal only with *what* it does. This allows developers to work with the code at a higher level @@ -48,15 +43,23 @@ details and thereby reducing the cognitive load when programming. Abstractions can aid decoupling of code. If one part of the code only uses another part through an appropriate abstraction then it becomes easier for these parts to change independently. +Benefits of using these techniques include having the codebase that is: + +* easier to read as you only need to understand the + details of the (smaller) component you are looking at and not the whole monolithic codebase. +* easier to test, as one of the components can be replaced + by a test or a mock version of it. +* easier to maintain, as changes can be isolated + from other parts of the code. Let's start redesigning our code by introducing some of the abstraction techniques -to incrementally improve its design. +to incrementally decouple it into smaller components to improve its overall design. In the code from our current branch `full-data-analysis`, -you may have noticed that loading data from CSV files in a directory is "baked" into +you may have noticed that loading data from CSV files from a `data` directory is "baked" into (i.e. is part of) the `analyse_data()` function. -This is not strictly a functionality of the data analysis function, so firstly -let's decouple the data loading into a separate function. +Data loading is a functionality separate from data analysis, so firstly +let's decouple the data loading part into a separate component (function). > ## Exercise: Decouple Data Loading from Data Analysis > Separate out the data loading functionality from `analyse_data()` into a new function @@ -70,8 +73,8 @@ let's decouple the data loading into a separate function. >> data_file_paths = glob.glob(os.path.join(dir_path, 'inflammation*.csv')) >> if len(data_file_paths) == 0: >> raise ValueError(f"No inflammation CSV files found in path {dir_path}") ->> data = map(models.load_csv, data_file_paths) # load inflammation data from CSV files ->> return list(data) # return the list of 2D NumPy arrays with inflammation data +>> data = map(models.load_csv, data_file_paths) #load inflammation data from each CSV file +>> return list(data) #return the list of 2D NumPy arrays with inflammation data >> ``` >> This function can now be used in the analysis as follows: >> ```python @@ -80,28 +83,28 @@ let's decouple the data loading into a separate function. >> daily_standard_deviation = compute_standard_deviation_by_data(data) >> ... >> ``` ->> The code is now easier to follow since we do not need to understand the the data loading from ->> files to read the statistical analysis, and vice versa - we do not have to understand the ->> statistical analysis when looking at data loading. +>> The code is now easier to follow since we do not need to understand the data loading part +>> to understand the statistical analysis part, and vice versa. > {: .solution} {: .challenge} -However, even with this change, the data loading is still coupled with the data analysis. +However, even with this change, the data loading is still coupled with the data analysis to a +large extent. For example, if we have to support loading data from different sources -(e.g. JSON files and CSV files), we would have to pass some kind of a flag indicating -what we want into `analyse_data()`. Instead, we would like to decouple the -consideration of what data to load from the `analyse_data()` function entirely. +(e.g. JSON files or an SQL database), we would have to pass some kind of a flag into `analyse_data()` +indicating the type of data we want to read from. Instead, we would like to decouple the +consideration of data source from the `analyse_data()` function entirely. One way we can do this is by using *encapsulation* and *classes*. ## Encapsulation & Classes -*Encapsulation* is the packing of "data" and "functions operating on that data" into a +**Encapsulation** is the process of packing the "data" and "functions operating on that data" into a single component/object. It is also provides a mechanism for restricting the access to that data. Encapsulation means that the internal representation of a component is generally hidden from view outside of the component's definition. -Encapsulation allows developers to present a consistent interface to an object/component +Encapsulation allows developers to present a consistent interface to the component/object that is independent of its internal implementation. For example, encapsulation can be used to hide the values or state of a structured data object inside a **class**, preventing direct access to them @@ -238,7 +241,7 @@ In addition, implementation of the method `get_area()` is hidden too (abstractio ## Interfaces -An interface is another important concept in software design related to abstraction and +An **interface** is another important concept in software design related to abstraction and encapsulation. For a software component, it declares the operations that can be invoked on that component, along with input arguments and what it returns. By knowing these details, we can communicate with this component without the need to know how it implements this interface. @@ -252,7 +255,7 @@ a given keyword that have been posted within a certain date range. Internal interfaces within software dictate how different parts of the system interact with each other. -Even when these are not explicitly documented or thought out, they still exist. +Even when these are not explicitly documented - they still exist. For example, our `Circle` class implicitly has an interface - you can call `get_area()` method on it and it will return a number representing its surface area. @@ -273,7 +276,7 @@ on it and it will return a number representing its surface area. ## Polymorphism -In general, polymorphism is the idea of having multiple implementations/forms/shapes +In general, **polymorphism** is the idea of having multiple implementations/forms/shapes of the same abstract concept. It is the provision of a single interface to entities of different types, or the use of a single symbol to represent multiple different types. @@ -282,7 +285,7 @@ There are [different versions of polymorphism](https://www.bmc.com/blogs/polymor For example, method or operator overloading is one type of polymorphism enabling methods and operators to take parameters of different types. -We will have a look at the interface-based polymorphism. +We will have a look at the *interface-based polymorphism*. In OOP, it is possible to have different object classes that conform to the same interface. For example, let's have a look at the following class representing a `Rectangle`: @@ -297,7 +300,7 @@ class Rectangle: Like `Circle`, this class provides the `get_area()` method. The method takes the same number of parameters (none), and returns a number. -However, the implementation is different. This is one type of *polymorphism*. +However, the implementation is different. This is interface-based polymorphism. The word "polymorphism" means "many forms", and in programming it refers to methods/functions/operators with the same name that can be executed on many objects or classes. @@ -319,7 +322,7 @@ the method for calculating the area of each shape is abstracted away to the rele How can polymorphism be useful in our software project? For example, we can replace our `CSVDataSource` with another class that reads a totally -different file format (e.g. JSON instead of CSV), or reads from an external service or database +different file format (e.g. JSON), or reads from an external service or a database. All of these changes can be now be made without changing the analysis function as we have decoupled the process of data loading from the data analysis earlier. Conversely, if we wanted to write a new analysis function, we could support any of these @@ -339,9 +342,9 @@ data sources with no extra work. > } > ] > ``` -> Finally, at run time construct an appropriate instance based on the file extension. +> Finally, at run-time, construct an appropriate data source instance based on the file extension. >> ## Solution ->> The new class could look something like: +>> The class that reads inflammation data from JSON files could look something like: >> ```python >> class JSONDataSource: >> """ @@ -357,7 +360,7 @@ data sources with no extra work. >> data = map(models.load_json, data_file_paths) >> return list(data) >> ``` ->> Additionally, in the controller will need to select the appropriate DataSource to +>> Additionally, in the controller we will need to select an appropriate DataSource instance to >> provide to the analysis: >>```python >> _, extension = os.path.splitext(InFiles[0]) @@ -366,7 +369,7 @@ data sources with no extra work. >> elif extension == '.csv': >> data_source = CSVDataSource(os.path.dirname(InFiles[0])) >> else: ->> raise ValueError(f'Unsupported file format: {extension}') +>> raise ValueError(f'Unsupported data file format: {extension}') >> analyse_data(data_source) >>``` >> As you can seen, all the above changes have been made made without modifying @@ -376,8 +379,8 @@ data sources with no extra work. ## Testing Using Mock Objects -We can use a *mock object* abstraction to make testing more straightforward. -Instead of having our tests use real data stored on a file system, we can instead provide +We can use a **mock object** abstraction to make testing more straightforward. +Instead of having our tests use real data stored on a file system, we can provide a mock or dummy implementation instead of one of the real classes. Providing that what we use as a substitute conforms to the same interface, the code we are testing should work just the same. diff --git a/_episodes/34-code-refactoring.md b/_episodes/34-code-refactoring.md index 2d155c227..060eaabf0 100644 --- a/_episodes/34-code-refactoring.md +++ b/_episodes/34-code-refactoring.md @@ -3,9 +3,10 @@ title: "Code Refactoring" teaching: 30 exercises: 20 questions: -- "How do you refactor code without breaking it?" +- "How do you refactor existing code without breaking it?" - "What are benefits of using pure functions in code?" objectives: +- "Employ code refactoring to improve the structure of existing code." - "Understand the use of regressions tests to avoid breaking existing code when refactoring." - "Understand the use of pure functions in software design to make the code easier to test." - "Refactor a piece of code to separate out 'pure' from 'impure' code." @@ -24,6 +25,12 @@ more readable, maintainable, efficient or easier to test. This can include intro as code decoupling and abstractions, but also renaming variables, reorganising functions to avoid code duplication and increase reuse, and simplifying conditional statements. +In the previous episode, we have already changed the structure of our code (i.e. refactored it +to a certain extent) +when we separated out data loading from data analysis but we have not tested that the new code +works as intended. This is particularly important with bigger code changes but even a small change +can easily break the codebase and introduce bugs. + When faced with an existing piece of code that needs modifying a good refactoring process to follow is: @@ -31,29 +38,26 @@ process to follow is: 2. Refactor the code 3. Verify that that the behaviour of the code is identical to that before refactoring. -In this episode we will refactor the function `analyse_data()` in `compute_data.py` -from our project in the following two ways: +In this episode we will further improve the code from our project in the following two ways: * add more tests so we can be more confident that future changes will have the intended effect and will not break the existing code. -* further split the monolithic `analyse_data()` function into a number of smaller and more -decoupled functions (continuing the work from the previous episode) making the code easier to -understand and test. +* further split `analyse_data()` function into a number of smaller and more +decoupled functions (continuing the work from the previous episode). ## Writing Tests Before Refactoring -When refactoring, first we need to make sure there are tests that verity +When refactoring, first we need to make sure there are tests in place that can verify the code behaviour as it is now (or write them if they are missing), then refactor the code and, finally, check that the original tests still pass. -This is to make sure we do not break the existing behaviour through refactoring. There is a bit of a "chicken and egg" problem here - if the refactoring is supposed to make it easier to write tests in the future, how can we write tests before doing the refactoring? The tricks to get around this trap are: - * Test at a higher level, with coarser accuracy - * Write tests that you intend to remove + * test at a higher level, with coarser accuracy, and + * write tests that you intend to remove. -The best tests are ones that test single bits of functionality rigorously. +The best tests are the ones that test a single bit of functionality rigorously. However, with our current `analyse_data()` code that is not possible because it is a large function doing a little bit of everything. Instead we will make minimal changes to the code to make it a bit more testable. @@ -62,8 +66,8 @@ Firstly, we will modify the function to return the data instead of visualising it because graphs are harder to test automatically (i.e. they need to be viewed and inspected manually in order to determine their correctness). -Next, we will make the assert statements verify what the outcome is -currently, rather than checking whether that is correct or not. +Next, we will make the assert statements verify what the current outcome is, rather than check +whether that is correct or not. Such tests are meant to verify that the behaviour does not *change* rather than checking the current behaviour is correct (there should be another set of tests checking the correctness). @@ -103,11 +107,12 @@ the tests at all. >> ## Solution >> One approach we can take is to: >> * comment out the visualise method on `analyse_data()` ->> (as this will cause our test to hang waiting for the result data) ->> * return the data instead, so we can write asserts on the data ->> * See what the calculated value is, and assert that it is the same as the expected value +>> (this will cause our test to hang waiting for the result data) +>> * return the data (instead of plotting it on a graph), so we can write assert statements +>> on the data +>> * see what the calculated result value is, and assert that it is the same as the expected value >> ->> Putting this together, your test may look like: +>> Putting this together, our test may look like: >> >> ```python >> import numpy.testing as npt @@ -129,8 +134,8 @@ the tests at all. >> ``` >> >> Note that while the above test will detect if we accidentally break the analysis code and ->> change the output of the analysis, is not a good or complete test for the following reasons: ->> * It is not at all obvious why the `expected_output` is correct +>> change the output of the analysis, it is still not a complete test for the following reasons: +>> * It is not obvious why the `expected_output` is correct >> * It does not test edge cases >> * If the data files in the directory change - the test will fail >> @@ -144,7 +149,8 @@ Now that we have our regression test for `analyse_data()` in place, we are ready function further. We would like to separate out as much of its code as possible as *pure functions*. Pure functions are very useful and much easier to test as they take input only from its input -parameters and output only via their return values. +parameters, do not modify input data and output only via their return value +(i.e. do not have any side effect of modifying global variables or writing to files). ### Pure Functions From 5f27aafd62d1bc3716c0349734ef407149974c48 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Mon, 25 Mar 2024 21:42:38 +0000 Subject: [PATCH 096/105] Review of refactoring episode --- _episodes/34-code-refactoring.md | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/_episodes/34-code-refactoring.md b/_episodes/34-code-refactoring.md index 060eaabf0..810594027 100644 --- a/_episodes/34-code-refactoring.md +++ b/_episodes/34-code-refactoring.md @@ -11,9 +11,10 @@ objectives: - "Understand the use of pure functions in software design to make the code easier to test." - "Refactor a piece of code to separate out 'pure' from 'impure' code." keypoints: +- "Code refactoring is a technique for improving the structure of existing code." - "Implementing regression tests before refactoring gives you confidence that your changes have not broken the code." -- "Refactoring code into pure functions that process data without side effects makes code easier +- "Using pure functions that process data without side effects whenever possible makes the code easier to read, test and maintain." --- @@ -147,10 +148,7 @@ the tests at all. Now that we have our regression test for `analyse_data()` in place, we are ready to refactor the function further. -We would like to separate out as much of its code as possible as *pure functions*. -Pure functions are very useful and much easier to test as they take input only from its input -parameters, do not modify input data and output only via their return value -(i.e. do not have any side effect of modifying global variables or writing to files). +We would like to separate out as much of its code as possible as *pure functions*. ### Pure Functions @@ -211,10 +209,10 @@ be harder to test but, when simplified like this, may only require a handful of >> while keeping all the logic for reading the data, processing it and showing it in a graph: >>```python >>def analyse_data(data_dir): ->> """Calculate the standard deviation by day between datasets ->> Gets all the inflammation csvs within a directory, works out the mean ->> inflammation value for each day across all datasets, then graphs the ->> standard deviation of these means.""" +>> """Calculates the standard deviation by day between datasets. +>> Gets all the inflammation data from CSV files within a directory, works out the mean +>> inflammation value for each day across all datasets, then visualises the +>> standard deviation of these means on a graph.""" >> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) >> if len(data_file_paths) == 0: >> raise ValueError(f"No inflammation csv's found in path {data_dir}") @@ -293,7 +291,7 @@ testable and maintainable. This is particularly useful for: Until this section, we have mainly been writing procedural code. In the previous episode, we have touched a bit upon classes, encapsulation and polymorphism, -which are characteristics of (but not limited to) the Object Oriented Programming (OOP). +which are characteristics of (but not limited to) the object-oriented programming (OOP). In this episode, we mentioned [pure functions](./index.html#pure-functions) and Functional Programming. @@ -307,9 +305,9 @@ to handle different subtasks. Because of this, it is useful to know about the [major paradigms](/programming-paradigms/index.html), so you can recognise where it might be useful to switch. This is outside of scope of this course - we have some extra episodes on the topics of -[Procedural Programming](/procedural-programming/index.html), -[Functional Programming](/functional-programming/index.html) and -[Object Oriented Programming](/object-oriented-programming/index.html) if you want to know more. +[procedural programming](/procedural-programming/index.html), +[functional programming](/functional-programming/index.html) and +[object-oriented programming](/object-oriented-programming/index.html) if you want to know more. > ## So Which One is Python? > Python is a multi-paradigm and multi-purpose programming language. @@ -328,4 +326,11 @@ This is outside of scope of this course - we have some extra episodes on the top > but producing a new data to reflect the result of the function. {: .callout} +## Software Design and Architecture + +In this section so far we have been talking about **software design** - the individual modules and +components of the software. We are now doing to have a brief look into **software architecture** - +which is about the overall structure that these software components fit into, a *design pattern* +with a common successful use of software components. + {% include links.md %} From ef4af2e6a49bf8cd24093f539057b04f7e26849a Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Tue, 26 Mar 2024 09:38:12 +0000 Subject: [PATCH 097/105] Connected episodes better --- _episodes/33-code-decoupling-abstractions.md | 15 +++++++++++---- _episodes/34-code-refactoring.md | 7 ++++--- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/_episodes/33-code-decoupling-abstractions.md b/_episodes/33-code-decoupling-abstractions.md index cdfe244ed..a20566619 100644 --- a/_episodes/33-code-decoupling-abstractions.md +++ b/_episodes/33-code-decoupling-abstractions.md @@ -13,14 +13,14 @@ objectives: - "Understand the principles of encapsulation, polymorphism and interfaces." - "Use mocks to replace a class in test code." keypoints: -- "Abstractions can hide certain details of the code behind classes and interfaces" -- "Encapsulation bundles data into a structured component along with the methods that operate +- "Code decoupling is separating code into smaller components and reducing the interdependence +between them so that the code is easier to understand, test and maintain." +- "Abstractions can hide certain details of the code behind classes and interfaces." +- "Encapsulation bundles data into a structured component along with methods that operate on the data, and provides a mechanism for restricting access to that data, hiding the internal representation of the component." - "Polymorphism describes the provision of a single interface to entities of different types, or the use of a single symbol to represent different types." -- "Code decoupling is separating code into smaller components and reducing the interdependence -between them so that the code is easier to understand, test and maintain." --- @@ -440,6 +440,13 @@ Now whenever you call `mock_version.method_to_mock()` the return value will be ` > {: .solution} {: .challenge} +## Safe Code Structure Changes + +With the changes to the code structure we have done using code decoupling and abstractions we have +already refactored our code to a certain extent but we have not tested that the changes work as +intended. +We will now look into how to properly refactor code to guarantee that the code still works +as before any modifications. {% include links.md %} diff --git a/_episodes/34-code-refactoring.md b/_episodes/34-code-refactoring.md index 810594027..b8148e9e0 100644 --- a/_episodes/34-code-refactoring.md +++ b/_episodes/34-code-refactoring.md @@ -8,14 +8,15 @@ questions: objectives: - "Employ code refactoring to improve the structure of existing code." - "Understand the use of regressions tests to avoid breaking existing code when refactoring." -- "Understand the use of pure functions in software design to make the code easier to test." +- "Understand the use of pure functions in software design to make the code easier to read, +test amd maintain." - "Refactor a piece of code to separate out 'pure' from 'impure' code." keypoints: - "Code refactoring is a technique for improving the structure of existing code." - "Implementing regression tests before refactoring gives you confidence that your changes have not broken the code." - "Using pure functions that process data without side effects whenever possible makes the code easier -to read, test and maintain." +to understand, test and maintain." --- ## Introduction @@ -329,7 +330,7 @@ This is outside of scope of this course - we have some extra episodes on the top ## Software Design and Architecture In this section so far we have been talking about **software design** - the individual modules and -components of the software. We are now doing to have a brief look into **software architecture** - +components of the software. We are now going to have a brief look into **software architecture** - which is about the overall structure that these software components fit into, a *design pattern* with a common successful use of software components. From 4bc5198b7ab844020cf05878780704497fb5f109 Mon Sep 17 00:00:00 2001 From: Steve Crouch Date: Wed, 27 Mar 2024 09:29:09 +0000 Subject: [PATCH 098/105] #321 - revise software design learning narrative --- _episodes/32-software-design.md | 55 ++++++++++++++++----------------- 1 file changed, 27 insertions(+), 28 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 33b2189ec..05f1bc497 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -182,46 +182,45 @@ calculates and compares standard deviation across all the data by day and finaly > {: .solution} {: .challenge} -## Techniques for Improving Code +## Techniques for Good Code Design -How code is structured is important for helping people who are developing and maintaining it +Once we have a good high-level architectural design, +it's important to follow this philosophy through to the process of developing the code itself, +and there are some key techniques to keep in mind that will help. + +As we've discussed, +how code is structured is important for helping people who are developing and maintaining it to understand and update it. -By breaking down our software into components with a single responsibility, +By breaking down our software into modular components with a single responsibility, we avoid having to rewrite it all when requirements change. -Such components can be as small as a single function, or be a software package in their own right. -These smaller components can be understood individually without having to understand +This also means that these smaller components can be understood individually without having to understand the entire codebase at once. +The following techniques build on this concept of modularity: -### Code Refactoring - -*Code refactoring* is the process of improving the design of an existing code - -changing the internal structure of code without changing its -external behavior, with the goal of making the code more readable, maintainable, efficient or easier -to test. -This can include things such as renaming variables, reorganising -functions to avoid code duplication and increase reuse, and simplifying conditional statements. - -### Code Decoupling - -*Code decoupling* is a code design technique that involves breaking a (complex) -software system into smaller, more manageable parts, and reducing the interdependence -between these different parts of the system. -This means that a change in one part of the code usually does not require a change in the other, -thereby making its development more efficient and less error prone. - -### Code Abstraction - -*Abstraction* is the process of hiding the implementation details of a piece of +- *Abstraction* is the process of hiding the implementation details of a piece of code (typically behind an interface) - i.e. the details of *how* something works are hidden away, leaving code developers to deal only with *what* it does. This allows developers to work with the code at a higher level of abstraction, without needing to understand fully (or keep in mind) all the underlying details at any given time and thereby reducing the cognitive load when programming. - -Abstraction can be achieved through techniques such as *encapsulation*, *inheritance*, and + Abstraction can be achieved through techniques such as *encapsulation*, *inheritance*, and *polymorphism*, which we will explore in the next episodes. There are other [abstraction techniques](https://en.wikipedia.org/wiki/Abstraction_(computer_science)) available too. +- *Code decoupling* is a code design technique that involves breaking a (complex) +software system into smaller, more manageable parts, and reducing the interdependence +between these different parts of the system. +This means that a change in one part of the code usually does not require a change in the other, +thereby making its development more efficient and less error prone. + +- *Code refactoring* is the process of improving the design of an existing code - +changing the internal structure of code without changing its +external behavior, with the goal of making the code more readable, maintainable, efficient or easier +to test. +This can include things such as renaming variables, reorganising +functions to avoid code duplication and increase reuse, and simplifying conditional statements. + + ## Improving Our Software Design Refactoring our code to make it more decoupled and to introduce abstractions to @@ -232,7 +231,7 @@ It will help to keep our codebase clean, modular and easier to understand. Writing good code is hard and takes practise. You may also be faced with an existing piece of code that breaks some (or all) of the good code principles, and your job will be to improve/refactor it so that it can evolve further. -We will now look into some examples of the techniques that can help us redesign our code +We will now look into some examples of these techniques that can help us redesign our code and incrementally improve its quality. {% include links.md %} From f2b74022f6166f9158c5612e1e425e5e036ca070 Mon Sep 17 00:00:00 2001 From: Steve Crouch Date: Wed, 27 Mar 2024 13:52:27 +0000 Subject: [PATCH 099/105] #321 - move software architecture episode to software design, update questions, objectives, keypoints --- _episodes/32-software-design.md | 177 +++++++++++++++++++++++--- _episodes/35-software-architecture.md | 164 +++--------------------- 2 files changed, 171 insertions(+), 170 deletions(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 05f1bc497..366a9a41e 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -1,19 +1,26 @@ --- -title: "Software Design" +title: "Software Architecture and Design" teaching: 25 -exercises: 20 +exercises: 25 questions: - "Why should we invest time in software design?" - "What should we consider when designing software?" +- "What is software architecture?" objectives: +- "List the common aspects of software design." +- "Describe the term technical debt and how it impacts software." - "Understand the goals and principles of designing 'good' software." -- "Understand code decoupling and code abstraction design techniques." -- "Understand what code refactoring is." +- "Use a diagramming technique to describe a software architecture." +- "What are the components of Model-View-Controller (MVC) architecture?" +- "Understand the use of common design patterns to improve the extensibility, reusability and +overall quality of software." +- "List some best practices when designing software." keypoints: - "'Good' code is designed to be maintainable: readable by people who did not author the code, testable through a set of automated tests, adaptable to new requirements." -- "The sooner you adopt a practice of designing your software in the lifecycle of your project, -the easier the development and maintenance process will." +- "Use abstraction and decoupling to logically separate the different aspects of your software within design as well as implementation." +- "Use refactoring to improve existing code to improve it's consistency internally and within its overall architecture." +- "Include software design as a key stage in the lifecycle of your project so that development and maintenance becomes easier." --- ## Introduction @@ -36,14 +43,14 @@ it is not too late to start now. It is not easy to come up with a complete definition for the term **software design**, but some of the common aspects are: -- **Algorithm design** - - what method are we going to use to solve the core research/business problem? - **Software architecture** - what components will the software have and how will they cooperate? - **System architecture** - what other things will this software have to interact with and how will it do this? - **UI/UX** (User Interface / User Experience) - how will users interact with the software? +- **Algorithm design** - + what method are we going to use to solve the core research/business problem? There is literature on each of the above software design aspects - we will not go into details of them all here. @@ -104,16 +111,16 @@ to fix a problem or the software's requirements change. Satisfying the above properties will lead to an overall software design goal of having *maintainable* code, which is: -* *understandable* by developers who did not develop the code, +* **Understandable** by developers who did not develop the code, by having a clear and well-considered high-level design (or *architecture*) that separates out the different components and aspects of its function logically -and in a modular way, and having the interactions between these different parts clear, simple, and sufficiently high-level that they do not contravene this design. - * Moving this forward into implementation, *understandable* would mean being consistent in coding style, using sensible naming conventions for functions, classes and variables, documenting and commenting code, simple control flow, and having small functions and methods focused on single tasks. -* *adaptable* by designing the code to be easily modifiable and extensible to satisfy new requirements, +and in a modular way, and having the interactions between these different parts clear, simple, and sufficiently high-level that they do not contravene this design. This is known as *separation of concerns*, and is a key principle in good software design. + * Moving this forward into implementation, *understandable* would mean being consistent in coding style, using sensible naming conventions for functions, classes and variables, documenting and commenting code, having a simple control flow, and having small functions and methods focused on single tasks. +* **Adaptable** by designing the code to be easily modifiable and extensible to satisfy new requirements, by incorporating points in the modular design where new behaviour can be added in a clear and straightforward manner (e.g. as individual functions in existing modules, or perhaps at a higher-level as plugins). * In an implementation sense, this means writing low-coupled/decoupled code where each part of the code has a separate concern, and has the lowest possible dependency on other parts of the code. This makes it easier to test, update or replace. -* *testable* by designing the code in a sufficiently modular way to make it easier to test the functionality within a modular design, +* **Testable** by designing the code in a sufficiently modular way to make it easier to test the functionality within a modular design, either as a whole or in terms of its individual functions. * This would carry forward in an implementation sense in two ways. Firstly, having functions sufficiently small to be amenable to individual (ideally automated) test cases, e.g. by writing unit, regression tests to verify the code produces the expected outputs from controlled inputs and exhibits the expected behavior over time @@ -150,13 +157,16 @@ This function loads all the data files for a given a directory path, then calculates and compares standard deviation across all the data by day and finaly plots a graph. > ## Exercise: Identifying How Code Can be Improved? +> > Critically examine the code in `analyse_data()` function in `compute_data.py` file. > > In what ways does this code not live up to the ideal properties of 'good' code? > Think about ways in which you find it hard to understand. > Think about the kinds of changes you might want to make to it, and what would > make making those changes challenging. +> >> ## Solution +> >> You may have found others, but here are some of the things that make the code >> hard to read, test and maintain. >> @@ -182,6 +192,139 @@ calculates and compares standard deviation across all the data by day and finaly > {: .solution} {: .challenge} +## Software Architecture + +A software architecture is the fundamental structure of a software system +that is typically decided at the beginning of project development +based on its requirements and is not that easy to change once implemented. +It refers to a "bigger picture" of a software system +that describes high-level components (modules) of the system, what their functionality/roles are +and how they interact. + +The basic idea is you draw boxes that will represent different units of code, as well as +other components of the system (such as users, databases, etc). +Then connect these boxes with lines where information or control will be exchanged. +These lines represent the interfaces in your system. + +As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. +For example, if there is a circular dependency between two sections of the design. +It can also help with estimating how long the work will take, as it forces you to consider all +the components that need to be made. + +Diagrams are not foolproof, but are a great starting point to break down the different +responsibilities and think about the kinds of information different parts of the system will need. + +> ## Exercise: Design a High-Level Architecture for a New Requirement +> +> Sketch out an architectural design for a new feature requested by a user. +> +> *"I want there to be a Google Drive folder such that when I upload new inflammation data to it, +> the software automatically pulls it down and updates the analysis. +> The new result should be added to a database with a timestamp. +> An email should then be sent to a group mailing list notifying them of the change."* +> +> You can draw by hand on a piece of paper or whiteboard, or use an online drawing tool +> such as [Excalidraw](https://excalidraw.com/). +> +>> ## Solution +>> +>> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg){: width="600px" } +> {: .solution} +{: .challenge} + +We have been developing our software using the **Model-View-Controller** (MVC) architecture, +but, MVC is just one of the common [software architectural patterns](/software-architecture-extra/index.html)) +and is not the only choice we could have made. + +### Model-View-Controller (MVC) Architecture + +MVC architecture divides the related program logic into three interconnected components or modules: + +- **Model** (data) +- **View** (client interface), and +- **Controller** (processes that handle input/output and manipulate the data). + +The *Model* represents the data used by a program and also contains operations/rules +for manipulating and changing the data in the model. +This may be a database, a file, a single data object or a series of objects - +for example a table representing patients' data. + +The *View* is the means of displaying data to users/clients within an application +(i.e. provides visualisation of the state of the model). +For example, displaying a window with input fields and buttons (Graphical User Interface, GUI) +or textual options within a command line (Command Line Interface, CLI) are examples of Views. +They include anything that the user can see from the application. +While building GUIs is not the topic of this course, +we do cover building CLIs (handling command line arguments) in Python to a certain extent. + +The *Controller* manipulates both the Model and the View. +It accepts input from the View +and performs the corresponding action on the Model (changing the state of the model) +and then updates the View accordingly. +For example, on user request, +Controller updates a picture on a user's GitHub profile +and then modifies the View by displaying the updated profile back to the user. + +### Limitations to Architectural Design + +Note, however, there are limits to everything - and MVC architecture is no exception. +The Controller often transcends into the Model and View, +and a clear separation is sometimes difficult to maintain. +For example, the Command Line Interface provides both the View +(what user sees and how they interact with the command line) +and the Controller (invoking of a command) aspects of a CLI application. +In Web applications, Controller often manipulates the data (received from the Model) +before displaying it to the user or passing it from the user to the Model. + +There are many variants of an MVC-like pattern +(such as [Model-View-Presenter](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93presenter) (MVP), +[Model-View-Viewmodel](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel) (MVVM), etc.), +where the Controller role is handled slightly differently, +but in most cases, the distinction between these patterns is not particularly important. +What really matters is that we are making conscious decisions about the architecture of our software +that suit the way in which we expect to use it. +We should reuse and be consistent with these established ideas where we can, +but we do not need to stick to them exactly. + +The key thing to take away is the distinction between the Model and the View code, while +the View and the Controller can be more or less coupled together (e.g. the code that specifies +there is a button on the screen, might be the same code that specifies what that button does). +The View may be hard to test, or use special libraries to draw the UI, but should not contain any +complex logic, and is really just a presentation layer on top of the Model. +The Model, conversely, should not care how the data is displayed. +For example, the View may present dates as "Monday 24th July 2023", +but the Model stores it using a `Date` object rather than its string representation. + +> ## Reusable "Patterns" of Architecture +> +> [Architectural]((https://www.redhat.com/architect/14-software-architecture-patterns)) and +> [programming patterns](https://refactoring.guru/design-patterns/catalog) are reusable templates for +> software systems and code that provide solutions for some common software design challenges. +> MVC is one architectural pattern. +> Patterns are a useful starting point for how to design your software and also provide +> a common vocabulary for discussing software designs with other developers. +> They may not always provide a full design solution as some problems may require +> a bespoke design that maps cleanly on to the specific problem you are trying to solve. +{: .callout} + +### Architectural Design Guidelines + +Creating good software architecture is not about applying any rules or patterns blindly, +but instead practise and taking care to: + +* Discuss design with your colleagues before writing the code. +* Separate different concerns into different sections of the code. +* Avoid duplication of code or data. +* Keep how much a person has to understand at once to a minimum. +* Try not to have too many abstractions (if you have to jump around a lot when reading the +code that is a clue that your code may be too abstract). +* Think about how interfaces will work (?). +* Not try to design a future-proof solution or to anticipate future requirements or adaptations +of the software - design the simplest solution that solves the problem at hand. +* (When working on a less well-structured part of the code), start by refactoring it so that your +change fits in cleanly. +* Try to leave the code in a better state that you found it. + ## Techniques for Good Code Design Once we have a good high-level architectural design, @@ -220,14 +363,6 @@ to test. This can include things such as renaming variables, reorganising functions to avoid code duplication and increase reuse, and simplifying conditional statements. - -## Improving Our Software Design - -Refactoring our code to make it more decoupled and to introduce abstractions to -hide all but the relevant information about parts of the code is important for creating more -maintainable code. -It will help to keep our codebase clean, modular and easier to understand. - Writing good code is hard and takes practise. You may also be faced with an existing piece of code that breaks some (or all) of the good code principles, and your job will be to improve/refactor it so that it can evolve further. diff --git a/_episodes/35-software-architecture.md b/_episodes/35-software-architecture.md index e96b0d8e0..3a41acc94 100644 --- a/_episodes/35-software-architecture.md +++ b/_episodes/35-software-architecture.md @@ -1,97 +1,25 @@ --- -title: "Software Architecture" +title: "Software Architecture Revisited" teaching: 15 -exercises: 50 +exercises: 30 questions: -- "What is software architecture?" -- "What are components of Model-View-Controller (MVC) architecture?" +- "How do we handle code contributions that don't fit within our existing architecture?" objectives: -- "Understand the use of common design patterns to improve the extensibility, reusability and -overall quality of software." -- "List some best practices when designing software." +- "Analyse new code to identify Model, View, Controller aspects." +- "Refactor new code to conform to an MVC architecture." +- "Adapt our existing code to include the new re-architected code." keypoints: +- "Sometimes new, contributed code needs refactoring for it to fit within an existing codebase." - "Try to leave the code in a better state that you found it." --- +In the previous few episodes we've looked at the importance and principles of good software architecture and design, +and how techniques such as code abstraction and refactoring fulfil that design within an implementation, +and help us maintain and improve it as our code evolves. -## Software Architecture - -A software architecture is the fundamental structure of a software system -that is typically decided at the beginning of project development -based on its requirements and is not that easy to change once implemented. -It refers to a "bigger picture" of a software system -that describes high-level components (modules) of the system, what their functionality/roles are -and how they interact. - -There are various [software architectures](/software-architecture-extra/index.html) around defining different ways of -dividing the code into smaller modules with well defined roles that are outside the scope of -this course. -We have been developing our software using the **Model-View-Controller** (MVC) architecture, -but, MVC is just one of the common architectural patterns -and is not the only choice we could have made. - -### Model-View-Controller (MVC) Architecture -MVC architecture divides the related program logic -into three interconnected modules: - -- **Model** (data) -- **View** (client interface), and -- **Controller** (processes that handle input/output and manipulate the data). - -Model represents the data used by a program and also contains operations/rules -for manipulating and changing the data in the model. -This may be a database, a file, a single data object or a series of objects - -for example a table representing patients' data. - -View is the means of displaying data to users/clients within an application -(i.e. provides visualisation of the state of the model). -For example, displaying a window with input fields and buttons (Graphical User Interface, GUI) -or textual options within a command line (Command Line Interface, CLI) are examples of Views. -They include anything that the user can see from the application. -While building GUIs is not the topic of this course, -we do cover building CLIs (handling command line arguments) in Python to a certain extent. - -Controller manipulates both the Model and the View. -It accepts input from the View -and performs the corresponding action on the Model (changing the state of the model) -and then updates the View accordingly. -For example, on user request, -Controller updates a picture on a user's GitHub profile -and then modifies the View by displaying the updated profile back to the user. - -### Separation of Responsibilities - -Separation of responsibilities is important when designing software architectures -in order to reduce the code's complexity and increase its maintainability. -Note, however, there are limits to everything - -and MVC architecture is no exception. -Controller often transcends into Model and View -and a clear separation is sometimes difficult to maintain. -For example, the Command Line Interface provides both the View -(what user sees and how they interact with the command line) -and the Controller (invoking of a command) aspects of a CLI application. -In Web applications, Controller often manipulates the data (received from the Model) -before displaying it to the user or passing it from the user to the Model. - -There are many variants of an MVC-like pattern -(such as [Model-View-Presenter](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93presenter) (MVP), -[Model-View-Viewmodel](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel) (MVVM), etc.), -where the Controller role is handled slightly differently, -but in most cases, the distinction between these patterns is not particularly important. -What really matters is that we are making conscious decisions about the architecture of our software -that suit the way in which we expect to use it. -We should reuse these established ideas where we can, but we do not need to stick to them exactly. - -The key thing to take away is the distinction between the Model and the View code, while -the View and the Controller can be more or less coupled together (e.g. the code that specifies -there is a button on the screen, might be the same code that specifies what that button does). -The View may be hard to test, or use special libraries to draw the UI, but should not contain any -complex logic, and is really just a presentation layer on top of the Model. -The Model, conversely, should not care how the data is displayed. -For example, the View may present dates as "Monday 24th July 2023", -but the Model stores it using a `Date` object rather than its string representation. - -## Our Project's Architecture (Revisited) +Let us now return to software architecture and consider how we may refactor some new code to fit within our existing MVC architectural design using the techniques we have learnt so far. + +## Revisiting Our Software's Architecture Recall that in our software project, the **Controller** module is in `inflammation-analysis.py`, and the View and Model modules are contained in @@ -368,74 +296,12 @@ by setting `required = True` within the `add_argument()` command. in the usage section of the help page. {: .callout} -## Architecting Software - -When designing a new software application, or making a substantial change to an existing one, -it can be really helpful to sketch out the intended architecture. -The basic idea is you draw boxes that will represent different units of code, as well as -other components of the system (such as users, databases, etc). -Then connect these boxes with lines where information or control will be exchanged. -These lines represent the interfaces in your system. - -As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. -For example, if there is a circular dependency between two sections of the design. -It can also help with estimating how long the work will take, as it forces you to consider all -the components that need to be made. - -Diagrams are not foolproof, but are a great starting point to break down the different -responsibilities and think about the kinds of information different parts of the system will need. - -> ## Exercise: Design a High-Level Architecture for a New Requirement -> Sketch out an architectural design for a new feature requested by a user. -> -> *"I want there to be a Google Drive folder such that when I upload new inflammation data to it, -> the software automatically pulls it down and updates the analysis. -> The new result should be added to a database with a timestamp. -> An email should then be sent to a group mailing list notifying them of the change."* -> -> You can draw by hand on a piece of paper or whiteboard, or use an online drawing tool -> such as [Excalidraw](https://excalidraw.com/). ->> ## Solution ->> ->> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg){: width="600px" } -> {: .solution} -{: .challenge} - -### Architectural & Programming Patterns - -[Architectural]((https://www.redhat.com/architect/14-software-architecture-patterns)) and -[programming patterns](https://refactoring.guru/design-patterns/catalog) are reusable templates for -software systems and code that provide solutions for some common software design challenges. -MVC is one architectural pattern. -Patterns are a useful starting point for how to design your software and also provide -a common vocabulary for discussing software designs with other developers. -They may not always provide a full design solution as some problems may require -a bespoke design that maps cleanly on to the specific problem you are trying to solve. - -### Design Guidelines - -Creating good software architecture is not about applying any rules or patterns blindly, -but instead practise and taking care to: - -* Discuss design with your colleagues before writing the code. -* Separate different concerns into different sections of the code. -* Avoid duplication of code or data. -* Keep how much a person has to understand at once to a minimum. -* Try not to have too many abstractions (if you have to jump around a lot when reading the -code that is a clue that your code may be too abstract). -* Think about how interfaces will work (?). -* Not try to design a future-proof solution or to anticipate future requirements or adaptations -of the software - design the simplest solution that solves the problem at hand. -* (When working on a less well-structured part of the code), start by refactoring it so that your -change fits in cleanly. -* Try to leave the code in a better state that you found it. - ### Additional Reading Material & References -Now that we have covered the basics of [software architecture](/software-architecture-extra/index.html) +Now that we have covered and revisited [software architecture](/software-architecture-extra/index.html) and [different programming paradigms](/programming-paradigms/index.html) -and how we can integrate them into our multi-layer architecture, +and how we can integrate them into our architecture, there are two optional extra episodes which you may find interesting. Both episodes cover the persistence layer of software architectures From a27bcb3fcaf3d84b8384c56e5e6026df5a010315 Mon Sep 17 00:00:00 2001 From: Steve Crouch Date: Wed, 27 Mar 2024 14:40:41 +0000 Subject: [PATCH 100/105] Update _episodes/32-software-design.md Co-authored-by: Aleksandra Nenadic --- _episodes/32-software-design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 366a9a41e..706d6c9e9 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -7,7 +7,7 @@ questions: - "What should we consider when designing software?" - "What is software architecture?" objectives: -- "List the common aspects of software design." +- "List the common aspects of software architecture and design." - "Describe the term technical debt and how it impacts software." - "Understand the goals and principles of designing 'good' software." - "Use a diagramming technique to describe a software architecture." From f4d0bc29c79810e0a47e4504bbd4a2b03fc5e67e Mon Sep 17 00:00:00 2001 From: Steve Crouch Date: Wed, 27 Mar 2024 14:40:56 +0000 Subject: [PATCH 101/105] Update _episodes/32-software-design.md Co-authored-by: Aleksandra Nenadic --- _episodes/32-software-design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 706d6c9e9..db88dc787 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -19,7 +19,7 @@ keypoints: - "'Good' code is designed to be maintainable: readable by people who did not author the code, testable through a set of automated tests, adaptable to new requirements." - "Use abstraction and decoupling to logically separate the different aspects of your software within design as well as implementation." -- "Use refactoring to improve existing code to improve it's consistency internally and within its overall architecture." +- "Use refactoring to improve existing code to improve its consistency internally and within its overall architecture." - "Include software design as a key stage in the lifecycle of your project so that development and maintenance becomes easier." --- From 2f4a1155f12c764b3a2b9ab31d3a7717fe478eec Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Wed, 27 Mar 2024 15:12:21 +0000 Subject: [PATCH 102/105] Update _episodes/34-code-refactoring.md Co-authored-by: Steve Crouch --- _episodes/34-code-refactoring.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/34-code-refactoring.md b/_episodes/34-code-refactoring.md index b8148e9e0..53e8865da 100644 --- a/_episodes/34-code-refactoring.md +++ b/_episodes/34-code-refactoring.md @@ -57,7 +57,7 @@ to write tests in the future, how can we write tests before doing the refactorin The tricks to get around this trap are: * test at a higher level, with coarser accuracy, and - * write tests that you intend to remove. + * write tests that you intend to replace or remove. The best tests are the ones that test a single bit of functionality rigorously. However, with our current `analyse_data()` code that is not possible because it is a From b1345a62b8df66d405a7d05a05107d28ba529228 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Wed, 27 Mar 2024 15:13:02 +0000 Subject: [PATCH 103/105] Update _episodes/33-code-decoupling-abstractions.md Co-authored-by: Steve Crouch --- _episodes/33-code-decoupling-abstractions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-code-decoupling-abstractions.md b/_episodes/33-code-decoupling-abstractions.md index a20566619..0a49e8652 100644 --- a/_episodes/33-code-decoupling-abstractions.md +++ b/_episodes/33-code-decoupling-abstractions.md @@ -56,7 +56,7 @@ Let's start redesigning our code by introducing some of the abstraction techniqu to incrementally decouple it into smaller components to improve its overall design. In the code from our current branch `full-data-analysis`, -you may have noticed that loading data from CSV files from a `data` directory is "baked" into +you may have noticed that loading data from CSV files from a `data` directory is "hardcoded" into (i.e. is part of) the `analyse_data()` function. Data loading is a functionality separate from data analysis, so firstly let's decouple the data loading part into a separate component (function). From 2c3b5acbaacc98427c8ea248fc291cbfe1115be1 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Wed, 27 Mar 2024 15:13:42 +0000 Subject: [PATCH 104/105] Update _episodes/33-code-decoupling-abstractions.md Co-authored-by: Steve Crouch --- _episodes/33-code-decoupling-abstractions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_episodes/33-code-decoupling-abstractions.md b/_episodes/33-code-decoupling-abstractions.md index 0a49e8652..e2244b6b9 100644 --- a/_episodes/33-code-decoupling-abstractions.md +++ b/_episodes/33-code-decoupling-abstractions.md @@ -57,7 +57,7 @@ to incrementally decouple it into smaller components to improve its overall desi In the code from our current branch `full-data-analysis`, you may have noticed that loading data from CSV files from a `data` directory is "hardcoded" into -(i.e. is part of) the `analyse_data()` function. +the `analyse_data()` function. Data loading is a functionality separate from data analysis, so firstly let's decouple the data loading part into a separate component (function). From db5837bd3ed8a2915bec612e26b38b26c8069663 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Wed, 27 Mar 2024 15:30:15 +0000 Subject: [PATCH 105/105] Fixes as per Steve's review on PR #327. --- _episodes/33-code-decoupling-abstractions.md | 7 ++++--- _episodes/34-code-refactoring.md | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/_episodes/33-code-decoupling-abstractions.md b/_episodes/33-code-decoupling-abstractions.md index e2244b6b9..b3b89171d 100644 --- a/_episodes/33-code-decoupling-abstractions.md +++ b/_episodes/33-code-decoupling-abstractions.md @@ -176,8 +176,8 @@ In addition, implementation of the method `get_area()` is hidden too (abstractio {: .callout} > ## Exercise: Use Classes to Abstract out Data Loading -> Declare a new class `CSVDataSource` that contains the `load_inflammation_data` function -> we wrote in the previous exercise as a method of this class. +> Inside `compute_data.py`, declare a new class `CSVDataSource` that contains the +> `load_inflammation_data()` function we wrote in the previous exercise as a method of this class. > The directory path where to load the files from should be passed in the class' constructor method. > Finally, construct an instance of the class `CSVDataSource` outside the statistical > analysis and pass it to `analyse_data()` function. @@ -261,7 +261,8 @@ For example, our `Circle` class implicitly has an interface - you can call `get_ on it and it will return a number representing its surface area. > ## Exercise: Identify an Interface Between `CSVDataSource` and `analyse_data` -> What is the interface between CSVDataSource class and `analyse_data()` function. +> What would you say is the interface between the CSVDataSource class +> and `analyse_data()` function? > Think about what functions `analyse_data()` needs to be able to call to perform its duty, > what parameters they need and what they return. >> ## Solution diff --git a/_episodes/34-code-refactoring.md b/_episodes/34-code-refactoring.md index 53e8865da..cc6d25833 100644 --- a/_episodes/34-code-refactoring.md +++ b/_episodes/34-code-refactoring.md @@ -77,7 +77,7 @@ This kind of testing is called **regression testing** as we are testing for regressions in existing behaviour. Refactoring code is not meant to change its behaviour, but sometimes to make it possible to verify -you not changing the important behaviour you have to make small tweaks to the code to write +you are not changing the important behaviour you have to make small tweaks to the code to write the tests at all. > ## Exercise: Write Regression Tests