diff --git a/_episodes/15-coding-conventions.md b/_episodes/15-coding-conventions.md index 3b94d62ae..0404f1233 100644 --- a/_episodes/15-coding-conventions.md +++ b/_episodes/15-coding-conventions.md @@ -438,7 +438,7 @@ because an incorrect comment causes more confusion than no comment at all. >> which is helpfully marking inconsistencies with coding guidelines by underlying them. >> There are a few things to fix in `inflammation-analysis.py`, for example: >> ->> 1. Line 24 in `inflammation-analysis.py` is too long and not very readable. +>> 1. Line 30 in `inflammation-analysis.py` is too long and not very readable. >> A better style would be to use multiple lines and hanging indent, >> with the closing brace `}' aligned either with >> the first non-whitespace character of the last line of list @@ -487,7 +487,7 @@ because an incorrect comment causes more confusion than no comment at all. >> Note how PyCharm is warning us by underlying the whole line. >> >> 4. Only one blank line after the end of definition of function `main` ->> and the rest of the code on line 30 in `inflammation-analysis.py` - +>> and the rest of the code on line 33 in `inflammation-analysis.py` - >> should be two blank lines. >> Note how PyCharm is warning us by underlying the whole line. >> diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index 2bc022d39..5bfdb39f1 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -131,15 +131,10 @@ within the context of the typical software development process: - How requirements inform and drive the **design of software**, the importance, role, and examples of **software architecture**, and the ways we can describe a software design. -- **Implementation choices** in terms of **programming paradigms**, - looking at **procedural**, **functional**, and **object oriented** paradigms of development. - Modern software will often contain instances of multiple paradigms, - so it is worthwhile being familiar with them and knowing when - to switch in order to make better code. -- How you can (and should) assess and update a software's architecture when - requirements change and complexity increases - - is the architecture still fit for purpose, - or are modifications and extensions becoming increasingly difficult to make? +- How to improve existing code to be more readable, maintainable and testable. +- Consider different strategies for writing well designed code, including + using **pure functions**, **classes** and **abstractions**. +- How to create, assess and improve software design. {% include links.md %} diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 18dbe2ae7..145a69b8c 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -1,264 +1,169 @@ --- title: "Software Architecture and Design" -teaching: 15 -exercises: 30 +teaching: 25 +exercises: 20 questions: - "What should we consider when designing software?" -- "How can we make sure the components of our software are reusable?" +- "What goals should we have when structuring our code?" +- "What is refactoring?" objectives: -- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." -- "Understand the components of multi-layer software architectures." +- "Know what goals we have when architecting and designing software." +- "Understand what an abstraction is, and when you should use one." +- "Understand what refactoring is." keypoints: -- "Planning software projects in advance can save a lot of effort and reduce 'technical debt' later - even a partial plan is better than no plan at all." +- "How code is structured is important for helping future people understand and update it" - "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. Such components can be as small as a single function, or be a software package in their own right." +- "These smaller components can be understood individually without having to understand the entire codebase at once." - "When writing software used for research, requirements will almost *always* change." - "*'Good code is written so that is readable, understandable, covered by automated tests, not over complicated and does well what is intended to do.'*" --- ## Introduction -In this episode, we'll be looking at how we can design our software -to ensure it meets the requirements, -but also retains the other qualities of good software. -As a piece of software grows, -it will reach a point where there's too much code for us to keep in mind at once. -At this point, it becomes particularly important that the software be designed sensibly. -What should be the overall structure of our software, -how should all the pieces of functionality fit together, -and how should we work towards fulfilling this overall design throughout development? - -It's not easy to come up with a complete definition for the term **software design**, -but some of the common aspects are: - -- **Algorithm design** - - what method are we going to use to solve the core business problem? -- **Software architecture** - - what components will the software have and how will they cooperate? -- **System architecture** - - what other things will this software have to interact with and how will it do this? -- **UI/UX** (User Interface / User Experience) - - how will users interact with the software? - -As usual, the sooner you adopt a practice in the lifecycle of your project, the easier it will be. -So we should think about the design of our software from the very beginning, -ideally even before we start writing code - -but if you didn't, it's never too late to start. - - -The answers to these questions will provide us with some **design constraints** -which any software we write must satisfy. -For example, a design constraint when writing a mobile app would be -that it needs to work with a touch screen interface - -we might have some software that works really well from the command line, -but on a typical mobile phone there isn't a command line interface that people can access. - - -## Software Architecture - -At the beginning of this episode we defined **software architecture** -as an answer to the question -"what components will the software have and how will they cooperate?". -Software engineering borrowed this term, and a few other terms, -from architects (of buildings) as many of the processes and techniques have some similarities. -One of the other important terms we borrowed is 'pattern', -such as in **design patterns** and **architecture patterns**. -This term is often attributed to the book -['A Pattern Language' by Christopher Alexander *et al.*](https://en.wikipedia.org/wiki/A_Pattern_Language) -published in 1977 -and refers to a template solution to a problem commonly encountered when building a system. - -Design patterns are relatively small-scale templates -which we can use to solve problems which affect a small part of our software. -For example, the **[adapter pattern](https://en.wikipedia.org/wiki/Adapter_pattern)** -(which allows a class that does not have the "right interface" to be reused) -may be useful if part of our software needs to consume data -from a number of different external data sources. -Using this pattern, -we can create a component whose responsibility is -transforming the calls for data to the expected format, -so the rest of our program doesn't have to worry about it. - -Architecture patterns are similar, -but larger scale templates which operate at the level of whole programs, -or collections or programs. -Model-View-Controller (which we chose for our project) is one of the best known architecture patterns. -Many patterns rely on concepts from Object Oriented Programming, -so we'll come back to the MVC pattern shortly -after we learn a bit more about Object Oriented Programming. - -There are many online sources of information about design and architecture patterns, -often giving concrete examples of cases where they may be useful. -One particularly good source is [Refactoring Guru](https://refactoring.guru/design-patterns). - - -### Multilayer Architecture - -One common architectural pattern for larger software projects is **Multilayer Architecture**. -Software designed using this architecture pattern is split into layers, -each of which is responsible for a different part of the process of manipulating data. - -Often, the software is split into three layers: - -- **Presentation Layer** - - This layer is responsible for managing the interaction between - our software and the people using it - - May include the **View** components if also using the MVC pattern -- **Application Layer / Business Logic Layer** - - This layer performs most of the data processing required by the presentation layer - - Likely to include the **Controller** components if also using an MVC pattern - - May also include the **Model** components -- **Persistence Layer / Data Access Layer** - - This layer handles data storage and provides data to the rest of the system - - May include the **Model** components of an MVC pattern - if they're not in the application layer - -Although we've drawn similarities here between the layers of a system and the components of MVC, -they're actually solutions to different scales of problem. -In a small application, a multilayer architecture is unlikely to be necessary, -whereas in a very large application, -the MVC pattern may be used just within the presentation layer, -to handle getting data to and from the people using the software. - -## Addressing New Requirements - -So, let's assume we now want to extend our application - -designed around an MVC architecture - with some new functionalities -(more statistical processing and a new view to see a patient's data). -Let's recall the solution requirements we discussed in the previous episode: - -- *Functional Requirements*: - - SR1.1.1 (from UR1.1): - add standard deviation to data model and include in graph visualisation view - - SR1.2.1 (from UR1.2): - add a new view to generate a textual representation of statistics, - which is invoked by an optional command line argument -- *Non-functional Requirements*: - - SR2.1.1 (from UR2.1): - generate graphical statistics report on clinical workstation configuration in under 30 seconds - -### How Should We Test These Requirements? - -Sometimes when we make changes to our code that we plan to test later, -we find the way we've implemented that change doesn't lend itself well to how it should be tested. -So what should we do? - -Consider requirement SR1.2.1 - -we have (at least) two things we should test in some way, -for which we could write unit tests. -For the textual representation of statistics, -in a unit test we could invoke our new view function directly -with known inflammation data and test the text output as a string against what is expected. -The second one, invoking this new view with an optional command line argument, -is more problematic since the code isn't structured in a way where -we can easily invoke the argument parsing portion to test it. -To make this more amenable to unit testing we could -move the command line parsing portion to a separate function, -and use that in our unit tests. -So in general, it's a good idea to make sure -your software's features are modularised and accessible via logical functions. - -We could also consider writing unit tests for SR2.1.1, -ensuring that the system meets our performance requirement, so should we? -We do need to verify it's being met with the modified implementation, -however it's generally considered bad practice to use unit tests for this purpose. -This is because unit tests test *if* a given aspect is behaving correctly, -whereas performance tests test *how efficiently* it does it. -Performance testing produces measurements of performance which require a different kind of analysis -(using techniques such as [*code profiling*](https://towardsdatascience.com/how-to-assess-your-code-performance-in-python-346a17880c9f)), -and require careful and specific configurations of operating environments to ensure fair testing. -In addition, unit testing frameworks are not typically designed for conducting such measurements, -and only test units of a system, -which doesn't give you an idea of performance of the system -as it is typically used by stakeholders. - -The key is to think about which kind of testing should be used -to check if the code satisfies a requirement, -but also what you can do to make that code amenable to that type of testing. - -> ## Exercise: Implementing Requirements -> Pick one of the requirements SR1.1.1 or SR1.2.1 above to implement -> and create an appropriate feature branch - -> e.g. `add-std-dev` or `add-view` from your most up-to-date `develop` branch. -> -> One aspect you should consider first is -> whether the new requirement can be implemented within the existing design. -> If not, how does the design need to be changed to accommodate the inclusion of this new feature? -> Also try to ensure that the changes you make are amenable to unit testing: -> is the code suitably modularised -> such that the aspect under test can be easily invoked -> with test input data and its output tested? -> -> If you have time, feel free to implement the other requirement, or invent your own! -> -> Also make sure you push changes to your new feature branch remotely -> to your software repository on GitHub. -> -> **Note: do not add the tests for the new feature just yet - -> even though you would normally add the tests along with the new code, -> we will do this in a later episode. -> Equally, do not merge your changes to the `develop` branch just yet.** +Typically when we start writing code, we write small scripts that +we intend to use. +We probably don't imagine we will need to change the code in the future. +We almost certainly don't expect other people will need to understand +and modify the code in the future. +However, as projects grow in complexity and the number of people involved grows, +it becomes important to think about how to structure code. +Software Architecture and Design is all about thinking about ways to make the +code be **maintainable** as projects grow. + +Maintainable code is: + + * Readable to people who didn't write the code. + * Testable through automated tests (like those from [episode 2](../21-automatically-testing-software/index.html)). + * Adaptable to new requirements. + +Writing code that meets these requirements is hard and takes practice. +Further, in most contexts you will already have a piece of code that breaks +some (or maybe all!) of these principles. + +> ## Group Exercise: Think about examples of good and bad code +> Try to come up with examples of code that has been hard to understand - why? > -> **Note 2: we have intentionally left this exercise without a solution -> to give you more freedom in implementing it how you see fit. -> If you are struggling with adding a new view and command line parameter, -> you may find the standard deviation requirement easier. -> A later episode in this section will look at -> how to handle command line parameters in a scalable way.** +> Try to come up with examples of code that was easy to understand and modify - why? {: .challenge} -## Best Practices for 'Good' Software Design - -Aspirationally, what makes good code can be summarised in the following quote from the -[Intent HG blog](https://intenthq.com/blog/it-audience/what-is-good-code-a-scientific-definition/): - -> *“Good code is written so that is readable, understandable, -> covered by automated tests, not over complicated -> and does well what is intended to do.”* - -By taking time to design our software to be easily modifiable and extensible, -we can save ourselves a lot of time later when requirements change. -The sooner we do this the better - -ideally we should have at least a rough design sketched out for our software -before we write a single line of code. -This design should be based around the structure of the problem we're trying to solve: -what are the concepts we need to represent -and what are the relationships between them. -And importantly, who will be using our software and how will they interact with it? - -Here's another way of looking at it. - -Not following good software design and development practices -can lead to accumulated 'technical debt', -which (according to [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt)), -is the "cost of additional rework caused by choosing an easy (limited) solution now -instead of using a better approach that would take longer". -So, the pressure to achieve project goals can sometimes lead to quick and easy solutions, -which make the software become -more messy, more complex, and more difficult to understand and maintain. -The extra effort required to make changes in the future is the interest paid on the (technical) debt. -It's natural for software to accrue some technical debt, -but it's important to pay off that debt during a maintenance phase - -simplifying, clarifying the code, making it easier to understand - -to keep these interest payments on making changes manageable. -If this isn't done, the software may accrue too much technical debt, -and it can become too messy and prohibitive to maintain and develop, -and then it cannot evolve. - -Importantly, there is only so much time available. -How much effort should we spend on designing our code properly -and using good development practices? -The following [XKCD comic](https://xkcd.com/844/) summarises this tension: - -![Writing good code comic](../fig/xkcd-good-code-comic.png){: .image-with-shadow width="400px" } - -At an intermediate level there are a wealth of practices that *could* be used, -and applying suitable design and coding practices is what separates -an *intermediate developer* from someone who has just started coding. -The key for an intermediate developer is to balance these concerns -for each software project appropriately, -and employ design and development practices *enough* so that progress can be made. -It's very easy to under-design software, -but remember it's also possible to over-design software too. +In this episode we will explore techniques and processes that can help you +continuously improve the quality of code so, over time, it tends towards more +maintainable code. + +We will look at: + + * What abstractions are, and how to pick appropriate ones. + * How to take code that is in a bad shape and improve it. + * Best practices to write code in ways that facilitate achieving these goals. + +### Cognitive Load + +When we are trying to understand a piece of code, in our heads we are storing +what the different variables mean and what the lines of code will do. +**Cognitive load** is a way of thinking about how much information we have to store in our +heads to understand a piece of code. + +The higher the cognitive load, the harder it is to understand the code. +If it is too high, we might have to create diagrams to help us hold it all in our head +or we might just decide we can't understand it. + +There are lots of ways to keep cognitive load down: + +* Good variable and function names +* Simple control flow +* Having each function do just one thing + +## Abstractions + +An **abstraction**, at its most basic level, is a technique to hide the details +of one part of a system from another part of the system. +We deal with abstractions all the time - when you press the brake pedal on the +car, you do not know how this manages both slowing down the engine and applying +pressure on the brakes. +The advantage of using this abstraction is, when something changes, for example +the introduction of anti-lock braking or an electric engine, the driver does +not need to do anything differently - +the detail of how the car brakes is *abstracted* away from them. + +Abstractions are a fundamental part of software. +For example, when you write Python code, you are dealing with an +abstraction of the computer. +You don't need to understand how RAM functions. +Instead, you just need to understand how variables work in Python. + +In large projects it is vital to come up with good abstractions. +A good abstraction makes code easier to read, as the reader doesn't need to understand +all the details of the project to understand one part. +An abstraction lowers the cognitive load of a bit of code, +as there is less to understand at once. + +A good abstraction makes code easier to test, as it can be tested in isolation +from everything else. + +Finally, a good abstraction makes code easier to adapt, as the details of +how a subsystem *used* to work are hidden from the user, so when they change, +the user doesn't need to know. + +In this episode we are going to look at some code and introduce various +different kinds of abstraction. +However, fundamentally any abstraction should be serving these goals. + +## Refactoring + +Often we are not working on brand new projects, but instead maintaining an existing +piece of software. +Often, this piece of software will be hard to maintain, perhaps because it has hard to understand, or doesn't have any tests. +In this situation, we want to adapt the code to make it more maintainable. +This will allow greater confidence of the code, as well as making future development easier. + +**Refactoring** is a process where some code is modified, such that its external behaviour remains +unchanged, but the code itself is easier to read, test and extend. + +When faced with a old piece of code that is hard to work with, that you need to modify, a good process to follow is: + +1. Have tests that verify the current behaviour +2. Refactor the code in such a way that the new change will slot in cleanly. +3. Make the desired change, which now fits in easily. + +Notice, after step 2, the *behaviour* of the code should be totally identical. +This allows you to test rigorously that the refactoring hasn't changed/broken anything +*before* making the intended change. + +In this episode, we will be making some changes to an existing bit of code that +is in need of refactoring. + +## The code for this episode + +The code itself is a feature to the inflammation tool we've been working on. + +In it, if the user adds `--full-data-analysis` then the program will scan the directory +of one of the provided files, compare standard deviations across the data by day and +plot a graph. + +The main body of it exists in `inflammation/compute_data.py` in a function called `analyse_data`. + +We are going to be refactoring and extending this over the remainder of this episode. + +> ## Group Exercise: What is bad about this code? +> In what ways does this code not live up to the ideal properties of maintainable code? +> Think about ways in which you find it hard to understand. +> Think about the kinds of changes you might want to make to it, and what would +> make making those changes challenging. +>> ## Solution +>> You may have found others, but here are some of the things that make the code +>> hard to read, test and maintain: +>> +>> * **Hard to read:** Everything is in a single function - reading it you have to understand how the file loading +works at the same time as the analysis itself. +>> * **Hard to modify:** If you want to use the data without using the graph you'd have to change it +>> * **Hard to modify or test:** It is always analysing a fixed set of data stored on the disk +>> * **Hard to modify:** It doesn't have any tests meaning changes might break something +>> +>> Keep the list you have created. +>> At the end of this section we will revisit this +>> and check that we have learnt ways to address the problems we found. +> {: .solution} +{: .challenge} {% include links.md %} diff --git a/_episodes/33-programming-paradigms.md b/_episodes/33-programming-paradigms.md deleted file mode 100644 index 520708b54..000000000 --- a/_episodes/33-programming-paradigms.md +++ /dev/null @@ -1,175 +0,0 @@ ---- -title: "Programming Paradigms" -start: false -teaching: 10 -exercises: 0 -questions: -- "How does the structure of a problem affect the structure of our code?" -- "How can we use common software paradigms to improve the quality of our software?" -objectives: -- "Describe some of the major software paradigms we can use to classify programming languages." -keypoints: -- "A software paradigm describes a way of structuring or reasoning about code." -- "Different programming languages are suited to different paradigms." -- "Different paradigms are suited to solving different classes of problems." -- "A single piece of software will often contain instances of multiple paradigms." ---- - -## Introduction - -As you become more experienced in software development it becomes increasingly important -to understand the wider landscape in which you operate, -particularly in terms of the software decisions the people around you made and why? -Today, there are a multitude of different programming languages, -with each supporting at least one way to approach a problem and structure your code. -In many cases, particularly with modern languages, -a single language can allow many different structural approaches within your code. - -One way to categorise these structural approaches is into **paradigms**. -Each paradigm represents a slightly different way of thinking about and structuring our code -and each has certain strengths and weaknesses when used to solve particular types of problems. -Once your software begins to get more complex -it's common to use aspects of different paradigms to handle different subtasks. -Because of this, it's useful to know about the major paradigms, -so you can recognise where it might be useful to switch. - -There are two major families that we can group the common programming paradigms into: -**Imperative** and **Declarative**. -An imperative program uses statements that change the program's state - -it consists of commands for the computer to perform -and focuses on describing **how** a program operates step by step. -A declarative program expresses the logic of a computation -to describe **what** should be accomplished -rather than describing its control flow as a sequence steps. - -We will look into three major paradigms -from the imperative and declarative families that may be useful to you - -**Procedural Programming**, **Functional Programming** and **Object-Oriented Programming**. -Note, however, that most of the languages can be used with multiple paradigms, -and it is common to see multiple paradigms within a single program - -so this classification of programming languages based on the paradigm they use isn't as strict. - -## Procedural Programming - -Procedural Programming comes from a family of paradigms known as the Imperative Family. -With paradigms in this family, we can think of our code as the instructions for processing data. - -Procedural Programming is probably the style you're most familiar with -and the one we used up to this point, -where we group code into -*procedures performing a single task, with exactly one entry and one exit point*. -In most modern languages we call these **functions**, instead of procedures - -so if you're grouping your code into functions, this might be the paradigm you're using. -By grouping code like this, we make it easier to reason about the overall structure, -since we should be able to tell roughly what a function does just by looking at its name. -These functions are also much easier to reuse than code outside of functions, -since we can call them from any part of our program. - -So far we have been using this technique in our code - -it contains a list of instructions that execute one after the other starting from the top. -This is an appropriate choice for smaller scripts and software -that we're writing just for a single use. -Aside from smaller scripts, Procedural Programming is also commonly seen -in code focused on high performance, with relatively simple data structures, -such as in High Performance Computing (HPC). -These programs tend to be written in C (which doesn't support Object Oriented Programming) -or Fortran (which didn't until recently). -HPC code is also often written in C++, -but C++ code would more commonly follow an Object Oriented style, -though it may have procedural sections. - -Note that you may sometimes hear people refer to this paradigm as "functional programming" -to contrast it with Object Oriented Programming, -because it uses functions rather than objects, -but this is incorrect. -Functional Programming is a separate paradigm that -places much stronger constraints on the behaviour of a function -and structures the code differently as we'll see soon. - -## Functional Programming - -Functional Programming comes from a different family of paradigms - -known as the Declarative Family. -The Declarative Family is a distinct set of paradigms -which have a different outlook on what a program is - -here code describes *what* data processing should happen. -What we really care about here is the outcome - how this is achieved is less important. - -Functional Programming is built around -a more strict definition of the term **function** borrowed from mathematics. -A function in this context can be thought of as -a mapping that transforms its input data into output data. -Anything a function does other than produce an output is known as a **side effect** -and should be avoided wherever possible. - -Being strict about this definition allows us to -break down the distinction between **code** and **data**, -for example by writing a function which accepts and transforms other functions - -in Functional Programming *code is data*. - -The most common application of Functional Programming in research is in data processing, -especially when handling **Big Data**. -One popular definition of Big Data is -data which is too large to fit in the memory of a single computer, -with a single dataset sometimes being multiple terabytes or larger. -With datasets like this, we can't move the data around easily, -so we often want to send our code to where the data is instead. -By writing our code in a functional style, -we also gain the ability to run many operations in parallel -as it's guaranteed that each operation won't interact with any of the others - -this is essential if we want to process this much data in a reasonable amount of time. - -## Object Oriented Programming - -Object Oriented Programming focuses on the specific characteristics of each object -and what each object can do. -An object has two fundamental parts - properties (characteristics) and behaviours. -In Object Oriented Programming, -we first think about the data and the things that we're modelling - and represent these by objects. - -For example, if we're writing a simulation for our chemistry research, -we're probably going to need to represent atoms and molecules. -Each of these has a set of properties which we need to know about -in order for our code to perform the tasks we want - -in this case, for example, we often need to know the mass and electric charge of each atom. -So with Object Oriented Programming, -we'll have some **object** structure which represents an atom and all of its properties, -another structure to represent a molecule, -and a relationship between the two (a molecule contains atoms). -This structure also provides a way for us to associate code with an object, -representing any **behaviours** it may have. -In our chemistry example, this could be our code for calculating the force between a pair of atoms. - -Most people would classify Object Oriented Programming as an -[extension of the Imperative family of languages](https://www.digitalocean.com/community/tutorials/functional-imperative-object-oriented-programming-comparison) -(with the extra feature being the objects), but -[others disagree](https://stackoverflow.com/questions/38527078/what-is-the-difference-between-imperative-and-object-oriented-programming). - -> ## So Which one is Python? -> Python is a multi-paradigm and multi-purpose programming language. -> You can use it as a procedural language and you can use it in a more object oriented way. -> It does tend to land more on the object oriented side as all its core data types -> (strings, integers, floats, booleans, lists, -> sets, arrays, tuples, dictionaries, files) -> as well as functions, modules and classes are objects. -> -> Since functions in Python are also objects that can be passed around like any other object, -> Python is also well suited to functional programming. -> One of the most popular Python libraries for data manipulation, -> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), -> supports a functional programming style -> as most of its functions on data are not changing the data (no side effects) -> but producing a new data to reflect the result of the function. -{: .callout} - -## Other Paradigms - -The three paradigms introduced here are some of the most common, -but there are many others which may be useful for addressing specific classes of problem - -for much more information see the Wikipedia's page on -[programming paradigms](https://en.wikipedia.org/wiki/Programming_paradigm). -Having mainly used Procedural Programming so far, -we will now have a closer look at Functional and Object Oriented Programming paradigms -and how they can affect our architectural design choices. - -{% include links.md %} diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md new file mode 100644 index 000000000..42eae41f7 --- /dev/null +++ b/_episodes/33-refactoring-functions.md @@ -0,0 +1,272 @@ +--- +title: "Refactoring Functions to Do Just One Thing" +teaching: 30 +exercises: 20 +questions: +- "How do you refactor code without breaking it?" +- "How do you write code that is easy to test?" +- "What is functional programming?" +- "Which situations/problems is functional programming well suited for?" +objectives: +- "Understand how to refactor functions to be easier to test" +- "Be able to write regressions tests to avoid breaking existing code" +- "Understand what a pure function is." +keypoints: +- "By refactoring code into pure functions that act on data makes code easier to test." +- "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" +- "Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations)." +--- + +## Introduction + +In this episode we will take some code and refactor it in a way which is going to make it +easier to test. +By having more tests, we can more confident of future changes having their intended effect. +The change we will make will also end up making the code easier to understand. + +## Writing tests before refactoring + +The process we are going to be following is: + +1. Write some tests that test the behaviour as it is now +2. Refactor the code to be more testable +3. Ensure that the original tests still pass + +By writing the tests *before* we refactor, we can be confident we haven't broken +existing behaviour through the refactoring. + +There is a bit of a chicken-and-the-egg problem here however. +If the refactoring is to make it easier to write tests, how can we write tests +before doing the refactoring? + +The tricks to get around this trap are: + + * Test at a higher level, with coarser accuracy + * Write tests that you intend to remove + +The best tests are ones that test single bits of code rigorously. +However, with this code it isn't possible to do that. + +Instead we will make minimal changes to the code to make it a bit testable, +for example returning the data instead of visualising it. + +We will make the asserts verify whatever the outcome is currently, +rather than worrying whether that is correct. +These tests are to verify the behaviour doesn't *change* rather than to check the current behaviour is correct. +This kind of testing is called **regression testing** as we are testing for +regressions in existing behaviour. + +As with everything in this episode, there isn't a hard and fast rule. +Refactoring doesn't change behaviour, but sometimes to make it possible to verify +you're not changing the important behaviour you have to make some small tweaks to write +the tests at all. + +> ## Exercise: Write regression tests before refactoring +> Add a new test file called `test_compute_data.py` in the tests folder. +> Add and complete this regression test to verify the current output of `analyse_data` +> is unchanged by the refactorings we are going to do: +> ```python +> def test_analyse_data(): +> from inflammation.compute_data import analyse_data +> path = Path.cwd() / "../data" +> result = analyse_data(path) +> +> # TODO: add an assert for the value of result +> ``` +> Use `assert_array_almost_equal` from the `numpy.testing` library to +> compare arrays of floating point numbers. +> +> You will need to modify `analyse_data` to not create a graph and instead +> return the data. +> +>> ## Hint +>> You might find it helpful to assert the results equal some made up array, observe the test failing +>> and copy and paste the correct result into the test. +> {: .solution} +> +>> ## Solution +>> One approach we can take is to: +>> * comment out the visualize (as this will cause our test to hang) +>> * return the data instead, so we can write asserts on the data +>> * See what the calculated value is, and assert that it is the same +>> Putting this together, you can write a test that looks something like: +>> +>> ```python +>> import numpy.testing as npt +>> from pathlib import Path +>> +>> def test_analyse_data(): +>> from inflammation.compute_data import analyse_data +>> path = Path.cwd() / "../data" +>> result = analyse_data(path) +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211, +>> 0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094, +>> 1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312, +>> 1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578, +>> 0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417, +>> 0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707, +>> 0.50323031,0.47574665,0.45197398,0.22070227] +>> npt.assert_array_almost_equal(result, expected_output) +>> ``` +>> +>> Note - this isn't a good test: +>> * It isn't at all obvious why these numbers are correct. +>> * It doesn't test edge cases. +>> * If the files change, the test will start failing. +>> +>> However, it allows us to guarantee we don't accidentally change the analysis output. +> {: .solution} +{: .challenge} + +## Pure functions + +A **pure function** is a function that works like a mathematical function. +That is, it takes in some inputs as parameters, and it produces an output. +That output should always be the same for the same input. +That is, it does not depend on any information not present in the inputs (such as global variables, databases, the time of day etc.) +Further, it should not cause any **side effects**, such as writing to a file or changing a global variable. + +You should try and have as much of the complex, analytical and mathematical code in pure functions. + +By eliminating dependency on external things such as global state, we +reduce the cognitive load to understand the function. +The reader only needs to concern themselves with the input +parameters of the function and the code itself, rather than +the overall context the function is operating in. + +Similarly, a function that *calls* a pure function is also easier +to understand. +Since the function won't have any side effects, the reader needs to +only understand what the function returns, which will probably +be clear from the context in which the function is called. + +This property also makes them easier to re-use as the caller +only needs to understand what parameters to provide, rather +than anything else that might need to be configured +or side effects for calling it at a time that is different +to when the original author intended. + +Some parts of a program are inevitably impure. +Programs need to read input from the user, or write to a database. +Well designed programs separate complex logic from the necessary impure "glue" code that interacts with users and systems. +This way, you have easy-to-test, easy-to-read code that contains the complex logic. +And you have really simple code that just reads data from a file, or gathers user input etc, +that is maybe harder to test, but is so simple that it only needs a handful of tests anyway. + +> ## Exercise: Refactor the function into a pure function +> Refactor the `analyse_data` function into a pure function with the logic, and an impure function that handles the input and output. +> The pure function should take in the data, and return the analysis results: +> ```python +> def compute_standard_deviation_by_day(data): +> # TODO +> return daily_standard_deviation +> ``` +> The "glue" function should maintain the behaviour of the original `analyse_data` +> but delegate all the calculations to the new pure function. +>> ## Solution +>> You can move all of the code that does the analysis into a separate function that +>> might look something like this: +>> ```python +>> def compute_standard_deviation_by_day(data): +>> means_by_day = map(models.daily_mean, data) +>> means_by_day_matrix = np.stack(list(means_by_day)) +>> +>> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) +>> return daily_standard_deviation +>> ``` +>> Then the glue function can use this function, whilst keeping all the logic +>> for reading the file and processing the data for showing in a graph: +>>```python +>>def analyse_data(data_dir): +>> """Calculate the standard deviation by day between datasets +>> Gets all the inflammation csvs within a directory, works out the mean +>> inflammation value for each day across all datasets, then graphs the +>> standard deviation of these means.""" +>> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {data_dir}") +>> data = map(models.load_csv, data_file_paths) +>> daily_standard_deviation = compute_standard_deviation_by_day(data) +>> +>> graph_data = { +>> 'standard deviation by day': daily_standard_deviation, +>> } +>> # views.visualize(graph_data) +>> return daily_standard_deviation +>>``` +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data`. +> {: .solution} +{: .challenge} + +### Testing Pure Functions + +Now we have a pure function for the analysis, we can write tests that cover +all the things we would like tests to cover without depending on the data +existing in CSVs. + +This is another advantage of pure functions - they are very well suited to automated testing. + +They are **easier to write** - +we construct input and assert the output +without having to think about making sure the global state is correct before or after. + +Perhaps more important, they are **easier to read** - +the reader will not have to open up a CSV file to understand why the test is correct. + +It will also make the tests **easier to maintain**. +If at some point the data format is changed from CSV to JSON, the bulk of the tests +won't need to be updated. + +> ## Exercise: Write some tests for the pure function +> Now we have refactored our a pure function, we can more easily write comprehensive tests. +> Add tests that check for when there is only one file with multiple rows, multiple files with one row +> and any other cases you can think of that should be tested. +>> ## Solution +>> You might have thought of more tests, but we can easily extend the test by parametrizing +>> with more inputs and expected outputs: +>> ```python +>>@pytest.mark.parametrize('data,expected_output', [ +>> ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), +>> ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), +>> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) +>>], +>>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) +>>def test_compute_standard_deviation_by_day(data, expected_output): +>> from inflammation.compute_data import compute_standard_deviation_by_data +>> +>> result = compute_standard_deviation_by_data(data) +>> npt.assert_array_almost_equal(result, expected_output) +``` +> {: .solution} +{: .challenge} + +## Functional Programming + +**Pure Functions** are a concept that is part of the idea of **Functional Programming**. +Functional programming is a style of programming that encourages using pure functions, +chained together. +Some programming languages, such as Haskell or Lisp just support writing functional code, +but it is more common for languages to allow using functional and **imperative** (the style +of code you have probably been writing thus far where you instruct the computer directly what to do). +Python, Java, C++ and many other languages allow for mixing these two styles. + +In Python, you can use the built-in functions `map`, `filter` and `reduce` to chain +pure functions together into pipelines. + +In the original code, we used `map` to "map" the file paths into the loaded data. +Extending this idea, you could then "map" the results of that through another process. + +You can read more about using these language features [here](https://www.learnpython.org/en/Map%2C_Filter%2C_Reduce). +Other programming languages will have similar features, and searching "functional style" + your programming language of choice +will help you find the features available. + +There are no hard and fast rules in software design but making your complex logic out of composed pure functions is a great place to start +when trying to make code readable, testable and maintainable. +This tends to be possible when: + +* Doing any kind of data analysis +* Simulations +* Translating data from one format to another + +{% include links.md %} diff --git a/_episodes/34-functional-programming.md b/_episodes/34-functional-programming.md deleted file mode 100644 index 750a0e235..000000000 --- a/_episodes/34-functional-programming.md +++ /dev/null @@ -1,825 +0,0 @@ ---- -title: "Functional Programming" -teaching: 30 -exercises: 30 -questions: -- What is functional programming? -- Which situations/problems is functional programming well suited for? -objectives: -- Describe the core concepts that define the functional programming paradigm -- Describe the main characteristics of code that is written in functional programming style -- Learn how to generate and process data collections efficiently using MapReduce and Python's comprehensions -keypoints: -- Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations). -- In functional programming, functions tend to be *pure* - they do not exhibit *side-effects* (by not affecting anything other than the value they return or anything outside a function). Functions can also be named, passed as arguments, and returned from other functions, just as any other data type. -- MapReduce is an instance of a data generation and processing approach, in particular suited for functional programming and handling Big Data within parallel and distributed environments. -- Python provides comprehensions for lists, dictionaries, sets and generators - a concise (if not strictly functional) way to generate new data from existing data collections while performing sophisticated mapping, filtering and conditional logic on original dataset's members. ---- - -## Introduction - -Functional programming is a programming paradigm where -programs are constructed by applying and composing/chaining **functions**. -Functional programming is based on the -[mathematical definition of a function](https://en.wikipedia.org/wiki/Function_(mathematics)) -`f()`, -which applies a transformation to some input data giving us some other data as a result -(i.e. a mapping from input `x` to output `f(x)`). -Thus, a program written in a functional style becomes a series of transformations on data -which are performed to produce a desired output. -Each function (transformation) taken by itself is simple and straightforward to understand; -complexity is handled by composing functions in various ways. - -Often when we use the term function we are referring to -a construct containing a block of code which performs a particular task and can be reused. -We have already seen this in procedural programming - -so how are functions in functional programming different? -The key difference is that functional programming is focussed on -**what** transformations are done to the data, -rather than **how** these transformations are performed -(i.e. a detailed sequence of steps which update the state of the code to reach a desired state). -Let's compare and contrast examples of these two programming paradigms. - -## Functional vs Procedural Programming - -The following two code examples implement the calculation of a factorial -in procedural and functional styles, respectively. -Recall that the factorial of a number `n` (denoted by `n!`) is calculated as -the product of integer numbers from 1 to `n`. - -The first example provides a procedural style factorial function. - -~~~ -def factorial(n): - """Calculate the factorial of a given number. - - :param int n: The factorial to calculate - :return: The resultant factorial - """ - if n < 0: - raise ValueError('Only use non-negative integers.') - - factorial = 1 - for i in range(1, n + 1): # iterate from 1 to n - # save intermediate value to use in the next iteration - factorial = factorial * i - - return factorial -~~~ -{: .language-python} - -Functions in procedural programming are *procedures* that describe -a detailed list of instructions to tell the computer what to do step by step -and how to change the state of the program and advance towards the result. -They often use *iteration* to repeat a series of steps. -Functional programming, on the other hand, typically uses *recursion* - -an ability of a function to call/repeat itself until a particular condition is reached. -Let's see how it is used in the functional programming example below -to achieve a similar effect to that of iteration in procedural programming. - -~~~ -# Functional style factorial function -def factorial(n): - """Calculate the factorial of a given number. - - :param int n: The factorial to calculate - :return: The resultant factorial - """ - if n < 0: - raise ValueError('Only use non-negative integers.') - - if n == 0 or n == 1: - return 1 # exit from recursion, prevents infinite loops - else: - return n * factorial(n-1) # recursive call to the same function -~~~ -{: .language-python} - -Note: You may have noticed that both functions in the above code examples have the same signature -(i.e. they take an integer number as input and return its factorial as output). -You could easily swap these equivalent implementations -without changing the way that the function is invoked. -Remember, a single piece of software may well contain instances of multiple programming paradigms - -including procedural, functional and object-oriented - -it is up to you to decide which one to use and when to switch -based on the problem at hand and your personal coding style. - -Functional computations only rely on the values that are provided as inputs to a function -and not on the state of the program that precedes the function call. -They do not modify data that exists outside the current function, including the input data - -this property is referred to as the *immutability of data*. -This means that such functions do not create any *side effects*, -i.e. do not perform any action that affects anything other than the value they return. -For example: printing text, -writing to a file, -modifying the value of an input argument, -or changing the value of a global variable. -Functions without side affects -that return the same data each time the same input arguments are provided -are called *pure functions*. - -> ## Exercise: Pure Functions -> -> Which of these functions are pure? -> If you're not sure, explain your reasoning to someone else, do they agree? -> -> ~~~ -> def add_one(x): -> return x + 1 -> -> def say_hello(name): -> print('Hello', name) -> -> def append_item_1(a_list, item): -> a_list += [item] -> return a_list -> -> def append_item_2(a_list, item): -> result = a_list + [item] -> return result -> ~~~ -> {: .language-python} -> -> > ## Solution -> > -> > 1. `add_one` is pure - it has no effects other than to return a value and this value will always be the same when given the same inputs -> > 2. `say_hello` is not pure - printing text counts as a side effect, even though it is the clear purpose of the function -> > 3. `append_item_1` is not pure - the argument `a_list` gets modified as a side effect - try this yourself to prove it -> > 4. `append_item_2` is pure - the result is a new variable, so this time `a_list` does not get modified - again, try this yourself -> {: .solution} -{: .challenge} - -## Benefits of Functional Code - -There are a few benefits we get when working with pure functions: - -- Testability -- Composability -- Parallelisability - -**Testability** indicates how easy it is to test the function - usually meaning unit tests. -It is much easier to test a function if we can be certain that -a particular input will always produce the same output. -If a function we are testing might have different results each time it runs -(e.g. a function that generates random numbers drawn from a normal distribution), -we need to come up with a new way to test it. -Similarly, it can be more difficult to test a function with side effects -as it is not always obvious what the side effects will be, or how to measure them. - -**Composability** refers to the ability to make a new function from a chain of other functions -by piping the output of one as the input to the next. -If a function does not have side effects or non-deterministic behaviour, -then all of its behaviour is reflected in the value it returns. -As a consequence of this, any chain of combined pure functions is itself pure, -so we keep all these benefits when we are combining functions into a larger program. -As an example of this, we could make a function called `add_two`, -using the `add_one` function we already have. - -~~~ -def add_two(x): - return add_one(add_one(x)) -~~~ -{: .language-python} - -**Parallelisability** is the ability for operations to be performed at the same time (independently). -If we know that a function is fully pure and we have got a lot of data, -we can often improve performance by -splitting data and distributing the computation across multiple processors. -The output of a pure function depends only on its input, -so we will get the right result regardless of when or where the code runs. - -> ## Everything in Moderation -> Despite the benefits that pure functions can bring, -> we should not be trying to use them everywhere. -> Any software we write needs to interact with the rest of the world somehow, -> which requires side effects. -> With pure functions you cannot read any input, write any output, -> or interact with the rest of the world in any way, -> so we cannot usually write useful software using just pure functions. -> Python programs or libraries written in functional style will usually not be -> as extreme as to completely avoid reading input, writing output, -> updating the state of internal local variables, etc.; -> instead, they will provide a functional-appearing interface -> but may use non-functional features internally. -> An example of this is the [Python Pandas library](https://pandas.pydata.org/) -> for data manipulation built on top of NumPy - -> most of its functions appear pure -> as they return new data objects instead of changing existing ones. -{: .callout} - -There are other advantageous properties that can be derived from the functional approach to coding. -In languages which support functional programming, -a function is a *first-class object* like any other object - -not only can you compose/chain functions together, -but functions can be used as inputs to, -passed around or returned as results from other functions -(remember, in functional programming *code is data*). -This is why functional programming is suitable for processing data efficiently - -in particular in the world of Big Data, where code is much smaller than the data, -sending the code to where data is located is cheaper and faster than the other way round. -Let's see how we can do data processing using functional programming. - -## MapReduce Data Processing Approach - -When working with data you will often find that you need to -apply a transformation to each datapoint of a dataset -and then perform some aggregation across the whole dataset. -One instance of this data processing approach is known as MapReduce -and is applied when processing (but not limited to) Big Data, -e.g. using tools such as [Spark](https://en.wikipedia.org/wiki/Apache_Spark) -or [Hadoop](https://hadoop.apache.org/). -The name MapReduce comes from applying an operation to (mapping) each value in a dataset, -then performing a reduction operation which -collects/aggregates all the individual results together to produce a single result. -MapReduce relies heavily on composability and parallelisability of functional programming - -both map and reduce can be done in parallel and on smaller subsets of data, -before aggregating all intermediate results into the final result. - -### Mapping -`map(f, C)` is a function takes another function `f()` and a collection `C` of data items as inputs. -Calling `map(f, L)` applies the function `f(x)` to every data item `x` in a collection `C` -and returns the resulting values as a new collection of the same size. - -This is a simple mapping that takes a list of names and -returns a list of the lengths of those names using the built-in function `len()`: - -~~~ -name_lengths = map(len, ["Mary", "Isla", "Sam"]) -print(list(name_lengths)) -~~~ -{: .language-python} -~~~ -[4, 4, 3] -~~~ -{: .output} - -This is a mapping that squares every number in the passed collection using anonymous, -inlined *lambda* expression (a simple one-line mathematical expression representing a function): - -~~~ -squares = map(lambda x: x * x, [0, 1, 2, 3, 4]) -print(list(squares)) -~~~ -{: .language-python} -~~~ -[0, 1, 4, 9, 16] -~~~ -{: .output} - -> ## Lambda -> Lambda expressions are used to create anonymous functions that can be used to -> write more compact programs by inlining function code. -> A lambda expression takes any number of input parameters and -> creates an anonymous function that returns the value of the expression. -> So, we can use the short, one-line `lambda x, y, z, ...: expression` code -> instead of defining and calling a named function `f()` as follows: -> ~~~ -> def f(x, y, z, ...): -> return expression -> ~~~ -> {: .language-python} -> The major distinction between lambda functions and ‘normal’ functions is that -> lambdas do not have names. -> We could give a name to a lambda expression if we really wanted to - -> but at that point we should be using a ‘normal’ Python function instead. -> -> ~~~ -> # Don't do this -> add_one = lambda x: x + 1 -> -> # Do this instead -> def add_one(x): -> return x + 1 -> ~~~ -> {: .language-python} -{: .callout} - -In addition to using built-in or inlining anonymous lambda functions, -we can also pass a named function that we have defined ourselves to the `map()` function. - -~~~ -def add_one(num): - return num + 1 - -result = map(add_one, [0, 1, 2]) -print(list(result)) -~~~ -{: .language-python} -~~~ -[1, 2, 3] -~~~ -{: .output} - -> ## Exercise: Check Inflammation Patient Data Against A Threshold Using Map -> Write a new function called `daily_above_threshold()` in our inflammation `models.py` that -> determines whether or not each daily inflammation value for a given patient -> exceeds a given threshold. -> -> Given a patient row number in our data, the patient dataset itself, and a given threshold, -> write the function to use `map()` to generate and return a list of booleans, -> with each value representing whether or not the daily inflammation value for that patient -> exceeded the given threshold. -> -> Ordinarily we would use Numpy's own `map` feature, -> but for this exercise, let's try a solution without it. -> -> > ## Solution -> > ~~~ -> > def daily_above_threshold(patient_num, data, threshold): -> > """Determine whether or not each daily inflammation value exceeds a given threshold for a given patient. -> > -> > :param patient_num: The patient row number -> > :param data: A 2D data array with inflammation data -> > :param threshold: An inflammation threshold to check each daily value against -> > :returns: A boolean list representing whether or not each patient's daily inflammation exceeded the threshold -> > """ -> > -> > return list(map(lambda x: x > threshold, data[patient_num])) -> > ~~~ -> > {: .language-python} -> > -> > Note: `map()` function returns a map iterator object -> > which needs to be converted to a collection object -> > (such as a list, dictionary, set, tuple) -> > using the corresponding "factory" function (in our case `list()`). -> {: .solution} -{: .challenge} - -#### Comprehensions for Mapping/Data Generation - -Another way you can generate new collections of data from existing collections in Python is -using *comprehensions*, -which are an elegant and concise way of creating data from -[iterable objects](https://www.w3schools.com/python/python_iterators.asp) using *for loops*. -While not a pure functional concept, -comprehensions provide data generation functionality -and can be used to achieve the same effect as the built-in "pure functional" function `map()`. -They are commonly used and actually recommended as a replacement of `map()` in modern Python. -Let's have a look at some examples. - -~~~ -integers = range(5) -double_ints = [2 * i for i in integers] - -print(double_ints) -~~~ -{: .language-python} -~~~ -[0, 2, 4, 6, 8] -~~~ -{: .output} - -The above example uses a *list comprehension* to double each number in a sequence. -Notice the similarity between the syntax for a list comprehension and a for loop - -in effect, this is a for loop compressed into a single line. -In this simple case, the code above is equivalent to using a map operation on a sequence, -as shown below: - -~~~ -integers = range(5) -double_ints = map(lambda i: 2 * i, integers) -print(list(double_ints)) -~~~ -{: .language-python} -~~~ -[0, 2, 4, 6, 8] -~~~ -{: .output} - -We can also use list comprehensions to filter data, by adding the filter condition to the end: - -~~~ -double_even_ints = [2 * i for i in integers if i % 2 == 0] -print(double_even_ints) -~~~ -{: .language-python} -~~~ -[0, 4, 8] -~~~ -{: .output} - -> ## Set and Dictionary Comprehensions and Generators -> We also have *set comprehensions* and *dictionary comprehensions*, -> which look similar to list comprehensions -> but use the set literal and dictionary literal syntax, respectively. -> ~~~ -> double_even_int_set = {2 * i for i in integers if i % 2 == 0} -> print(double_even_int_set) -> -> double_even_int_dict = {i: 2 * i for i in integers if i % 2 == 0} -> print(double_even_int_dict) -> ~~~ -> {: .language-python} -> ~~~ -> {0, 4, 8} -> {0: 0, 2: 4, 4: 8} -> ~~~ -> {: .output} -> -> Finally, there’s one last ‘comprehension’ in Python - a *generator expression* - -> a type of an iterable object which we can take values from and loop over, -> but does not actually compute any of the values until we need them. -> Iterable is the generic term for anything we can loop or iterate over - -> lists, sets and dictionaries are all iterables. -> ->The `range` function is an example of a generator - -> if we created a `range(1000000000)`, but didn’t iterate over it, -> we’d find that it takes almost no time to do. -> Creating a list containing a similar number of values would take much longer, -> and could be at risk of running out of memory. -> -> We can build our own generators using a generator expression. -> These look much like the comprehensions above, -> but act like a generator when we use them. -> Note the syntax difference for generator expressions - -> parenthesis are used in place of square or curly brackets. -> -> ~~~ -> doubles_generator = (2 * i for i in integers) -> for x in doubles_generator: -> print(x) -> ~~~ -> {: .language-python} -> ~~~ -> 0 -> 2 -> 4 -> 6 -> 8 -> ~~~ -> {: .output} -{: .callout} - - -Let's now have a look at reducing the elements of a data collection into a single result. - -### Reducing - -`reduce(f, C, initialiser)` function accepts a function `f()`, -a collection `C` of data items -and an optional `initialiser`, -and returns a single cumulative value which -aggregates (reduces) all the values from the collection into a single result. -The reduction function first applies the function `f()` to the first two values in the collection -(or to the `initialiser`, if present, and the first item from `C`). -Then for each remaining value in the collection, -it takes the result of the previous computation -and the next value from the collection as the new arguments to `f()` -until we have processed all of the data and reduced it to a single value. -For example, if collection `C` has 5 elements, the call `reduce(f, C)` calculates: - -~~~ -f(f(f(f(C[0], C[1]), C[2]), C[3]), C[4]) -~~~ - -One example of reducing would be to calculate the product of a sequence of numbers. - -~~~ -from functools import reduce - -sequence = [1, 2, 3, 4] - -def product(a, b): - return a * b - -print(reduce(product, sequence)) - -# The same reduction using a lambda function -print(reduce((lambda a, b: a * b), sequence)) -~~~ -{: .language-python} -~~~ -24 -24 -~~~ -{: .output} - -Note that `reduce()` is not a built-in function like `map()` - -you need to import it from library `functools`. - -> ## Exercise: Calculate the Sum of a Sequence of Numbers Using Reduce -> Using reduce calculate the sum of a sequence of numbers. -> Although in practice we would use the built-in `sum()` function for this - try doing it without it. -> -> > ## Solution -> > ~~~ -> > from functools import reduce -> > -> > sequence = [1, 2, 3, 4] -> > -> > def add(a, b): -> > return a + b -> > -> > print(reduce(add, sequence)) -> > -> > # The same reduction using a lambda function -> > print(reduce((lambda a, b: a + b), sequence)) -> > ~~~ -> > {: .language-python} -> > ~~~ -> > 10 -> > 10 -> > ~~~ -> > {: .output} -> {: .solution} -{: .challenge} - -### Putting It All Together -Let's now put together what we have learned about map and reduce so far -by writing a function that calculates the sum of the squares of the values in a list -using the MapReduce approach. - -~~~ -from functools import reduce - -def sum_of_squares(sequence): - squares = [x * x for x in sequence] # use list comprehension for mapping - return reduce(lambda a, b: a + b, squares) -~~~ -{: .language-python} - -We should see the following behaviour when we use it: - -~~~ -print(sum_of_squares([0])) -print(sum_of_squares([1])) -print(sum_of_squares([1, 2, 3])) -print(sum_of_squares([-1])) -print(sum_of_squares([-1, -2, -3])) -~~~ -{: .language-python} -~~~ -0 -1 -14 -1 -14 -~~~ -{: .output} - -Now let’s assume we’re reading in these numbers from an input file, -so they arrive as a list of strings. -We'll modify the function so that it passes the following tests: - -~~~ -print(sum_of_squares(['1', '2', '3'])) -print(sum_of_squares(['-1', '-2', '-3'])) -~~~ -{: .language-python} -~~~ -14 -14 -~~~ -{: .output} - -The code may look like: - -~~~ -from functools import reduce - -def sum_of_squares(sequence): - integers = [int(x) for x in sequence] - squares = [x * x for x in integers] - return reduce(lambda a, b: a + b, squares) -~~~ -{: .language-python} - -Finally, like comments in Python, we’d like it to be possible for users to -comment out numbers in the input file they give to our program. -We'll finally extend our function so that the following tests pass: - -~~~ -print(sum_of_squares(['1', '2', '3'])) -print(sum_of_squares(['-1', '-2', '-3'])) -print(sum_of_squares(['1', '2', '#100', '3'])) -~~~ -{: .language-python} -~~~ -14 -14 -14 -~~~ -{: .output} - -To do so, we may filter out certain elements and have: - -~~~ -from functools import reduce - -def sum_of_squares(sequence): - integers = [int(x) for x in sequence if x[0] != '#'] - squares = [x * x for x in integers] - return reduce(lambda a, b: a + b, squares) -~~~ -{: .language-python} - ->## Exercise: Extend Inflammation Threshold Function Using Reduce -> Extend the `daily_above_threshold()` function you wrote previously -> to return a count of the number of days a patient's inflammation is over the threshold. -> Use `reduce()` over the boolean array that was previously returned to generate the count, -> then return that value from the function. -> -> You may choose to define a separate function to pass to `reduce()`, -> or use an inline lambda expression to do it (which is a bit trickier!). -> -> Hints: -> - Remember that you can define an `initialiser` value with `reduce()` -> to help you start the counter -> - If defining a lambda expression, -> note that it can conditionally return different values using the syntax -> ` if else ` in the expression. -> -> > ## Solution -> > Using a separate function: -> > ~~~ -> > def daily_above_threshold(patient_num, data, threshold): -> > """Count how many days a given patient's inflammation exceeds a given threshold. -> > -> > :param patient_num: The patient row number -> > :param data: A 2D data array with inflammation data -> > :param threshold: An inflammation threshold to check each daily value against -> > :returns: An integer representing the number of days a patient's inflammation is over a given threshold -> > """ -> > def count_above_threshold(a, b): -> > if b: -> > return a + 1 -> > else: -> > return a -> > -> > # Use map to determine if each daily inflammation value exceeds a given threshold for a patient -> > above_threshold = map(lambda x: x > threshold, data[patient_num]) -> > # Use reduce to count on how many days inflammation was above the threshold for a patient -> > return reduce(count_above_threshold, above_threshold, 0) -> > ~~~ -> > {: .language-python} -> > -> > Note that the `count_above_threshold` function used by `reduce()` -> > was defined within the `daily_above_threshold()` function -> > to limit its scope and clarify its purpose -> > (i.e. it may only be useful as part of `daily_above_threshold()` -> > hence being defined as an inner function). -> > -> > The equivalent code using a lambda expression may look like: -> > -> > ~~~ -> > from functools import reduce -> > -> > ... -> > -> > def daily_above_threshold(patient_num, data, threshold): -> > """Count how many days a given patient's inflammation exceeds a given threshold. -> > -> > :param patient_num: The patient row number -> > :param data: A 2D data array with inflammation data -> > :param threshold: An inflammation threshold to check each daily value against -> > :returns: An integer representing the number of days a patient's inflammation is over a given threshold -> > """ -> > -> > above_threshold = map(lambda x: x > threshold, data[patient_num]) -> > return reduce(lambda a, b: a + 1 if b else a, above_threshold, 0) -> > ~~~ -> > {: .language-python} -> Where could this be useful? -> For example, you may want to define the success criteria for a trial if, say, -> 80% of patients do not exhibit inflammation in any of the trial days, or some similar metrics. ->{: .solution} -{: .challenge} - -## Decorators - -Finally, we will look at one last aspect of Python where functional programming is coming handy. -As we have seen in the -[episode on parametrising our unit tests](../22-scaling-up-unit-testing/index.html#parameterising-our-unit-tests), -a decorator can take a function, modify/decorate it, then return the resulting function. -This is possible because Python treats functions as first-class objects -that can be passed around as normal data. -Here, we discuss decorators in more detail and learn how to write our own. -Let's look at the following code for ways on how to "decorate" functions. - -~~~ -def with_logging(func): - - """A decorator which adds logging to a function.""" - def inner(*args, **kwargs): - print("Before function call") - result = func(*args, **kwargs) - print("After function call") - return result - - return inner - - -def add_one(n): - print("Adding one") - return n + 1 - -# Redefine function add_one by wrapping it within with_logging function -add_one = with_logging(add_one) - -# Another way to redefine a function - using a decorator -@with_logging -def add_two(n): - print("Adding two") - return n + 2 - -print(add_one(1)) -print(add_two(1)) -~~~ -{: .language-python} -~~~ -Before function call -Adding one -After function call -2 -Before function call -Adding two -After function call -3 -~~~ -{: .output} - -In this example, we see a decorator (`with_logging`) -and two different syntaxes for applying the decorator to a function. -The decorator is implemented here as a function which encloses another function. -Because the inner function (`inner()`) calls the function being decorated (`func()`) -and returns its result, -it still behaves like this original function. -Part of this is the use of `*args` and `**kwargs` - -these allow our decorated function to accept any arguments or keyword arguments -and pass them directly to the function being decorated. -Our decorator in this case does not need to modify any of the arguments, -so we do not need to know what they are. -Any additional behaviour we want to add as part of our decorated function, -we can put before or after the call to the original function. -Here we print some text both before and after the decorated function, -to show the order in which events happen. - -We also see in this example the two different ways in which a decorator can be applied. -The first of these is to use a normal function call (`with_logging(add_one)`), -where we then assign the resulting function back to a variable - -often using the original name of the function, so replacing it with the decorated version. -The second syntax is the one we have seen previously (`@with_logging`). -This syntax is equivalent to the previous one - -the result is that we have a decorated version of the function, -here with the name `add_two`. -Both of these syntaxes can be useful in different situations: -the `@` syntax is more concise if we never need to use the un-decorated version, -while the function-call syntax gives us more flexibility - -we can continue to use the un-decorated function -if we make sure to give the decorated one a different name, -and can even make multiple decorated versions using different decorators. - -> ## Exercise: Measuring Performance Using Decorators -> One small task you might find a useful case for a decorator is -> measuring the time taken to execute a particular function. -> This is an important part of performance profiling. -> -> Write a decorator which you can use to measure the execution time of the decorated function -> using the [time.process_time_ns()](https://docs.python.org/3/library/time.html#time.process_time_ns) function. -> There are several different timing functions each with slightly different use-cases, -> but we won’t worry about that here. -> -> For the function to measure, you may wish to use this as an example: -> ~~~ -> def measure_me(n): -> total = 0 -> for i in range(n): -> total += i * i -> -> return total -> ~~~ -> {: .language-python} -> > ## Solution -> > -> > ~~~ -> > import time -> > -> > def profile(func): -> > def inner(*args, **kwargs): -> > start = time.process_time_ns() -> > result = func(*args, **kwargs) -> > stop = time.process_time_ns() -> > -> > print("Took {0} seconds".format((stop - start) / 1e9)) -> > return result -> > -> > return inner -> > -> > @profile -> > def measure_me(n): -> > total = 0 -> > for i in range(n): -> > total += i * i -> > -> > return total -> > -> > print(measure_me(1000000)) -> > ~~~ -> > {: .language-python} -> > ~~~ -> > Took 0.124199753 seconds -> > 333332833333500000 -> > ~~~ -> > {: .output} -> {: .solution} -{: .challenge} diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-refactoring-decoupled-units.md new file mode 100644 index 000000000..a9e82d9a9 --- /dev/null +++ b/_episodes/34-refactoring-decoupled-units.md @@ -0,0 +1,443 @@ +--- +title: "Using Classes to De-Couple Code" +teaching: 30 +exercises: 45 +questions: +- "What is de-coupled code?" +- "When is it useful to use classes to structure code?" +- "How can we make sure the components of our software are reusable?" +objectives: +- "Understand the object-oriented principle of polymorphism and interfaces." +- "Be able to introduce appropriate abstractions to simplify code." +- "Understand what decoupled code is, and why you would want it." +- "Be able to use mocks to replace a class in test code." +keypoints: +- "Classes can help separate code so it is easier to understand." +- "By using interfaces, code can become more decoupled." +- "Decoupled code is easier to test, and easier to maintain." +--- + +## Introduction + +When we're thinking about units of code, one important thing to consider is +whether the code is **decoupled** (as opposed to **coupled**). +Two units of code can be considered decoupled if changes in one don't +necessitate changes in the other. +While two connected units can't be totally decoupled, loose coupling +allows for more maintainable code: + +* Loosely coupled code is easier to read as you don't need to understand the + detail of the other unit. +* Loosely coupled code is easier to test, as one of the units can be replaced + by a test or mock version of it. +* Loose coupled code tends to be easier to maintain, as changes can be isolated + from other parts of the code. + +Introducing **abstractions** is a way to decouple code. +If one part of the code only uses another part through an appropriate abstraction +then it becomes easier for these parts to change independently. + +> ## Exercise: Decouple the file loading from the computation +> Currently the function is hard coded to load all the files in a directory. +> Decouple this into a separate function that returns all the files to load +>> ## Solution +>> You should have written a new function that reads all the data into the format needed +>> for the analysis: +>> ```python +>> def load_inflammation_data(dir_path): +>> data_file_paths = glob.glob(os.path.join(dir_path, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {dir_path}") +>> data = map(models.load_csv, data_file_paths) +>> return list(data) +>> ``` +>> This can then be used in the analysis. +>> ```python +>> def analyse_data(data_dir): +>> data = load_inflammation_data(data_dir) +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... +>> ``` +>> This is now easier to understand, as we don't need to understand the the file loading +>> to read the statistical analysis, and we don't have to understand the statistical analysis +>> when reading the data loading. +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data`. +> {: .solution} +{: .challenge} + +Even with this change, the file loading is coupled with the data analysis. +For example, if we wave to support reading JSON files or CSV files +we would have to pass into `analyse_data` some kind of flag indicating what we want. + +Instead, we would like to decouple the consideration of what data to load +from the `analyse_data`` function entirely. + +One way we can do this is to use a language feature called a **class**. + +## Using Python Classes + +A class is a way of grouping together data with some specific methods. +In Python, you can declare a class as follows: + +```python +class Circle: + pass +``` + +They are typically named using `UpperCase`. + +You can then **construct** a class elsewhere in your code by doing the following: + +```python +my_circle = Circle() +``` + +When you construct a class in this ways, the classes **construtor** is called. +It is possible to pass in values to the constructor that configure the class: + +```python +class Circle: + def __init__(self, radius): + self.radius = radius + +my_circle = Circle(10) +``` + +The constructor has the special name `__init__` (one of the so called "dunder methods"). +Notice it also has a special first parameter called `self` (called this by convention). +This parameter can be used to access the current **instance** of the object being created. + +A class can be thought of as a cookie cutter template, +and the instances are the cookies themselves. +That is, one class can have many instances. + +Classes can also have methods defined on them. +Like constructors, they have an special `self` parameter that must come first. + +```python +import math + +class Circle: + ... + def get_area(self): + return math.pi * self.radius * self.radius +... +print(my_circle.get_area()) +``` + +Here the instance of the class, `my_circle` will be automatically +passed in as the first parameter when calling `get_area`. +Then the method can access the **member variable** `radius`. + +> ## Exercise: Use a class to configure loading +> Put the `load_inflammation_data` function we wrote in the last exercise as a member method +> of a new class called `CSVDataSource`. +> Put the configuration of where to load the files in the classes constructor. +> Once this is done, you can construct this class outside the the statistical analysis +> and pass the instance in to `analyse_data`. +>> ## Hint +>> When we have completed the refactoring, the code in the `analyse_data` function +>> should look like: +>> ```python +>> def analyse_data(data_source): +>> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... +>> ``` +>> The controller code should look like: +>> ```python +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> analyse_data(data_source) +>> ``` +> {: .solution} +>> ## Solution +>> You should have created a class that looks something like this: +>> +>> ```python +>> class CSVDataSource: +>> """ +>> Loads all the inflammation csvs within a specified folder. +>> """ +>> def __init__(self, dir_path): +>> self.dir_path = dir_path +>> +>> def load_inflammation_data(self): +>> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {self.dir_path}") +>> data = map(models.load_csv, data_file_paths) +>> return list(data) +>> ``` +>> We can now pass an instance of this class into the the statistical analysis function. +>> This means that should we want to re-use the analysis it wouldn't be fixed to reading +>> from a directory of CSVs. +>> We have "decoupled" the reading of the data from the statistical analysis. +>> ```python +>> def analyse_data(data_source): +>> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... +>> ``` +>> +>> In the controller, you might have something like: +>> +>> ```python +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> analyse_data(data_source) +>> ``` +>> While the behaviour is unchanged, how we call `analyse_data` has changed. +>> We must update our regression test to match this, to ensure we haven't broken the code: +>> ```python +>> ... +>> def test_compute_data(): +>> from inflammation.compute_data import analyse_data +>> path = Path.cwd() / "../data" +>> data_source = CSVDataSource(path) +>> result = analyse_data(data_source) +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 +>> ... +>> ``` +> {: .solution} +{: .challenge} + +## Interfaces + +Another important concept in software design is the idea of **interfaces** between different units in the code. +One kind of interface you might have come across are APIs (Application Programming Interfaces). +These allow separate systems to communicate with each other - such as a making an API request +to Google Maps to find the latitude and longitude of an address. + +However, there are internal interfaces within our software that dictate how +different parts of the system interact with each other. +Even if these aren't thought out or documented, they still exist! + +For example, our `Circle` class implicitly has an interface: +you can call `get_area` on it and it will return a number representing its area. + +> ## Exercise: Identify the interface between `CSVDataSource` and `analyse_data` +> What is the interface that CSVDataSource has with `analyse_data`. +> Think about what functions `analyse_data` needs to be able to call, +> what parameters they need and what it will return. +>> ## Solution +>> The interface is the `load_inflammation_data` method. +>> +>> It takes no parameters. +>> +>> It returns a list where each entry is a 2D array of patient inflammation results by day +>> Any object we pass into `analyse_data` must conform to this interface. +> {: .solution} +{: .challenge} + +## Polymorphism + +It is possible to design multiple classes that each conform to the same interface. + +For example, we could provide a `Rectangle` class: + +```python +class Rectangle(Shape): + def __init__(self, width, height): + self.width = width + self.height = height + def get_area(self): + return self.width * self.height +``` + +Like `Circle`, this class provides a `get_area` method. +The method takes the same number of parameters (none), and returns a number. +However, the implementation is different. + +When classes share an interface, then we can use an instance of a class without +knowing what specific class is being used. +When we do this, it is called **polymorphism**. + +Here is an example where we create a list of shapes (either Circles or Rectangles) +and can then find the total area. +Note how we call `get_area` and Python is able to call the appropriate `get_area` +for each of the shapes. + +```python +my_circle = Circle(radius=10) +my_rectangle = Rectangle(width=5, height=3) +my_shapes = [my_circle, my_rectangle] +total_area = sum(shape.get_area() for shape in my_shapes) +``` + +This is an example of **abstraction** - when we are calculating the total +area, the method for calculating the area of each shape is abstracted away +to the relevant class. + +### How polymorphism is useful + +As we saw with the `Circle` and `Square` examples, we can use common interfaces and polymorphism +to abstract away the details of the implementation from the caller. + +For example, we could replace our `CSVDataSource` with a class that reads a totally different format, +or reads from an external service. +All of these can be added in without changing the analysis. +Further - if we want to write a new analysis, we can support any of these data sources +for free with no further work. +That is, we have decoupled the job of loading the data from the job of analysing the data. + +> ## Exercise: Introduce an alternative implementation of DataSource +> Create another class that supports loading JSON instead of CSV. +> There is a function in `models.py` that loads from JSON in the following format: +> ```json +> [ +> { +> "observations": [0, 1] +> }, +> { +> "observations": [0, 2] +> } +> ] +> ``` +> It should implement the `load_inflammation_data` method. +> Finally, at run time construct an appropriate instance based on the file extension. +>> ## Solution +>> You should have created a class that looks something like: +>> ```python +>> class JSONDataSource: +>> """ +>> Loads all the inflammation JSON's within a specified folder. +>> """ +>> def __init__(self, dir_path): +>> self.dir_path = dir_path +>> +>> def load_inflammation_data(self): +>> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.json')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation JSON's found in path {self.dir_path}") +>> data = map(models.load_json, data_file_paths) +>> return list(data) +>> ``` +>> Additionally, in the controller will need to select the appropriate DataSource to +>> provide to the analysis: +>>```python +>> _, extension = os.path.splitext(InFiles[0]) +>> if extension == '.json': +>> data_source = JSONDataSource(os.path.dirname(InFiles[0])) +>> elif extension == '.csv': +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> else: +>> raise ValueError(f'Unsupported file format: {extension}') +>> analyse_data(data_source) +>>``` +>> As you have seen, all these changes were made without modifying +>> the analysis code itself. +> {: .solution} +{: .challenge} + +## Testing using Mock Objects + +We can use this abstraction to also make testing more straight forward. +Instead of having our tests use real file system data, we can instead provide +a mock or dummy implementation instead of one of the real classes. +Providing what we substitute conforms to the same interface, the code we are testing will work +just the same. +This dummy implementation could just returns some fixed example data. + + +An convenient way to do this in Python is using Python's [mock object library](https://docs.python.org/3/library/unittest.mock.html). +These are a whole topic to themselves - +but a basic mock can be constructed using a couple of lines of code: + +```python +from unittest.mock import Mock + +mock_version = Mock() +mock_version.method_to_mock.return_value = 42 +``` + +Here we construct a mock in the same way you'd construct a class. +Then we specify a method that we want to behave a specific way. + +Now whenever you call `mock_version.method_to_mock()` the return value will be `42`. + + +> ## Exercise: Test using a mock or dummy implementation +> Complete this test for analyse_data, using a mock object in place of the +> `data_source`: +> ```python +> from unittest.mock import Mock +> +> def test_compute_data_mock_source(): +> from inflammation.compute_data import analyse_data +> data_source = Mock() +> +> # TODO: configure data_source mock +> +> result = analyse_data(data_source) +> +> # TODO: add assert on the contents of result +> ``` +> Create a mock for to provide as the `data_source` that returns some fixed data to test +> the `analyse_data` method. +> Use this mock in a test. +> +> Don't forget you will need to import `Mock` from the `unittest.mock` package. +>> ## Solution +>> ```python +>> from unittest.mock import Mock +>> +>> def test_compute_data_mock_source(): +>> from inflammation.compute_data import analyse_data +>> data_source = Mock() +>> data_source.load_inflammation_data.return_value = [[[0, 2, 0]], +>> [[0, 1, 0]]] +>> +>> result = analyse_data(data_source) +>> npt.assert_array_almost_equal(result, [0, math.sqrt(0.25) ,0]) +>> ``` +> {: .solution} +{: .challenge} + +## Object Oriented Programming + +Using classes, particularly when using polymorphism, are techniques that come from +**object oriented programming** (frequently abbreviated to OOP). +As with functional programming different programming languages will provide features to enable you +to write object oriented code. +For example, in Python you can create classes, and use polymorphism to call the +correct method on an instance (e.g when we called `get_area` on a shape, the appropriate `get_area` was called). + +Object oriented programming also includes **information hiding**. +In this, certain fields might be marked private to a class, +preventing them from being modified at will. + +This can be used to maintain invariants of a class (such as insisting that a circles radius is always non-negative). + +There is also inheritance, which allows classes to specialise +the behaviour of other classes by **inheriting** from +another class and **overriding** certain methods. + +As with functional programming, there are times when +object oriented programming is well suited, and times where it is not. + +Good uses: + + * Representing real world objects with invariants + * Providing alternative implementations such as we did with DataSource + * Representing something that has a state that will change over the programs lifetime (such as elements of a GUI) + +One downside of OOP is ending up with very large classes that contain complex methods. +As they are methods on the class, it can be hard to know up front what side effects it causes to the class. +This can make maintenance hard. + +> ## Classes and functional programming +> Using classes is compatible with functional programming. +> In fact, grouping data into logical structures (such as three numbers into a vector) +> is a vital step in writing readable and maintainable code with any approach. +> However, when writing in a functional style, classes should be immutable. +> That is, the methods they provide are read-only. +> If you require the class to be different, you'd create a new instance +> with the new values. +> (that is, the functions should not modify the state of the class). +{: .callout} + + +Don't use features for the sake of using features. +Code should be as simple as it can be, but not any simpler. +If you know your function only makes sense to operate on circles, then +don't accept shapes just to use polymorphism! diff --git a/_episodes/35-object-oriented-programming.md b/_episodes/35-object-oriented-programming.md deleted file mode 100644 index 01413497a..000000000 --- a/_episodes/35-object-oriented-programming.md +++ /dev/null @@ -1,904 +0,0 @@ ---- -title: "Object Oriented Programming" -teaching: 30 -exercises: 20 -questions: -- "How can we use code to describe the structure of data?" -- "How should the relationships between structures be described?" -objectives: -- "Describe the core concepts that define the object oriented paradigm" -- "Use classes to encapsulate data within a more complex program" -- "Structure concepts within a program in terms of sets of behaviour" -- "Identify different types of relationship between concepts within a program" -- "Structure data within a program using these relationships" -keypoints: -- "Object oriented programming is a programming paradigm based on the concept of classes, which encapsulate data and code." -- "Classes allow us to organise data into distinct concepts." -- "By breaking down our data into classes, we can reason about the behaviour of parts of our data." -- "Relationships between concepts can be described using inheritance (*is a*) and composition (*has a*)." ---- - -## Introduction - -Object oriented programming is a programming paradigm based on the concept of objects, -which are data structures that contain (encapsulate) data and code. -Data is encapsulated in the form of fields (attributes) of objects, -while code is encapsulated in the form of procedures (methods) -that manipulate objects' attributes and define "behaviour" of objects. -So, in object oriented programming, -we first think about the data and the things that we’re modelling - -and represent these by objects - -rather than define the logic of the program, -and code becomes a series of interactions between objects. - -## Structuring Data - -One of the main difficulties we encounter when building more complex software is -how to structure our data. -So far, we've been processing data from a single source and with a simple tabular structure, -but it would be useful to be able to combine data from a range of different sources -and with more data than just an array of numbers. - -~~~ -data = np.array([[1., 2., 3.], - [4., 5., 6.]]) -~~~ -{: .language-python} - -Using this data structure has the advantage of -being able to use NumPy operations to process the data -and Matplotlib to plot it, -but often we need to have more structure than this. -For example, we may need to attach more information about the patients -and store this alongside our measurements of inflammation. - -We can do this using the Python data structures we're already familiar with, -dictionaries and lists. -For instance, we could attach a name to each of our patients: - -~~~ -patients = [ - { - 'name': 'Alice', - 'data': [1., 2., 3.], - }, - { - 'name': 'Bob', - 'data': [4., 5., 6.], - }, -] -~~~ -{: .language-python} - -> ## Exercise: Structuring Data -> -> Write a function, called `attach_names`, -> which can be used to attach names to our patient dataset. -> When used as below, it should produce the expected output. -> -> If you're not sure where to begin, -> think about ways you might be able to effectively loop over two collections at once. -> Also, don't worry too much about the data type of the `data` value, -> it can be a Python list, or a NumPy array - either is fine. -> -> ~~~ -> data = np.array([[1., 2., 3.], -> [4., 5., 6.]]) -> -> output = attach_names(data, ['Alice', 'Bob']) -> print(output) -> ~~~ -> {: .language-python} -> -> ~~~ -> [ -> { -> 'name': 'Alice', -> 'data': [1., 2., 3.], -> }, -> { -> 'name': 'Bob', -> 'data': [4., 5., 6.], -> }, -> ] -> ~~~ -> {: .output} -> -> > ## Solution -> > -> > One possible solution, perhaps the most obvious, -> > is to use the `range` function to index into both lists at the same location: -> > -> > ~~~ -> > def attach_names(data, names): -> > """Create datastructure containing patient records.""" -> > output = [] -> > -> > for i in range(len(data)): -> > output.append({'name': names[i], -> > 'data': data[i]}) -> > -> > return output -> > ~~~ -> > {: .language-python} -> > -> > However, this solution has a potential problem that can occur sometimes, -> > depending on the input. -> > What might go wrong with this solution? -> > How could we fix it? -> > -> > > ## A Better Solution -> > > -> > > What would happen if the `data` and `names` inputs were different lengths? -> > > -> > > If `names` is longer, we'll loop through, until we run out of rows in the `data` input, -> > > at which point we'll stop processing the last few names. -> > > If `data` is longer, we'll loop through, but at some point we'll run out of names - -> > > but this time we try to access part of the list that doesn't exist, -> > > so we'll get an exception. -> > > -> > > A better solution would be to use the `zip` function, -> > > which allows us to iterate over multiple iterables without needing an index variable. -> > > The `zip` function also limits the iteration to whichever of the iterables is smaller, -> > > so we won't raise an exception here, -> > > but this might not quite be the behaviour we want, -> > > so we'll also explicitly `assert` that the inputs should be the same length. -> > > Checking that our inputs are valid in this way is an example of a precondition, -> > > which we introduced conceptually in an earlier episode. -> > > -> > > If you've not previously come across the `zip` function, -> > > read [this section](https://docs.python.org/3/library/functions.html#zip) -> > > of the Python documentation. -> > > -> > > ~~~ -> > > def attach_names(data, names): -> > > """Create datastructure containing patient records.""" -> > > assert len(data) == len(names) -> > > output = [] -> > > -> > > for data_row, name in zip(data, names): -> > > output.append({'name': name, -> > > 'data': data_row}) -> > > -> > > return output -> > > ~~~ -> > > {: .language-python} -> > {: .solution} -> {: .solution} -{: .challenge} - -## Classes in Python - -Using nested dictionaries and lists should work for some of the simpler cases -where we need to handle structured data, -but they get quite difficult to manage once the structure becomes a bit more complex. -For this reason, in the object oriented paradigm, -we use **classes** to help with managing this data -and the operations we would want to perform on it. -A class is a **template** (blueprint) for a structured piece of data, -so when we create some data using a class, -we can be certain that it has the same structure each time. - -With our list of dictionaries we had in the example above, -we have no real guarantee that each dictionary has the same structure, -e.g. the same keys (`name` and `data`) unless we check it manually. -With a class, if an object is an **instance** of that class -(i.e. it was made using that template), -we know it will have the structure defined by that class. -Different programming languages make slightly different guarantees -about how strictly the structure will match, -but in object oriented programming this is one of the core ideas - -all objects derived from the same class must follow the same behaviour. - -You may not have realised, but you should already be familiar with -some of the classes that come bundled as part of Python, for example: - -~~~ -my_list = [1, 2, 3] -my_dict = {1: '1', 2: '2', 3: '3'} -my_set = {1, 2, 3} - -print(type(my_list)) -print(type(my_dict)) -print(type(my_set)) -~~~ -{: .language-python} - -~~~ - - - -~~~ -{: .output} - -Lists, dictionaries and sets are a slightly special type of class, -but they behave in much the same way as a class we might define ourselves: - -- They each hold some data (**attributes** or **state**). -- They also provide some methods describing the behaviours of the data - - what can the data do and what can we do to the data? - -The behaviours we may have seen previously include: - -- Lists can be appended to -- Lists can be indexed -- Lists can be sliced -- Key-value pairs can be added to dictionaries -- The value at a key can be looked up in a dictionary -- The union of two sets can be found (the set of values present in any of the sets) -- The intersection of two sets can be found (the set of values present in all of the sets) - -## Encapsulating Data - -Let's start with a minimal example of a class representing our patients. - -~~~ -# file: inflammation/models.py - -class Patient: - def __init__(self, name): - self.name = name - self.observations = [] - -alice = Patient('Alice') -print(alice.name) -~~~ -{: .language-python} - -~~~ -Alice -~~~ -{: .output} - -Here we've defined a class with one method: `__init__`. -This method is the **initialiser** method, -which is responsible for setting up the initial values and structure of the data -inside a new instance of the class - -this is very similar to **constructors** in other languages, -so the term is often used in Python too. -The `__init__` method is called every time we create a new instance of the class, -as in `Patient('Alice')`. -The argument `self` refers to the instance on which we are calling the method -and gets filled in automatically by Python - -we do not need to provide a value for this when we call the method. - -Data encapsulated within our Patient class includes -the patient's name and a list of inflammation observations. -In the initialiser method, -we set a patient's name to the value provided, -and create a list of inflammation observations for the patient (initially empty). -Such data is also referred to as the attributes of a class -and holds the current state of an instance of the class. -Attributes are typically hidden (encapsulated) internal object details -ensuring that access to data is protected from unintended changes. -They are manipulated internally by the class, -which, in addition, can expose certain functionality as public behavior of the class -to allow other objects to interact with this class' instances. - -## Encapsulating Behaviour - -In addition to representing a piece of structured data -(e.g. a patient who has a name and a list of inflammation observations), -a class can also provide a set of functions, or **methods**, -which describe the **behaviours** of the data encapsulated in the instances of that class. -To define the behaviour of a class we add functions which operate on the data the class contains. -These functions are the member functions or methods. - -Methods on classes are the same as normal functions, -except that they live inside a class and have an extra first parameter `self`. -Using the name `self` is not strictly necessary, but is a very strong convention - -it is extremely rare to see any other name chosen. -When we call a method on an object, -the value of `self` is automatically set to this object - hence the name. -As we saw with the `__init__` method previously, -we do not need to explicitly provide a value for the `self` argument, -this is done for us by Python. - -Let's add another method on our Patient class that adds a new observation to a Patient instance. - -~~~ -# file: inflammation/models.py - -class Patient: - """A patient in an inflammation study.""" - def __init__(self, name): - self.name = name - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - if self.observations: - day = self.observations[-1]['day'] + 1 - else: - day = 0 - - new_observation = { - 'day': day, - 'value': value, - } - - self.observations.append(new_observation) - return new_observation - -alice = Patient('Alice') -print(alice) - -observation = alice.add_observation(3) -print(observation) -print(alice.observations) -~~~ -{: .language-python} - -~~~ -<__main__.Patient object at 0x7fd7e61b73d0> -{'day': 0, 'value': 3} -[{'day': 0, 'value': 3}] -~~~ -{: .output} - -Note also how we used `day=None` in the parameter list of the `add_observation` method, -then initialise it if the value is indeed `None`. -This is one of the common ways to handle an optional argument in Python, -so we'll see this pattern quite a lot in real projects. - -> ## Class and Static Methods -> -> Sometimes, the function we're writing doesn't need access to -> any data belonging to a particular object. -> For these situations, we can instead use a **class method** or a **static method**. -> Class methods have access to the class that they're a part of, -> and can access data on that class - -> but do not belong to a specific instance of that class, -> whereas static methods have access to neither the class nor its instances. -> -> By convention, class methods use `cls` as their first argument instead of `self` - -> this is how we access the class and its data, -> just like `self` allows us to access the instance and its data. -> Static methods have neither `self` nor `cls` -> so the arguments look like a typical free function. -> These are the only common exceptions to using `self` for a method's first argument. -> -> Both of these method types are created using **decorators** - -> for more information see -> the [classmethod](https://docs.python.org/3/library/functions.html#classmethod) -> and [staticmethod](https://docs.python.org/3/library/functions.html#staticmethod) -> decorator sections of the Python documentation. -{: .callout} - -### Dunder Methods - -Why is the `__init__` method not called `init`? -There are a few special method names that we can use -which Python will use to provide a few common behaviours, -each of which begins and ends with a **d**ouble-**under**score, -hence the name **dunder method**. - -When writing your own Python classes, -you'll almost always want to write an `__init__` method, -but there are a few other common ones you might need sometimes. -You may have noticed in the code above that the method `print(alice)` -returned `<__main__.Patient object at 0x7fd7e61b73d0>`, -which is the string representation of the `alice` object. -We may want the print statement to display the object's name instead. -We can achieve this by overriding the `__str__` method of our class. - -~~~ -# file: inflammation/models.py - -class Patient: - """A patient in an inflammation study.""" - def __init__(self, name): - self.name = name - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1]['day'] + 1 - - except IndexError: - day = 0 - - - new_observation = { - 'day': day, - 'value': value, - } - - self.observations.append(new_observation) - return new_observation - - def __str__(self): - return self.name - - -alice = Patient('Alice') -print(alice) -~~~ -{: .language-python} - -~~~ -Alice -~~~ -{: .output} - -These dunder methods are not usually called directly, -but rather provide the implementation of some functionality we can use - -we didn't call `alice.__str__()`, -but it was called for us when we did `print(alice)`. -Some we see quite commonly are: - -- `__str__` - converts an object into its string representation, used when you call `str(object)` or `print(object)` -- `__getitem__` - Accesses an object by key, this is how `list[x]` and `dict[x]` are implemented -- `__len__` - gets the length of an object when we use `len(object)` - usually the number of items it contains - -There are many more described in the Python documentation, -but it’s also worth experimenting with built in Python objects to -see which methods provide which behaviour. -For a more complete list of these special methods, -see the [Special Method Names](https://docs.python.org/3/reference/datamodel.html#special-method-names) -section of the Python documentation. - -> ## Exercise: A Basic Class -> -> Implement a class to represent a book. -> Your class should: -> -> - Have a title -> - Have an author -> - When printed using `print(book)`, show text in the format "title by author" -> -> ~~~ -> book = Book('A Book', 'Me') -> -> print(book) -> ~~~ -> {: .language-python} -> -> ~~~ -> A Book by Me -> ~~~ -> {: .output} -> -> > ## Solution -> > -> > ~~~ -> > class Book: -> > def __init__(self, title, author): -> > self.title = title -> > self.author = author -> > -> > def __str__(self): -> > return self.title + ' by ' + self.author -> > ~~~ -> > {: .language-python} -> {: .solution} -{: .challenge} - -### Properties - -The final special type of method we will introduce is a **property**. -Properties are methods which behave like data - -when we want to access them, we do not need to use brackets to call the method manually. - -~~~ -# file: inflammation/models.py - -class Patient: - ... - - @property - def last_observation(self): - return self.observations[-1] - -alice = Patient('Alice') - -alice.add_observation(3) -alice.add_observation(4) - -obs = alice.last_observation -print(obs) -~~~ -{: .language-python} - -~~~ -{'day': 1, 'value': 4} -~~~ -{: .output} - -You may recognise the `@` syntax from episodes on -parameterising unit tests and functional programming - -`property` is another example of a **decorator**. -In this case the `property` decorator is taking the `last_observation` function -and modifying its behaviour, -so it can be accessed as if it were a normal attribute. -It is also possible to make your own decorators, but we won't cover it here. - -## Relationships Between Classes - -We now have a language construct for grouping data and behaviour -related to a single conceptual object. -The next step we need to take is to describe the relationships between the concepts in our code. - -There are two fundamental types of relationship between objects -which we need to be able to describe: - -1. Ownership - x **has a** y - this is **composition** -2. Identity - x **is a** y - this is **inheritance** - -### Composition - -You should hopefully have come across the term **composition** already - -in the novice Software Carpentry, we use composition of functions to reduce code duplication. -That time, we used a function which converted temperatures in Celsius to Kelvin -as a **component** of another function which converted temperatures in Fahrenheit to Kelvin. - -In the same way, in object oriented programming, we can make things components of other things. - -We often use composition where we can say 'x *has a* y' - -for example in our inflammation project, -we might want to say that a doctor *has* patients -or that a patient *has* observations. - -In the case of our example, we're already saying that patients have observations, -so we're already using composition here. -We're currently implementing an observation as a dictionary with a known set of keys though, -so maybe we should make an `Observation` class as well. - -~~~ -# file: inflammation/models.py - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return str(self.value) - -class Patient: - """A patient in an inflammation study.""" - def __init__(self, name): - self.name = name - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation - - def __str__(self): - return self.name - - -alice = Patient('Alice') -obs = alice.add_observation(3) - -print(obs) -~~~ -{: .language-python} - -~~~ -3 -~~~ -{: .output} - -Now we're using a composition of two custom classes to -describe the relationship between two types of entity in the system that we're modelling. - -### Inheritance - -The other type of relationship used in object oriented programming is **inheritance**. -Inheritance is about data and behaviour shared by classes, -because they have some shared identity - 'x *is a* y'. -If class `X` inherits from (*is a*) class `Y`, -we say that `Y` is the **superclass** or **parent class** of `X`, -or `X` is a **subclass** of `Y`. - -If we want to extend the previous example to also manage people who aren't patients -we can add another class `Person`. -But `Person` will share some data and behaviour with `Patient` - -in this case both have a name and show that name when you print them. -Since we expect all patients to be people (hopefully!), -it makes sense to implement the behaviour in `Person` and then reuse it in `Patient`. - -To write our class in Python, -we used the `class` keyword, the name of the class, -and then a block of the functions that belong to it. -If the class **inherits** from another class, -we include the parent class name in brackets. - -~~~ -# file: inflammation/models.py - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return str(self.value) - -class Person: - def __init__(self, name): - self.name = name - - def __str__(self): - return self.name - -class Patient(Person): - """A patient in an inflammation study.""" - def __init__(self, name): - super().__init__(name) - self.observations = [] - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation - -alice = Patient('Alice') -print(alice) - -obs = alice.add_observation(3) -print(obs) - -bob = Person('Bob') -print(bob) - -obs = bob.add_observation(4) -print(obs) -~~~ -{: .language-python} - -~~~ -Alice -3 -Bob -AttributeError: 'Person' object has no attribute 'add_observation' -~~~ -{: .output} - -As expected, an error is thrown because we cannot add an observation to `bob`, -who is a Person but not a Patient. - -We see in the example above that to say that a class inherits from another, -we put the **parent class** (or **superclass**) in brackets after the name of the **subclass**. - -There's something else we need to add as well - -Python doesn't automatically call the `__init__` method on the parent class -if we provide a new `__init__` for our subclass, -so we'll need to call it ourselves. -This makes sure that everything that needs to be initialised on the parent class has been, -before we need to use it. -If we don't define a new `__init__` method for our subclass, -Python will look for one on the parent class and use it automatically. -This is true of all methods - -if we call a method which doesn't exist directly on our class, -Python will search for it among the parent classes. -The order in which it does this search is known as the **method resolution order** - -a little more on this in the Multiple Inheritance callout below. - -The line `super().__init__(name)` gets the parent class, -then calls the `__init__` method, -providing the `name` variable that `Person.__init__` requires. -This is quite a common pattern, particularly for `__init__` methods, -where we need to make sure an object is initialised as a valid `X`, -before we can initialise it as a valid `Y` - -e.g. a valid `Person` must have a name, -before we can properly initialise a `Patient` model with their inflammation data. - - -> ## Composition vs Inheritance -> -> When deciding how to implement a model of a particular system, -> you often have a choice of either composition or inheritance, -> where there is no obviously correct choice. -> For example, it's not obvious whether a photocopier *is a* printer and *is a* scanner, -> or *has a* printer and *has a* scanner. -> -> ~~~ -> class Machine: -> pass -> -> class Printer(Machine): -> pass -> -> class Scanner(Machine): -> pass -> -> class Copier(Printer, Scanner): -> # Copier `is a` Printer and `is a` Scanner -> pass -> ~~~ -> {: .language-python} -> -> ~~~ -> class Machine: -> pass -> -> class Printer(Machine): -> pass -> -> class Scanner(Machine): -> pass -> -> class Copier(Machine): -> def __init__(self): -> # Copier `has a` Printer and `has a` Scanner -> self.printer = Printer() -> self.scanner = Scanner() -> ~~~ -> {: .language-python} -> -> Both of these would be perfectly valid models and would work for most purposes. -> However, unless there's something about how you need to use the model -> which would benefit from using a model based on inheritance, -> it's usually recommended to opt for **composition over inheritance**. -> This is a common design principle in the object oriented paradigm and is worth remembering, -> as it's very common for people to overuse inheritance once they've been introduced to it. -> -> For much more detail on this see the -> [Python Design Patterns guide](https://python-patterns.guide/gang-of-four/composition-over-inheritance/). -{: .callout} - -> ## Multiple Inheritance -> -> **Multiple Inheritance** is when a class inherits from more than one direct parent class. -> It exists in Python, but is often not present in other Object Oriented languages. -> Although this might seem useful, like in our inheritance-based model of the photocopier above, -> it's best to avoid it unless you're sure it's the right thing to do, -> due to the complexity of the inheritance heirarchy. -> Often using multiple inheritance is a sign you should instead be using composition - -> again like the photocopier model above. -{: .callout} - - -> ## Exercise: A Model Patient -> -> Let's use what we have learnt in this episode and combine it with what we have learnt on -> [software requirements](../31-software-requirements/index.html) -> to formulate and implement a -> [few new solution requirements](../31-software-requirements/index.html#exercise-new-solution-requirements) -> to extend the model layer of our clinical trial system. -> -> Let's start with extending the system such that there must be -> a `Doctor` class to hold the data representing a single doctor, which: -> -> - must have a `name` attribute -> - must have a list of patients that this doctor is responsible for. -> -> In addition to these, try to think of an extra feature you could add to the models -> which would be useful for managing a dataset like this - -> imagine we're running a clinical trial, what else might we want to know? -> Try using Test Driven Development for any features you add: -> write the tests first, then add the feature. -> The tests have been started for you in `tests/test_patient.py`, -> but you will probably want to add some more. -> -> Once you've finished the initial implementation, do you have much duplicated code? -> Is there anywhere you could make better use of composition or inheritance -> to improve your implementation? -> -> For any extra features you've added, -> explain them and how you implemented them to your neighbour. -> Would they have implemented that feature in the same way? -> -> > ## Solution -> > One example solution is shown below. -> > You may start by writing some tests (that will initially fail), -> > and then develop the code to satisfy the new requirements and pass the tests. -> > ~~~ -> > # file: tests/test_patient.py -> > """Tests for the Patient model.""" -> > -> > def test_create_patient(): -> > """Check a patient is created correctly given a name.""" -> > from inflammation.models import Patient -> > name = 'Alice' -> > p = Patient(name=name) -> > assert p.name == name -> > -> > def test_create_doctor(): -> > """Check a doctor is created correctly given a name.""" -> > from inflammation.models import Doctor -> > name = 'Sheila Wheels' -> > doc = Doctor(name=name) -> > assert doc.name == name -> > -> > def test_doctor_is_person(): -> > """Check if a doctor is a person.""" -> > from inflammation.models import Doctor, Person -> > doc = Doctor("Sheila Wheels") -> > assert isinstance(doc, Person) -> > -> > def test_patient_is_person(): -> > """Check if a patient is a person. """ -> > from inflammation.models import Patient, Person -> > alice = Patient("Alice") -> > assert isinstance(alice, Person) -> > -> > def test_patients_added_correctly(): -> > """Check patients are being added correctly by a doctor. """ -> > from inflammation.models import Doctor, Patient -> > doc = Doctor("Sheila Wheels") -> > alice = Patient("Alice") -> > doc.add_patient(alice) -> > assert doc.patients is not None -> > assert len(doc.patients) == 1 -> > -> > def test_no_duplicate_patients(): -> > """Check adding the same patient to the same doctor twice does not result in duplicates. """ -> > from inflammation.models import Doctor, Patient -> > doc = Doctor("Sheila Wheels") -> > alice = Patient("Alice") -> > doc.add_patient(alice) -> > doc.add_patient(alice) -> > assert len(doc.patients) == 1 -> > ... -> > ~~~ -> > {: .language-python} -> > -> > ~~~ -> > # file: inflammation/models.py -> > ... -> > class Person: -> > """A person.""" -> > def __init__(self, name): -> > self.name = name -> > -> > def __str__(self): -> > return self.name -> > -> > class Patient(Person): -> > """A patient in an inflammation study.""" -> > def __init__(self, name): -> > super().__init__(name) -> > self.observations = [] -> > -> > def add_observation(self, value, day=None): -> > if day is None: -> > try: -> > day = self.observations[-1].day + 1 -> > except IndexError: -> > day = 0 -> > new_observation = Observation(day, value) -> > self.observations.append(new_observation) -> return new_observation -> > -> > class Doctor(Person): -> > """A doctor in an inflammation study.""" -> > def __init__(self, name): -> > super().__init__(name) -> > self.patients = [] -> > -> > def add_patient(self, new_patient): -> > # A crude check by name if this patient is already looked after -> > # by this doctor before adding them -> > for patient in self.patients: -> > if patient.name == new_patient.name: -> > return -> > self.patients.append(new_patient) -> > ... -> > ~~~ -> {: .language-python} -> {: .solution} -{: .challenge} - -{% include links.md %} diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-refactoring-architecture.md new file mode 100644 index 000000000..a00390828 --- /dev/null +++ b/_episodes/35-refactoring-architecture.md @@ -0,0 +1,253 @@ +--- +title: "Architecting Code to Separate Responsibilities" +teaching: 15 +exercises: 50 +questions: +- "What is the point of the MVC architecture" +- "How to design larger solutions." +- "How to tell what is and isn't an appropriate abstraction." +objectives: +- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." +- "How to design large changes to the codebase." +- "Understand how to determine correct abstractions. " +keypoints: +- "By splitting up the \"view\" code from \"model\" code, you allow easier re-use of code." +- "YAGNI - you ain't gonna need it - don't create abstractions that aren't useful." +- "Sketching a diagram of the code can clarify how it is supposed to work, and troubleshoot problems early." +--- + + +## Introduction + +Model-View-Controller (MVC) is a way of separating out different responsibilities of a typical +application. Specifically we have: + +* The **model** which is responsible for the internal data representations for the program, + and the valid operations that can be performed on it. +* The **view** is responsible for how this data is presented to the user (e.g. through a GUI or + by writing out to a file) +* The **controller** is responsible for how the model can be interacted with. + +Separating out these different responsibilities into different parts of the code will make +the code much more maintainable. +For example, if the view code is kept away from the model code, then testing the model code +can be done without having to worry about how it will be presented. + +It helps with readability, as it makes it easier to have each function doing +just one thing. + +It also helps with maintainability - if the UI requirements change, these changes +are easily isolated from the more complex logic. + +## Separating out responsibilities + +The key thing to take away from MVC is the distinction between model code and view code. + +> ## What about the controller +> The view and the controller tend to be more tightly coupled and it isn't always sensible +> to draw a thick line dividing these two. Depending on how the user interacts with the software +> this distinction may not be possible (the code that specifies there is a button on the screen, +> might be the same code that specifies what that button does). In fact, the original proposer +> of MVC groups the views and the controller into a single element, called the tool. Other modern +> architectures like Model-View-Presenter do away with the controller and instead separate out the +> layout code from a programmable view of the UI. +{: .callout} + +The view code might be hard to test, or use libraries to draw the UI, but should +not contain any complex logic, and is really just a presentation layer on top of the model. + +The model, conversely, should not really care how the data is displayed. +For example, perhaps the UI always presents dates as "Monday 24th July 2023", but the model +would still store this using a `Date` rather than just that string. + +> ## Exercise: Identify model and view parts of the code +> Looking at the code inside `compute_data.py`, +> +> * What parts should be considered **model** code +> * What parts should be considered **view** code? +> * What parts should be considered **controller** code? +> +>> ## Solution +>> * The computation of the standard deviation is **model** code +>> * Reading the data from the CSV is also **model** code. +>> * The display of the output as a graph is the **view** code. +>> * The logic that processes the supplied flats is the **controller**. +> {: .solution} +{: .challenge} + +Within the model there is further separation that makes sense. +For example, as we did in the last episode, separating out the impure code that interacts with file systems from +the pure calculations is helps with readability and testability. +Nevertheless, the MVC approach is a great starting point when thinking about how you should structure your code. + +> ## Exercise: Split out the model code from the view code +> Refactor `analyse_data` such the *view* code we identified in the last +> exercise is removed from the function, so the function contains only +> *model* code, and the *view* code is moved elsewhere. +>> ## Solution +>> The idea here is to have `analyse_data` to not have any "view" considerations. +>> That is, it should just compute and return the data. +>> +>> ```python +>> def analyse_data(data_dir): +>> """Calculate the standard deviation by day between datasets +>> Gets all the inflammation csvs within a directory, works out the mean +>> inflammation value for each day across all datasets, then graphs the +>> standard deviation of these means.""" +>> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> +>> return daily_standard_deviation +>> ``` +>> There can be a separate bit of code that chooses how that should be presented, e.g. as a graph: +>> +>> ```python +>> if args.full_data_analysis: +>> _, extension = os.path.splitext(InFiles[0]) +>> if extension == '.json': +>> data_source = JSONDataSource(os.path.dirname(InFiles[0])) +>> elif extension == '.csv': +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> else: +>> raise ValueError(f'Unsupported file format: {extension}') +>> analyse_data(data_source) +>> graph_data = { +>> 'standard deviation by day': data_result, +>> } +>> views.visualize(graph_data) +>> return +>> ``` +>> You might notice this is more-or-less the change we did to write our +>> regression test. +>> This demonstrates that splitting up model code from view code can +>> immediately make your code much more testable. +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data`. +> {: .solution} +{: .challenge} + +## Programming patterns + +MVC is a **programming pattern**. Programming patterns are templates for structuring code. +Patterns are a useful starting point for how to design your software. +They also work as a common vocabulary for discussing software designs with +other developers. + +The Refactoring Guru website has a [list of programming patterns](https://refactoring.guru/design-patterns/catalog). +They aren't all good design decisions, and can certainly be over-applied, but learning about them can be helpful +for thinking at a big picture level about software design. + +For example, the [visitor pattern](https://refactoring.guru/design-patterns/visitor) is +a good way of separating the problem of how to move through the data +from a specific action you want to perform on the data. + +By having a terminology for these approaches can facilitate discussions +where everyone is familiar with them. +However, they cannot replace a full design as most problems will require +a bespoke design that maps cleanly on to the specific problem you are +trying to solve. + +## Architecting larger changes + +When creating a new application, or creating a substantial change to an existing one, +it can be really helpful to sketch out the intended architecture on a whiteboard +(pen and paper works too, though of course it might get messy as you iterate on the design!). + +The basic idea is you draw boxes that will represent different units of code, as well as +other components of the system (such as users, databases etc). +Then connect these boxes with lines where information or control will be exchanged. +These lines represent the interfaces in your system. + +As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. +For example, if there is a circular dependency between two sections of the design. +It can also help with estimating how long the work will take, as it forces you to consider all the components that +need to be made. + +Diagrams aren't foolproof, and often the stuff we haven't considered won't make it on to the diagram +but they are a great starting point to break down the different responsibilities and think about +the kinds of information different parts of the system will need. + + +> ## Exercise: Design a high-level architecture +> Sketch out a design for a new feature requested by a user +> +> *"I want there to be a Google Drive folder that when I upload new inflammation data to +> the software automatically pulls it down and updates the analysis. +> The new result should be added to a database with a timestamp. +> An email should then be sent to a group email notifying them of the change."* +>> ## Solution +>> +>> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg) +> {: .solution} +{: .challenge} + +## An abstraction too far + +So far we have seen how abstractions are good for making code easier to read, maintain and test. +However, it is possible to introduce too many abstractions. + +> All problems in computer science can be solved by another level of indirection except the problem of too many levels of indirection + +When you introduce an abstraction, if the reader of the code needs to understand what is happening inside the abstraction, +it has actually made the code *harder* to read. +When code is just in the function, it can be clear to see what it is doing. +When the code is calling out to an instance of a class that, thanks to polymorphism, could be a range of possible implementations, +the only way to find out what is *actually* being called is to run the code and see. +This is much slower to understand, and actually obfuscates meaning. + +It is a judgement as to whether you have make the code too abstract. +If you have to jump around a lot when reading the code that is a clue that is too abstract. +Similarly, if there are two parts of the code that always need updating together, that is +again an indication of an incorrect or over-zealous abstraction. + + +## You Ain't Gonna Need It + +There are different approaches to designing software. +One principle that is popular is called You Ain't Gonna Need it - "YAGNI" for short. +The idea is that, since it is hard to predict the future needs of a piece of software, +it is always best to design the simplest solution that solves the problem at hand. +This is opposed to trying to imagine how you might want to adapt the software in future +and designing the code with that in mind. + +Then, since you know the problem you are trying to solve, you can avoid making your solution unnecessarily complex or abstracted. + +In our example, it might be tempting to abstract how the `CSVDataSource` walks the file tree into a class. +However, since we only have one strategy for exploring the file tree, this would just create indirection for the sake of it +- now a reader of CSVDataSource would have to read a different class to find out how the tree is walked. +Maybe in the future this is something that needs to be customised, but we haven't really made it any harder to do by *not* doing this prematurely +and once we have the concrete feature request, it will be easier to design it appropriately. + +> All of this is a judgement. +> For example, in this case, perhaps it *would* make sense to at least pull the file parsing out into a separate +> class, but not have the CSVDataSource be configurable. +> That way, it is clear to see how the file tree is being walked (there's no polymorphism going on) +> without mixing the *parsing* code in with the file finding code. +> There are no right answers, just guidelines. +{: .callout} + +> ## Exercise: Applying to real world examples +> Thinking about the examples of good and bad code you identified at the start of the episode. +> Identify what kind of principles were and weren't being followed +> Identify some refactorings that could be performed that would improve the code +> Discuss the ideas as a group. +{: .challenge} + +## Conclusion + +Good architecture is not about applying any rules blindly, but instead practise and taking care around important things: + +* Avoid duplication of code or data. +* Keeping how much a person has to understand at once to a minimum. +* Think about how interfaces will work. +* Separate different considerations into different sections of the code. +* Don't try and design a future proof solution, focus on the problem at hand. + +Practise makes perfect. +One way to practise is to consider code that you already have and think how it might be redesigned. +Another way is to always try to leave code in a better state that you found it. +So when you're working on a less well structured part of the code, start by refactoring it so that your change fits in cleanly. +Doing this, over time, with your colleagues, will improve your skills as software architecture as well as improving the code. + + +{% include links.md %} diff --git a/_episodes/36-architecture-revisited.md b/_episodes/36-architecture-revisited.md deleted file mode 100644 index 0e6ed0186..000000000 --- a/_episodes/36-architecture-revisited.md +++ /dev/null @@ -1,450 +0,0 @@ ---- -title: "Architecture Revisited: Extending Software" -teaching: 15 -exercises: 0 -questions: -- "How can we extend our software within the constraints of the MVC architecture?" -objectives: -- "Extend our software to add a view of a single patient in the study and the software's command line interface to request a specific view." -keypoints: -- "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. - Such components can be as small as a single function, or be a software package in their own right." ---- - -As we have seen, we have different programming paradigms that are suitable for different problems -and affect the structure of our code. -In programming languages that support multiple paradigms, such as Python, -we have the luxury of using elements of different paradigms paradigms and we, -as software designers and programmers, -can decide how to use those elements in different architectural components of our software. -Let's now circle back to the architecture of our software for one final look. - -## MVC Revisited - -We've been developing our software using the **Model-View-Controller** (MVC) architecture so far, -but, as we have seen, MVC is just one of the common architectural patterns -and is not the only choice we could have made. - -There are many variants of an MVC-like pattern (such as -[Model-View-Presenter](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93presenter) (MVP), -[Model-View-Viewmodel](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel) (MVVM), etc.), -but in most cases, the distinction between these patterns isn't particularly important. -What really matters is that we are making decisions about the architecture of our software -that suit the way in which we expect to use it. -We should reuse these established ideas where we can, but we don't need to stick to them exactly. - -In this episode we'll be taking our Object Oriented code from the previous episode -and integrating it into our existing MVC pattern. -But first we will explain some features of -the Controller (`inflammation-analysis.py`) component of our architecture. - -### Controller Structure - -You will have noticed already that structure of the `inflammation-analysis.py` file -follows this pattern: - -~~~ -# import modules - -def main(): - # perform some actions - -if __name__ == "__main__": - # perform some actions before main() - main() -~~~ -{: .language-python} - -In this pattern the actions performed by the script are contained within the `main` function -(which does not need to be called `main`, -but using this convention helps others in understanding your code). -The `main` function is then called within the `if` statement `__name__ == "__main__"`, -after some other actions have been performed -(usually the parsing of command-line arguments, which will be explained below). -`__name__` is a special dunder variable which is set, -along with a number of other special dunder variables, -by the python interpreter before the execution of any code in the source file. -What value is given by the interpreter to `__name__` is determined by -the manner in which it is loaded. - -If we run the source file directly using the Python interpreter, e.g.: - -~~~ -$ python3 inflammation-analysis.py -~~~ -{: .language-bash} - -then the interpreter will assign the hard-coded string `"__main__"` to the `__name__` variable: - -~~~ -__name__ = "__main__" -... -# rest of your code -~~~ -{: .language-python} - -However, if your source file is imported by another Python script, e.g: - -~~~ -import inflammation-analysis -~~~ -{: .language-python} - -then the interpreter will assign the name `"inflammation-analysis"` -from the import statement to the `__name__` variable: - -~~~ -__name__ = "inflammation-analysis" -... -# rest of your code -~~~ -{: .language-python} - -Because of this behaviour of the interpreter, -we can put any code that should only be executed when running the script -directly within the `if __name__ == "__main__":` structure, -allowing the rest of the code within the script to be -safely imported by another script if we so wish. - -While it may not seem very useful to have your controller script importable by another script, -there are a number of situations in which you would want to do this: - -- for testing of your code, you can have your testing framework import the main script, - and run special test functions which then call the `main` function directly; -- where you want to not only be able to run your script from the command-line, - but also provide a programmer-friendly application programming interface (API) for advanced users. - -### Passing Command-line Options to Controller - -The standard Python library for reading command line arguments passed to a script is -[`argparse`](https://docs.python.org/3/library/argparse.html). -This module reads arguments passed by the system, -and enables the automatic generation of help and usage messages. -These include, as we saw at the start of this course, -the generation of helpful error messages when users give the program invalid arguments. - -The basic usage of `argparse` can be seen in the `inflammation-analysis.py` script. -First we import the library: - -~~~ -import argparse -~~~ -{: .language-python} - -We then initialise the argument parser class, passing an (optional) description of the program: - -~~~ -parser = argparse.ArgumentParser( - description='A basic patient inflammation data management system') -~~~ -{: .language-python} - -Once the parser has been initialised we can add -the arguments that we want argparse to look out for. -In our basic case, we want only the names of the file(s) to process: - -~~~ -parser.add_argument( - 'infiles', - nargs='+', - help='Input CSV(s) containing inflammation series for each patient') -~~~ -{: .language-python} - -Here we have defined what the argument will be called (`'infiles'`) when it is read in; -the number of arguments to be expected -(`nargs='+'`, where `'+'` indicates that there should be 1 or more arguments passed); -and a help string for the user -(`help='Input CSV(s) containing inflammation series for each patient'`). - -You can add as many arguments as you wish, -and these can be either mandatory (as the one above) or optional. -Most of the complexity in using `argparse` is in adding the correct argument options, -and we will explain how to do this in more detail below. - -Finally we parse the arguments passed to the script using: - -~~~ -args = parser.parse_args() -~~~ -{: .language-python} - -This returns an object (that we've called `arg`) containing all the arguments requested. -These can be accessed using the names that we have defined for each argument, -e.g. `args.infiles` would return the filenames that have been input. - -The help for the script can be accessed using the `-h` or `--help` optional argument -(which `argparse` includes by default): - -~~~ -$ python3 inflammation-analysis.py --help -~~~ -{: .language-bash} - -~~~ -usage: inflammation-analysis.py [-h] infiles [infiles ...] - -A basic patient inflammation data management system - -positional arguments: - infiles Input CSV(s) containing inflammation series for each patient - -optional arguments: - -h, --help show this help message and exit -~~~ -{: .output} - -The help page starts with the command line usage, -illustrating what inputs can be given (any within `[]` brackets are optional). -It then lists the **positional** and **optional** arguments, -giving as detailed a description of each as you have added to the `add_argument()` command. -Positional arguments are arguments that need to be included -in the proper position or order when calling the script. - -Note that optional arguments are indicated by `-` or `--`, followed by the argument name. -Positional arguments are simply inferred by their position. -It is possible to have multiple positional arguments, -but usually this is only practical where all (or all but one) positional arguments -contains a clearly defined number of elements. -If more than one option can have an indeterminate number of entries, -then it is better to create them as 'optional' arguments. -These can be made a required input though, -by setting `required = True` within the `add_argument()` command. - -> ## Positional and Optional Argument Order -> -> The usage section of the help page above shows -> the optional arguments going before the positional arguments. -> This is the customary way to present options, but is not mandatory. -> Instead there are two rules which must be followed for these arguments: -> -> 1. Positional and optional arguments must each be given all together, and not inter-mixed. -> For example, the order can be either `optional - positional` or `positional - optional`, -> but not `optional - positional - optional`. -> 2. Positional arguments must be given in the order that they are shown -> in the usage section of the help page. -{: .callout} - -Now that you have some familiarity with `argparse`, -we will demonstrate below how you can use this to add extra functionality to your controller. - -### Adding a New View - -Let's start with adding a view that allows us to see the data for a single patient. -First, we need to add the code for the view itself -and make sure our `Patient` class has the necessary data - -including the ability to pass a list of measurements to the `__init__` method. -Note that your Patient class may look very different now, -so adapt this example to fit what you have. - -~~~ -# file: inflammation/views.py - -... - -def display_patient_record(patient): - """Display data for a single patient.""" - print(patient.name) - for obs in patient.observations: - print(obs.day, obs.value) -~~~ -{: .language-python} - -~~~ -# file: inflammation/models.py - -... - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return self.value - -class Person: - def __init__(self, name): - self.name = name - - def __str__(self): - return self.name - -class Patient(Person): - """A patient in an inflammation study.""" - def __init__(self, name, observations=None): - super().__init__(name) - - self.observations = [] - ### MODIFIED START ### - if observations is not None: - self.observations = observations - ### MODIFIED END ### - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation -~~~ -{: .language-python} - -Now we need to make sure people can call this view - -that means connecting it to the controller -and ensuring that there's a way to request this view when running the program. -The changes we need to make here are that the `main` function -needs to be able to direct us to the view we've requested - -and we need to add to the command line interface - the controller - -the necessary data to drive the new view. - -~~~ -# file: inflammation-analysis.py - -#!/usr/bin/env python3 -"""Software for managing patient data in our imaginary hospital.""" - -import argparse - -from inflammation import models, views - - -def main(args): - """The MVC Controller of the patient data system. - - The Controller is responsible for: - - selecting the necessary models and views for the current task - - passing data between models and views - """ - infiles = args.infiles - if not isinstance(infiles, list): - infiles = [args.infiles] - - for filename in infiles: - inflammation_data = models.load_csv(filename) - - ### MODIFIED START ### - if args.view == 'visualize': - view_data = { - 'average': models.daily_mean(inflammation_data), - 'max': models.daily_max(inflammation_data), - 'min': models.daily_min(inflammation_data), - } - - views.visualize(view_data) - - elif args.view == 'record': - patient_data = inflammation_data[args.patient] - observations = [models.Observation(day, value) for day, value in enumerate(patient_data)] - patient = models.Patient('UNKNOWN', observations) - - views.display_patient_record(patient) - ### MODIFIED END ### - - -if __name__ == "__main__": - parser = argparse.ArgumentParser( - description='A basic patient data management system') - - parser.add_argument( - 'infiles', - nargs='+', - help='Input CSV(s) containing inflammation series for each patient') - - ### MODIFIED START ### - parser.add_argument( - '--view', - default='visualize', - choices=['visualize', 'record'], - help='Which view should be used?') - - parser.add_argument( - '--patient', - type=int, - default=0, - help='Which patient should be displayed?') - ### MODIFIED END ### - - args = parser.parse_args() - - main(args) -~~~ -{: .language-python} - -We've added two options to our command line interface here: -one to request a specific view and one for the patient ID that we want to lookup. -For the full range of features that we have access to with `argparse` see the -[Python module documentation](https://docs.python.org/3/library/argparse.html?highlight=argparse#module-argparse). -Allowing the user to request a specific view like this is -a similar model to that used by the popular Python library Click - -if you find yourself needing to build more complex interfaces than this, -Click would be a good choice. -You can find more information in [Click's documentation](https://click.palletsprojects.com/). - -For now, we also don't know the names of any of our patients, -so we've made it `'UNKNOWN'` until we get more data. - -We can now call our program with these extra arguments to see the record for a single patient: - -~~~ -$ python3 inflammation-analysis.py --view record --patient 1 data/inflammation-01.csv -~~~ -{: .language-bash} - -~~~ -UNKNOWN -0 0.0 -1 0.0 -2 1.0 -3 3.0 -4 1.0 -5 2.0 -6 4.0 -7 7.0 -... -~~~ -{: .output} - -> ## Additional Material -> -> Now that we've covered the basics of different programming paradigms -> and how we can integrate them into our multi-layer architecture, -> there are two optional extra episodes which you may find interesting. -> -> Both episodes cover the persistence layer of software architectures -> and methods of persistently storing data, but take different approaches. -> The episode on [persistence with JSON](/persistence) covers -> some more advanced concepts in Object Oriented Programming, while -> the episode on [databases](/databases) starts to build towards a true multilayer architecture, -> which would allow our software to handle much larger quantities of data. -{: .callout} - - -## Towards Collaborative Software Development - -Having looked at some theoretical aspects of software design, -we are now circling back to implementing our software design -and developing our software to satisfy the requirements collaboratively in a team. -At an intermediate level of software development, -there is a wealth of practices that could be used, -and applying suitable design and coding practices is what separates -an intermediate developer from someone who has just started coding. -The key for an intermediate developer is to balance these concerns -for each software project appropriately, -and employ design and development practices enough so that progress can be made. - -One practice that should always be considered, -and has been shown to be very effective in team-based software development, -is that of *code review*. -Code reviews help to ensure the 'good' coding standards are achieved -and maintained within a team by having multiple people -have a look and comment on key code changes to see how they fit within the codebase. -Such reviews check the correctness of the new code, test coverage, functionality changes, -and confirm that they follow the coding guides and best practices. -Let's have a look at some code review techniques available to us. diff --git a/_extras/databases.md b/_extras/databases.md index b4bc67a65..5fda791d9 100644 --- a/_extras/databases.md +++ b/_extras/databases.md @@ -16,7 +16,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../36-architecture-revisited/index.html#additional-material). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#additional-material). {: .callout} A **database** is an organised collection of data, diff --git a/_extras/persistence.md b/_extras/persistence.md index f071c82ff..b207e0458 100644 --- a/_extras/persistence.md +++ b/_extras/persistence.md @@ -25,7 +25,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../36-architecture-revisited/index.html#additional-material). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#additional-material). {: .callout} Our patient data system so far can read in some data, process it, and display it to people. diff --git a/fig/example-architecture-daigram.mermaid.txt b/fig/example-architecture-daigram.mermaid.txt new file mode 100644 index 000000000..c3ab99112 --- /dev/null +++ b/fig/example-architecture-daigram.mermaid.txt @@ -0,0 +1,18 @@ +graph TD + A[(GDrive Folder)] + B[(Database)] + C[GDrive Monitor] + C -- Checks periodically--> A + D[Download inflammation data] + C -- Trigger update --> D + E[Parse inflammation data] + D --> E + F[Perform analysis] + E --> F + G[Upload analysis] + F --> G + G --> B + H[Notify users] + I[Monitor database] + I -- Check periodically --> B + I --> H diff --git a/fig/example-architecture-diagram.svg b/fig/example-architecture-diagram.svg new file mode 100644 index 000000000..02a7ecceb --- /dev/null +++ b/fig/example-architecture-diagram.svg @@ -0,0 +1 @@ +
Checks periodically
Trigger update
Check periodically
GDrive Folder
Database
GDrive Monitor
Download inflammation data
Parse inflammation data
Perform analysis
Upload analysis
Notify users
Monitor database
\ No newline at end of file