diff --git a/_config.yml b/_config.yml index 434f152ab..6c956248b 100644 --- a/_config.yml +++ b/_config.yml @@ -95,10 +95,13 @@ extras_order: - discuss - protect-main-branch - vscode + - software-architecture-extra + - programming-paradigms + - procedural-programming - functional-programming + - object-oriented-programming - persistence - databases - - verifying-code-style-linters - quiz # Files and directories that are not to be copied. exclude: diff --git a/_episodes/15-coding-conventions.md b/_episodes/15-coding-conventions.md index 3b94d62ae..0404f1233 100644 --- a/_episodes/15-coding-conventions.md +++ b/_episodes/15-coding-conventions.md @@ -438,7 +438,7 @@ because an incorrect comment causes more confusion than no comment at all. >> which is helpfully marking inconsistencies with coding guidelines by underlying them. >> There are a few things to fix in `inflammation-analysis.py`, for example: >> ->> 1. Line 24 in `inflammation-analysis.py` is too long and not very readable. +>> 1. Line 30 in `inflammation-analysis.py` is too long and not very readable. >> A better style would be to use multiple lines and hanging indent, >> with the closing brace `}' aligned either with >> the first non-whitespace character of the last line of list @@ -487,7 +487,7 @@ because an incorrect comment causes more confusion than no comment at all. >> Note how PyCharm is warning us by underlying the whole line. >> >> 4. Only one blank line after the end of definition of function `main` ->> and the rest of the code on line 30 in `inflammation-analysis.py` - +>> and the rest of the code on line 33 in `inflammation-analysis.py` - >> should be two blank lines. >> Note how PyCharm is warning us by underlying the whole line. >> diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index 2bc022d39..5835ddc5d 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -2,7 +2,7 @@ title: "Section 3: Software Development as a Process" colour: "#fafac8" start: true -teaching: 5 +teaching: 10 exercises: 0 questions: - "How can we design and write 'good' software that meets its goals and requirements?" @@ -13,7 +13,11 @@ objectives: keypoints: - "Software engineering takes a wider view of software development beyond programming (or coding)." - "Ensuring requirements are sufficiently captured is critical to the success of any project." -- "Following a process makes development predictable, can save time, and helps ensure each stage of development is given sufficient consideration before proceeding to the next." +- "Following a process makes software development predictable, saves time in the long run, + and helps ensure each stage of development is given sufficient consideration + before proceeding to the next." +- "Once you get the hang of a programming language, writing code to do what you want is relatively +easy. The hard part is writing code that is easy to adapt when your requirements change." --- In this section, we will take a step back from coding development practices and tools @@ -65,7 +69,7 @@ Someone who is engineering software takes a wider view: but there is an assumption that the software - or even just a part of it - could be reused in the future. -### The Software Development Process +### Software Development Process The typical stages of a software development process can be categorised as follows: @@ -75,7 +79,7 @@ The typical stages of a software development process can be categorised as follo This helps maintain a clear direction throughout development, and sets clear targets for what the software needs to do. - **Design:** where the requirements are translated into an overall design for the software. - It covers what will be the basic software 'components' and how they'll fit together, + It covers what will be the basic software 'components' and how they will fit together, as well as the tools and technologies that will be used, which will together address the requirements identified in the first stage. - **Implementation:** the software is developed according to the design, @@ -99,7 +103,7 @@ these stages are followed implicitly or explicitly in every software project. What is required for a project (during requirements gathering) is always considered, for example, even if it isn't explored sufficiently or well understood. -Following a process of development offers some major benefits: +Following a **process** of development offers some major benefits: - **Stage gating:** a quality *gate* at the end of each stage, where stakeholders review the stage's outcomes to decide @@ -115,31 +119,27 @@ Following a process of development offers some major benefits: - **Transparency:** essentially, each stage generates output(s) into subsequent stages, which presents opportunities for them to be published as part of an open development process. -- **It saves time:** a well-known result from +- **Time saving:** a well-known result from [empirical software engineering studies](https://web.archive.org/web/20160731150816/http://superwebdeveloper.com/2009/11/25/the-incredible-rate-of-diminishing-returns-of-fixing-software-bugs/) - is that it becomes exponentially more expensive to fix mistakes in future stages. - For example, if a mistake takes 1 hour to fix in requirements, + is that fixing software mistakes is exponentially more expensive in later software development + stages. + For example, if a mistake takes 1 hour to fix in the requirements stage, it may take 5 times that during design, and perhaps as much as 20 times that to fix if discovered during testing. In this section we will place the actual writing of software (implementation) -within the context of the typical software development process: +within the context of a typical software development process: - Explore the **importance of software requirements**, - the different classes of requirements, + different classes of requirements, and how we can interpret and capture them. - How requirements inform and drive the **design of software**, the importance, role, and examples of **software architecture**, and the ways we can describe a software design. -- **Implementation choices** in terms of **programming paradigms**, - looking at **procedural**, **functional**, and **object oriented** paradigms of development. - Modern software will often contain instances of multiple paradigms, - so it is worthwhile being familiar with them and knowing when - to switch in order to make better code. -- How you can (and should) assess and update a software's architecture when - requirements change and complexity increases - - is the architecture still fit for purpose, - or are modifications and extensions becoming increasingly difficult to make? +- How to **improve** existing code to be more **readable**, **testable** and **maintainable**. +- Consider different strategies for writing well designed code, including + using **pure functions**, **classes** and **abstractions**. +- How to create, assess and improve **software design**. {% include links.md %} diff --git a/_episodes/31-software-requirements.md b/_episodes/31-software-requirements.md index 78cca1e8f..9faf0ed08 100644 --- a/_episodes/31-software-requirements.md +++ b/_episodes/31-software-requirements.md @@ -1,7 +1,7 @@ --- title: "Software Requirements" -teaching: 15 -exercises: 30 +teaching: 25 +exercises: 15 questions: - "Where do we start when beginning a new software project?" - "How can we capture and organise what is required for software to function as intended?" @@ -22,7 +22,7 @@ The requirements of our software are the basis on which the whole project rests if we get the requirements wrong, we'll build the wrong software. However, it's unlikely that we'll be able to determine all of the requirements upfront. Especially when working in a research context, -requirements are flexible and may change as we develop our software. +requirements are flexible and may change as we develop our software. ## Types of Requirements @@ -223,17 +223,16 @@ and these aspects should be considered as part of the software's non-functional > ## Optional Exercise: Requirements for Your Software Project > -> Think back to a piece of code or software (either small or large) you've written, +> Think back to a piece of code or software (either small or large) you have written, > or which you have experience using. > First, try to formulate a few of its key business requirements, -> then derive these into user and then solution requirements -> (in a similar fashion to the ones above in *Types of Requirements*). +> then derive these into user and then solution requirements. {: .challenge} ### Long- or Short-Lived Code? -Along with requirements, here's something to consider early on. +Along with requirements, here is something to consider early on. You, perhaps with others, may be developing open-source software with the intent that it will live on after your project completes. It could be important to you that your software is adopted and used by other projects @@ -249,10 +248,10 @@ so be sure to consider these aspects. On the other hand, you might want to knock together some code to prove a concept or to perform a quick calculation and then just discard it. -But can you be sure you'll never want to use it again? -Maybe a few months from now you'll realise you need it after all, +But can you be sure you will never want to use it again? +Maybe a few months from now you will realise you need it after all, or you'll have a colleague say "I wish I had a..." -and realise you've already made one. +and realise you have already made one. A little effort now could save you a lot in the future. ## From Requirements to Implementation, via Design @@ -269,12 +268,12 @@ At each level, not only are the perspectives different, but so are the nature of the objectives and the language used to describe them, since they each reflect the perspective and language of their stakeholder group. -It's often tempting to go right ahead and implement requirements within existing software, +It is often tempting to go right ahead and implement requirements within existing software, but this neglects a crucial step: do these new requirements fit within our existing design, or does our design need to be revisited? It may not need any changes at all, -but if it doesn't fit logically our design will need a bigger rethink +but if it does not fit logically our design will need a bigger rethink so the new requirement can be implemented in a sensible way. We'll look at this a bit later in this section, but simply adding new code without considering diff --git a/_episodes/32-software-architecture-design.md b/_episodes/32-software-architecture-design.md new file mode 100644 index 000000000..37d475414 --- /dev/null +++ b/_episodes/32-software-architecture-design.md @@ -0,0 +1,370 @@ +--- +title: "Software Architecture and Design" +teaching: 25 +exercises: 25 +questions: +- "Why should we invest time in software design?" +- "What should we consider when designing software?" +- "What is software architecture?" +objectives: +- "List the common aspects of software architecture and design." +- "Describe the term technical debt and how it impacts software." +- "Understand the goals and principles of designing 'good' software." +- "Use a diagramming technique to describe a software architecture." +- "What are the components of Model-View-Controller (MVC) architecture?" +- "Understand the use of common design patterns to improve the extensibility, reusability and +overall quality of software." +- "List some best practices when designing software." +keypoints: +- "'Good' code is designed to be maintainable: readable by people who did not author the code, +testable through a set of automated tests, adaptable to new requirements." +- "Use abstraction and decoupling to logically separate the different aspects of your software within design as well as implementation." +- "Use refactoring to improve existing code to improve its consistency internally and within its overall architecture." +- "Include software design as a key stage in the lifecycle of your project so that development and maintenance becomes easier." +--- + +## Introduction + +Ideally, we should have at least a rough design of our software sketched out +before we write a single line of code. +This design should be based around the requirements and the structure of the problem we are trying +to solve: what are the concepts we need to represent in our code +and what are the relationships between them. +And importantly, who will be using our software and how will they interact with it. + +As a piece of software grows, +it will reach a point where there is too much code for us to keep in mind at once. +At this point, it becomes particularly important to think of the overall design and +structure of our software, how should all the pieces of functionality fit together, +and how should we work towards fulfilling this overall design throughout development. +Even if you did not think about the design of your software from the very beginning - +it is not too late to start now. + +It is not easy to come up with a complete definition for the term **software design**, +but some of the common aspects are: + +- **Software architecture** - + what components will the software have and how will they cooperate? +- **System architecture** - + what other things will this software have to interact with and how will it do this? +- **UI/UX** (User Interface / User Experience) - + how will users interact with the software? +- **Algorithm design** - + what method are we going to use to solve the core research/business problem? + +There is literature on each of the above software design aspects - we will not go into details of +them all here. +Instead, we will learn some techniques to structure our code better to satisfy some of the +requirements of 'good' software and revisit +our software's [MVC architecture](/11-software-project/index.html#software-architecture) +in the context of software design. + +## Poor Design Choices & Technical Debt + +When faced with a problem that you need to solve by writing code - it may be tempted to +skip the design phase and dive straight into coding. +What happens if you do not follow the good software design and development best practices? +It can lead to accumulated 'technical debt', +which (according to [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt)), +is the "cost of additional rework caused by choosing an easy (limited) solution now +instead of using a better approach that would take longer". +The pressure to achieve project goals can sometimes lead to quick and easy solutions, +which make the software become +more messy, more complex, and more difficult to understand and maintain. +The extra effort required to make changes in the future is the interest paid on the (technical) debt. +It is natural for software to accrue some technical debt, +but it is important to pay off that debt during a maintenance phase - +simplifying, clarifying the code, making it easier to understand - +to keep these interest payments on making changes manageable. + +There is only so much time available in a project. +How much effort should we spend on designing our code properly +and using good development practices? +The following [XKCD comic](https://xkcd.com/844/) summarises this tension: + +![Writing good code comic](../fig/xkcd-good-code-comic.png){: .image-with-shadow width="400px" } + +At an intermediate level there are a wealth of practices that *could* be used, +and applying suitable design and coding practices is what separates +an *intermediate developer* from someone who has just started coding. +The key for an intermediate developer is to balance these concerns +for each software project appropriately, +and employ design and development practices *enough* so that progress can be made. +It is very easy to under-design software, +but remember it is also possible to over-design software too. + +## Good Software Design Goals + +Aspirationally, what makes good code can be summarised in the following quote from the +[Intent HG blog](https://intenthq.com/blog/it-audience/what-is-good-code-a-scientific-definition/): + +> *“Good code is written so that is readable, understandable, +> covered by automated tests, not over complicated +> and does well what is intended to do.”* + +Software has become a crucial aspect of reproducible research, as well as an asset that +can be reused or repurposed. +Thus, it is even more important to take time to design the software to be easily *modifiable* and +*extensible*, to save ourselves and our team a lot of time later on when we have +to fix a problem or the software's requirements change. + +Satisfying the above properties will lead to an overall software design +goal of having *maintainable* code, which is: + +* **Understandable** by developers who did not develop the code, +by having a clear and well-considered high-level design (or *architecture*) that separates out the different components and aspects of its function logically +and in a modular way, and having the interactions between these different parts clear, simple, and sufficiently high-level that they do not contravene this design. This is known as *separation of concerns*, and is a key principle in good software design. + * Moving this forward into implementation, *understandable* would mean being consistent in coding style, using sensible naming conventions for functions, classes and variables, documenting and commenting code, having a simple control flow, and having small functions and methods focused on single tasks. +* **Adaptable** by designing the code to be easily modifiable and extensible to satisfy new requirements, +by incorporating points in the modular design where new behaviour can be added in a clear and straightforward manner +(e.g. as individual functions in existing modules, or perhaps at a higher-level as plugins). + * In an implementation sense, this means writing low-coupled/decoupled code where each part of the code has a separate concern, and has the lowest possible dependency on other parts of the code. + This makes it easier to test, update or replace. +* **Testable** by designing the code in a sufficiently modular way to make it easier to test the functionality within a modular design, +either as a whole or in terms of its individual functions. + * This would carry forward in an implementation sense in two ways. Firstly, having functions sufficiently small to be amenable to individual (ideally automated) test cases, e.g. by writing unit, regression tests to verify the code produces + the expected outputs from controlled inputs and exhibits the expected behavior over time + as the code changes. + Secondly, at a higher-level in implementation, this would allow functional tests to be written to create tests to verify entire pathways through the code, from initial software input to testing eventual output. + +Now that we know what goals we should aspire to, let us take a critical look at the code in our +software project and try to identify ways in which it can be improved. + +Our software project contains a branch `full-data-analysis` with code for a new feature of our +inflammation analysis software. Recall that you can see all your branches as follows: + +~~~ +$ git branch --all +~~~ +{: .language-bash} + +Let's checkout a new local branch from the `full-data-analysis` branch, making sure we +have saved and committed all current changes before doing so. + +~~~ +git checkout -b full-data-analysis +~~~ +{: .language-bash} + +This new feature enables user to pass a new command-line parameter `--full-data-analysis` causing +the software to find the directory containing the first input data file (provided via command line +parameter `infiles`) and invoke the data analysis over all the data files in that directory. +This bit of functionality is handled by `inflammation-analysis.py` in the project root. + +The new data analysis code is located in `compute_data.py` file within the `inflammation` directory +in a function called `analyse_data()`. +This function loads all the data files for a given a directory path, then +calculates and compares standard deviation across all the data by day and finaly plots a graph. + +> ## Exercise: Identify How Can Code be Improved? +> +> Critically examine the code in `analyse_data()` function in `compute_data.py` file. +> +> In what ways does this code not live up to the ideal properties of 'good' code? +> Think about ways in which you find it hard to understand. +> Think about the kinds of changes you might want to make to it, and what would +> make making those changes challenging. +>> ## Solution +>> You may have found others, but here are some of the things that make the code +>> hard to read, test and maintain. +>> +>> * **Hard to read:** everything is implemented in a single function. +>> In order to understand it, you need to understand how file loading works at the same time as +>> the analysis itself. +>> * **Hard to modify:** if you wanted to use the data for some other purpose and not just +>> plotting the graph you would have to change the `data_analysis()` function. +>> * **Hard to modify or test:** it is always analysing a fixed set of CSV data files +>> stored on a disk. +>> * **Hard to modify:** it does not have any tests so we cannot be 100% confident the code does +>> what it claims to do; any changes to the code may break something and it would be harder and +>> more time-consuming to figure out what. +>> +>> Make sure to keep the list you have created in the exercise above. +>> For the remainder of this section, we will work on improving this code. +>> At the end, we will revisit your list to check that you have learnt ways to address each of the +>> problems you had found. +>> +>> There may be other things to improve with the code on this branch, e.g. how command line +>> parameters are being handled in `inflammation-analysis.py`, but we are focussing on +>> `analyse_data()` function for the time being. +> {: .solution} +{: .challenge} + +## Software Architecture + +A software architecture is the fundamental structure of a software system +that is typically decided at the beginning of project development +based on its requirements and is not that easy to change once implemented. +It refers to a "bigger picture" of a software system +that describes high-level components (modules) of the system, what their functionality/roles are +and how they interact. + +The basic idea is you draw boxes that will represent different units of code, as well as +other components of the system (such as users, databases, etc). +Then connect these boxes with lines where information or control will be exchanged. +These lines represent the interfaces in your system. + +As well as helping to visualise the work, doing this sketch can troubleshoot potential issues. +For example, if there is a circular dependency between two sections of the design. +It can also help with estimating how long the work will take, as it forces you to consider all +the components that need to be made. + +Diagrams are not foolproof, but are a great starting point to break down the different +responsibilities and think about the kinds of information different parts of the system will need. + +> ## Exercise: Design a High-Level Architecture for a New Requirement +> +> Sketch out an architectural design for a new feature requested by a user. +> +> *"I want there to be a Google Drive folder such that when I upload new inflammation data to it, +> the software automatically pulls it down and updates the analysis. +> The new result should be added to a database with a timestamp. +> An email should then be sent to a group mailing list notifying them of the change."* +> +> You can draw by hand on a piece of paper or whiteboard, or use an online drawing tool +> such as [Excalidraw](https://excalidraw.com/). +> +>> ## Solution +>> +>> ![Diagram showing proposed architecture of the problem](../fig/example-architecture-diagram.svg){: width="600px" } +> {: .solution} +{: .challenge} + +We have been developing our software using the **Model-View-Controller** (MVC) architecture, +but MVC is just one of the common [software architectural patterns](/software-architecture-extra/index.html) +and is not the only choice we could have made. + +### Model-View-Controller (MVC) Architecture + +MVC architecture divides the related program logic into three interconnected components or modules: + +- **Model** (data) +- **View** (client interface), and +- **Controller** (processes that handle input/output and manipulate the data). + +The *Model* represents the data used by a program and also contains operations/rules +for manipulating and changing the data in the model. +This may be a database, a file, a single data object or a series of objects - +for example a table representing patients' data. + +The *View* is the means of displaying data to users/clients within an application +(i.e. provides visualisation of the state of the model). +For example, displaying a window with input fields and buttons (Graphical User Interface, GUI) +or textual options within a command line (Command Line Interface, CLI) are examples of Views. +They include anything that the user can see from the application. +While building GUIs is not the topic of this course, +we do cover building CLIs (handling command line arguments) in Python to a certain extent. + +The *Controller* manipulates both the Model and the View. +It accepts input from the View +and performs the corresponding action on the Model (changing the state of the model) +and then updates the View accordingly. +For example, on user request, +Controller updates a picture on a user's GitHub profile +and then modifies the View by displaying the updated profile back to the user. + +### Limitations to Architectural Design + +Note, however, there are limits to everything - and MVC architecture is no exception. +The Controller often transcends into the Model and View, +and a clear separation is sometimes difficult to maintain. +For example, the Command Line Interface provides both the View +(what user sees and how they interact with the command line) +and the Controller (invoking of a command) aspects of a CLI application. +In Web applications, Controller often manipulates the data (received from the Model) +before displaying it to the user or passing it from the user to the Model. + +There are many variants of an MVC-like pattern +(such as [Model-View-Presenter](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93presenter) (MVP), +[Model-View-Viewmodel](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel) (MVVM), etc.), +where the Controller role is handled slightly differently, +but in most cases, the distinction between these patterns is not particularly important. +What really matters is that we are making conscious decisions about the architecture of our software +that suit the way in which we expect to use it. +We should reuse and be consistent with these established ideas where we can, +but we do not need to stick to them exactly. + +The key thing to take away is the distinction between the Model and the View code, while +the View and the Controller can be more or less coupled together (e.g. the code that specifies +there is a button on the screen, might be the same code that specifies what that button does). +The View may be hard to test, or use special libraries to draw the UI, but should not contain any +complex logic, and is really just a presentation layer on top of the Model. +The Model, conversely, should not care how the data is displayed. +For example, the View may present dates as "Monday 24th July 2023", +but the Model stores it using a `Date` object rather than its string representation. + +> ## Reusable "Patterns" of Architecture +> +> [Architectural]((https://www.redhat.com/architect/14-software-architecture-patterns)) and +> [programming patterns](https://refactoring.guru/design-patterns/catalog) are reusable templates for +> software systems and code that provide solutions for some common software design challenges. +> MVC is one architectural pattern. +> Patterns are a useful starting point for how to design your software and also provide +> a common vocabulary for discussing software designs with other developers. +> They may not always provide a full design solution as some problems may require +> a bespoke design that maps cleanly on to the specific problem you are trying to solve. +{: .callout} + +### Architectural Design Guidelines + +Creating good software architecture is not about applying any rules or patterns blindly, +but instead practise and taking care to: + +* Discuss design with your colleagues before writing the code. +* Separate different concerns into different sections of the code. +* Avoid duplication of code or data. +* Keep how much a person has to understand at once to a minimum. +* Try not to have too many abstractions (if you have to jump around a lot when reading the +code that is a clue that your code may be too abstract). +* Think about how will your components interface other components and external systems. +* Not try to design a future-proof solution or to anticipate future requirements or adaptations +of the software - design the simplest solution that solves the problem at hand. +* (When working on a less well-structured part of the code), start by refactoring it so that your +change fits in cleanly. +* Try to leave the code in a better state that you found it. + +## Techniques for Good Software Design + +Once we have a good high-level architectural design, +it is important to follow this philosophy through to the process of developing the code itself, +and there are some key techniques to keep in mind that will help. + +As we have discussed, +how code is structured is important for helping people who are developing and maintaining it +to understand and update it. +By breaking down our software into modular components with a single responsibility, +we avoid having to rewrite it all when requirements change. +This also means that these smaller components can be understood individually without having to understand +the entire codebase at once. +The following techniques build on this concept of modularity: + +- *Abstraction* is the process of hiding the implementation details of a piece of +code (typically behind an interface) - i.e. the details of *how* something works are hidden away, +leaving code developers to deal only with *what* it does. +This allows developers to work with the code at a higher level +of abstraction, without needing to understand fully (or keep in mind) all the underlying +details at any given time and thereby reducing the cognitive load when programming. + Abstraction can be achieved through techniques such as *encapsulation*, *inheritance*, and +*polymorphism*, which we will explore in the next episodes. There are other [abstraction techniques](https://en.wikipedia.org/wiki/Abstraction_(computer_science)) +available too. + +- *Code decoupling* is a code design technique that involves breaking a (complex) +software system into smaller, more manageable parts, and reducing the interdependence +between these different parts of the system. +This means that a change in one part of the code usually does not require a change in the other, +thereby making its development more efficient and less error prone. + +- *Code refactoring* is the process of improving the design of an existing code - +changing the internal structure of code without changing its +external behavior, with the goal of making the code more readable, maintainable, efficient or easier +to test. +This can include things such as renaming variables, reorganising +functions to avoid code duplication and increase reuse, and simplifying conditional statements. + +Writing good code is hard and takes practise. +You may also be faced with an existing piece of code that breaks some (or all) of the +good code principles, and your job will be to improve/refactor it so that it can evolve further. +We will now look into some examples of these techniques that can help us redesign our code +and incrementally improve its quality. + +{% include links.md %} diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md deleted file mode 100644 index 18dbe2ae7..000000000 --- a/_episodes/32-software-design.md +++ /dev/null @@ -1,264 +0,0 @@ ---- -title: "Software Architecture and Design" -teaching: 15 -exercises: 30 -questions: -- "What should we consider when designing software?" -- "How can we make sure the components of our software are reusable?" -objectives: -- "Understand the use of common design patterns to improve the extensibility, reusability and overall quality of software." -- "Understand the components of multi-layer software architectures." -keypoints: -- "Planning software projects in advance can save a lot of effort and reduce 'technical debt' later - even a partial plan is better than no plan at all." -- "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. -Such components can be as small as a single function, or be a software package in their own right." -- "When writing software used for research, requirements will almost *always* change." -- "*'Good code is written so that is readable, understandable, covered by automated tests, not over complicated and does well what is intended to do.'*" ---- - -## Introduction - -In this episode, we'll be looking at how we can design our software -to ensure it meets the requirements, -but also retains the other qualities of good software. -As a piece of software grows, -it will reach a point where there's too much code for us to keep in mind at once. -At this point, it becomes particularly important that the software be designed sensibly. -What should be the overall structure of our software, -how should all the pieces of functionality fit together, -and how should we work towards fulfilling this overall design throughout development? - -It's not easy to come up with a complete definition for the term **software design**, -but some of the common aspects are: - -- **Algorithm design** - - what method are we going to use to solve the core business problem? -- **Software architecture** - - what components will the software have and how will they cooperate? -- **System architecture** - - what other things will this software have to interact with and how will it do this? -- **UI/UX** (User Interface / User Experience) - - how will users interact with the software? - -As usual, the sooner you adopt a practice in the lifecycle of your project, the easier it will be. -So we should think about the design of our software from the very beginning, -ideally even before we start writing code - -but if you didn't, it's never too late to start. - - -The answers to these questions will provide us with some **design constraints** -which any software we write must satisfy. -For example, a design constraint when writing a mobile app would be -that it needs to work with a touch screen interface - -we might have some software that works really well from the command line, -but on a typical mobile phone there isn't a command line interface that people can access. - - -## Software Architecture - -At the beginning of this episode we defined **software architecture** -as an answer to the question -"what components will the software have and how will they cooperate?". -Software engineering borrowed this term, and a few other terms, -from architects (of buildings) as many of the processes and techniques have some similarities. -One of the other important terms we borrowed is 'pattern', -such as in **design patterns** and **architecture patterns**. -This term is often attributed to the book -['A Pattern Language' by Christopher Alexander *et al.*](https://en.wikipedia.org/wiki/A_Pattern_Language) -published in 1977 -and refers to a template solution to a problem commonly encountered when building a system. - -Design patterns are relatively small-scale templates -which we can use to solve problems which affect a small part of our software. -For example, the **[adapter pattern](https://en.wikipedia.org/wiki/Adapter_pattern)** -(which allows a class that does not have the "right interface" to be reused) -may be useful if part of our software needs to consume data -from a number of different external data sources. -Using this pattern, -we can create a component whose responsibility is -transforming the calls for data to the expected format, -so the rest of our program doesn't have to worry about it. - -Architecture patterns are similar, -but larger scale templates which operate at the level of whole programs, -or collections or programs. -Model-View-Controller (which we chose for our project) is one of the best known architecture patterns. -Many patterns rely on concepts from Object Oriented Programming, -so we'll come back to the MVC pattern shortly -after we learn a bit more about Object Oriented Programming. - -There are many online sources of information about design and architecture patterns, -often giving concrete examples of cases where they may be useful. -One particularly good source is [Refactoring Guru](https://refactoring.guru/design-patterns). - - -### Multilayer Architecture - -One common architectural pattern for larger software projects is **Multilayer Architecture**. -Software designed using this architecture pattern is split into layers, -each of which is responsible for a different part of the process of manipulating data. - -Often, the software is split into three layers: - -- **Presentation Layer** - - This layer is responsible for managing the interaction between - our software and the people using it - - May include the **View** components if also using the MVC pattern -- **Application Layer / Business Logic Layer** - - This layer performs most of the data processing required by the presentation layer - - Likely to include the **Controller** components if also using an MVC pattern - - May also include the **Model** components -- **Persistence Layer / Data Access Layer** - - This layer handles data storage and provides data to the rest of the system - - May include the **Model** components of an MVC pattern - if they're not in the application layer - -Although we've drawn similarities here between the layers of a system and the components of MVC, -they're actually solutions to different scales of problem. -In a small application, a multilayer architecture is unlikely to be necessary, -whereas in a very large application, -the MVC pattern may be used just within the presentation layer, -to handle getting data to and from the people using the software. - -## Addressing New Requirements - -So, let's assume we now want to extend our application - -designed around an MVC architecture - with some new functionalities -(more statistical processing and a new view to see a patient's data). -Let's recall the solution requirements we discussed in the previous episode: - -- *Functional Requirements*: - - SR1.1.1 (from UR1.1): - add standard deviation to data model and include in graph visualisation view - - SR1.2.1 (from UR1.2): - add a new view to generate a textual representation of statistics, - which is invoked by an optional command line argument -- *Non-functional Requirements*: - - SR2.1.1 (from UR2.1): - generate graphical statistics report on clinical workstation configuration in under 30 seconds - -### How Should We Test These Requirements? - -Sometimes when we make changes to our code that we plan to test later, -we find the way we've implemented that change doesn't lend itself well to how it should be tested. -So what should we do? - -Consider requirement SR1.2.1 - -we have (at least) two things we should test in some way, -for which we could write unit tests. -For the textual representation of statistics, -in a unit test we could invoke our new view function directly -with known inflammation data and test the text output as a string against what is expected. -The second one, invoking this new view with an optional command line argument, -is more problematic since the code isn't structured in a way where -we can easily invoke the argument parsing portion to test it. -To make this more amenable to unit testing we could -move the command line parsing portion to a separate function, -and use that in our unit tests. -So in general, it's a good idea to make sure -your software's features are modularised and accessible via logical functions. - -We could also consider writing unit tests for SR2.1.1, -ensuring that the system meets our performance requirement, so should we? -We do need to verify it's being met with the modified implementation, -however it's generally considered bad practice to use unit tests for this purpose. -This is because unit tests test *if* a given aspect is behaving correctly, -whereas performance tests test *how efficiently* it does it. -Performance testing produces measurements of performance which require a different kind of analysis -(using techniques such as [*code profiling*](https://towardsdatascience.com/how-to-assess-your-code-performance-in-python-346a17880c9f)), -and require careful and specific configurations of operating environments to ensure fair testing. -In addition, unit testing frameworks are not typically designed for conducting such measurements, -and only test units of a system, -which doesn't give you an idea of performance of the system -as it is typically used by stakeholders. - -The key is to think about which kind of testing should be used -to check if the code satisfies a requirement, -but also what you can do to make that code amenable to that type of testing. - -> ## Exercise: Implementing Requirements -> Pick one of the requirements SR1.1.1 or SR1.2.1 above to implement -> and create an appropriate feature branch - -> e.g. `add-std-dev` or `add-view` from your most up-to-date `develop` branch. -> -> One aspect you should consider first is -> whether the new requirement can be implemented within the existing design. -> If not, how does the design need to be changed to accommodate the inclusion of this new feature? -> Also try to ensure that the changes you make are amenable to unit testing: -> is the code suitably modularised -> such that the aspect under test can be easily invoked -> with test input data and its output tested? -> -> If you have time, feel free to implement the other requirement, or invent your own! -> -> Also make sure you push changes to your new feature branch remotely -> to your software repository on GitHub. -> -> **Note: do not add the tests for the new feature just yet - -> even though you would normally add the tests along with the new code, -> we will do this in a later episode. -> Equally, do not merge your changes to the `develop` branch just yet.** -> -> **Note 2: we have intentionally left this exercise without a solution -> to give you more freedom in implementing it how you see fit. -> If you are struggling with adding a new view and command line parameter, -> you may find the standard deviation requirement easier. -> A later episode in this section will look at -> how to handle command line parameters in a scalable way.** -{: .challenge} - -## Best Practices for 'Good' Software Design - -Aspirationally, what makes good code can be summarised in the following quote from the -[Intent HG blog](https://intenthq.com/blog/it-audience/what-is-good-code-a-scientific-definition/): - -> *“Good code is written so that is readable, understandable, -> covered by automated tests, not over complicated -> and does well what is intended to do.”* - -By taking time to design our software to be easily modifiable and extensible, -we can save ourselves a lot of time later when requirements change. -The sooner we do this the better - -ideally we should have at least a rough design sketched out for our software -before we write a single line of code. -This design should be based around the structure of the problem we're trying to solve: -what are the concepts we need to represent -and what are the relationships between them. -And importantly, who will be using our software and how will they interact with it? - -Here's another way of looking at it. - -Not following good software design and development practices -can lead to accumulated 'technical debt', -which (according to [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt)), -is the "cost of additional rework caused by choosing an easy (limited) solution now -instead of using a better approach that would take longer". -So, the pressure to achieve project goals can sometimes lead to quick and easy solutions, -which make the software become -more messy, more complex, and more difficult to understand and maintain. -The extra effort required to make changes in the future is the interest paid on the (technical) debt. -It's natural for software to accrue some technical debt, -but it's important to pay off that debt during a maintenance phase - -simplifying, clarifying the code, making it easier to understand - -to keep these interest payments on making changes manageable. -If this isn't done, the software may accrue too much technical debt, -and it can become too messy and prohibitive to maintain and develop, -and then it cannot evolve. - -Importantly, there is only so much time available. -How much effort should we spend on designing our code properly -and using good development practices? -The following [XKCD comic](https://xkcd.com/844/) summarises this tension: - -![Writing good code comic](../fig/xkcd-good-code-comic.png){: .image-with-shadow width="400px" } - -At an intermediate level there are a wealth of practices that *could* be used, -and applying suitable design and coding practices is what separates -an *intermediate developer* from someone who has just started coding. -The key for an intermediate developer is to balance these concerns -for each software project appropriately, -and employ design and development practices *enough* so that progress can be made. -It's very easy to under-design software, -but remember it's also possible to over-design software too. - -{% include links.md %} diff --git a/_episodes/33-code-decoupling-abstractions.md b/_episodes/33-code-decoupling-abstractions.md new file mode 100644 index 000000000..b3b89171d --- /dev/null +++ b/_episodes/33-code-decoupling-abstractions.md @@ -0,0 +1,453 @@ +--- +title: "Code Decoupling & Abstractions" +teaching: 30 +exercises: 45 +questions: +- "What is decoupled code?" +- "What are commonly used code abstractions?" +- "When is it useful to use classes to structure code?" +- "How can we make sure the components of our software are reusable?" +objectives: +- "Understand the benefits of code decoupling." +- "Introduce appropriate abstractions to simplify code." +- "Understand the principles of encapsulation, polymorphism and interfaces." +- "Use mocks to replace a class in test code." +keypoints: +- "Code decoupling is separating code into smaller components and reducing the interdependence +between them so that the code is easier to understand, test and maintain." +- "Abstractions can hide certain details of the code behind classes and interfaces." +- "Encapsulation bundles data into a structured component along with methods that operate +on the data, and provides a mechanism for restricting access to that data, +hiding the internal representation of the component." +- "Polymorphism describes the provision of a single interface to entities of different types, +or the use of a single symbol to represent different types." + +--- + +## Introduction + +**Code decoupling** refers to breaking up the software into smaller components and reducing the +interdependence between these components so that they can be tested and maintained independently. +Two components of code can be considered *decoupled* if a change in one does not +necessitate a change in the other. +While two connected units cannot always be totally decoupled, *loose coupling* +is something we should aim for. + +**Code abstraction** is the process of hiding the implementation details of a piece of +code behind an interface - i.e. the details of *how* something works are hidden away, +leaving us to deal only with *what* it does. +This allows developers to work with the code at a higher level +of abstraction, without needing to understand fully (or keep in mind) all the underlying +details and thereby reducing the cognitive load when programming. + +Abstractions can aid decoupling of code. +If one part of the code only uses another part through an appropriate abstraction +then it becomes easier for these parts to change independently. +Benefits of using these techniques include having the codebase that is: + +* easier to read as you only need to understand the + details of the (smaller) component you are looking at and not the whole monolithic codebase. +* easier to test, as one of the components can be replaced + by a test or a mock version of it. +* easier to maintain, as changes can be isolated + from other parts of the code. + +Let's start redesigning our code by introducing some of the abstraction techniques +to incrementally decouple it into smaller components to improve its overall design. + +In the code from our current branch `full-data-analysis`, +you may have noticed that loading data from CSV files from a `data` directory is "hardcoded" into +the `analyse_data()` function. +Data loading is a functionality separate from data analysis, so firstly +let's decouple the data loading part into a separate component (function). + +> ## Exercise: Decouple Data Loading from Data Analysis +> Separate out the data loading functionality from `analyse_data()` into a new function +> `load_inflammation_data()` that returns a list of 2D NumPy arrays with inflammation data +> loaded from all inflammation CSV files found in a specified directory path. +>> ## Solution +>> The new function `load_inflammation_data()` that reads all the inflammation data into the +>> format needed for the analysis could look something like: +>> ```python +>> def load_inflammation_data(dir_path): +>> data_file_paths = glob.glob(os.path.join(dir_path, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation CSV files found in path {dir_path}") +>> data = map(models.load_csv, data_file_paths) #load inflammation data from each CSV file +>> return list(data) #return the list of 2D NumPy arrays with inflammation data +>> ``` +>> This function can now be used in the analysis as follows: +>> ```python +>> def analyse_data(data_dir): +>> data = load_inflammation_data(data_dir) +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... +>> ``` +>> The code is now easier to follow since we do not need to understand the data loading part +>> to understand the statistical analysis part, and vice versa. +> {: .solution} +{: .challenge} + +However, even with this change, the data loading is still coupled with the data analysis to a +large extent. +For example, if we have to support loading data from different sources +(e.g. JSON files or an SQL database), we would have to pass some kind of a flag into `analyse_data()` +indicating the type of data we want to read from. Instead, we would like to decouple the +consideration of data source from the `analyse_data()` function entirely. +One way we can do this is by using *encapsulation* and *classes*. + +## Encapsulation & Classes + +**Encapsulation** is the process of packing the "data" and "functions operating on that data" into a +single component/object. +It is also provides a mechanism for restricting the access to that data. +Encapsulation means that the internal representation of a component is generally hidden +from view outside of the component's definition. + +Encapsulation allows developers to present a consistent interface to the component/object +that is independent of its internal implementation. +For example, encapsulation can be used to hide the values or +state of a structured data object inside a **class**, preventing direct access to them +that could violate the object's state maintained by the class' methods. +Note that object-oriented programming (OOP) languages support encapsulation, +but encapsulation is not unique to OOP. + +So, a class is a way of grouping together data with some methods that manipulate that data. +In Python, you can *declare* a class as follows: + +```python +class Circle: + pass +``` + +Classes are typically named using "CapitalisedWords" naming convention - e.g. FileReader, +OutputStream, Rectangle. + +You can *construct* an *instance* of a class elsewhere in the code by doing the following: + +```python +my_circle = Circle() +``` + +When you construct a class in this ways, the class' *constructor* method is called. +It is also possible to pass values to the constructor in order to configure the class instance: + +```python +class Circle: + def __init__(self, radius): + self.radius = radius + +my_circle = Circle(10) +``` + +The constructor has the special name `__init__`. +Note it has a special first parameter called `self` by convention - it is +used to access the current *instance* of the object being created. + +A class can be thought of as a cookie cutter template, and instances as the cookies themselves. +That is, one class can have many instances. + +Classes can also have other methods defined on them. +Like constructors, they have the special parameter `self` that must come first. + +```python +import math + +class Circle: + ... + def get_area(self): + return math.pi * self.radius * self.radius +... +print(my_circle.get_area()) +``` + +On the last line of the code above, the instance of the class, `my_circle`, will be automatically +passed as the first parameter (`self`) when calling the `get_area()` method. +The `get_area()` method can then access the variable `radius` encapsulated within the object, which +is otherwise invisible to the world outside of the object. +The method `get_area()` itself can also be accessed via the object/instance only. + +As we can see, internal representation of any instance of class `Circle` is hidden +outside of this class (encapsulation). +In addition, implementation of the method `get_area()` is hidden too (abstraction). + +> ## Encapsulation & Abstraction +> Encapsulation provides **information hiding**. Abstraction provides **implementation hiding**. +{: .callout} + +> ## Exercise: Use Classes to Abstract out Data Loading +> Inside `compute_data.py`, declare a new class `CSVDataSource` that contains the +> `load_inflammation_data()` function we wrote in the previous exercise as a method of this class. +> The directory path where to load the files from should be passed in the class' constructor method. +> Finally, construct an instance of the class `CSVDataSource` outside the statistical +> analysis and pass it to `analyse_data()` function. +>> ## Hint +>> At the end of this exercise, the code in the `analyse_data()` function should look like: +>> ```python +>> def analyse_data(data_source): +>> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... +>> ``` +>> The controller code should look like: +>> ```python +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> analyse_data(data_source) +>> ``` +> {: .solution} +>> ## Solution +>> For example, we can declare class `CSVDataSource` like this: +>> +>> ```python +>> class CSVDataSource: +>> """ +>> Loads all the inflammation CSV files within a specified directory. +>> """ +>> def __init__(self, dir_path): +>> self.dir_path = dir_path +>> +>> def load_inflammation_data(self): +>> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation CSV files found in path {self.dir_path}") +>> data = map(models.load_csv, data_file_paths) +>> return list(data) +>> ``` +>> In the controller, we create an instance of CSVDataSource and pass it +>> into the the statistical analysis function. +>> +>> ```python +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> analyse_data(data_source) +>> ``` +>> The `analyse_data()` function is modified to receive any data source object (that implements +>> the `load_inflammation_data()` method) as a parameter. +>> ```python +>> def analyse_data(data_source): +>> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> ... +>> ``` +>> We have now fully decoupled the reading of the data from the statistical analysis and +>> the analysis is not fixed to reading from a directory of CSV files. Indeed, we can pass various +>> data sources to this function now, as long as they implement the `load_inflammation_data()` +>> method. +>> +>> While the overall behaviour of the code and its results are unchanged, +>> the way we invoke data analysis has changed. +> {: .solution} +{: .challenge} + + +## Interfaces + +An **interface** is another important concept in software design related to abstraction and +encapsulation. For a software component, it declares the operations that can be invoked on +that component, along with input arguments and what it returns. By knowing these details, +we can communicate with this component without the need to know how it implements this interface. + +API (Application Programming Interface) is one example of an interface that allows separate +systems (external to one another) to communicate with each other. +For example, a request to Google Maps service API may get +you the latitude and longitude for a given address. +Twitter API may return all tweets that contain +a given keyword that have been posted within a certain date range. + +Internal interfaces within software dictate how +different parts of the system interact with each other. +Even when these are not explicitly documented - they still exist. + +For example, our `Circle` class implicitly has an interface - you can call `get_area()` method +on it and it will return a number representing its surface area. + +> ## Exercise: Identify an Interface Between `CSVDataSource` and `analyse_data` +> What would you say is the interface between the CSVDataSource class +> and `analyse_data()` function? +> Think about what functions `analyse_data()` needs to be able to call to perform its duty, +> what parameters they need and what they return. +>> ## Solution +>> The interface is the `load_inflammation_data()` method, which takes no parameters and +>> returns a list where each entry is a 2D NumPy array of patient inflammation data (read from some +> data source). +>> +>> Any object passed into `analyse_data()` should conform to this interface. +> {: .solution} +{: .challenge} + + +## Polymorphism + +In general, **polymorphism** is the idea of having multiple implementations/forms/shapes +of the same abstract concept. +It is the provision of a single interface to entities of different types, +or the use of a single symbol to represent multiple different types. + +There are [different versions of polymorphism](https://www.bmc.com/blogs/polymorphism-programming/). +For example, method or operator overloading is one +type of polymorphism enabling methods and operators to take parameters of different types. + +We will have a look at the *interface-based polymorphism*. +In OOP, it is possible to have different object classes that conform to the same interface. +For example, let's have a look at the following class representing a `Rectangle`: + +```python +class Rectangle: + def __init__(self, width, height): + self.width = width + self.height = height + def get_area(self): + return self.width * self.height +``` + +Like `Circle`, this class provides the `get_area()` method. +The method takes the same number of parameters (none), and returns a number. +However, the implementation is different. This is interface-based polymorphism. + +The word "polymorphism" means "many forms", and in programming it refers to +methods/functions/operators with the same name that can be executed on many objects or classes. + +Using our `Circle` and `Rectangle` classes, we can create a list of different shapes and iterate +through the list to find their total surface area as follows: + +```python +my_circle = Circle(radius=10) +my_rectangle = Rectangle(width=5, height=3) +my_shapes = [my_circle, my_rectangle] +total_area = sum(shape.get_area() for shape in my_shapes) +``` + +Note that we have not created a common superclass or linked the classes `Circle` and `Rectangle` +together in any way. It is possible due to polymorphism. +You could also say that, when we are calculating the total surface area, +the method for calculating the area of each shape is abstracted away to the relevant class. + +How can polymorphism be useful in our software project? +For example, we can replace our `CSVDataSource` with another class that reads a totally +different file format (e.g. JSON), or reads from an external service or a database. +All of these changes can be now be made without changing the analysis function as we have decoupled +the process of data loading from the data analysis earlier. +Conversely, if we wanted to write a new analysis function, we could support any of these +data sources with no extra work. + +> ## Exercise: Add an Additional DataSource +> Create another class that supports loading patient data from JSON files, with the +> appropriate `load_inflammation_data()` method. +> There is a function in `models.py` that loads from JSON in the following format: +> ```json +> [ +> { +> "observations": [0, 1] +> }, +> { +> "observations": [0, 2] +> } +> ] +> ``` +> Finally, at run-time, construct an appropriate data source instance based on the file extension. +>> ## Solution +>> The class that reads inflammation data from JSON files could look something like: +>> ```python +>> class JSONDataSource: +>> """ +>> Loads patient data with inflammation values from JSON files within a specified folder. +>> """ +>> def __init__(self, dir_path): +>> self.dir_path = dir_path +>> +>> def load_inflammation_data(self): +>> data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.json')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation JSON files found in path {self.dir_path}") +>> data = map(models.load_json, data_file_paths) +>> return list(data) +>> ``` +>> Additionally, in the controller we will need to select an appropriate DataSource instance to +>> provide to the analysis: +>>```python +>> _, extension = os.path.splitext(InFiles[0]) +>> if extension == '.json': +>> data_source = JSONDataSource(os.path.dirname(InFiles[0])) +>> elif extension == '.csv': +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> else: +>> raise ValueError(f'Unsupported data file format: {extension}') +>> analyse_data(data_source) +>>``` +>> As you can seen, all the above changes have been made made without modifying +>> the analysis code itself. +> {: .solution} +{: .challenge} + +## Testing Using Mock Objects + +We can use a **mock object** abstraction to make testing more straightforward. +Instead of having our tests use real data stored on a file system, we can provide +a mock or dummy implementation instead of one of the real classes. +Providing that what we use as a substitute conforms to the same interface, +the code we are testing should work just the same. +Such mock/dummy implementation could just return some fixed example data. + +An convenient way to do this in Python is using Python's [mock object library](https://docs.python.org/3/library/unittest.mock.html). +This is a whole topic in itself - +but a basic mock can be constructed using a couple of lines of code: + +```python +from unittest.mock import Mock + +mock_version = Mock() +mock_version.method_to_mock.return_value = 42 +``` + +Here we construct a mock in the same way you would construct a class. +Then we specify a method that we want to behave a specific way. + +Now whenever you call `mock_version.method_to_mock()` the return value will be `42`. + + +> ## Exercise: Test Using a Mock Implementation +> Complete this test for `analyse_data()`, using a mock object in place of the +> `data_source`: +> ```python +> from unittest.mock import Mock +> +> def test_compute_data_mock_source(): +> from inflammation.compute_data import analyse_data +> data_source = Mock() +> +> # TODO: configure data_source mock +> +> result = analyse_data(data_source) +> +> # TODO: add assert on the contents of result +> ``` +> Create a mock that returns some fixed data and to use as the `data_source` in order to test +> the `analyse_data` method. +> Use this mock in a test. +> +> Do not forget to import `Mock` from the `unittest.mock` package. +>> ## Solution +>> ```python +>> from unittest.mock import Mock +>> +>> def test_compute_data_mock_source(): +>> from inflammation.compute_data import analyse_data +>> data_source = Mock() +>> data_source.load_inflammation_data.return_value = [[[0, 2, 0]], +>> [[0, 1, 0]]] +>> +>> result = analyse_data(data_source) +>> npt.assert_array_almost_equal(result, [0, math.sqrt(0.25) ,0]) +>> ``` +> {: .solution} +{: .challenge} + +## Safe Code Structure Changes + +With the changes to the code structure we have done using code decoupling and abstractions we have +already refactored our code to a certain extent but we have not tested that the changes work as +intended. +We will now look into how to properly refactor code to guarantee that the code still works +as before any modifications. + +{% include links.md %} + diff --git a/_episodes/34-code-refactoring.md b/_episodes/34-code-refactoring.md new file mode 100644 index 000000000..cc6d25833 --- /dev/null +++ b/_episodes/34-code-refactoring.md @@ -0,0 +1,337 @@ +--- +title: "Code Refactoring" +teaching: 30 +exercises: 20 +questions: +- "How do you refactor existing code without breaking it?" +- "What are benefits of using pure functions in code?" +objectives: +- "Employ code refactoring to improve the structure of existing code." +- "Understand the use of regressions tests to avoid breaking existing code when refactoring." +- "Understand the use of pure functions in software design to make the code easier to read, +test amd maintain." +- "Refactor a piece of code to separate out 'pure' from 'impure' code." +keypoints: +- "Code refactoring is a technique for improving the structure of existing code." +- "Implementing regression tests before refactoring gives you confidence that your changes have not +broken the code." +- "Using pure functions that process data without side effects whenever possible makes the code easier +to understand, test and maintain." +--- + +## Introduction + +Code refactoring is the process of improving the design of an existing codebase - changing the +internal structure of code without changing its external behavior, with the goal of making the code +more readable, maintainable, efficient or easier to test. This can include introducing things such +as code decoupling and abstractions, but also renaming variables, reorganising functions to avoid +code duplication and increase reuse, and simplifying conditional statements. + +In the previous episode, we have already changed the structure of our code (i.e. refactored it +to a certain extent) +when we separated out data loading from data analysis but we have not tested that the new code +works as intended. This is particularly important with bigger code changes but even a small change +can easily break the codebase and introduce bugs. + +When faced with an existing piece of code that needs modifying a good refactoring +process to follow is: + +1. Make sure you have tests that verify the current behaviour +2. Refactor the code +3. Verify that that the behaviour of the code is identical to that before refactoring. + +In this episode we will further improve the code from our project in the following two ways: +* add more tests so we can be more confident that future changes will have the +intended effect and will not break the existing code. +* further split `analyse_data()` function into a number of smaller and more +decoupled functions (continuing the work from the previous episode). + +## Writing Tests Before Refactoring + +When refactoring, first we need to make sure there are tests in place that can verify +the code behaviour as it is now (or write them if they are missing), +then refactor the code and, finally, check that the original tests still pass. + +There is a bit of a "chicken and egg" problem here - if the refactoring is supposed to make it easier +to write tests in the future, how can we write tests before doing the refactoring? +The tricks to get around this trap are: + + * test at a higher level, with coarser accuracy, and + * write tests that you intend to replace or remove. + +The best tests are the ones that test a single bit of functionality rigorously. +However, with our current `analyse_data()` code that is not possible because it is a +large function doing a little bit of everything. +Instead we will make minimal changes to the code to make it a bit more testable. + +Firstly, +we will modify the function to return the data instead of visualising it because graphs are harder +to test automatically (i.e. they need to be viewed and inspected manually in order to determine +their correctness). +Next, we will make the assert statements verify what the current outcome is, rather than check +whether that is correct or not. +Such tests are meant to +verify that the behaviour does not *change* rather than checking the current behaviour is correct +(there should be another set of tests checking the correctness). +This kind of testing is called **regression testing** as we are testing for +regressions in existing behaviour. + +Refactoring code is not meant to change its behaviour, but sometimes to make it possible to verify +you are not changing the important behaviour you have to make small tweaks to the code to write +the tests at all. + +> ## Exercise: Write Regression Tests +> Modify the `analyse_data()` function not to plot a graph and return the data instead. +> Then, add a new test file called `test_compute_data.py` in the `tests` folder and +> add a regression test to verify the current output of `analyse_data()`. We will use this test +> in the remainder of this section to verify the output `analyse_data()` is unchanged each time +> we refactor or change code in the future. +> +> Start from the skeleton test code below: +> +> ```python +> def test_analyse_data(): +> from inflammation.compute_data import analyse_data +> path = Path.cwd() / "../data" +> data_source = CSVDataSource(path) +> result = analyse_data(data_source) +> +> # TODO: add assert statement(s) to test the result value is as expected +> ``` +> Use `assert_array_almost_equal` from the `numpy.testing` library to +> compare arrays of floating point numbers. +>> ## Hint +>> When determining the correct return data result to use in tests, it may be helpful to assert the +>> result equals some random made-up data, observe the test fail initially and then +>> copy and paste the correct result into the test. +> {: .solution} +> +>> ## Solution +>> One approach we can take is to: +>> * comment out the visualise method on `analyse_data()` +>> (this will cause our test to hang waiting for the result data) +>> * return the data (instead of plotting it on a graph), so we can write assert statements +>> on the data +>> * see what the calculated result value is, and assert that it is the same as the expected value +>> +>> Putting this together, our test may look like: +>> +>> ```python +>> import numpy.testing as npt +>> from pathlib import Path +>> +>> def test_analyse_data(): +>> from inflammation.compute_data import analyse_data +>> path = Path.cwd() / "../data" +>> data_source = CSVDataSource(path) +>> result = analyse_data(data_source) +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211, +>> 0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094, +>> 1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312, +>> 1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578, +>> 0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417, +>> 0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707, +>> 0.50323031,0.47574665,0.45197398,0.22070227] +>> npt.assert_array_almost_equal(result, expected_output) +>> ``` +>> +>> Note that while the above test will detect if we accidentally break the analysis code and +>> change the output of the analysis, it is still not a complete test for the following reasons: +>> * It is not obvious why the `expected_output` is correct +>> * It does not test edge cases +>> * If the data files in the directory change - the test will fail +>> +>> We would need to add additional tests to check the above. +> {: .solution} +{: .challenge} + +## Separating Pure and Impure Code + +Now that we have our regression test for `analyse_data()` in place, we are ready to refactor the +function further. +We would like to separate out as much of its code as possible as *pure functions*. + +### Pure Functions + +A pure function in programming works like a mathematical function - +it takes in some input and produces an output and that output is +always the same for the same input. +That is, the output of a pure function does not depend on any information +which is not present in the input (such as global variables). +Furthermore, pure functions do not cause any *side effects* - they do not modify the input data +or data that exist outside the function (such as printing text, writing to a file or +changing a global variable). They perform actions that affect nothing but the value they return. + +### Benefits of Pure Functions + +Pure functions are easier to understand because they eliminate side effects. +The reader only needs to concern themselves with the input +parameters of the function and the function code itself, rather than +the overall context the function is operating in. +Similarly, a function that calls a pure function is also easier +to understand - we only need to understand what the function returns, which will probably +be clear from the context in which the function is called. +Finally, pure functions are easier to reuse as the caller +only needs to understand what parameters to provide, rather +than anything else that might need to be configured prior to the call. +For these reasons, you should try and have as much of the complex, analytical and mathematical +code are pure functions. + + +Some parts of a program are inevitably impure. +Programs need to read input from users, generate a graph, or write results to a file or a database. +Well designed programs separate complex logic from the necessary impure "glue" code that +interacts with users and other systems. +This way, you have easy-to-read and easy-to-test pure code that contains the complex logic +and simplified impure code that reads data from a file or gathers user input. Impure code may +be harder to test but, when simplified like this, may only require a handful of tests anyway. + +> ## Exercise: Refactoring To Use a Pure Function +> Refactor the `analyse_data()` function to delegate the data analysis to a new +> pure function `compute_standard_deviation_by_day()` and separate it +> from the impure code that handles the input and output. +> The pure function should take in the data, and return the analysis result, as follows: +> ```python +> def compute_standard_deviation_by_day(data): +> # TODO +> return daily_standard_deviation +> ``` +>> ## Solution +>> The analysis code will be refactored into a separate function that may look something like: +>> ```python +>> def compute_standard_deviation_by_day(data): +>> means_by_day = map(models.daily_mean, data) +>> means_by_day_matrix = np.stack(list(means_by_day)) +>> +>> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) +>> return daily_standard_deviation +>> ``` +>> The `analyse_data()` function now calls the `compute_standard_deviation_by_day()` function, +>> while keeping all the logic for reading the data, processing it and showing it in a graph: +>>```python +>>def analyse_data(data_dir): +>> """Calculates the standard deviation by day between datasets. +>> Gets all the inflammation data from CSV files within a directory, works out the mean +>> inflammation value for each day across all datasets, then visualises the +>> standard deviation of these means on a graph.""" +>> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {data_dir}") +>> data = map(models.load_csv, data_file_paths) +>> daily_standard_deviation = compute_standard_deviation_by_day(data) +>> +>> graph_data = { +>> 'standard deviation by day': daily_standard_deviation, +>> } +>> # views.visualize(graph_data) +>> return daily_standard_deviation +>>``` +>> Make sure to re-run the regression test to check this refactoring has not +>> changed the output of `analyse_data()`. +> {: .solution} +{: .challenge} + +### Testing Pure Functions + +Now we have our analysis implemented as a pure function, we can write tests that cover +all the things we would like to check without depending on CSVs files. +This is another advantage of pure functions - they are very well suited to automated testing, +i.e. their tests are: +* **easier to write** - we construct input and assert the output +without having to think about making sure the global state is correct before or after +* **easier to read** - the reader will not have to open a CSV file to understand why +the test is correct +* **easier to maintain** - if at some point the data format changes +from CSV to JSON, the bulk of the tests need not be updated + +> ## Exercise: Testing a Pure Function +> Add tests for `compute_standard_deviation_by_data()` that check for situations +> when there is only one file with multiple rows, +> multiple files with one row, and any other cases you can think of that should be tested. +>> ## Solution +>> You might have thought of more tests, but we can easily extend the test by parametrizing +>> with more inputs and expected outputs: +>> ```python +>>@pytest.mark.parametrize('data,expected_output', [ +>> ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), +>> ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), +>> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) +>>], +>>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) +>>def test_compute_standard_deviation_by_day(data, expected_output): +>> from inflammation.compute_data import compute_standard_deviation_by_data +>> +>> result = compute_standard_deviation_by_data(data) +>> npt.assert_array_almost_equal(result, expected_output) +``` +> {: .solution} +{: .challenge} + +> ## Functional Programming +> **Functional programming** is a programming paradigm where programs are constructed by +> applying and composing/chaining pure functions. +> Some programming languages, such as Haskell or Lisp, support writing pure functional code only. +> Other languages, such as Python, Java, C++, allow mixing **functional** and **procedural** +> programming paradigms. +> Read more in the [extra episode on functional programming](/functional-programming/index.html) +> and when it can be very useful to switch to this paradigm +> (e.g. to employ MapReduce approach for data processing). +{: .callout} + + +There are no definite rules in software design but making your complex logic out of +composed pure functions is a great place to start when trying to make your code readable, +testable and maintainable. This is particularly useful for: + +* Data processing and analysis +(for example, using [Python Pandas library](https://pandas.pydata.org/) for data manipulation where most of functions appear pure) +* Doing simulations (? needs more explanation) +* Translating data from one format to another (? an example would be good) + +## Programming Paradigms + +Until this section, we have mainly been writing procedural code. +In the previous episode, we have touched a bit upon classes, encapsulation and polymorphism, +which are characteristics of (but not limited to) the object-oriented programming (OOP). +In this episode, we mentioned [pure functions](./index.html#pure-functions) +and Functional Programming. + +These are examples of different [programming paradigms](/programming-paradigms/index.html) +and provide varied approaches to structuring your code - +each with certain strengths and weaknesses when used to solve particular types of problems. +In many cases, particularly with modern languages, a single language can allow many different +structural approaches and mixing programming paradigms within your code. +Once your software begins to get more complex - it is common to use aspects of [different paradigm](/programming-paradigms/index.html) +to handle different subtasks. +Because of this, it is useful to know about the [major paradigms](/programming-paradigms/index.html), +so you can recognise where it might be useful to switch. +This is outside of scope of this course - we have some extra episodes on the topics of +[procedural programming](/procedural-programming/index.html), +[functional programming](/functional-programming/index.html) and +[object-oriented programming](/object-oriented-programming/index.html) if you want to know more. + +> ## So Which One is Python? +> Python is a multi-paradigm and multi-purpose programming language. +> You can use it as a procedural language and you can use it in a more object oriented way. +> It does tend to land more on the object oriented side as all its core data types +> (strings, integers, floats, booleans, lists, +> sets, arrays, tuples, dictionaries, files) +> as well as functions, modules and classes are objects. +> +> Since functions in Python are also objects that can be passed around like any other object, +> Python is also well suited to functional programming. +> One of the most popular Python libraries for data manipulation, +> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), +> supports a functional programming style +> as most of its functions on data are not changing the data (no side effects) +> but producing a new data to reflect the result of the function. +{: .callout} + +## Software Design and Architecture + +In this section so far we have been talking about **software design** - the individual modules and +components of the software. We are now going to have a brief look into **software architecture** - +which is about the overall structure that these software components fit into, a *design pattern* +with a common successful use of software components. + +{% include links.md %} diff --git a/_episodes/35-software-architecture-revisited.md b/_episodes/35-software-architecture-revisited.md new file mode 100644 index 000000000..3a41acc94 --- /dev/null +++ b/_episodes/35-software-architecture-revisited.md @@ -0,0 +1,337 @@ +--- +title: "Software Architecture Revisited" +teaching: 15 +exercises: 30 +questions: +- "How do we handle code contributions that don't fit within our existing architecture?" +objectives: +- "Analyse new code to identify Model, View, Controller aspects." +- "Refactor new code to conform to an MVC architecture." +- "Adapt our existing code to include the new re-architected code." +keypoints: +- "Sometimes new, contributed code needs refactoring for it to fit within an existing codebase." +- "Try to leave the code in a better state that you found it." +--- + +In the previous few episodes we've looked at the importance and principles of good software architecture and design, +and how techniques such as code abstraction and refactoring fulfil that design within an implementation, +and help us maintain and improve it as our code evolves. + +Let us now return to software architecture and consider how we may refactor some new code to fit within our existing MVC architectural design using the techniques we have learnt so far. + +## Revisiting Our Software's Architecture + +Recall that in our software project, the **Controller** module is in `inflammation-analysis.py`, +and the View and Model modules are contained in +`inflammation/views.py` and `inflammation/models.py`, respectively. +Data underlying the Model is contained within the directory `data`. + +Looking at the code in the branch `full-data-analysis` (where we should be currently located), +we can notice that the new code was added in a separate script `inflammation/compute_data.py` and +contains a mix of Model, View and Controller code. + +> ## Exercise: Identify Model, View and Controller Parts of the Code +> Looking at the code inside `compute_data.py`, what parts could be considered +> Model, View and Controller code? +> +>> ## Solution +>> * Computing the standard deviation belongs to Model. +>> * Reading the data from CSV files also belongs to Model. +>> * Displaying of the output as a graph is View. +>> * The logic that processes the supplied files is Controller. +> {: .solution} +{: .challenge} + +Within the Model further separations make sense. +For example, as we did in the before, separating out the impure code that interacts with +the file system from the pure calculations helps with readability and testability. +Nevertheless, the MVC architectural pattern is a great starting point when thinking about +how you should structure your code. + +> ## Exercise: Split out the Model, View and Controller Code +> Refactor `analyse_data()` function so that the Model, View and Controller code +> we identified in the previous exercise is moved to appropriate modules. +>> ## Solution +>> The idea here is for the `analyse_data()` function not to have any "view" considerations. +>> That is, it should just compute and return the data and +>> should be located in `inflammation/models.py`. +>> +>> ```python +>> def analyse_data(data_source): +>> """Calculate the standard deviation by day between datasets +>> Gets all the inflammation csvs within a directory, works out the mean +>> inflammation value for each day across all datasets, then graphs the +>> standard deviation of these means.""" +>> data = data_source.load_inflammation_data() +>> daily_standard_deviation = compute_standard_deviation_by_data(data) +>> +>> return daily_standard_deviation +>> ``` +>> There can be a separate bit of code in the Controller `inflammation-analysis.py` +>> that chooses how data should be presented, e.g. as a graph: +>> +>> ```python +>> if args.full_data_analysis: +>> _, extension = os.path.splitext(InFiles[0]) +>> if extension == '.json': +>> data_source = JSONDataSource(os.path.dirname(InFiles[0])) +>> elif extension == '.csv': +>> data_source = CSVDataSource(os.path.dirname(InFiles[0])) +>> else: +>> raise ValueError(f'Unsupported file format: {extension}') +>> data_result = analyse_data(data_source) +>> graph_data = { +>> 'standard deviation by day': data_result, +>> } +>> views.visualize(graph_data) +>> return +>> ``` +>> Note that this is, more or less, the change we did to write our regression test. +>> This demonstrates that splitting up Model code from View code can +>> immediately make your code much more testable. +>> Ensure you re-run our regression test to check this refactoring has not +>> changed the output of `analyse_data()`. +> {: .solution} +{: .challenge} + +At this point, you have refactored and tested all the code on branch `full-data-analysis` +and it is working as expected. The branch is ready to be incorporated into `develop` +and then, later on, `main`, which may also have been changed by other developers working on +the code at the same time so make sure to update accordingly or resolve any conflicts. + +~~~ +$ git switch develop +$ git merge full-data-analysis +~~~ +{: .language-bash} + +Let's now have a closer look at our Controller, and how can handling command line arguments in Python +(which is something you may find yourself doing often if you need to run the code from a +command line tool). + +### Controller Structure + +You will have noticed already that structure of the `inflammation-analysis.py` file +follows this pattern: + +~~~ +# import modules + +def main(args): + # perform some actions + +if __name__ == "__main__": + # perform some actions before main() + main(args) +~~~ +{: .language-python} + +In this pattern the actions performed by the script are contained within the `main` function +(which does not need to be called `main`, +but using this convention helps others in understanding your code). +The `main` function is then called within the `if` statement `__name__ == "__main__"`, +after some other actions have been performed +(usually the parsing of command-line arguments, which will be explained below). +`__name__` is a special dunder variable which is set, +along with a number of other special dunder variables, +by the python interpreter before the execution of any code in the source file. +What value is given by the interpreter to `__name__` is determined by +the manner in which it is loaded. + +If we run the source file directly using the Python interpreter, e.g.: + +~~~ +$ python3 inflammation-analysis.py +~~~ +{: .language-bash} + +then the interpreter will assign the hard-coded string `"__main__"` to the `__name__` variable: + +~~~ +__name__ = "__main__" +... +# rest of your code +~~~ +{: .language-python} + +However, if your source file is imported by another Python script, e.g: + +~~~ +import inflammation-analysis +~~~ +{: .language-python} + +then the interpreter will assign the name `"inflammation-analysis"` +from the import statement to the `__name__` variable: + +~~~ +__name__ = "inflammation-analysis" +... +# rest of your code +~~~ +{: .language-python} + +Because of this behaviour of the interpreter, +we can put any code that should only be executed when running the script +directly within the `if __name__ == "__main__":` structure, +allowing the rest of the code within the script to be +safely imported by another script if we so wish. + +While it may not seem very useful to have your controller script importable by another script, +there are a number of situations in which you would want to do this: + +- for testing of your code, you can have your testing framework import the main script, + and run special test functions which then call the `main` function directly; +- where you want to not only be able to run your script from the command-line, + but also provide a programmer-friendly application programming interface (API) for advanced users. + +### Passing Command-line Options to Controller + +The standard Python library for reading command line arguments passed to a script is +[`argparse`](https://docs.python.org/3/library/argparse.html). +This module reads arguments passed by the system, +and enables the automatic generation of help and usage messages. +These include, as we saw at the start of this course, +the generation of helpful error messages when users give the program invalid arguments. + +The basic usage of `argparse` can be seen in the `inflammation-analysis.py` script. +First we import the library: + +~~~ +import argparse +~~~ +{: .language-python} + +We then initialise the argument parser class, passing an (optional) description of the program: + +~~~ +parser = argparse.ArgumentParser( + description='A basic patient inflammation data management system') +~~~ +{: .language-python} + +Once the parser has been initialised we can add +the arguments that we want argparse to look out for. +In our basic case, we want only the names of the file(s) to process: + +~~~ +parser.add_argument( + 'infiles', + nargs='+', + help='Input CSV(s) containing inflammation series for each patient') +~~~ +{: .language-python} + +Here we have defined what the argument will be called (`'infiles'`) when it is read in; +the number of arguments to be expected +(`nargs='+'`, where `'+'` indicates that there should be 1 or more arguments passed); +and a help string for the user +(`help='Input CSV(s) containing inflammation series for each patient'`). + +You can add as many arguments as you wish, +and these can be either mandatory (as the one above) or optional. +Most of the complexity in using `argparse` is in adding the correct argument options, +and we will explain how to do this in more detail below. + +Finally we parse the arguments passed to the script using: + +~~~ +args = parser.parse_args() +~~~ +{: .language-python} + +This returns an object (that we have called `args`) containing all the arguments requested. +These can be accessed using the names that we have defined for each argument, +e.g. `args.infiles` would return the filenames that have been input. + +The help for the script can be accessed using the `-h` or `--help` optional argument +(which `argparse` includes by default): + +~~~ +$ python3 inflammation-analysis.py --help +~~~ +{: .language-bash} + +~~~ +usage: inflammation-analysis.py [-h] infiles [infiles ...] + +A basic patient inflammation data management system + +positional arguments: + infiles Input CSV(s) containing inflammation series for each patient + +optional arguments: + -h, --help show this help message and exit +~~~ +{: .output} + +The help page starts with the command line usage, +illustrating what inputs can be given (any within `[]` brackets are optional). +It then lists the **positional** and **optional** arguments, +giving as detailed a description of each as you have added to the `add_argument()` command. +Positional arguments are arguments that need to be included +in the proper position or order when calling the script. + +Note that optional arguments are indicated by `-` or `--`, followed by the argument name. +Positional arguments are simply inferred by their position. +It is possible to have multiple positional arguments, +but usually this is only practical where all (or all but one) positional arguments +contains a clearly defined number of elements. +If more than one option can have an indeterminate number of entries, +then it is better to create them as 'optional' arguments. +These can be made a required input though, +by setting `required = True` within the `add_argument()` command. + +> ## Positional and Optional Argument Order +> +> The usage section of the help page above shows +> the optional arguments going before the positional arguments. +> This is the customary way to present options, but is not mandatory. +> Instead there are two rules which must be followed for these arguments: +> +> 1. Positional and optional arguments must each be given all together, and not inter-mixed. + For example, the order can be either "optional, positional" or "positional, optional", + but not "optional, positional, optional". +> 2. Positional arguments must be given in the order that they are shown +in the usage section of the help page. +{: .callout} + + +### Additional Reading Material & References + +Now that we have covered and revisited [software architecture](/software-architecture-extra/index.html) +and [different programming paradigms](/programming-paradigms/index.html) +and how we can integrate them into our architecture, +there are two optional extra episodes which you may find interesting. + +Both episodes cover the persistence layer of software architectures +and methods of persistently storing data, but take different approaches. +The episode on [persistence with JSON](../persistence) covers +some more advanced concepts in Object Oriented Programming, while +the episode on [databases](../databases) starts to build towards a true multilayer architecture, +which would allow our software to handle much larger quantities of data. + +## Towards Collaborative Software Development + +Having looked at some aspects of software design and architecture, +we are now circling back to implementing our software design +and developing our software to satisfy the requirements collaboratively in a team. +At an intermediate level of software development, +there is a wealth of practices that could be used, +and applying suitable design and coding practices is what separates +an intermediate developer from someone who has just started coding. +The key for an intermediate developer is to balance these concerns +for each software project appropriately, +and employ design and development practices enough so that progress can be made. + +One practice that should always be considered, +and has been shown to be very effective in team-based software development, +is that of *code review*. +Code reviews help to ensure the 'good' coding standards are achieved +and maintained within a team by having multiple people +have a look and comment on key code changes to see how they fit within the codebase. +Such reviews check the correctness of the new code, test coverage, functionality changes, +and confirm that they follow the coding guides and best practices. +Let's have a look at some code review techniques available to us. + +{% include links.md %} diff --git a/_episodes/36-architecture-revisited.md b/_episodes/36-architecture-revisited.md deleted file mode 100644 index 0e6ed0186..000000000 --- a/_episodes/36-architecture-revisited.md +++ /dev/null @@ -1,450 +0,0 @@ ---- -title: "Architecture Revisited: Extending Software" -teaching: 15 -exercises: 0 -questions: -- "How can we extend our software within the constraints of the MVC architecture?" -objectives: -- "Extend our software to add a view of a single patient in the study and the software's command line interface to request a specific view." -keypoints: -- "By breaking down our software into components with a single responsibility, we avoid having to rewrite it all when requirements change. - Such components can be as small as a single function, or be a software package in their own right." ---- - -As we have seen, we have different programming paradigms that are suitable for different problems -and affect the structure of our code. -In programming languages that support multiple paradigms, such as Python, -we have the luxury of using elements of different paradigms paradigms and we, -as software designers and programmers, -can decide how to use those elements in different architectural components of our software. -Let's now circle back to the architecture of our software for one final look. - -## MVC Revisited - -We've been developing our software using the **Model-View-Controller** (MVC) architecture so far, -but, as we have seen, MVC is just one of the common architectural patterns -and is not the only choice we could have made. - -There are many variants of an MVC-like pattern (such as -[Model-View-Presenter](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93presenter) (MVP), -[Model-View-Viewmodel](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel) (MVVM), etc.), -but in most cases, the distinction between these patterns isn't particularly important. -What really matters is that we are making decisions about the architecture of our software -that suit the way in which we expect to use it. -We should reuse these established ideas where we can, but we don't need to stick to them exactly. - -In this episode we'll be taking our Object Oriented code from the previous episode -and integrating it into our existing MVC pattern. -But first we will explain some features of -the Controller (`inflammation-analysis.py`) component of our architecture. - -### Controller Structure - -You will have noticed already that structure of the `inflammation-analysis.py` file -follows this pattern: - -~~~ -# import modules - -def main(): - # perform some actions - -if __name__ == "__main__": - # perform some actions before main() - main() -~~~ -{: .language-python} - -In this pattern the actions performed by the script are contained within the `main` function -(which does not need to be called `main`, -but using this convention helps others in understanding your code). -The `main` function is then called within the `if` statement `__name__ == "__main__"`, -after some other actions have been performed -(usually the parsing of command-line arguments, which will be explained below). -`__name__` is a special dunder variable which is set, -along with a number of other special dunder variables, -by the python interpreter before the execution of any code in the source file. -What value is given by the interpreter to `__name__` is determined by -the manner in which it is loaded. - -If we run the source file directly using the Python interpreter, e.g.: - -~~~ -$ python3 inflammation-analysis.py -~~~ -{: .language-bash} - -then the interpreter will assign the hard-coded string `"__main__"` to the `__name__` variable: - -~~~ -__name__ = "__main__" -... -# rest of your code -~~~ -{: .language-python} - -However, if your source file is imported by another Python script, e.g: - -~~~ -import inflammation-analysis -~~~ -{: .language-python} - -then the interpreter will assign the name `"inflammation-analysis"` -from the import statement to the `__name__` variable: - -~~~ -__name__ = "inflammation-analysis" -... -# rest of your code -~~~ -{: .language-python} - -Because of this behaviour of the interpreter, -we can put any code that should only be executed when running the script -directly within the `if __name__ == "__main__":` structure, -allowing the rest of the code within the script to be -safely imported by another script if we so wish. - -While it may not seem very useful to have your controller script importable by another script, -there are a number of situations in which you would want to do this: - -- for testing of your code, you can have your testing framework import the main script, - and run special test functions which then call the `main` function directly; -- where you want to not only be able to run your script from the command-line, - but also provide a programmer-friendly application programming interface (API) for advanced users. - -### Passing Command-line Options to Controller - -The standard Python library for reading command line arguments passed to a script is -[`argparse`](https://docs.python.org/3/library/argparse.html). -This module reads arguments passed by the system, -and enables the automatic generation of help and usage messages. -These include, as we saw at the start of this course, -the generation of helpful error messages when users give the program invalid arguments. - -The basic usage of `argparse` can be seen in the `inflammation-analysis.py` script. -First we import the library: - -~~~ -import argparse -~~~ -{: .language-python} - -We then initialise the argument parser class, passing an (optional) description of the program: - -~~~ -parser = argparse.ArgumentParser( - description='A basic patient inflammation data management system') -~~~ -{: .language-python} - -Once the parser has been initialised we can add -the arguments that we want argparse to look out for. -In our basic case, we want only the names of the file(s) to process: - -~~~ -parser.add_argument( - 'infiles', - nargs='+', - help='Input CSV(s) containing inflammation series for each patient') -~~~ -{: .language-python} - -Here we have defined what the argument will be called (`'infiles'`) when it is read in; -the number of arguments to be expected -(`nargs='+'`, where `'+'` indicates that there should be 1 or more arguments passed); -and a help string for the user -(`help='Input CSV(s) containing inflammation series for each patient'`). - -You can add as many arguments as you wish, -and these can be either mandatory (as the one above) or optional. -Most of the complexity in using `argparse` is in adding the correct argument options, -and we will explain how to do this in more detail below. - -Finally we parse the arguments passed to the script using: - -~~~ -args = parser.parse_args() -~~~ -{: .language-python} - -This returns an object (that we've called `arg`) containing all the arguments requested. -These can be accessed using the names that we have defined for each argument, -e.g. `args.infiles` would return the filenames that have been input. - -The help for the script can be accessed using the `-h` or `--help` optional argument -(which `argparse` includes by default): - -~~~ -$ python3 inflammation-analysis.py --help -~~~ -{: .language-bash} - -~~~ -usage: inflammation-analysis.py [-h] infiles [infiles ...] - -A basic patient inflammation data management system - -positional arguments: - infiles Input CSV(s) containing inflammation series for each patient - -optional arguments: - -h, --help show this help message and exit -~~~ -{: .output} - -The help page starts with the command line usage, -illustrating what inputs can be given (any within `[]` brackets are optional). -It then lists the **positional** and **optional** arguments, -giving as detailed a description of each as you have added to the `add_argument()` command. -Positional arguments are arguments that need to be included -in the proper position or order when calling the script. - -Note that optional arguments are indicated by `-` or `--`, followed by the argument name. -Positional arguments are simply inferred by their position. -It is possible to have multiple positional arguments, -but usually this is only practical where all (or all but one) positional arguments -contains a clearly defined number of elements. -If more than one option can have an indeterminate number of entries, -then it is better to create them as 'optional' arguments. -These can be made a required input though, -by setting `required = True` within the `add_argument()` command. - -> ## Positional and Optional Argument Order -> -> The usage section of the help page above shows -> the optional arguments going before the positional arguments. -> This is the customary way to present options, but is not mandatory. -> Instead there are two rules which must be followed for these arguments: -> -> 1. Positional and optional arguments must each be given all together, and not inter-mixed. -> For example, the order can be either `optional - positional` or `positional - optional`, -> but not `optional - positional - optional`. -> 2. Positional arguments must be given in the order that they are shown -> in the usage section of the help page. -{: .callout} - -Now that you have some familiarity with `argparse`, -we will demonstrate below how you can use this to add extra functionality to your controller. - -### Adding a New View - -Let's start with adding a view that allows us to see the data for a single patient. -First, we need to add the code for the view itself -and make sure our `Patient` class has the necessary data - -including the ability to pass a list of measurements to the `__init__` method. -Note that your Patient class may look very different now, -so adapt this example to fit what you have. - -~~~ -# file: inflammation/views.py - -... - -def display_patient_record(patient): - """Display data for a single patient.""" - print(patient.name) - for obs in patient.observations: - print(obs.day, obs.value) -~~~ -{: .language-python} - -~~~ -# file: inflammation/models.py - -... - -class Observation: - def __init__(self, day, value): - self.day = day - self.value = value - - def __str__(self): - return self.value - -class Person: - def __init__(self, name): - self.name = name - - def __str__(self): - return self.name - -class Patient(Person): - """A patient in an inflammation study.""" - def __init__(self, name, observations=None): - super().__init__(name) - - self.observations = [] - ### MODIFIED START ### - if observations is not None: - self.observations = observations - ### MODIFIED END ### - - def add_observation(self, value, day=None): - if day is None: - try: - day = self.observations[-1].day + 1 - - except IndexError: - day = 0 - - new_observation = Observation(day, value) - - self.observations.append(new_observation) - return new_observation -~~~ -{: .language-python} - -Now we need to make sure people can call this view - -that means connecting it to the controller -and ensuring that there's a way to request this view when running the program. -The changes we need to make here are that the `main` function -needs to be able to direct us to the view we've requested - -and we need to add to the command line interface - the controller - -the necessary data to drive the new view. - -~~~ -# file: inflammation-analysis.py - -#!/usr/bin/env python3 -"""Software for managing patient data in our imaginary hospital.""" - -import argparse - -from inflammation import models, views - - -def main(args): - """The MVC Controller of the patient data system. - - The Controller is responsible for: - - selecting the necessary models and views for the current task - - passing data between models and views - """ - infiles = args.infiles - if not isinstance(infiles, list): - infiles = [args.infiles] - - for filename in infiles: - inflammation_data = models.load_csv(filename) - - ### MODIFIED START ### - if args.view == 'visualize': - view_data = { - 'average': models.daily_mean(inflammation_data), - 'max': models.daily_max(inflammation_data), - 'min': models.daily_min(inflammation_data), - } - - views.visualize(view_data) - - elif args.view == 'record': - patient_data = inflammation_data[args.patient] - observations = [models.Observation(day, value) for day, value in enumerate(patient_data)] - patient = models.Patient('UNKNOWN', observations) - - views.display_patient_record(patient) - ### MODIFIED END ### - - -if __name__ == "__main__": - parser = argparse.ArgumentParser( - description='A basic patient data management system') - - parser.add_argument( - 'infiles', - nargs='+', - help='Input CSV(s) containing inflammation series for each patient') - - ### MODIFIED START ### - parser.add_argument( - '--view', - default='visualize', - choices=['visualize', 'record'], - help='Which view should be used?') - - parser.add_argument( - '--patient', - type=int, - default=0, - help='Which patient should be displayed?') - ### MODIFIED END ### - - args = parser.parse_args() - - main(args) -~~~ -{: .language-python} - -We've added two options to our command line interface here: -one to request a specific view and one for the patient ID that we want to lookup. -For the full range of features that we have access to with `argparse` see the -[Python module documentation](https://docs.python.org/3/library/argparse.html?highlight=argparse#module-argparse). -Allowing the user to request a specific view like this is -a similar model to that used by the popular Python library Click - -if you find yourself needing to build more complex interfaces than this, -Click would be a good choice. -You can find more information in [Click's documentation](https://click.palletsprojects.com/). - -For now, we also don't know the names of any of our patients, -so we've made it `'UNKNOWN'` until we get more data. - -We can now call our program with these extra arguments to see the record for a single patient: - -~~~ -$ python3 inflammation-analysis.py --view record --patient 1 data/inflammation-01.csv -~~~ -{: .language-bash} - -~~~ -UNKNOWN -0 0.0 -1 0.0 -2 1.0 -3 3.0 -4 1.0 -5 2.0 -6 4.0 -7 7.0 -... -~~~ -{: .output} - -> ## Additional Material -> -> Now that we've covered the basics of different programming paradigms -> and how we can integrate them into our multi-layer architecture, -> there are two optional extra episodes which you may find interesting. -> -> Both episodes cover the persistence layer of software architectures -> and methods of persistently storing data, but take different approaches. -> The episode on [persistence with JSON](/persistence) covers -> some more advanced concepts in Object Oriented Programming, while -> the episode on [databases](/databases) starts to build towards a true multilayer architecture, -> which would allow our software to handle much larger quantities of data. -{: .callout} - - -## Towards Collaborative Software Development - -Having looked at some theoretical aspects of software design, -we are now circling back to implementing our software design -and developing our software to satisfy the requirements collaboratively in a team. -At an intermediate level of software development, -there is a wealth of practices that could be used, -and applying suitable design and coding practices is what separates -an intermediate developer from someone who has just started coding. -The key for an intermediate developer is to balance these concerns -for each software project appropriately, -and employ design and development practices enough so that progress can be made. - -One practice that should always be considered, -and has been shown to be very effective in team-based software development, -is that of *code review*. -Code reviews help to ensure the 'good' coding standards are achieved -and maintained within a team by having multiple people -have a look and comment on key code changes to see how they fit within the codebase. -Such reviews check the correctness of the new code, test coverage, functionality changes, -and confirm that they follow the coding guides and best practices. -Let's have a look at some code review techniques available to us. diff --git a/_extras/databases.md b/_extras/databases.md index b4bc67a65..9f0267a27 100644 --- a/_extras/databases.md +++ b/_extras/databases.md @@ -1,5 +1,5 @@ --- -title: "Additional Material: Databases" +title: "Databases" layout: episode teaching: 30 exercises: 30 @@ -16,7 +16,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../36-architecture-revisited/index.html#additional-material). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#conclusion). {: .callout} A **database** is an organised collection of data, diff --git a/_episodes/34-functional-programming.md b/_extras/functional-programming.md similarity index 98% rename from _episodes/34-functional-programming.md rename to _extras/functional-programming.md index 750a0e235..a9b5fb30d 100644 --- a/_episodes/34-functional-programming.md +++ b/_extras/functional-programming.md @@ -2,6 +2,7 @@ title: "Functional Programming" teaching: 30 exercises: 30 +layout: episode questions: - What is functional programming? - Which situations/problems is functional programming well suited for? @@ -95,14 +96,14 @@ def factorial(n): ~~~ {: .language-python} -Note: You may have noticed that both functions in the above code examples have the same signature +***Note:** You may have noticed that both functions in the above code examples have the same signature (i.e. they take an integer number as input and return its factorial as output). You could easily swap these equivalent implementations without changing the way that the function is invoked. Remember, a single piece of software may well contain instances of multiple programming paradigms - including procedural, functional and object-oriented - it is up to you to decide which one to use and when to switch -based on the problem at hand and your personal coding style. +based on the problem at hand and your personal coding style.* Functional computations only rely on the values that are provided as inputs to a function and not on the state of the program that precedes the function call. @@ -237,7 +238,7 @@ before aggregating all intermediate results into the final result. ### Mapping `map(f, C)` is a function takes another function `f()` and a collection `C` of data items as inputs. -Calling `map(f, L)` applies the function `f(x)` to every data item `x` in a collection `C` +Calling `map(f, C)` applies the function `f(x)` to every data item `x` in a collection `C` and returns the resulting values as a new collection of the same size. This is a simple mapping that takes a list of names and @@ -338,10 +339,10 @@ print(list(result)) > > ~~~ > > {: .language-python} > > -> > Note: `map()` function returns a map iterator object +> > ***Note:** `map()` function returns a map iterator object > > which needs to be converted to a collection object > > (such as a list, dictionary, set, tuple) -> > using the corresponding "factory" function (in our case `list()`). +> > using the corresponding "factory" function (in our case `list()`).* > {: .solution} {: .challenge} @@ -624,10 +625,10 @@ def sum_of_squares(sequence): > > Hints: > - Remember that you can define an `initialiser` value with `reduce()` -> to help you start the counter + > to help you start the counter > - If defining a lambda expression, -> note that it can conditionally return different values using the syntax -> ` if else ` in the expression. + > note that it can conditionally return different values using the syntax + > ` if else ` in the expression. > > > ## Solution > > Using a separate function: diff --git a/_episodes/35-object-oriented-programming.md b/_extras/object-oriented-programming.md similarity index 99% rename from _episodes/35-object-oriented-programming.md rename to _extras/object-oriented-programming.md index 01413497a..a23bd6305 100644 --- a/_episodes/35-object-oriented-programming.md +++ b/_extras/object-oriented-programming.md @@ -1,7 +1,8 @@ --- title: "Object Oriented Programming" +layout: episode teaching: 30 -exercises: 20 +exercises: 35 questions: - "How can we use code to describe the structure of data?" - "How should the relationships between structures be described?" @@ -26,7 +27,7 @@ Data is encapsulated in the form of fields (attributes) of objects, while code is encapsulated in the form of procedures (methods) that manipulate objects' attributes and define "behaviour" of objects. So, in object oriented programming, -we first think about the data and the things that we’re modelling - +we first think about the data and the things that we are modelling - and represent these by objects - rather than define the logic of the program, and code becomes a series of interactions between objects. @@ -76,9 +77,9 @@ patients = [ > which can be used to attach names to our patient dataset. > When used as below, it should produce the expected output. > -> If you're not sure where to begin, +> If you are not sure where to begin, > think about ways you might be able to effectively loop over two collections at once. -> Also, don't worry too much about the data type of the `data` value, +> Also, do not worry too much about the data type of the `data` value, > it can be a Python list, or a NumPy array - either is fine. > > ~~~ @@ -104,6 +105,7 @@ patients = [ > ~~~ > {: .output} > +> Time: 10 min > > ## Solution > > > > One possible solution, perhaps the most obvious, @@ -460,6 +462,7 @@ section of the Python documentation. > ~~~ > {: .output} > +> Time: 5 min > > ## Solution > > > > ~~~ @@ -800,6 +803,7 @@ before we can properly initialise a `Patient` model with their inflammation data > explain them and how you implemented them to your neighbour. > Would they have implemented that feature in the same way? > +> Time: 20 min > > ## Solution > > One example solution is shown below. > > You may start by writing some tests (that will initially fail), @@ -902,3 +906,4 @@ before we can properly initialise a `Patient` model with their inflammation data {: .challenge} {% include links.md %} + diff --git a/_extras/persistence.md b/_extras/persistence.md index f071c82ff..47fe9cf43 100644 --- a/_extras/persistence.md +++ b/_extras/persistence.md @@ -1,5 +1,5 @@ --- -title: "Additional Material: Persistence" +title: "Persistence" layout: episode teaching: 25 exercises: 25 @@ -25,7 +25,7 @@ keypoints: > ## Follow up from Section 3 > This episode could be read as a follow up from the end of -> [Section 3 on software design and development](../36-architecture-revisited/index.html#additional-material). +> [Section 3 on software design and development](../35-refactoring-architecture/index.html#conclusion). {: .callout} Our patient data system so far can read in some data, process it, and display it to people. diff --git a/_extras/procedural-programming.md b/_extras/procedural-programming.md new file mode 100644 index 000000000..77e4141e2 --- /dev/null +++ b/_extras/procedural-programming.md @@ -0,0 +1,71 @@ +--- +title: "Procedural Programming" +teaching: 10 +exercises: 0 +layout: episode +questions: +- "What is procedural programming?" +- "Which situations/problems is procedural programming well suited for?" +objectives: +- "Describe the core concepts that define the procedural programming paradigm" +- "Describe the main characteristics of code that is written in procedural programming style" +keypoints: +- "Procedural Programming emphasises a structured approach to coding, using a sequence of tasks and subroutines to create a well-organised program." +--- + +In procedural programming code is grouped into +procedures (also known as routines - reusable piece of code that performs a specific action but +have no return value) and functions (similar to procedures but return value after an execution). +Procedures and function both perform a single task, with exactly one entry and one exit point and +containing a series of logical steps (instructions) to be carried out. +The primary concern is the *process* through which the input is transformed into the desired output. + +Key features of procedural programming include: + +* Sequence control: the code execution process goes through the steps in a defined order, with clear starting and ending points. +* Modularity: code can be divided into separate modules or procedures to perform specific tasks, making it easier to maintain and reuse. +* Standard data structures: Procedural Programming makes use of standard data structures such as +arrays, lists, and records to store and manipulate data efficiently. +* Abstraction: procedures encapsulate complex operations and allow them to be represented as simple, high-level commands. +* Execution control: variable implementations of loops, branches, and jumps give more control over the flow of execution. + +To better understand procedural programming, it is useful to compare it with other prevalent +programming paradigms such as +[object-oriented programming](/object-oriented-programming/index.html) (OOP) +and [functional programming](/functional-programming/index.html) +to shed light on their distinctions, advantages, and drawbacks. + +Procedural programming uses a very detailed list of instructions to tell the computer what to do +step by step. This approach uses iteration to repeat a series of steps as often as needed. +Functional programming is an approach to problem solving that treats every computation as a +mathematical function (an expression) and relies more heavily on recursion as a primary control +structure (rather than iteration). +Procedural languages treat data and procedures as two different +entities whereas, in functional programming, code is also treated as data - functions +can take other functions as arguments or return them as results. +Compare and contract [two different implementations](/functional-programming/index.html#functional-vs-procedural-programming) +of the same functionality in procedural and functional programming styles +to better grasp their differences. + +Procedural and [object-oriented programming](/object-oriented-programming/index.html) have fundamental differences in their approach to +organising code and solving problems. +In procedural programming, the code is structured around functions and procedures that execute a +specific task or operations. Object-oriented programming is based around objects and classes, +where data is encapsulated within objects and methods on objects that used to manipulate that data. +Both procedural and object-oriented programming paradigms support [abstraction and modularization](/33-code-decoupling-abstractions/index.html). +Procedural programming achieves this through procedures and functions, while OOP uses classes and +objects. +However, OOP goes further by encapsulating related data and methods within objects, +enabling a higher level of abstraction and separation between different components. +Inheritance and polymorphism are two vital features provided by OOP, which are not intrinsically +supported by procedural languages. [Inheritance](/object-oriented-programming/index.html#inheritance) allows the creation of classes that inherit +properties and methods from existing classes – enabling code reusability and reducing redundancy. +[Polymorphism](/33-code-decoupling-abstractions/index.html#polymorphism) permits a single function or method to operate on multiple data types or objects, +improving flexibility and adaptability. + +The choice between procedural, functional and object-oriented programming depends primarily on +the specific project requirements and personal preference. +Procedural programming may be more suitable for smaller projects, whereas OOP is typically +preferred for larger and more complex projects, especially when working in a team. +Functional programming can offer more elegant and scalable solutions for complex problems, +particularly in parallel computing. diff --git a/_episodes/33-programming-paradigms.md b/_extras/programming-paradigms.md similarity index 74% rename from _episodes/33-programming-paradigms.md rename to _extras/programming-paradigms.md index 520708b54..b22d8e269 100644 --- a/_episodes/33-programming-paradigms.md +++ b/_extras/programming-paradigms.md @@ -1,11 +1,10 @@ --- title: "Programming Paradigms" -start: false -teaching: 10 +teaching: 20 exercises: 0 +layout: episode questions: -- "How does the structure of a problem affect the structure of our code?" -- "How can we use common software paradigms to improve the quality of our software?" +- "What should we consider when designing software?" objectives: - "Describe some of the major software paradigms we can use to classify programming languages." keypoints: @@ -15,12 +14,10 @@ keypoints: - "A single piece of software will often contain instances of multiple paradigms." --- -## Introduction +## Programming Paradigms -As you become more experienced in software development it becomes increasingly important -to understand the wider landscape in which you operate, -particularly in terms of the software decisions the people around you made and why? -Today, there are a multitude of different programming languages, +In addition to [architectural decisions](/software-architecture-extra/index.html) on bigger components of your code, it is important +to understand the wider landscape of programming paradigms and languages, with each supporting at least one way to approach a problem and structure your code. In many cases, particularly with modern languages, a single language can allow many different structural approaches within your code. @@ -29,8 +26,8 @@ One way to categorise these structural approaches is into **paradigms**. Each paradigm represents a slightly different way of thinking about and structuring our code and each has certain strengths and weaknesses when used to solve particular types of problems. Once your software begins to get more complex -it's common to use aspects of different paradigms to handle different subtasks. -Because of this, it's useful to know about the major paradigms, +it is common to use aspects of different paradigms to handle different subtasks. +Because of this, it is useful to know about the major paradigms, so you can recognise where it might be useful to switch. There are two major families that we can group the common programming paradigms into: @@ -49,17 +46,17 @@ Note, however, that most of the languages can be used with multiple paradigms, and it is common to see multiple paradigms within a single program - so this classification of programming languages based on the paradigm they use isn't as strict. -## Procedural Programming +### Procedural Programming Procedural Programming comes from a family of paradigms known as the Imperative Family. With paradigms in this family, we can think of our code as the instructions for processing data. -Procedural Programming is probably the style you're most familiar with +Procedural Programming is probably the style you are most familiar with and the one we used up to this point, where we group code into *procedures performing a single task, with exactly one entry and one exit point*. In most modern languages we call these **functions**, instead of procedures - -so if you're grouping your code into functions, this might be the paradigm you're using. +so if you are grouping your code into functions, this might be the paradigm you're using. By grouping code like this, we make it easier to reason about the overall structure, since we should be able to tell roughly what a function does just by looking at its name. These functions are also much easier to reuse than code outside of functions, @@ -68,12 +65,12 @@ since we can call them from any part of our program. So far we have been using this technique in our code - it contains a list of instructions that execute one after the other starting from the top. This is an appropriate choice for smaller scripts and software -that we're writing just for a single use. +that we are writing just for a single use. Aside from smaller scripts, Procedural Programming is also commonly seen in code focused on high performance, with relatively simple data structures, such as in High Performance Computing (HPC). These programs tend to be written in C (which doesn't support Object Oriented Programming) -or Fortran (which didn't until recently). +or Fortran (which did not until recently). HPC code is also often written in C++, but C++ code would more commonly follow an Object Oriented style, though it may have procedural sections. @@ -84,9 +81,12 @@ because it uses functions rather than objects, but this is incorrect. Functional Programming is a separate paradigm that places much stronger constraints on the behaviour of a function -and structures the code differently as we'll see soon. +and structures the code differently as we will see soon. -## Functional Programming +You can read more in an [extra episode on Procedural Programming](/procedural-programming/index.html). + + +### Functional Programming Functional Programming comes from a different family of paradigms - known as the Declarative Family. @@ -116,10 +116,12 @@ With datasets like this, we can't move the data around easily, so we often want to send our code to where the data is instead. By writing our code in a functional style, we also gain the ability to run many operations in parallel -as it's guaranteed that each operation won't interact with any of the others - +as it is guaranteed that each operation won't interact with any of the others - this is essential if we want to process this much data in a reasonable amount of time. -## Object Oriented Programming +You can read more in an [extra episode on Functional Programming](/functional-programming/index.html). + +### Object Oriented Programming Object Oriented Programming focuses on the specific characteristics of each object and what each object can do. @@ -127,8 +129,8 @@ An object has two fundamental parts - properties (characteristics) and behaviour In Object Oriented Programming, we first think about the data and the things that we're modelling - and represent these by objects. -For example, if we're writing a simulation for our chemistry research, -we're probably going to need to represent atoms and molecules. +For example, if we are writing a simulation for our chemistry research, +we are probably going to need to represent atoms and molecules. Each of these has a set of properties which we need to know about in order for our code to perform the tasks we want - in this case, for example, we often need to know the mass and electric charge of each atom. @@ -145,22 +147,7 @@ Most people would classify Object Oriented Programming as an (with the extra feature being the objects), but [others disagree](https://stackoverflow.com/questions/38527078/what-is-the-difference-between-imperative-and-object-oriented-programming). -> ## So Which one is Python? -> Python is a multi-paradigm and multi-purpose programming language. -> You can use it as a procedural language and you can use it in a more object oriented way. -> It does tend to land more on the object oriented side as all its core data types -> (strings, integers, floats, booleans, lists, -> sets, arrays, tuples, dictionaries, files) -> as well as functions, modules and classes are objects. -> -> Since functions in Python are also objects that can be passed around like any other object, -> Python is also well suited to functional programming. -> One of the most popular Python libraries for data manipulation, -> [Pandas](https://pandas.pydata.org/) (built on top of NumPy), -> supports a functional programming style -> as most of its functions on data are not changing the data (no side effects) -> but producing a new data to reflect the result of the function. -{: .callout} +You can read more in an [extra episode on Object Oriented Programming](/object-oriented-programming/index.html). ## Other Paradigms @@ -168,8 +155,10 @@ The three paradigms introduced here are some of the most common, but there are many others which may be useful for addressing specific classes of problem - for much more information see the Wikipedia's page on [programming paradigms](https://en.wikipedia.org/wiki/Programming_paradigm). -Having mainly used Procedural Programming so far, -we will now have a closer look at Functional and Object Oriented Programming paradigms -and how they can affect our architectural design choices. + +We have mainly used Procedural Programming in this lesson, but you can +have a closer look at [Functional](/functional-programming/index.html) and +[Object Oriented Programming](/object-oriented-programming/index.html) paradigms +in extra episodes and how they can affect our architectural design choices. {% include links.md %} diff --git a/_extras/protect-main-branch.md b/_extras/protect-main-branch.md index b358f726e..c9745fe86 100644 --- a/_extras/protect-main-branch.md +++ b/_extras/protect-main-branch.md @@ -1,5 +1,5 @@ --- -title: "Additional Material: Protecting the Main Branch on a Shared GitHub Repository" +title: "Protecting the Main Branch on a Shared GitHub Repository" --- ## Introduction diff --git a/_extras/software-architecture-extra.md b/_extras/software-architecture-extra.md new file mode 100644 index 000000000..550205d2c --- /dev/null +++ b/_extras/software-architecture-extra.md @@ -0,0 +1,77 @@ +--- +title: "Software Architecture" +teaching: 15 +exercises: 0 +layout: episode +questions: +- "What should we consider when designing software?" +objectives: +- "Understand the components of multi-layer software architectures." +keypoints: +- "Software architecture provides an answer to the question +'what components will the software have and how will they cooperate?'." +--- + +## Software Architecture + +**Software architecture** provides an answer to the question +"what components will the software have and how will they cooperate?". +Software engineering borrowed this term, and a few other terms, +from architects (of buildings) as many of the processes and techniques have some similarities. +One of the other important terms we borrowed is 'pattern', +such as in **design patterns** and **architecture patterns**. +This term is often attributed to the book +['A Pattern Language' by Christopher Alexander *et al.*](https://en.wikipedia.org/wiki/A_Pattern_Language) +published in 1977 +and refers to a template solution to a problem commonly encountered when building a system. + +Design patterns are relatively small-scale templates +which we can use to solve problems which affect a small part of our software. +For example, the **[adapter pattern](https://en.wikipedia.org/wiki/Adapter_pattern)** +(which allows a class that does not have the "right interface" to be reused) +may be useful if part of our software needs to consume data +from a number of different external data sources. +Using this pattern, +we can create a component whose responsibility is +transforming the calls for data to the expected format, +so the rest of our program doesn't have to worry about it. + +Architecture patterns are similar, +but larger scale templates which operate at the level of whole programs, +or collections or programs. +Model-View-Controller (which we chose for our project) is one of the best known architecture patterns. +Many patterns rely on concepts from [Object Oriented Programming](/object-oriented-programming/index.html). + +There are many online sources of information about design and architecture patterns, +often giving concrete examples of cases where they may be useful. +One particularly good source is [Refactoring Guru](https://refactoring.guru/design-patterns). + +### Multilayer Architecture + +One common architectural pattern for larger software projects is **Multilayer Architecture**. +Software designed using this architecture pattern is split into layers, +each of which is responsible for a different part of the process of manipulating data. + +Often, the software is split into three layers: + +- **Presentation Layer** + - This layer is responsible for managing the interaction between + our software and the people using it + - May include the **View** components if also using the MVC pattern +- **Application Layer / Business Logic Layer** + - This layer performs most of the data processing required by the presentation layer + - Likely to include the **Controller** components if also using an MVC pattern + - May also include the **Model** components +- **Persistence Layer / Data Access Layer** + - This layer handles data storage and provides data to the rest of the system + - May include the **Model** components of an MVC pattern + if they're not in the application layer + +Although we have drawn similarities here between the layers of a system and the components of MVC, +they are actually solutions to different scales of problem. +In a small application, a multilayer architecture is unlikely to be necessary, +whereas in a very large application, +the MVC pattern may be used just within the presentation layer, +to handle getting data to and from the people using the software. + +{% include links.md %} diff --git a/_extras/vscode.md b/_extras/vscode.md index 6796e7088..34b01b8a5 100644 --- a/_extras/vscode.md +++ b/_extras/vscode.md @@ -1,5 +1,5 @@ --- -title: "Additional Material: Using Microsoft Visual Studio Code" +title: "Using Microsoft Visual Studio Code" --- [Visual Studio Code (VS Code)](https://code.visualstudio.com/), not to be confused with [Visual Studio](https://visualstudio.microsoft.com/), diff --git a/fig/example-architecture-daigram.mermaid.txt b/fig/example-architecture-daigram.mermaid.txt new file mode 100644 index 000000000..c3ab99112 --- /dev/null +++ b/fig/example-architecture-daigram.mermaid.txt @@ -0,0 +1,18 @@ +graph TD + A[(GDrive Folder)] + B[(Database)] + C[GDrive Monitor] + C -- Checks periodically--> A + D[Download inflammation data] + C -- Trigger update --> D + E[Parse inflammation data] + D --> E + F[Perform analysis] + E --> F + G[Upload analysis] + F --> G + G --> B + H[Notify users] + I[Monitor database] + I -- Check periodically --> B + I --> H diff --git a/fig/example-architecture-diagram.svg b/fig/example-architecture-diagram.svg new file mode 100644 index 000000000..02a7ecceb --- /dev/null +++ b/fig/example-architecture-diagram.svg @@ -0,0 +1 @@ +
Checks periodically
Trigger update
Check periodically
GDrive Folder
Database
GDrive Monitor
Download inflammation data
Parse inflammation data
Perform analysis
Upload analysis
Notify users
Monitor database
\ No newline at end of file