|
1 | 1 | { |
2 | 2 | "cells": [ |
3 | | - { |
4 | | - "cell_type": "markdown", |
5 | | - "metadata": {}, |
6 | | - "source": [ |
7 | | - "[](https://mybinder.org/v2/gh/treehouse-projects/python-introducing-pandas/master?filepath=s2n4-combining-dataframes.ipynb)" |
8 | | - ] |
9 | | - }, |
10 | 3 | { |
11 | 4 | "cell_type": "markdown", |
12 | 5 | "metadata": {}, |
|
39 | 32 | "from datetime import datetime\n", |
40 | 33 | "import os\n", |
41 | 34 | "\n", |
42 | | - "import numpy as np\n", |
43 | 35 | "import pandas as pd\n", |
44 | 36 | "\n", |
45 | 37 | "from utils import render\n", |
|
194 | 186 | "cell_type": "markdown", |
195 | 187 | "metadata": {}, |
196 | 188 | "source": [ |
197 | | - "Ideally, I'd like to see all the requests that have a matching transaction based on the users and the amount involved.\n", |
| 189 | + "I'd like to see all the requests that have a matching transaction based on the users and the amount involved.\n", |
198 | 190 | "\n", |
199 | 191 | "In order to do this we will merge both of our datasets together. \n", |
200 | 192 | "\n", |
|
338 | 330 | "source": [ |
339 | 331 | "## Gather Insights\n", |
340 | 332 | "\n", |
341 | | - "So looking at this data merged together, I'd like to see the difference between when the request was made, and when the money was actually received.\n", |
| 333 | + "So looking at this data merged together, I'd like to see the time difference between when the request was made, and when the money was actually received.\n", |
342 | 334 | "\n", |
343 | 335 | "Good news for us, pandas has very powerful date/time functionality, but in order to get there we're going to need to convert our columns. As you can see, the CSV import did not recognize our date field. **`sent_date`** and **`request_date`** are just plain old objects." |
344 | 336 | ] |
|
588 | 580 | "## Further research\n", |
589 | 581 | "I saw something a little strange as I was looking through those **`successful_requests`**, it noticed a couple of what seemed like duplicated requests. I called my contact at CashBox and asked about possible duplicate requests. Sure enough, the application allows you to send multiple requests for the same amount. \n", |
590 | 582 | "\n", |
591 | | - "So this means there are probably duplicates in our **`successful_requests`** `DataFrame` because there are duplicates in the **`requests`**. There most like is only one transaction that fulfills the request, but there could be multiple requests that match. Our merge created duplication as well.\n", |
| 583 | + "So this means there are probably duplicates in our **`successful_requests`** `DataFrame` because there are duplicates in the **`requests`**. There is most likely only one transaction that fulfills the request, but there could be multiple requests that match. Our merge brought that duplication across as well.\n", |
592 | 584 | "\n", |
593 | 585 | "Let's explore the possible duplicates in the **`requests`** `DataFrame`. There is a method [`DataFrame.duplicated`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html) that will return a boolean `Series` which we can use as an index. A `keep` parameter is available which is used to choose which of the duplicated rows to mark as a duplicate. You can mark the first, last or all of them." |
594 | 586 | ] |
|
0 commit comments