Review pass

craigsdennis · craigsdennis · commit f87fb1d84551 · 2018-10-09T17:19:35.000-07:00
diff --git a/s2n7-handling-missing-and-duplicated-data.ipynb b/s2n7-handling-missing-and-duplicated-data.ipynb
@@ -1,12 +1,5 @@
 {
  "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/treehouse-projects/python-introducing-pandas/master?filepath=s2n5-handling-missing-and-duplicated-data.ipynb)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -44,7 +37,7 @@
     "transactions = pd.read_csv(os.path.join('data', 'transactions.csv'), index_col=0)\n",
     "requests = pd.read_csv(os.path.join('data', 'requests.csv'), index_col=0)\n",
     "\n",
-    "# Perform the merge from the previous notebook (s2n4-combining-dataframes.ipynb)\n",
+    "# Perform the merge from the previous notebook (s2n6-combining-dataframes.ipynb)\n",
     "successful_requests = requests.merge(\n",
     "    transactions,\n",
     "    left_on=['from_user', 'to_user', 'amount'], \n",
@@ -64,9 +57,11 @@
    "source": [
     "## Duplicated Data\n",
     "\n",
-    "We realized in our the previous notebook (s2n4-combining-dataframes.ipynb) that the **`requests`** `DataFrame` had duplicates. Unfortunately this means that our **`successful_requests`** also contains duplicates because we merged those same values with a transaction, even though in actuality, only one of those duplicated requests should be deemed \"successful\".\n",
+    "We realized in our the previous notebook (s2n6-combining-dataframes.ipynb) that the **`requests`** `DataFrame` had duplicates. Unfortunately this means that our **`successful_requests`** also contains duplicates because we merged those same values with a transaction, even though in actuality, only one of those duplicated requests should be deemed \"successful\".\n",
     "\n",
-    "We should correct our `DataFrame` by removing the duplicate requests, keeping only the last one, as that is really the one that triggered the actual transaction. The great news is that there is a method named [`drop_duplicates`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html) that does just that. Like `duplicated` there is a `keep` parameter that works similarly, you tell it which of the duplicates to keep. "
+    "We should correct our `DataFrame` by removing the duplicate requests, keeping only the last one, as that is really the one that triggered the actual transaction. The great news is that there is a method named [`drop_duplicates`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html) that does just that. \n",
+    "\n",
+    "Like `duplicated` there is a `keep` parameter that works similarly, you tell it which of the duplicates to keep. "
    ]
   },
   {
@@ -88,8 +83,10 @@
    "source": [
     "# Let's get our records sorted chronologically\n",
     "successful_requests.sort_values('request_date', inplace=True) \n",
+    "\n",
     "# And then we'll drop dupes keeping only the last one. Note the call to inplace \n",
     "successful_requests.drop_duplicates(('from_user', 'to_user', 'amount'), keep='last', inplace=True)\n",
+    "\n",
     "# Statement from previous notebook\n",
     "\"Wow! ${:,.2f} has passed through the request system in {} transactions!!!\".format(\n",
     "    successful_requests.amount.sum(),\n",
@@ -363,7 +360,7 @@
    "source": [
     "## Locating Missing Data\n",
     "\n",
-    "As I was looking at these people who hadn't made requests I noticed that a few of them had a Not A Number (`np.nan`) for a **`last_name`**.\n",
+    "As I was looking at these people who hadn't made requests I noticed that a few of them had a NaN (Not A Number) for a **`last_name`**.\n",
     "\n",
     "We can get a quick overview of how many blank values we have by using the [`DataFrame.count`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html)\n"
    ]
@@ -529,7 +526,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [
     {
@@ -573,14 +570,15 @@
        "Index: []"
       ]
      },
-     "execution_count": 7,
+     "execution_count": 9,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "# Make a copy of the DataFrame with \"Unknown\" as the last name where it is missing\n",
     "users_with_unknown = users.fillna('Unknown')\n",
+    "\n",
     "# Make sure we got 'em all\n",
     "users_with_unknown[users_with_unknown.last_name.isna()]"
    ]
@@ -598,7 +596,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [
     {
@@ -607,13 +605,14 @@
        "(475, 430)"
       ]
      },
-     "execution_count": 9,
+     "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "users_with_last_names = users.dropna()\n",
+    "\n",
     "# Row counts of the original \n",
     "(len(users), len(users_with_last_names))"
    ]