-
Load the CSV file - Result is a Pandas dataframe
-
Input:
- Critic Score (
FloatwithNaNs) - Publisher (
Catagory) - User Score (
FloatwithNaNs) - Genre (
Catagory) - Name (
String) - User Count (
FloatwithNaNs) - Critic Count (
FloatwithNaNs)
- Critic Score (
-
Output: Global Sales (
Float)
-
Input:
-
Tokenize Names - Bag of Words
- The loop that builds the vocab then counts how many times that is
Put the Bag-of-Words code here later
-
Result: a list of
Ints and how manyInts in that list is the size of the$n_b$ (this is vocab)- You will often see something like this - max vocab: 200 words words(drop infrequent) - This will keep the 200 most common words and drop the ones that don’t show up a lot
NOTE: Sometimes you may see an
errorthat comes up that looks something like this:Mat1 * Mat2, Shapes (1007, 2) and (4, 97)This is why the shape of you data matters
-
Collect results from all rows (16720,
$n_b$ ) - Save as tensor and callt_names -
Take
user_scorecolumn- Options:
- Remove rows with
NaNs - this Pandas function isdropna() - Imputation: inject mission values
- Use the average value to fill in all missing values - the Pandas function is
fillna() - Make it zero
- Train a model on the other columns to predict the
user_score(do not useglobal_sales)
- Use the average value to fill in all missing values - the Pandas function is
- Remove rows with
- Result: tensor(16720, 1)
- Options:
-
Take
publishercolumn - One-Hot encoding without having to tokenize it - this Pandas function isget_dummies()-
Result: (16720,
$n_p$ ) -$p$ for publisher
-
Result: (16720,
-
Take
genre-
Result: 916720,
$n_g$ ) -$g$ for genre
-
Result: 916720,
-
Figure out the total size
torch.cat([t_names, t_user_score, ...])
-
Result: (16720,
$n_{v} + 1 + 1 + 1 + n_p + n_g$ )
-
Result: (16720,
-
Build a neuron
model = nn.Linear(input_dim, 1)
Where
input_dimrefers to the number of input features that the model, expects to receive for each individual sample in the dataset.
-
Notifications
You must be signed in to change notification settings - Fork 0
Mattrobby/Predicting-Sales
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Using a Neral Net to predict the sales output based on data from a CSV file.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published