This project analyzes three separate datasets with the following objectives:
- CMSU Student Survey: Use probability and conditional probability analysis to examine gender distribution, major choices, graduation intentions, and other factors among CMSU students.
- Shingle Moisture Analysis: Conduct hypothesis testing on moisture content in two types of asphalt shingles to ensure product quality.
- Salary Data Study: Use ANOVA to understand salary variations by education level and occupation type and test for interaction effects.
The project uses the following datasets:
- A+%26+B+shingles.csv: Contains moisture content data for two types of shingles (A and B).
- SalaryData.csv: Records salary information, educational qualifications, and occupation levels.
- Survey.csv: Responses from 62 CMSU undergraduate students to a 14-question survey.
- Objective: Calculate probabilities for various student characteristics, such as:
- Gender distribution and major choices among male and female students.
- Graduation intentions, GPA distribution, and employment status.
- Key Questions:
- What is the probability that a CMSU student is male or female?
- What is the conditional probability of specific majors among male and female students?
- What are the probabilities associated with GPA, graduation intentions, and computer ownership?
- Objective: Determine if the moisture content in shingles A and B meets quality standards.
- Key Questions:
- Is the average moisture content below the permissible limit (0.35 pounds/100 sq ft)?
- Are the mean moisture levels in shingles A and B equal?
- Objective: Assess the effect of educational qualification and occupation on salary using statistical tests.
- Key Questions:
- Is there a significant difference in salary based on education levels?
- Is there a difference in salary based on occupation?
- Does an interaction exist between education and occupation affecting salary?
The project employs several statistical methods:
- Probability & Conditional Probability: To explore gender, major choices, and other student characteristics.
- Hypothesis Testing (t-tests): For comparing mean moisture content in shingles A and B.
- ANOVA (Analysis of Variance): To analyze salary differences by education and occupation levels, and to assess any interaction effects.
- Gender Distribution: Calculated probabilities for selecting a male or female student.
- Major Preferences: Analyzed conditional probabilities for majors among male and female students.
- Other Insights: Insights on graduation intentions, GPA, and employment status among students.
- Moisture Content: Confirmed if shingles A and B met the moisture standards using hypothesis testing.
- Comparison: Tested for equal means between shingles A and B to ensure quality consistency.
- Salary by Education Level: Found significant differences in salary based on education.
- Salary by Occupation: Observed variation in salaries across occupation types.
- Interaction Effect: Discovered an interaction between education and occupation affecting salary outcomes, highlighting how education level impacts salary differently across occupations.
- A+%26+B+shingles.csv: Moisture content data for shingles A and B.
- SalaryData.csv: Data on salary, educational qualifications, and occupation.
- Survey.csv: Survey responses from CMSU students.
- AS_Extended_Project_Guided+_Template_Notebook+solution.ipynb: Jupyter notebook containing the data analysis and solution code.
- AS_EXTENDED+PROJECT (1).pdf: Business report summarizing the analysis, results, and business implications.