Syllabus

Schedule a time to visit with me, if needed.
You might enjoy reading my course and teaching paradigm

Overview

By the end of the semester, each student will be able to:

Explore and interpret distributed data at scale in business contexts, building upon previously learned data science methodologies (E.g. Databricks and Pyspark).

Implement the data engineering pipeline from API ingestion through feature engineering to containerized applications.

Present data-informed arguments for technical team decision-making.

Identify the differences and benefits of current industry technologies for big data storage and analysis.

The course follows these principles of teaching Data Science;

Organize the course around a set of diverse case studies

Integrate computing into every aspect of the course

Teach abstraction, but minimize reliance on mathematical notation

Structure course activities to realistically mimic a data scientist’s experience

Demonstrate the importance of critical thinking/skepticism through examples

Nice general outcomes. What tools are we using?

We use Polars, Pyspark and SparkSQL within Docker, the Google Cloud Platform (GCP) and with Databricks. We will also leverage git and Github heavily in our class and team collaborations.

What are the assignments?

The semester is relatively open-ended regarding the work you submit for evaluation. We will generally work in groups, with each class participant submitting individual work at various times throughout the semester.

Review the current projects here.

Often, the following challenges occur.

Your tools coding challenge (data munging and visualization)
Spark Team Feature Build Challenge (with team)
Spark Coding Challenges (data munging and feature creation)
In class, written code challenges (yes, a paper and pencil)
Analytics Deployment App (Streamlit or Marimo, Docker, and GCP)
Team 30-120 minute tools and code training

Is this course a data engineering course?

It is a 'big data programming and analytics' course for data scientists. We use the same tools as data engineers; however, we focus on how data scientists apply these big data tools for management and business decision-making.

Data engineering is a ‘big client’ (building pipelines and tools that touch 1000's) with small daily changes (refining systems and delivering quicker results). In contrast, data science is a 'small client' (addressing the needs of 10s in management) with ‘big change’ in modeling and data sources (proposing the latest methods and demoing the data munging and value).

A data engineer would spend more time talking with IT and CS partners. Also, they would interact heavily with the data scientists. The data engineer would translate for the data scientist into the IT and CS domain, and the data scientist would translate for the data engineer into the business and business needs domain. A data scientist would spend less time talking with IT and CS than a data engineer.

With that said, the course is sufficiently open-ended to propose additional data engineering applications if they meet the needs of the larger project we are tackling.

Competency Assumptions

We assume that you have experience using data science programming in Python as practiced in DS 250. You will also need a background in data science programming in R as practiced in DS 350 or experience with Machine Learning as practiced in CSE 450. You can see all the prerequisites at the BYU-I Catalog

Course Format

This course assumes that you are capable of guided learning and working in teams.

What does guided learning and working in teams mean?

The class runs like a start-up. We will work together to solve big-data problems as a ‘company.’ We are mandated to learn ‘big data programming’ and tackle complex data science problems. At the end of the semester, we should all feel more comfortable with Polars, Visualization, PySpark, Spark SQL, Docker, GCP, and Databricks. Our team will choose how we get from week 1 to week 13. You should not expect anything about this class to be ‘traditional’ in the context of academia. For example,

We will all take turns providing guides on how to use the tools
We will work in smaller teams, but as a class, to make decisions about our projects and work
The class will be treated as working group meetings to mimic my industry experience as much as possible
If you need someone to give you due dates and precisely what you should read or do each night, this class will push you into a new paradigm for education. You can read student feedback to see how some have responded to the process of this course.

Hopefully, you will see how your previous data science and design thinking courses provide a foundation for building, learning, and developing with big data tools, gaining empathy for our data and clients, ideating proposed solutions, and prototyping our end products.

You can read more about the design thinking process to better understand what will occur this semester.

Preparation

You are not assigned weekly readings. However, you are expected to spend 6 hours outside class improving your Spark skills. You are more than welcome to find your own resources if you don't want to leverage our curated list of resources. You can also work in your teams to create a study timeline to manage your resources. You are expected to pace yourself and set a learning timeline.

Class Time

The goal is to avoid traditional in-class lectures. We will use class time for the following team activities.

Presentation development: Almost all work will be done with a partner or in teams. Each project done during the class will require a presentation.
Decision point presentations: Generally, these presentations will focus on class decision points where all groups will agree on a joint approach moving forward.
Individual development projects:
Programming training: As we decide on the learning proposals, the smaller student teams will take responsibility for developing a short activity for the class.

Presentations

These presentations are not expected to be high-impact proposals with highly polished slides. However, they should be organized and clear as your slides will persuade the class to move with your group's decision.

You can read more about small group presentations to ensure your team is prepared.

Learning and Training

Each partner/group will provide one 15-120 minute training on the class-selected learning topics. These presentations should include a hands-on coding activity and be self-contained within our devotional GitHub repository within our DS 460 GitHub organization.

Grading

The grading system's influence on our thinking is a side effect of mass learning and academia. We are in a class at an accredited university and will have to manage this side effect. However, we don’t have to let it control our learning, thinking, or work. Discovering and practicing pertinent industry skills should motivate each activity.

Class performance is tracked in four areas: impact, involvement, hours, and understanding. These areas generally map to how your future employer will value you. Each area is essential to maximizing perceived performance, but not all areas need to be exceptional to earn the highest marks in this course or to succeed in the industry.

Impact

If your team doesn't understand why they need your services, they will eventually not need you.

Concept: Your team will make decisions and assignments. Ensure your team feels that you are an equal contributor. Contributors are measured by the extent to which they assume responsibilities and deliver on them. It is ok to contribute more to some projects and less to others, but your team should feel like you typically make significant contributions.
Class: A primary contributor is defined by providing at least as much material and results for the project as 50% of the group members. An active contributor is a team member who makes some impact and is involved in the project life cycle.

Involvement

If your team and manager don't see and hear your ideas and work, they will question your leadership and interest.

Concept: Do your work before the class meetings and come prepared to listen and direct the planning. Class meetings are not a time to remain silent out of politeness or to avoid appearing foolish. Get involved, ask questions, and provide answers.
Class: This element is harder to explain the specifics on what should be done. I will contact you directly if you are not meeting expectations in your group involvement.

Hours

Putting in the time is the best predictor of success

Concept: Most employers expect you to work many hours each week. If they only wanted specified products, they would hire consultants to deliver the product. As a full-time employee, you will be given the space to explore new domains and then guide the group in their implementation. However, you must guide your work. As a data scientist, each day will have new and unique challenges.
Class: Full-time employment for a 3-credit class at BYU-I is 9 hours a week (6 outside & 3 inside class). Putting in full hours all semester will be a crucial element in defining your final grade. Excellent performance in the other three areas could help you achieve the highest marks without meeting the total hours (Generally, you will need to put in hours to do well on the other three).

Understanding

You should know how to do things. But not everything.

Concept: When you are on a team, you should earn a reputation for knowledge in a few specific areas. You want to be the person that everyone knows they can ask to get the correct answer. You can find your niche and hone your skills. You should find moments to offer your help in these areas.
Class: We will have coding challenges during the semester. Some will take multiple days, the entire class period, or a few minutes before we start class. All challenges will be announced at least 24 hours before the class period they occur, along with a programming topic.
Class: We will choose assignments from DS 350 and CSE 450 to replicate using medium and big data APIs we are learning in this class. You will need to complete all assigned replication projects.
Class: Training devotionals must be provided by each student.

Competency Scale

The tables below summarize the specifications-based grading for the course. You should read the details below for further understanding.

Grade	Hours	Challenges	Oral/Written Challenge	Replication	Involvement	Impact
A	110	4 key* & 3 or higher	pass 3	All complete	< 3 warnings & < 3.1 hours class missing	Active all & primary > 2
B	90	3 key* & 3's on most	pass 2	< 2 missing	< 9.1 hours class missing or write-up	Active most & primary > 1
C	70	3 anytime	pass 1	< 3 missing	< 4 warnings	Active often & primary > 0
D	50	--	--	--	--	--

*Key challenges are any Pyspark challenges and the app challenge at the end of the semester. *Replication projects may or may not happen during the semester. If none happen, then you have completed them all.

A Details:

Hours: 110
Challenges: A satisfactory score (3) on all the challenges and at least a near perfect score (3.7 or higher) on the key challenges. All challenges must be completed.
Replication: All replication assignments completed with full credit.
Involvement: Two or fewer conversations from me or the TA about your lack of participation or preparedness. Missing class less than three times.
Impact: Multiple projects where you were recognized as a primary contributor (making more impact than half of your team). All projects include your fingerprints.

B Details:

Hours: 90
Challenges: A satisfactory score on more than half of the challenges and all key challenges.
Replication: All but one replication assignments completed with full credit.
Involvement: Missing more than 9 hours of class or getting a write-up for low engagement.
Impact: At least one project where you were recognized as a primary contributor (making more impact than half of your team), and most projects include your fingerprints.

C Details:

Hours: 70
Challenges: A satisfactory score on at least one coding challenge.
Replication: All but two replication assignments completed with full credit.
Involvement: Three or fewer conversations from me or the TA about your lack of participation or preparedness.
Impact: At least one project where you were recognized as a primary contributor (making more impact than half of your team). Significant participation in at least half of the projects.

D Details:

Hours: 50

Coding Challenges & Replication Projects

The coding challenges and replication projects will be graded on a four-point scale:

Submitted work.
Some code aligns with the challenge.
Strong performance with satisfactory code.
Near flawless performance with clean and concise code.

Grade Request Letter

At the end of the semester, you will need to submit a completed grade request letter. This may be a new concept to some of you. Please review the example of a poorly worded letter with a discussion.

Negotiating Competency Grade

If you feel you have greatly exceeded one of the competency areas, you can use that excess to negotiate a shortcoming in a different competency. Here are a few examples you could argue (These are example arguments and are not intended to signify a path to the grade requested).

I achieved only a satisfactory score on my final coding challenge, but I completed 119 hours and was a key contributor to 5 projects. As such, I request an A.

I was only recognized as a key contributor on one project. However, I worked 107 hours and stayed involved in all work during class. As such, I request a B.

I worked only 50 hours in this class. However, I got all 3s on my coding challenges and a 4 on the final coding challenge. Also, I was a key contributor on 5 projects and never missed class. I request an A-.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
feature_engineering		feature_engineering
img		img
CSE451_Hex_v4.png		CSE451_Hex_v4.png
DT-IT.pdf		DT-IT.pdf
README.md		README.md
course_paradigm.md		course_paradigm.md
design_thinking_agile.md		design_thinking_agile.md
grade_request_letter.md		grade_request_letter.md
learning_plan.md		learning_plan.md
presentations.md		presentations.md
resources.md		resources.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Syllabus

Overview

Nice general outcomes. What tools are we using?

What are the assignments?

Is this course a data engineering course?

Competency Assumptions

Course Format

What does guided learning and working in teams mean?

Preparation

Class Time

Presentations

Learning and Training

Grading

Impact

Involvement

Hours

Understanding

Competency Scale

Coding Challenges & Replication Projects

Grade Request Letter

Negotiating Competency Grade

Links

About

Uh oh!

Releases

Packages

byuibigdata/course_guide

Folders and files

Latest commit

History

Repository files navigation

Syllabus

Overview

Nice general outcomes. What tools are we using?

What are the assignments?

Is this course a data engineering course?

Competency Assumptions

Course Format

What does guided learning and working in teams mean?

Preparation

Class Time

Presentations

Learning and Training

Grading

Impact

Involvement

Hours

Understanding

Competency Scale

Coding Challenges & Replication Projects

Grade Request Letter

Negotiating Competency Grade

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages