Skip to content

gkdevops/python-data-engineer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Data Engineer Learning Repository

Welcome to the Python Data Engineer learning repository! This repo contains a structured, practical set of Jupyter notebooks and example projects for learning core Python concepts with a focus on data engineering.

Note: This summary is based on the top-level files; for a full list of all tutorials and scripts, check the GitHub repository contents.


📚 Topics Covered

  • Overview: Introduction to Python, variables, data types, and basic operations.
  • Key Concepts:
    • Printing and string manipulation
    • Variable assignment and naming
    • Numeric, string, and boolean data types
    • Type conversion, built-in functions, and string methods
    • List basics and common list operations

  • Overview: Mastering conditional statements for decision making.
  • Key Concepts: if, elif, else, comparison and logical operators, nested conditions.

  • Overview: Using loops to automate repetitive tasks.
  • Key Concepts: for and while loops, loop control (break, continue, pass), iterating collections.

  • Overview: Writing reusable blocks of code with functions.
  • Key Concepts: Defining and calling functions, parameters, return values, scope, lambda functions.

  • Overview: Using operators to manipulate data.
  • Key Concepts: Arithmetic, assignment, comparison, logical, bitwise, and membership operators.

  • Overview: Mastering data structures for efficient storage and retrieval.
  • Key Concepts: Lists, tuples, sets, dictionaries, and real-world examples.

  • Overview: Organizing and reusing code with modules and packages.
  • Key Concepts:
    • Difference between modules, packages, and libraries
    • Importing and using built-in and external libraries (e.g., Pandas, NumPy, Matplotlib, Requests)
    • Creating custom modules and packages

  • Overview: File handling (text & CSV) and JSON management for configuration and data exchange. (JSON content previously listed separately has been merged into this section.)
  • Key Concepts:
    • Reading and writing text and CSV files using built-in modules and pandas
    • Using os and shutil for file and directory operations
    • Reading, writing, parsing, and serializing JSON with Python’s json module
    • Data extraction and ingestion from files and JSON API responses
    • Error handling and path management

  • Overview: Object-oriented programming in Python.
  • Key Concepts:
    • Defining classes and creating objects
    • Constructors (__init__)

  • Overview: Working with randomness, generating random numbers and data for testing and simulations.
  • Key Concepts: random module, faker library, random sampling and anonymization.

  • Overview: Reusable scripts and code blocks for modular data engineering workflows.
  • Key Concepts: Encapsulating logic in functions and scripts, templates for batch processing.

  • Overview: Logging and monitoring data engineering processes.
  • Key Concepts: Python’s logging module, log formats, levels, handlers, and best practices.

  • Overview: Common Python interview questions and concise answers, focusing on practical explanations.

  • Overview: Example Streamlit apps and instructions to run them.
  • Key Concepts: Installing Streamlit, building and running simple apps, basic visualization with Plotly.

  • Overview: End-to-end projects to apply learned concepts.
  • Contents: Project ideas, example implementations, and deployment notes.

📎 How to Use This Repo

  1. Browse Notebooks: Start with the Jupyter notebooks in the main directory for a structured learning path.
  2. Explore Directories: Check out the additional folders for sample scripts, data, and projects.
  3. Try the Code: Run the notebooks locally or in an online Jupyter environment.
  4. Contribute: Pull requests to add new topics or improve examples are welcome!

🔗 Explore More


About

Learn Python language for beginners in Data Analytics and Big Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published