Payer Medallion Lakehouse Training Project with Lakeflow Declarative Pipelines 🚀

This repo uses Databricks Lakeflow Declarative Pipelines for end-to-end, production-grade ETL with minimal operational overhead. The project is perfect for data engineers, analysts, and healthcare professionals looking to ramp up on both modern lakehouse technology and payer-specific analytics.

Setting up & running

Important

This bundle uses Serverless compute, so make sure that it's enabled for your workspace (works on Databricks Free Edition as well). If it's not, then you need to adjust parameters of the job and DLT pipelines!

You can install the project two ways:

Using Databricks Assset Bundles (DABs) inside the Databricks Workspace (recommended):
Using DABs from the command line of your computer

Setting it up using DABs in workspace

Create a Git Folder inside your Databricks workspace by cloning this repository.
Open the payer_dlt/databricks.yaml inside create Git Folder.
Adjust the following parameters inside the databricks.yaml (create necessary objects before use):

catalog_name - the name of the existing UC Catalog used in configuration.
bronze_schema_name - the name of an existing UC Schema to put raw data.
silver_schema_name - the name of an existing UC Schema to put tables with transformed data.
gold_schema_name - the name of an existing UC Schema to put tables with reporting data.

Click Deploy button in the Deployments tab on the left - this will create necessary jobs and pipelines
Click Run button next to the DLT Payer Demo: Setup job.
Click Start pipeline for DLT pipelines to process data and run detections (in the following order):

DLT Payer Demo: Ingest Bronze data
DLT Payer Demo: Ingest Silver data
DLT Payer Demo: Ingest Gold data

Setting it up using DABs locally

Install the latest version of Databricks CLI.
Authenticate to your Databricks workspace, if you have not done so already:

databricks configure

Set environment variable DATABRICKS_CONFIG_PROFILE to the name of Databricks CLI profile you configured, and configure necessary variables in the dev profile of databricks.yml file. You need to specify the following (create necessary objects before use):

catalog_name - the name of the existing UC Catalog used in configuration.
bronze_schema_name - the name of an existing UC Schema to put raw data.
silver_schema_name - the name of an existing UC Schema to put tables with transformed data.
gold_schema_name - the name of an existing UC Schema to put tables with reporting data.

To deploy a development copy of this project, type:

databricks bundle deploy

Run a job to set up the normalized tables and download sample log files:

databricks bundle run dlt_payer_demo_setup

Run DLT pipelines to ingest data in bronze, silver and gold tiers:

databricks bundle run ingest_payer_bronze_data
databricks bundle run ingest_payer_bronze_data_silver
databricks bundle run ingest_payer_silver_data_gold

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
resources		resources
src		src
README.md		README.md
databricks.yml		databricks.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Payer Medallion Lakehouse Training Project with Lakeflow Declarative Pipelines 🚀

Setting up & running

Setting it up using DABs in workspace

Setting it up using DABs locally

About

Uh oh!

Releases

Packages

Languages

bigdatavik/payer_dlt

Folders and files

Latest commit

History

Repository files navigation

Payer Medallion Lakehouse Training Project with Lakeflow Declarative Pipelines 🚀

Setting up & running

Setting it up using DABs in workspace

Setting it up using DABs locally

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages