A targets pipeline for building SSURGO databases with DuckDB.
This R project provides a reproducible pipeline for processing and building SSURGO (Soil Survey Geographic Database) databases using the targets package and DuckDB.
- Reproducible Workflows: Built on the
targetspackage for reliable, efficient, scalable data pipelines - DuckDB Integration: Leverages DuckDB for columnar data storage and querying of spatial and tabular data
- R-based Pipeline: Written entirely in R, the project leverages the soilDB package for downloading data and creating the database
To get started, ensure you have R installed, then clone this repository to a folder of your choice.
You want to be sure to have a local repository instance; by default the downloaded data will be stored in "./data/" folder.
The repository is set up as an R package, mainly to manage dependencies. You can install dependencies using remotes::install_deps(). You do not actually need to install the 'SSURGO' R package to run the pipeline, but the dependencies must be present.
if (!requireNamespace("remotes")) install.packages("remotes")
setwd("path/to/SSURGO")
remotes::install_deps()Once you have dependencies installed, run SSURGO.R to generate a fresh _targets.R file.
You can modify the soil survey areas to include in the database in the first four targets. The default setup assumes you are creating a database with all US States, but you can choose any subset of one or more states, or any alternative method to create the ssas target (a character vector of area symbols).
source("SSURGO.R")This project uses the targets package to manage the pipeline.
To run the workflow, be sure your working directory is the ./SSURGO/ folder containing _targets.R.
# Load the targets library
library(targets)
# View the pipeline
tar_visnetwork() # Visualize the pipeline DAG
# Run the pipeline
tar_make()SSURGO/
|-- _targets.R # Main targets pipeline configuration (generated by SSURGO.R)
|-- SSURGO.R # Entry point for `tar_script()` _targets.R generation
|-- R/ # Core R functions and wrappers
|-- man/ # Documentation files
|-- DESCRIPTION # Package metadata
|-- NAMESPACE # Package namespace
|-- README.md # This file
- targets: Workflow orchestration
- duckdb: In-process SQL database engine
- R (>= 4.0.0 recommended)
See the DESCRIPTION file for complete dependency information.
The pipeline follows a structured approach:
- Data Ingestion: Download and prepare SSURGO data sources
- Database Building: Construct optimized DuckDB databases
- Output: Generate final database artifacts
Please raise any issues on the Issue Tracker.
This project is licensed under the terms specified in LICENSE.md.
Andrew G. Brown (@brownag)