Skip to content

[OSD] parse soil color sample prep and multiple colors #82

@brownag

Description

@brownag

For example, it was discovered in #81 when correcting default color state for many series that BERN gives broken and crushed moist colors that are not parsed:

E.g.

When broken and crushed are the same: "Ap2--3 to 9 inches; brown (10YR 5/3) silt loam, dark brown (7.5YR 3/2) broken and crushed moist; "

When broken and crushed are different: "Bk2--34 to 47 inches; pale brown (10YR 6/3) silty clay loam, brown (7.5YR 4/2) broken and brown (10YR 4/3) crushed moist;"

OSD JSON
BERN BERN.json

There are about 400-500 horizons that have similar description that are not parsing because of the extra words before dry/moist.

Dealing with the first case is relatively easy, the second case is complex because there are multiple moist colors. We currently don't have a method to store both, so by convention it should be the first color given... but current patterns fail to pull either color due to the formatting. My preference would be to isolate the crushed colors and store them separately, with a preference for broken colors in the current color fields. Building this out would tie in well with attempts to isolate colors of horizon features, such as redox concentrations, which is greatly needed.

> res2 <- gsub(".*\\).*(broken and crushed|crushed|broken|smoothed|mixed)[ ,;.when]*(?:moist|dry).*[.;]*|.*", "\\1", y$narrative)
> table(res2) |> sort(decreasing = TRUE) |> head()
res2
          crushed    mixed smoothed 
  142637      445        1        1 
> 
> res3 <- gsub(".*\\).*(?:broken and crushed|crushed|broken|smoothed|mixed)[ ,;.when]*(moist|dry).*[.;]*|.*", "\\1", y$narrative)
> table(res3) |> sort(decreasing = TRUE) |> head()
res3
        moist    dry 
142637    383     64 
> 
> res4 <- gsub(".*\\).*[ ,;.when]*(moist|dry).*[.;]*|.*", "\\1", y$narrative)
> table(res4) |> sort(decreasing = TRUE) |> head()
res4
moist         dry 
70017 52022 21045 

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions