-
Notifications
You must be signed in to change notification settings - Fork 1
Description
For example, it was discovered in #81 when correcting default color state for many series that BERN gives broken and crushed moist colors that are not parsed:
E.g.
When broken and crushed are the same: "Ap2--3 to 9 inches; brown (10YR 5/3) silt loam, dark brown (7.5YR 3/2) broken and crushed moist; "
When broken and crushed are different: "Bk2--34 to 47 inches; pale brown (10YR 6/3) silty clay loam, brown (7.5YR 4/2) broken and brown (10YR 4/3) crushed moist;"
| OSD | JSON |
|---|---|
| BERN | BERN.json |
There are about 400-500 horizons that have similar description that are not parsing because of the extra words before dry/moist.
Dealing with the first case is relatively easy, the second case is complex because there are multiple moist colors. We currently don't have a method to store both, so by convention it should be the first color given... but current patterns fail to pull either color due to the formatting. My preference would be to isolate the crushed colors and store them separately, with a preference for broken colors in the current color fields. Building this out would tie in well with attempts to isolate colors of horizon features, such as redox concentrations, which is greatly needed.
> res2 <- gsub(".*\\).*(broken and crushed|crushed|broken|smoothed|mixed)[ ,;.when]*(?:moist|dry).*[.;]*|.*", "\\1", y$narrative)
> table(res2) |> sort(decreasing = TRUE) |> head()
res2
crushed mixed smoothed
142637 445 1 1
>
> res3 <- gsub(".*\\).*(?:broken and crushed|crushed|broken|smoothed|mixed)[ ,;.when]*(moist|dry).*[.;]*|.*", "\\1", y$narrative)
> table(res3) |> sort(decreasing = TRUE) |> head()
res3
moist dry
142637 383 64
>
> res4 <- gsub(".*\\).*[ ,;.when]*(moist|dry).*[.;]*|.*", "\\1", y$narrative)
> table(res4) |> sort(decreasing = TRUE) |> head()
res4
moist dry
70017 52022 21045