-
Notifications
You must be signed in to change notification settings - Fork 0
FUND-2189 DRC: added validation on document taal which should be 3-char iso639 format #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FUND-2189 DRC: added validation on document taal which should be 3-char iso639 format #126
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds ISO 639-3 language code validation for the document "Taal" (language) field in the DRC (Documenten Registration Component). The validation replaces simple length constraints with proper format checking against the ISO 639-3 standard, ensuring that language codes are valid three-letter ISO codes.
Key changes:
- Implemented ISO 639-3 language code validation using an embedded resource file
- Replaced basic length validation with format and standard compliance validation
- Added automatic lowercase normalization for language codes during mapping
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
EnkelvoudigInformatieObjectRequestValidator.cs (3 versions) |
Updated validation rule to use ISO 639 validation instead of length check |
RequestToDomainProfile.cs (3 versions) |
Added lowercase transformation for language codes during mapping |
ZGW.Common.Web.csproj |
Configured ISO 639-3 tab file as embedded resource |
Iso639Helper.cs |
New helper class to load and validate ISO 639-3 language codes |
IRuleBuilderExtensions.cs |
New validation extension method for ISO 639 language codes |
iso-639-3.tab |
ISO 639-3 language code reference data (548 entries) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/OneGround.ZGW.Documenten.Web/MappingProfiles/v1/RequestToDomainProfile.cs
Outdated
Show resolved
Hide resolved
src/OneGround.ZGW.Documenten.Web/MappingProfiles/v1/5/RequestToDomainProfile.cs
Outdated
Show resolved
Hide resolved
src/OneGround.ZGW.Documenten.Web/MappingProfiles/v1/1/RequestToDomainProfile.cs
Outdated
Show resolved
Hide resolved
src/OneGround.ZGW.Documenten.Web/MappingProfiles/v1/1/RequestToDomainProfile.cs
Outdated
Show resolved
Hide resolved
src/OneGround.ZGW.Documenten.Web/MappingProfiles/v1/RequestToDomainProfile.cs
Outdated
Show resolved
Hide resolved
src/OneGround.ZGW.Documenten.Web/MappingProfiles/v1/5/RequestToDomainProfile.cs
Outdated
Show resolved
Hide resolved
…ith-2-letters-accepted
…rs-accepted' of https://github.com/OneGround/ZGW-APIs into bugfix/FUND-2189-DRC-taal-in-UPPERCASE-and-with-2-letters-accepted
…ith-2-letters-accepted
…ith-2-letters-accepted
| _logger.LogWarning( | ||
| "Language code mismatch: Request has {RequestLength}-character code '{RequestTaal}', but mapped to {MappedLength}-character code '{MappedTaal}' for ClientId: {ClientId}", | ||
| taalRequested.Length, | ||
| taalRequested, |
Check failure
Code scanning / CodeQL
Log entries created from user input High
user-provided value
This log entry depends on a
user-provided value
This log entry depends on a
user-provided value
This log entry depends on a
user-provided value
This log entry depends on a
user-provided value
This log entry depends on a
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 18 days ago
General approach: sanitize or normalize user-provided values before including them in log entries, particularly those that may contain control characters (newline, carriage return, tabs, etc.). For plain-text logs, a simple, robust approach is to remove or replace line breaks and other control characters so that even if an attacker supplies them, they cannot break log boundaries or inject misleading content.
Concrete fix here:
-
Introduce a small helper in
ZGWControllerBaseto sanitize strings for logging:- Replace or remove
\rand\n. - Optionally strip other non-printable control characters.
- Return
nullunchanged (so nullability semantics remain the same).
- Replace or remove
-
Use this helper in
LogInvalidTaalCodewhen loggingtaalRequestedandtaalMapped(both may ultimately be derived from user input):- Compute
var safeTaalRequested = SanitizeForLog(taalRequested); - Compute
var safeTaalMapped = SanitizeForLog(taalMapped); - Use these sanitized values in
_logger.LogWarning(...).
- Compute
-
Do not alter the existing logic determining when to log (the length comparison), nor the format of the message template or other log fields such as
clientId. This ensures existing functionality and log semantics are preserved.
All changes are confined to src/OneGround.ZGW.Common.Web/Controllers/ZGWControllerBase.cs in the shown region. No external dependencies are needed; the sanitization uses standard string.Replace and char APIs.
-
Copy modified lines R80-R102 -
Copy modified lines R109-R111 -
Copy modified line R115 -
Copy modified line R117
| @@ -77,18 +77,44 @@ | ||
| protected static HashSet<string> ExpandLookup(string expand) => | ||
| expand != null ? expand.Split(',', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries).ToHashSet() : []; | ||
|
|
||
| /// <summary> | ||
| /// Sanitize a string value so it is safe to include in log messages. | ||
| /// Removes CR/LF characters and other control characters that could be used | ||
| /// for log forging or to break log formatting. | ||
| /// </summary> | ||
| /// <param name="value">The original string value.</param> | ||
| /// <returns>A sanitized string safe for logging, or null if the input was null.</returns> | ||
| protected static string SanitizeForLog(string value) | ||
| { | ||
| if (value is null) | ||
| { | ||
| return null; | ||
| } | ||
|
|
||
| // Remove carriage return and line feed characters to prevent log forging. | ||
| var sanitized = value.Replace("\r", string.Empty).Replace("\n", string.Empty); | ||
|
|
||
| // Optionally strip any remaining non-printable control characters. | ||
| sanitized = new string(sanitized.Where(c => !char.IsControl(c)).ToArray()); | ||
|
|
||
| return sanitized; | ||
| } | ||
|
|
||
| // Note: Temporary log method. We should investigate who send the 2-letter language code so we log for these situations | ||
| protected void LogInvalidTaalCode(string taalRequested, string taalMapped) | ||
| { | ||
| if (taalRequested?.Length != taalMapped?.Length) | ||
| { | ||
| var clientId = User.Claims.FirstOrDefault(c => c.Type == "client_id")?.Value ?? "unknown"; | ||
| var safeTaalRequested = SanitizeForLog(taalRequested); | ||
| var safeTaalMapped = SanitizeForLog(taalMapped); | ||
|
|
||
| _logger.LogWarning( | ||
| "Language code mismatch: Request has {RequestLength}-character code '{RequestTaal}', but mapped to {MappedLength}-character code '{MappedTaal}' for ClientId: {ClientId}", | ||
| taalRequested.Length, | ||
| taalRequested, | ||
| safeTaalRequested, | ||
| taalMapped.Length, | ||
| taalMapped, | ||
| safeTaalMapped, | ||
| clientId | ||
| ); | ||
| } |
| var clientId = User.Claims.FirstOrDefault(c => c.Type == "client_id")?.Value ?? "unknown"; | ||
| _logger.LogWarning( | ||
| "Language code mismatch: Request has {RequestLength}-character code '{RequestTaal}', but mapped to {MappedLength}-character code '{MappedTaal}' for ClientId: {ClientId}", | ||
| taalRequested.Length, |
| "Language code mismatch: Request has {RequestLength}-character code '{RequestTaal}', but mapped to {MappedLength}-character code '{MappedTaal}' for ClientId: {ClientId}", | ||
| taalRequested.Length, | ||
| taalRequested, | ||
| taalMapped.Length, |
| foreach (var line in lines) | ||
| { | ||
| var cols = line.Split('\t'); | ||
|
|
||
| result.Add( | ||
| new Iso639Language | ||
| { | ||
| ISO6392Code = cols.ElementAtOrDefault(0), | ||
| ISO6391Code = cols.ElementAtOrDefault(1), | ||
| EnglishNameOfLanguage = cols.ElementAtOrDefault(2), | ||
| FrenchNameOfLanguage = cols.ElementAtOrDefault(3), | ||
| GermanNameOfLanguage = cols.ElementAtOrDefault(4), | ||
| } | ||
| ); | ||
| } |
| if (dictionary.TryGetValue(taal.ToLower(), out var result)) | ||
| { | ||
| return result; | ||
| } | ||
| else | ||
| { | ||
| return taal.ToLower(); | ||
| } |
Pull Request
Description
Type of Change
Related Issues
Testing
Checklist