Skip to content

Conversation

@heuvea
Copy link
Contributor

@heuvea heuvea commented Dec 15, 2025

Pull Request

Description

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation

Related Issues

Testing

  • Tests pass
  • Manual testing completed

Checklist

  • Self-review completed
  • Documentation updated (if needed)

@heuvea heuvea added the Bug Something isn't working label Dec 15, 2025
Copilot AI review requested due to automatic review settings December 15, 2025 12:41
@heuvea heuvea added the Documenten API Related to Documenten (DRC) component label Dec 15, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds ISO 639-3 language code validation for the document "Taal" (language) field in the DRC (Documenten Registration Component). The validation replaces simple length constraints with proper format checking against the ISO 639-3 standard, ensuring that language codes are valid three-letter ISO codes.

Key changes:

  • Implemented ISO 639-3 language code validation using an embedded resource file
  • Replaced basic length validation with format and standard compliance validation
  • Added automatic lowercase normalization for language codes during mapping

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
EnkelvoudigInformatieObjectRequestValidator.cs (3 versions) Updated validation rule to use ISO 639 validation instead of length check
RequestToDomainProfile.cs (3 versions) Added lowercase transformation for language codes during mapping
ZGW.Common.Web.csproj Configured ISO 639-3 tab file as embedded resource
Iso639Helper.cs New helper class to load and validate ISO 639-3 language codes
IRuleBuilderExtensions.cs New validation extension method for ISO 639 language codes
iso-639-3.tab ISO 639-3 language code reference data (548 entries)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

_logger.LogWarning(
"Language code mismatch: Request has {RequestLength}-character code '{RequestTaal}', but mapped to {MappedLength}-character code '{MappedTaal}' for ClientId: {ClientId}",
taalRequested.Length,
taalRequested,

Check failure

Code scanning / CodeQL

Log entries created from user input High

This log entry depends on a
user-provided value
.
This log entry depends on a
user-provided value
.
This log entry depends on a
user-provided value
.
This log entry depends on a
user-provided value
.
This log entry depends on a
user-provided value
.
This log entry depends on a
user-provided value
.

Copilot Autofix

AI 18 days ago

General approach: sanitize or normalize user-provided values before including them in log entries, particularly those that may contain control characters (newline, carriage return, tabs, etc.). For plain-text logs, a simple, robust approach is to remove or replace line breaks and other control characters so that even if an attacker supplies them, they cannot break log boundaries or inject misleading content.

Concrete fix here:

  1. Introduce a small helper in ZGWControllerBase to sanitize strings for logging:

    • Replace or remove \r and \n.
    • Optionally strip other non-printable control characters.
    • Return null unchanged (so nullability semantics remain the same).
  2. Use this helper in LogInvalidTaalCode when logging taalRequested and taalMapped (both may ultimately be derived from user input):

    • Compute var safeTaalRequested = SanitizeForLog(taalRequested);
    • Compute var safeTaalMapped = SanitizeForLog(taalMapped);
    • Use these sanitized values in _logger.LogWarning(...).
  3. Do not alter the existing logic determining when to log (the length comparison), nor the format of the message template or other log fields such as clientId. This ensures existing functionality and log semantics are preserved.

All changes are confined to src/OneGround.ZGW.Common.Web/Controllers/ZGWControllerBase.cs in the shown region. No external dependencies are needed; the sanitization uses standard string.Replace and char APIs.


Suggested changeset 1
src/OneGround.ZGW.Common.Web/Controllers/ZGWControllerBase.cs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/OneGround.ZGW.Common.Web/Controllers/ZGWControllerBase.cs b/src/OneGround.ZGW.Common.Web/Controllers/ZGWControllerBase.cs
--- a/src/OneGround.ZGW.Common.Web/Controllers/ZGWControllerBase.cs
+++ b/src/OneGround.ZGW.Common.Web/Controllers/ZGWControllerBase.cs
@@ -77,18 +77,44 @@
     protected static HashSet<string> ExpandLookup(string expand) =>
         expand != null ? expand.Split(',', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries).ToHashSet() : [];
 
+    /// <summary>
+    /// Sanitize a string value so it is safe to include in log messages.
+    /// Removes CR/LF characters and other control characters that could be used
+    /// for log forging or to break log formatting.
+    /// </summary>
+    /// <param name="value">The original string value.</param>
+    /// <returns>A sanitized string safe for logging, or null if the input was null.</returns>
+    protected static string SanitizeForLog(string value)
+    {
+        if (value is null)
+        {
+            return null;
+        }
+
+        // Remove carriage return and line feed characters to prevent log forging.
+        var sanitized = value.Replace("\r", string.Empty).Replace("\n", string.Empty);
+
+        // Optionally strip any remaining non-printable control characters.
+        sanitized = new string(sanitized.Where(c => !char.IsControl(c)).ToArray());
+
+        return sanitized;
+    }
+
     // Note: Temporary log method. We should investigate who send the 2-letter language code so we log for these situations
     protected void LogInvalidTaalCode(string taalRequested, string taalMapped)
     {
         if (taalRequested?.Length != taalMapped?.Length)
         {
             var clientId = User.Claims.FirstOrDefault(c => c.Type == "client_id")?.Value ?? "unknown";
+            var safeTaalRequested = SanitizeForLog(taalRequested);
+            var safeTaalMapped = SanitizeForLog(taalMapped);
+
             _logger.LogWarning(
                 "Language code mismatch: Request has {RequestLength}-character code '{RequestTaal}', but mapped to {MappedLength}-character code '{MappedTaal}' for ClientId: {ClientId}",
                 taalRequested.Length,
-                taalRequested,
+                safeTaalRequested,
                 taalMapped.Length,
-                taalMapped,
+                safeTaalMapped,
                 clientId
             );
         }
EOF
@@ -77,18 +77,44 @@
protected static HashSet<string> ExpandLookup(string expand) =>
expand != null ? expand.Split(',', StringSplitOptions.TrimEntries | StringSplitOptions.RemoveEmptyEntries).ToHashSet() : [];

/// <summary>
/// Sanitize a string value so it is safe to include in log messages.
/// Removes CR/LF characters and other control characters that could be used
/// for log forging or to break log formatting.
/// </summary>
/// <param name="value">The original string value.</param>
/// <returns>A sanitized string safe for logging, or null if the input was null.</returns>
protected static string SanitizeForLog(string value)
{
if (value is null)
{
return null;
}

// Remove carriage return and line feed characters to prevent log forging.
var sanitized = value.Replace("\r", string.Empty).Replace("\n", string.Empty);

// Optionally strip any remaining non-printable control characters.
sanitized = new string(sanitized.Where(c => !char.IsControl(c)).ToArray());

return sanitized;
}

// Note: Temporary log method. We should investigate who send the 2-letter language code so we log for these situations
protected void LogInvalidTaalCode(string taalRequested, string taalMapped)
{
if (taalRequested?.Length != taalMapped?.Length)
{
var clientId = User.Claims.FirstOrDefault(c => c.Type == "client_id")?.Value ?? "unknown";
var safeTaalRequested = SanitizeForLog(taalRequested);
var safeTaalMapped = SanitizeForLog(taalMapped);

_logger.LogWarning(
"Language code mismatch: Request has {RequestLength}-character code '{RequestTaal}', but mapped to {MappedLength}-character code '{MappedTaal}' for ClientId: {ClientId}",
taalRequested.Length,
taalRequested,
safeTaalRequested,
taalMapped.Length,
taalMapped,
safeTaalMapped,
clientId
);
}
Copilot is powered by AI and may make mistakes. Always verify output.
var clientId = User.Claims.FirstOrDefault(c => c.Type == "client_id")?.Value ?? "unknown";
_logger.LogWarning(
"Language code mismatch: Request has {RequestLength}-character code '{RequestTaal}', but mapped to {MappedLength}-character code '{MappedTaal}' for ClientId: {ClientId}",
taalRequested.Length,
"Language code mismatch: Request has {RequestLength}-character code '{RequestTaal}', but mapped to {MappedLength}-character code '{MappedTaal}' for ClientId: {ClientId}",
taalRequested.Length,
taalRequested,
taalMapped.Length,
Comment on lines +39 to +53
foreach (var line in lines)
{
var cols = line.Split('\t');

result.Add(
new Iso639Language
{
ISO6392Code = cols.ElementAtOrDefault(0),
ISO6391Code = cols.ElementAtOrDefault(1),
EnglishNameOfLanguage = cols.ElementAtOrDefault(2),
FrenchNameOfLanguage = cols.ElementAtOrDefault(3),
GermanNameOfLanguage = cols.ElementAtOrDefault(4),
}
);
}
Comment on lines +139 to +146
if (dictionary.TryGetValue(taal.ToLower(), out var result))
{
return result;
}
else
{
return taal.ToLower();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Documenten API Related to Documenten (DRC) component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants