Skip to content

Conversation

@azkrishpy
Copy link
Contributor

@azkrishpy azkrishpy commented Oct 29, 2025

Issue #, if available:
#806
Description of changes:
Exposes S3 Request Metrics from the CRT S3 Client.

  • S3RequestMetrics: Captures timing, request/response info, and error details for each S3 request attempt
  • onTelemetry() callback: New method in S3MetaRequestResponseHandler invoked after each request completes
  • ErrorType classification: Categorizes errors (SUCCESS, THROTTLING, SERVER_ERROR, CONFIGURED_TIMEOUT, IO, OTHER)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@azkrishpy
Copy link
Contributor Author

Java SDK users can use a custom MetricPublisher that bridges CRT metrics to SDK's MetricCollection:

public class CrtMetricPublisher implements MetricPublisher {
    private final MetricPublisher delegate;

    public CrtMetricPublisher(MetricPublisher delegate) {
        this.delegate = delegate;
    }

    public void publishCrtMetrics(S3RequestMetrics crtMetrics) {
        MetricCollection collection = MetricCollection.builder()
            .name("S3Request")
            .creationTime(Instant.now())
            .putMetric(CoreMetric.API_CALL_DURATION, 
                Duration.ofNanos(crtMetrics.getApiCallDurationNs()))
            .putMetric(CoreMetric.SERVICE_ID, crtMetrics.getServiceId())
            .putMetric(CoreMetric.OPERATION_NAME, crtMetrics.getOperationName())
            .putMetric(CoreMetric.RETRY_COUNT, crtMetrics.getRetryCount())
            .putMetric(CoreMetric.AWS_REQUEST_ID, crtMetrics.getAwsRequestId())
            .putMetric(CoreMetric.AWS_EXTENDED_REQUEST_ID, crtMetrics.getAwsExtendedRequestId())
            .putMetric(CoreMetric.BACKOFF_DELAY_DURATION, 
                Duration.ofNanos(crtMetrics.getBackoffDelayDurationNs()))
            .putMetric(CoreMetric.SERVICE_CALL_DURATION, 
                Duration.ofNanos(crtMetrics.getServiceCallDurationNs()))
            .putMetric(CoreMetric.SIGNING_DURATION, 
                Duration.ofNanos(crtMetrics.getSigningDurationNs()))
            .build();

        delegate.publish(collection);
    }

    @Override
    public void publish(MetricCollection metricCollection) {
        delegate.publish(metricCollection);
    }

    @Override
    public void close() {
        delegate.close();
    }
}

Usage

CrtMetricPublisher metricPublisher = new CrtMetricPublisher(
    LoggingMetricPublisher.create());

S3MetaRequestResponseHandler responseHandler = new S3MetaRequestResponseHandler() {
    @Override
    public void onTelemetry(S3RequestMetrics metrics) {
        metricPublisher.publishCrtMetrics(metrics);
    }

    @Override
    public void onFinished(S3FinishedResponseContext context) {
        // Handle completion
    }
};

S3MetaRequestOptions options = new S3MetaRequestOptions()
    .withMetaRequestType(MetaRequestType.GET_OBJECT)
    .withHttpRequest(httpRequest)
    .withResponseHandler(responseHandler);

try (S3MetaRequest request = client.makeMetaRequest(options)) {
    // Request executes, metrics published via onTelemetry callback
}

@azkrishpy
Copy link
Contributor Author

The following metrics are not available from CRT for java since sdk has the information:

  • CredentialsFetchDuration
  • EndpointResolveDuration
  • MarshallingDuration
  • ServiceEndpoint

}

/**
* Invoked to report telemetry of partial execution of meta request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by partial execution? When is this invoked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are multiple parallelized request created for a single meta request, each request gets its own response with metrics. Although the meta request might not have been fully executed, the request might be complete and hence customers receive the metric data for that request. This is what I meant by partial execution of meta request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, can we update the documentation to make it more clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@azkrishpy azkrishpy force-pushed the expose-metrics branch 2 times, most recently from 02be5a6 to b61830a Compare December 5, 2025 01:53
Copy link
Contributor

@TingDaoK TingDaoK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All trivial comments, look good to me as long as Java SDK team align with the interface provided!

ErrorType(String name) {
this.name = name;
}
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line at the end of the file

* @return true if the error that generated the exception makes sense for a retry, and
* false otherwise.
*/
@Deprecated // use CRT.awsGetErrorType() instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: is this deprecation intended? CRT.awsGetErrorType() doesn't exactly return if an error should be retryable or not though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that java uses isErrorRetryable to figure out if the error needs to be pushed as an IOException or not. Since isErrorRetryable(HttpException exception) was always a temporary method to convey errors and we now provide the correct error type, we are deprecating that method.

Copy link
Contributor

@zoewangg zoewangg Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we use it in Java https://github.com/aws/aws-sdk-java-v2/blob/master/http-clients/aws-crt-client/src/main/java/software/amazon/awssdk/http/crt/internal/CrtUtils.java#L47

I suppose we should change it to if(CRT.awsGetErrorType(httpException.getErrorCode()) == IO) in Java. Is there any non-I/O code that we should retry? I'd expect no, but wanted to double check

private int requestType;

// CRT info metrics
private String ipAddress;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems useful, can we expose the getter?

// connection's and S3 request's addresses for this request attempt. This does not need to be exposed,
// but it might prove useful for logs.
private long connectionId;
private long requestPtr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's requestPtr?

public static native String awsErrorName(int errorCode);

/**
* Given an error code, get a boolean to check if an error is transient or not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the doc to add the definition of transient error?

@azkrishpy azkrishpy merged commit bdc3664 into main Dec 30, 2025
106 of 107 checks passed
@azkrishpy azkrishpy deleted the expose-metrics branch December 30, 2025 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants