Skip to content

Question: How do we ensure TDigest::merge never causes numCentroids over centroidsCapacity and thus index out of bound? #703

@tisonkun

Description

@tisonkun

numCentroids_ = 0;
Sort.stableSort(values, weights, num);
if (reverseMerge_) { // this might be avoidable if stableSort could be implemented with a boolean parameter to invert the logic
Sort.reverse(values, num);
Sort.reverse(weights, num);
}
centroidMeans_[0] = values[0];
centroidWeights_[0] = weights[0];
numCentroids_++;
int current = 1;
double weightSoFar = 0;
while (current != num) {
final double proposedWeight = centroidWeights_[numCentroids_ - 1] + weights[current];
boolean addThis = false;
if ((current != 1) && (current != (num - 1))) {
final double q0 = weightSoFar / centroidsWeight_;
final double q2 = (weightSoFar + proposedWeight) / centroidsWeight_;
final double normalizer = ScaleFunction.normalizer(k_ * 2, centroidsWeight_);
addThis = proposedWeight <= (centroidsWeight_ * Math.min(ScaleFunction.max(q0, normalizer), ScaleFunction.max(q2, normalizer)));
}
if (addThis) { // merge into existing centroid
centroidWeights_[numCentroids_ - 1] += weights[current];
centroidMeans_[numCentroids_ - 1] += ((values[current] - centroidMeans_[numCentroids_ - 1])
* weights[current]) / centroidWeights_[numCentroids_ - 1];
} else { // copy to a new centroid
weightSoFar += centroidWeights_[numCentroids_ - 1];
centroidMeans_[numCentroids_] = values[current];
centroidWeights_[numCentroids_] = weights[current];
numCentroids_++;
}
current++;
}

I don't find any constraint that numCentroids will never exceed centroidsCapacity here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions