Compressed cover tree
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
The compressed cover tree is a type of data structure in computer science that is specifically designed to facilitate the speed-up of a k-nearest neighbors algorithm in finite metric spaces.[1] Compressed cover tree is a simplified version of explicit representation of cover tree that was motivated by past issues in proofs of time complexity results[2] of cover tree. The compressed cover tree was specifically designed to achieve claimed time complexities of cover tree[3] in a mathematically rigorous way.
Problem statement
[edit]In the modern formulation, the k-nearest neighbor problem is to find all nearest neighbors in a given reference set R for all points from another given query set Q. Both sets belong to a common ambient space X with a distance metric d satisfying all metric axioms.
Definitions
[edit]Compressed cover tree
[edit]Let (R,d) be a finite metric space. A compressed cover tree has the vertex set R with a root and a level function satisfying the conditions below:
- Root condition: the level of the root node r satisfies
- Covering condition: For every node , we select a unique parent p and a level l(q) such that and this parent node pp has a single link to its child node q.
- Separation condition: For , the cover set has
Expansion constants
[edit]In a metric space, let be the closed ball with a center p and a radius . The notation denotes the number (if finite) of points in the closed ball.
The expansion constant[3] is the smallest such that for any point and .
the new minimized expansion constant[1] is a discrete analog of the doubling dimension Navigating nets[4] , where A is a locally finite set which covers R.
Note that for any finite metric space (R,d).
Aspect ratio
[edit]For any finite set R with a metric d, the diameter is . The aspect ratio is , where is the shortest distance between points of R.
Complexity
[edit]Insert
[edit]Although cover trees provide faster searches than the naive approach, this advantage must be weighed with the additional cost of maintaining the data structure. In a naive approach adding a new point to the dataset is trivial because order does not need to be preserved, but in a compressed cover tree it can be bounded:
- using expansion constant: .
- using minimized expansion constant / doubling dimension .
K-nearest neighborhood search
[edit]Let Q and R be finite subsets of a metric space (X,d). Once all points of R are inserted into a compressed cover tree it can be used for find-queries of the query point set Q. The following time complexities have been proven for finding the k-nearest neighbor of a query point in the reference set R:
- using expansion constant: .
- using minimized expansion constant / doubling dimension , where is a number of points inside a closed ball around q having a radius 5 times the distance of q to its k-nearest neighbor.
Space
[edit]The compressed cover tree constructed on finite metric space R requires O(|R|) space, during the construction and during the execution of the Find algorithm.
Compared to other similar data structures
[edit]Using doubling dimension as hidden factor
[edit]Tables below show time complexity estimates which use minimized expansion constant or dimensionality constant [4] related to doubling dimension. Note that denotes the aspect ratio.
- Results for building data structures
Name of datastructure, source |
Claimed time complexity | Claimed space complexity | Proof of result |
---|---|---|---|
Navigating nets[4] | Theorem 2.5[4] | ||
Compressed cover tree[1] | Theorem 3.6[1] |
Results for exact k-nearest neighbors of one query point in reference set R assuming that all data structures are already built. Below we denote the distance between a query point q and the reference set R as and distance from a query point q to its k-nearest neighbor in set R as :
Name of datastructure, source |
Claimed time complexity | Claimed space complexity | Proof of result |
---|---|---|---|
Navigating nets[4] | Proof outline in Theorem 2.3[4] | ||
Compressed cover tree[1] | Corollary 3.7[1] |
Using expansion constant as hidden factor
[edit]Tables below show time complexity estimates which use or KR-type constant [4] as a hidden factor. Note that the dimensionality factor is equivalent to
- Results for building data structures
Name of datastructure, source |
Claimed time complexity | Claimed space complexity | Proof of result |
---|---|---|---|
Navigating nets[4] | Not available | ||
Cover tree[3] | Counterexample 4.2[2] shows that the past proof is incorrect. | ||
Compressed cover tree[1] | Corollary 3.10[1] |
Results for exact k-nearest neighbors of one query point assuming that all data structures are already built.
Name of datastructure, source |
Claimed time complexity | Claimed space complexity | Proof of result |
---|---|---|---|
Navigating nets[4] | Not available | ||
Cover tree[3] | for k = 1. | Counterexample 5.2[2] shows that the past proof is incorrect. | |
Compressed cover tree[1] | Theorem 4.9 |
See also
[edit]References
[edit]- ^ a b c d e f g h i Elkin, Yury; Kurlin, Vitaliy (2021). "A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree". arXiv:2111.15478 [cs.CG].
- ^ a b c Elkin, Yury; Kurlin, Vitaliy (2022). "Counterexamples expose gaps in the proof of time complexity for cover trees introduced in 2006". arXiv:2208.09447 [cs.CG].
- ^ a b c d Alina Beygelzimer, Sham Kakade, and John Langford. Cover Trees for Nearest Neighbor. In Proc. International Conference on Machine Learning (ICML), 2006.
- ^ a b c d e f g h i Kenneth Clarkson. Nearest-neighbor searching and metric space dimensions. In G. Shakhnarovich, T. Darrell, and P. Indyk, editors, Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pages 15–59. MIT Press, 2006.