FLANN快速近似最邻近算法官方指导文档_flann快速最近邻搜索库的手册-程序员宅基地

技术标签: 算法  experience  

by Oxidane Lin

本文翻译自FLANN的官方文档,原文是LATEX,改成了md格式方便阅读。能力时间所限,只翻译了C++部分,C/MATLAB/PYTHON还请自行对应。

Introduction

We can define the nearest neighbor search (NNS) problem in the
following way: given a set of points P = p 1 , p 2 , … , p n P=p_1,p_2,\dots,p_n P=p1,p2,,pn in a metric
space X X X, these points must be preprocessed in such a way that given a
new query point q ∈ X q \in X qX, finding the point in P P P that is nearest to
q q q can be done quickly.

定义“最邻近搜索”问题如下:在某个空间 X X X中任意给定一组点 P = p 1 , p 2 , … , p n P=p_1,p_2,\dots,p_n P=p1,p2,,pn,这些点经过预处理后,需要达到以下效果:任意给定一个新的查询点 q ∈ X q \in X qX,可以快速找到距离 q q q最近的点 P P P

The problem of nearest neighbor search is one of major importance in a
variety of applications such as image recognition, data compression,
pattern recognition and classification, machine learning, document
retrieval systems, statistics and data analysis. However, solving this
problem in high dimensional spaces seems to be a very difficult task and
there is no algorithm that performs significantly better than the
standard brute-force search. This has lead to an increasing interest in
a class of algorithms that perform approximate nearest neighbor
searches, which have proven to be a good-enough approximation in most
practical applications and in most cases, orders of magnitude faster
that the algorithms performing the exact searches.

在图像识别、数据压缩、模式识别和分类、机器学习、文档检索系统、统计和数据分析等各种应用中,最近邻搜索问题是一个非常重要的问题。然而,在高维空间中解决这个问题似乎是一项非常困难的任务,而且没有任何算法能比标准的蛮力搜索表现得更好。这使得人们对使用近似最近邻搜索的一类算法越来越感兴趣,在大多数实际应用中,这已经证明是一个足够好的近似,在大多数情况下,比执行精确搜索的算法快一个数量级。

FLANN (Fast Library for Approximate Nearest Neighbors) is a library for
performing fast approximate nearest neighbor searches. FLANN is written
in the C++ programming language. FLANN can be easily used in many
contexts through the C, MATLAB, Python, and Ruby bindings provided with
the library.

FLANN就是用C++写的一个快速近似最邻近算法。

Quick Start 快速学习

This section contains small examples of how to use the FLANN library
from different programming languages (C++, C, MATLAB, Python, and Ruby).

  • C++
        // file flann_example.cpp
        // 最简单的例子
        #include <flann/flann.hpp>
        #include <flann/io/hdf5.h>
        #include <stdio.h>
        int main(int argc, char** argv)
        {
    
            int nn = 3;
            flann::Matrix<float> dataset;
            flann::Matrix<float> query;
            flann::load_from_file(dataset, "dataset.hdf5","dataset");
            flann::load_from_file(query, "dataset.hdf5","query");
            flann::Matrix<int> indices(new int[query.rows*nn], query.rows, nn);
            flann::Matrix<float> dists(new float[query.rows*nn], query.rows, nn);
            // construct an randomized kd-tree index using 4 kd-trees
            flann::Index<flann::L2<float> > index(dataset, flann::KDTreeIndexParams(4));
            index.buildIndex();                                                                                               
            // do a knn search, using 128 checks
            index.knnSearch(query, indices, dists, nn, flann::SearchParams(128));
            flann::save_to_file(indices,"result.hdf5","result");
            delete[] dataset.ptr();
            delete[] query.ptr();
            delete[] indices.ptr();
            delete[] dists.ptr();
            
            return 0;
        }

Downloading and compiling FLANN 下载和编译

FLANN can be downloaded from the following address:

http://www.cs.ubc.ca/\simmariusm/flann

After downloading and unpacking, the following files and directories
should be present:

  • bin: directory various for scripts and binary files

  • doc: directory containg this documentation

  • examples: directory containg examples of using FLANN

  • src: directory containg the source files

  • test: directory containg unit tests for FLANN

To compile the FLANN library the CMake[^1] build system is required.
Below is an example of how FLANN can be compiled on Linux (replace x.y.z
with the corresponding version number).

$ cd flann-x.y.z-src
$ mkdir build
$ cd build
$ cmake ..
$ make

On windows the steps are very similar:

> "C:\Program Files\Microsoft Visual Studio 9.0\VC\vcvarsall.bat"
> cd flann-x.y.z-src
> mkdir build
> cd build
> cmake ..
> nmake

There are several compile options that can be configured before FLANN is
compiled, for example the build type (Release, RelWithDebInfo, Debug) or
whether to compile the C, Python or the MATLAB bindings. To change any
of this options use the cmake-gui application after cmake has
finished (see figure [fig:cmake-gui]).

> cmake-gui .

Configuring the FLANN compile options

Upgrading from a previous version 版本升级

This section contains changes that you need to be aware of when
upgrading from a previous version of FLANN.

Version 1.8.0

Due to changes in the library, the on-disk format of the saved
indexes has changed and it is not possible to load indexes
previously saved with an older version of the library.

Version 1.7.0

A small breaking API change in flann::Matrix requires client code
to be updated. In order to release the memory used by a matrix, use:

  delete[] matrix.ptr();

instead of:

  delete[] matrix.data;

or:

  matrix.free();

The member data of flann::Matrix is not publicly accessible any
more, use the ptr() method to obtain the pointer to the memory
buffer.

Compiling FLANN with multithreading support 多线程支持

For taking advantage of multithreaded search, the project that uses FLANN needs to be compiled with a compiler that supports the OpenMP standard and the OpenMP support must be enabled. The number of cores to be used can be selected with the cores field in the SearchParams structure. By default a single core will be used. Setting the cores field to zero will automatically use as many threads as cores available on the machine.

SearchParams里把cores设为零可以自动用最多的核实现快速多线程搜索。

Using FLANN 使用细则

Using FLANN from C++

The core of the FLANN library is written in C++. To make use of the full power and flexibility of the templated code one should use the C++ bindings if possible. To use the C++ bindings you only need to include the the library header file flann.hpp. An example of the compile command that must be used will look something like this:

g++ flann_example.cpp -I $FLANN_ROOT/include -o flann_example_cpp

where $FLANN_ROOT is the library main directory.

The following sections describe the public C++ API.

用C++最能体现模板代码的优势所在,最好用C++进行编程。编译过程如上↑。

flann::Index

The FLANN nearest neighbor index class. This class is used to abstract different types of nearest neighbor search indexes. The class is templated on the distance functor to be used for computing distances between pairs of features.

Index类。是各种最临近搜索索引的抽象。该类用到了距离因子的模版,以计算特征之间的距离。

namespace flann
{
    
    template<typename Distance>
    class Index 
    {
    
    typedef typename Distance::ElementType ElementType;
    typedef typename Distance::ResultType DistanceType;
    public:
        Index(const IndexParams& params, Distance distance = Distance() );
        
        Index(const Matrix<ElementType>& points, const IndexParams& params,
                Distance distance = Distance() );
    ~Index();
    void buildIndex();        
        
        void buildIndex(const Matrix<ElementType>& points);
        
        void addPoints(const Matrix<ElementType>& points, 
                        float rebuild_threshold = 2);
        
        void removePoint(size_t point_id);
        
        ElementType* getPoint(size_t point_id);
    int knnSearch(const Matrix<ElementType>& queries, 
                Matrix<int>& indices, 
                Matrix<DistanceType>& dists, 
                size_t knn, 
                const SearchParams& params);
        int knnSearch(const Matrix<ElementType>& queries,
                        std::vector< std::vector<int> >& indices,
                        std::vector<std::vector<DistanceType> >& dists,
                        size_t knn,
                        const SearchParams& params);
    int radiusSearch(const Matrix<ElementType>& queries, 
                Matrix<int>& indices, 
                Matrix<DistanceType>& dists, 
                float radius, 
                const SearchParams& params);
        int radiusSearch(const Matrix<ElementType>& queries,
                            std::vector< std::vector<int> >& indices,
                            std::vector<std::vector<DistanceType> >& dists,
                            float radius,
                            const SearchParams& params);
    void save(std::string filename);
    int veclen() const;
    int size() const;
    IndexParams getParameters() const;
        flann_algorithm_t getType() const;
    };
}

The Distance functor

The distance functor is a class whose operator() computes the distance between two features. If the distance is also a kd-tree compatible distance it should also provide an accum_dist() method that computes the distance between individual feature dimensions. A typical distance functor looks like this (see the dist.h file for more examples):

距离因子是一个类,它的operator()用于计算特征之间的“距离”。如果该距离兼容k-d树(多维树),它同时会提供一个accum_dist()方法,用于计算单独维度中的距离。一个典型的距离因子计算方法如下,更多请参考dist.h

    template<class T>
    struct L2
    {
    
        typedef bool is_kdtree_distance;//判断是否为多维树
        typedef T ElementType;//元素的类型和模板类型一致
        typedef typename Accumulator<T>::Type ResultType; //
        template <typename Iterator1, typename Iterator2>
        ResultType operator()(Iterator1 a, Iterator2 b, size_t size, 
                              ResultType /*worst_dist*/ = -1) const
        {
    
            ResultType result = ResultType();
            ResultType diff;
            for(size_t i = 0; i < size; ++i ) {
    
                diff = *a++ - *b++;//对应位置上的a-b为差值
                result += diff*diff;//a-b的平方和为距离
            }
            return result;
        }
        template <typename U, typename V>
        inline ResultType accum_dist(const U& a, const V& b, int) const
        {
    
            return (a-b)*(a-b);
        }
    };

In addition to operator() and accum_dist(), a distance functor
should also define the ElementType and the ResultType as the types
of the elements it operates on and the type of the result it computes.

程序还应该指明元素的类型 ElementTypeResultType

If a distance functor can be used as a kd-tree distance (meaning that the full distance between a pair of features can be accumulated from the partial distances between the individual dimensions) a typedef is_kdtree_distance should be present inside the distance functor. If the distance is not a kd-tree distance, but it’s a distance in a vector space (the individual dimensions of the elements it operates on can be accessed independently) a typedef is_vector_space_distance should be defined inside the functor. If neither typedef is defined, the distance is assumed to be a metric distance and will only be used with indexes operating on generic metric distances.\

如果一个距离因子是kdtree的距离,也就是说,一对特征之间的总距离可以从各个独立维度的微分距离累加得到,那么变量is_kdtree_distance应该存在;如果某个距离因子不是kdtree的距离,而是一个向量空间的距离,(操作的每个独立的维度都可以独立获取),is_vector_space_distance变量应该定义。如果这两者都没有被定义,这个距离会被认为是一个度量距离,只会用通用的度量距离算子进行处理。

flann::Index::Index Constructs a nearest neighbor search index for a given dataset.

    Index(const IndexParams& params, Distance distance = Distance() );
    Index(const Matrix<ElementType>& points, const IndexParams& params,
            Distance distance = Distance() );
points 需要建立索引的特征形成的矩阵,每行一个特征,矩阵大小是特征数×维度。
:   Matrix containing the features(points) that should be indexed,
    stored in a row-major order (one point on each row of the matrix).
    The size of the matrix is $num\_features \times dimensionality$.

params 包含索引参数的结构,构建索引的类型取决于这个参数,可用的类型有:
:   Structure containing the index parameters. The type of index that
    will be constructed depends on the type of this parameter. The
    possible parameter types are:

LinearIndexParams 传递一个该类型的对象时,索引会做线性暴力搜索。
When passing an object of this type, the index will perform a linear, brute-force search.

    struct LinearIndexParams : public IndexParams 
    {
    
    };

KDTreeIndexParams 传递一个该类型的对象时,索引将由一系列随机的kdtrees并行搜索。
When passing an object of this type the index constructed will consist of a set of randomized kd-trees which will be searched in parallel.

    struct KDTreeIndexParams : public IndexParams
    {
    
            KDTreeIndexParams( int trees = 4 );
    };
trees 并行kdtree的数量,好的取值在1-16之间。
:   The number of parallel kd-trees to use. Good values are in the
    range [1..16]

KMeansIndexParams 传递一个该类型的对象时,索引将被构建成一个层次k均值树。
k均值聚类:以空间中k个点为中心进行聚类,对最靠近他们的对象归类。通过迭代的方法,逐次更新各聚类中心的值,直至得到最好的聚类结果。
When passing an object of this type the index constructed will be a hierarchical k-means tree.

    struct KMeansIndexParams : public IndexParams
    {
    
        KMeansIndexParams( int branching = 32,
                int iterations = 11,
                flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM,
                float cb_index = 0.2 );
    };
branching 分支数
:   <span> The branching factor to use for the hierarchical k-means
    tree </span>

iterations 聚类过程中的最大迭代数。如果是-1,聚类将一直进行到收敛。
:   <span> The maximum number of iterations to use in the k-means
    clustering stage when building the k-means tree. If a value of
    -1 is used here, it means that the k-means clustering should be
    iterated until convergence</span>

centers\_init 进行k均值聚类的时候,挑选初始中心点的算法。可以用的参数有:
    CENTERS\_RANDOM 随机选
    CENTERS\_GONZALES Gonzales算法
    CENTERS\_KMEANSPP Kmeanspp算法
:   <span> The algorithm to use for selecting the initial centers
    when performing a k-means clustering step. The possible values
    are CENTERS\_RANDOM (picks the initial cluster centers
    randomly), CENTERS\_GONZALES (picks the initial centers using
    Gonzales’ algorithm) and CENTERS\_KMEANSPP (picks the initial
    centers using the algorithm suggested in @arthur_kmeanspp_2007)
    </span>

cb\_index 聚类边界系数:该参数影响分层k均值树中“探索”进行的方式。如果是0,
    下一个探索的区域就是最近中心点的区域;大于0的值,会把区域的大小也考虑在内。
:   <span> This parameter (cluster boundary index) influences the
    way exploration is performed in the hierarchical kmeans tree.
    When `cb_index` is zero the next kmeans domain to be explored is
    choosen to be the one with the closest center. A value greater
    then zero also takes into account the size of the domain.</span>

CompositeIndexParams 组合索引,用这种索引时,随机kdtree和层次k均值树会被组合使用。
When using a parameters object of this type the index created combines the randomized kd-trees and the hierarchical k-means tree.

    struct CompositeIndexParams : public IndexParams
    {
    
            CompositeIndexParams( int trees = 4,
                                int branching = 32,
                                int iterations = 11,
                                flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM, 
                                float cb_index = 0.2 );
    };

KDTreeSingleIndexParams 传递一个该类型的对象时,索引将会包含一个单独的kdtree,用于搜索低维度的数据,例如3d点云。
When passing an object of this type the index will contain a single kd-tree optimized for searching lower dimensionality data (for example 3D point clouds)

    struct KDTreeSingleIndexParams : public IndexParams
    {
    
            KDTreeSingleIndexParams( int leaf_max_size = 10 );
    };
max\_leaf\_size 每片叶子上包含点的最大数量。
:   The maximum number of points to have in a leaf for not branching
    the tree any more.

KDTreeCuda3dIndexParams 传递一个该类型的对象时,索引将会包含一个单独的kdtree,并且在CUDA上运算。大量搜索和查询点时,搜索表现最好。
When passing an object of this type the index will be a single kd-tree that is built and performs searches on a CUDA compatible GPU. Search performance is best for large numbers of search and query points. For more information, see section [sec:flann::cuda]

    struct KDTreeCuda3dIndexParams : public IndexParams
    {
    
        KDTreeCuda3dIndexParams( int leaf_max_size = 64 );
    };
max\_leaf\_size 每片叶子上包含点的最大数量。
:   The maximum number of points to have in a leaf for not branching
    the tree any more.

HierarchicalClusteringIndexParams 传递一个该类型的对象时,索引类型会是一个层次聚类索引。这种索引适用于任何度量距离,并且可以用汉明距离匹配二进制特征。
When passing an object of this type the index constructed will be a hierarchical clustering index. This type of index works with any metric distance and can be used for matching binary features using Hamming distances.

    struct HierarchicalClusteringIndexParams : public IndexParams
    {
    
        HierarchicalClusteringIndexParams(int branching = 32,
                                    flann_centers_init_t centers_init = FLANN_CENTERS_RANDOM,
                                    int trees = 4, int leaf_max_size = 100)
    };
branching 分支数
:   <span> The branching factor to use for the hierarchical
    clustering tree </span>

centers\_init 中心点选取算法
:   <span> The algorithm to use for selecting the initial centers
    when performing a k-means clustering step. The possible values
    are CENTERS\_RANDOM (picks the initial cluster centers
    randomly), CENTERS\_GONZALES (picks the initial centers using
    Gonzales’ algorithm) and CENTERS\_KMEANSPP (picks the initial
    centers using the algorithm suggested in @arthur_kmeanspp_2007)
    </span>

trees 并行树,最好在3-8.
:   The number of parallel trees to use. Good values are in the
    range [3..8]

leaf\_size 叶片大小
:   The maximum number of points a leaf node should contain.

LshIndexParams 传递一个该类型的对象时,索引将会是一种多探寻的局部敏感哈希方法,只适用于汉明距离匹配二进制特征。
When passing an object of this type the index constructed will be a multi-probe LSH (Locality-Sensitive Hashing) index. This type of index can only be used for matching binary features using Hamming distances.

    struct LshIndexParams : public IndexParams
    {
    
        LshIndexParams(unsigned int table_number = 12, 
                        unsigned int key_size = 20, 
                        unsigned int multi_probe_level = 2);
    };
table\_number 使用哈希表的数量
:   <span> The number of hash tables to use </span>

key\_size 哈希表键的长度
:   <span> The length of the key in the hash tables</span>

multi\_probe\_level 多探寻中用到的级别,0是标准LSH
:   Number of levels to use in multi-probe (0 for standard LSH)

AutotunedIndexParams 传递一个该类型的对象时,自动选择最佳的索引类型和参数(随机kdtree,层次k均值,线性)
When passing an object of this type the index created is automatically tuned to offer the best performance, by choosing the optimal index type (randomized kd-trees, hierarchical kmeans, linear) and parameters for the dataset provided.

    struct AutotunedIndexParams : public IndexParams
    {
    
        AutotunedIndexParams( float target_precision = 0.9,
                    float build_weight = 0.01,
                    float memory_weight = 0,
                    float sample_fraction = 0.1 );
    };
target\_precision 0-1之间表征近似最临近搜索返回值是真正最临近值的百分比,
    此值越大,结果越精确,但搜索时间更长。最佳值一般取决于应用场景。
:   <span> Is a number between 0 and 1 specifying the percentage of
    the approximate nearest-neighbor searches that return the exact
    nearest-neighbor. Using a higher value for this parameter gives
    more accurate results, but the search takes longer. The optimum
    value usually depends on the application. </span>

build\_weight 表征索引构建时间相比于最临近搜索时间重要性的比例。有些应用场合,
    构建索引花费很长时间,随后的搜索非常快是允许的;另一些场景中要求索引被快速建立,
    即使这会使搜索时间更长。
:   <span> Specifies the importance of the index build time raported
    to the nearest-neighbor search time. In some applications it’s
    acceptable for the index build step to take a long time if the
    subsequent searches in the index can be performed very fast. In
    other applications it’s required that the index be build as fast
    as possible even if that leads to slightly longer search
    times.</span>

memory\_weight 表征索引建立和搜索所需时间和消耗内存之间的权重关系,
    小于1的值更在意时间,大于1的值更在意内存消耗。
:   <span>Is used to specify the tradeoff between time (index build
    time and search time) and memory used by the index. A value less
    than 1 gives more importance to the time spent and a value
    greater than 1 gives more importance to the memory usage.</span>

sample\_fraction 采样比例:0-1之间表征自动参数认证算法中使用的数据集的比例大小。
    当数据集过大时,耗时会非常久,用一部分表示整体也能得出很好的评估结果。
:   <span>Is a number between 0 and 1 indicating what fraction of
    the dataset to use in the automatic parameter configuration
    algorithm. Running the algorithm on the full dataset gives the
    most accurate results, but for very large datasets can take
    longer than desired. In such case using just a fraction of the
    data helps speeding up this algorithm while still giving good
    approximations of the optimum parameters.</span>

SavedIndexParams 从硬盘读取一个之前保存好的索引。
This object type is used for loading a previously saved index from the disk.

    struct SavedIndexParams : public IndexParams
    {
    
            SavedIndexParams( std::string filename );
    };
filename 文件名字
:   <span> The filename in which the index was saved. </span>

flann::Index::buildIndex

构建最临近搜索索引,有两个重载,一个以点为参数,另一个以对象创建时提供的点为参数。
Builds the nearest neighbor search index. There are two versions of the
buildIndex method, one that uses the points provided as argument and
one that uses the points provided to the constructor when the object was
constructed.

    void buildIndex();        
    void buildIndex(const Matrix<ElementType>& points);

flann::Index::addPoints

addPoints添加点的方法,用于向建好的索引中加点,为了防止索引失衡,有阈值参数控制何时重建索引,默认为2.
The addPoints method can be used to incrementally add points to the
index after the index was build. To avoid the index getting unbalanced,
the addPoints method has the option of rebuilding the index after a
large number of points have been added. The rebuild_threshold
parameter controls when the index is rebuild, by default this is done
when it doubles in size (rebuild_threshold = 2).

    void addPoints(const Matrix<ElementType>& points, float rebuild_threshold = 2);

flann::Index::removePoint

removePoint移除点的方法,留下的点的索引号不会改变。
The removePoint method removes one point with the specified point_id
from the index. The indices (of the remaining points) returned by the
nearest neighbor operations do not change when points are removed from
the index.

    void removePoint(size_t point_id);

flann::Index::getPoint

getPoint返回一个点的id号,指向数据中的一个点。
The getPoint method returns a pointer to the data point with the specified point_id.

    ElementType* getPoint(size_t point_id);

flann::Index::knnSearch

进行k最近邻搜索,就是指最接近的k个邻居(数据),即每个样本都可以由它的K个邻居来表达。这个方法有两个重载,一个是预先分配内存的flann::Matrix对象,用于存放返回的索引号和距离;另一个是两层向量,会自动调整大小。
Performs a K-nearest neighbor search for a set of query points. There
are two signatures for this method, one that takes pre-allocated
flann::Matrix objects for returning the indices of and distances to
the neighbors found, and one that takes std::vector<std::vector> that
will re resized automatically as needed.

    int Index::knnSearch(const Matrix<ElementType>& queries,
            Matrix<int>& indices, 
            Matrix<DistanceType>& dists,
            size_t knn,
            const SearchParams& params);
    int Index::knnSearch(const Matrix<ElementType>& queries,
                    std::vector< std::vector<int> >& indices,
                    std::vector<std::vector<DistanceType> >& dists,
                    size_t knn,
                    const SearchParams& params);
queries 包含查询点的矩阵,大小是查询点数×维度。
:   <span>Matrix containing the query points. Size of matrix is
    ($num\_queries \times dimentionality $)</span>

indices 包含找到的k临近结果的索引号,对预分配版本,最少要查询点数×knn的值。
:   <span>Matrix that will contain the indices of the K-nearest
    neighbors found (size should be at least $num\_queries \times knn$
    for the pre-allocated version).</span>

dists 包含找到的k临近结果的距离信息,对预分配版本,最少要查询点数×knn的值。
    注意:对于欧式距离,`flann::L2`计算的是平方距离,所以传到此处的值也应该是平方值。
:   <span>Matrix that will contain the distances to the K-nearest
    neighbors found (size should be at least $num\_queries \times knn$
    for the pre-allocated version).\
    *Note:* For Euclidean distances, the `flann::L2` functor computes
    squared distances, so the value passed here needs to be a squared
    distance as well. </span>

knn 最临近搜索要找的点的数量
:   <span>Number of nearest neighbors to search for.</span>

params 搜索参数,搜索中要用的参数的结构。
:   <span>Search parameters.</span> Structure containing parameters used
    during search.

SearchParameters 搜索参数

    struct SearchParams
    {
    
        SearchParams(int checks = 32,
            float eps = 0,
            bool sorted = true);
        int checks;
        float eps;
        bool sorted;
        int max_neighbors;
        tri_type use_heap;
        int cores;
        bool matrices_in_gpu_ram;
    };
checks 搜索临近点时最大叶子的数量。该值越大,搜索精度越高,耗时越久。所有叶片都搜到,用
`FLANN_CHECKS_UNLIMITED`;如果是自动认证的情况,达到预期精度所需checks的数量也会被计算,
想用该值的话,设为`FLANN_CHECKS_AUTOTUNED`。
:   specifies the maximum leafs to visit when searching for
    neighbours. A higher value for this parameter would give better
    search precision, but also take more time. For all leafs to be
    checked use the value `FLANN_CHECKS_UNLIMITED`. If automatic
    configuration was used when the index was created, the number of
    checks required to achieve the specified precision was also
    computed, to use that value specify `FLANN_CHECKS_AUTOTUNED`.

eps 搜索 eps-大致临近的点,大概是epsilon的意思?只用于KDTreeSingleIndex/KDTreeCuda3dIndex。
:   Search for eps-approximate neighbors (only used by
    KDTreeSingleIndex and KDTreeCuda3dIndex).

sorted 只用于半径搜索,表征返回的临近点是否按距离排序。
:   Used only by radius search, specifies if the neighbors returned
    should be sorted by distance.

max\_neighbours 表征半径搜索返回临近值的最大数量,仅用于半径搜索。
:   Specifies the maximum number of neighbors radius search should
    return (default: -1 = unlimited). Only used for radius search.

use\_heap 用堆排序。用堆的数据结构来管理内部的临近列表(可选值 FLANN\_False, FLANN\_True,
    FLANN\_Undefined)。堆对于大量临近点更高效,对少量值的效果不明显。默认值是Undefined,
    让代码根据临近点的数量自己选择最佳值。该选项只用与knn搜索。
:   Use a heap data structure to manage the list of neighbors
    internally (possible values: FLANN\_False, FLANN\_True,
    FLANN\_Undefined). A heap is more efficient for a large number
    of neighbors and less efficient for a small number of neighbors.
    Default value is FLANN\_Undefined, which lets the code choose
    the best option depending on the number of neighbors requested.
    Only used for KNN search.

cores 搜索中内核的数量。
:   How many cores to assign to the search (specify 0 for automatic
    core selection).

matrices\_in\_gpu\_ram GPU搜索中,表征矩阵是否已经在GPU内存中。
:   for GPU search indicates if matrices are already in GPU ram.

flann::Index::radiusSearch

进行半径最近邻搜索。这个方法有两个重载,一个是预先分配内存的flann::Matrix对象,用于存放返回的索引号和距离;另一个是两层向量,会自动调整大小。
Performs a radius nearest neighbor search for a set of query points.
There are two signatures for this method, one that takes pre-allocated
flann::Matrix objects for returning the indices of and distances to
the neighbors found, and one that takes std::vector<std::vector> that
will resized automatically as needed.

    int Index::radiusSearch(const Matrix<ElementType>& queries,
              Matrix<int>& indices,
              Matrix<DistanceType>& dists,
              float radius,
              const SearchParams& params); 
    int Index::radiusSearch(const Matrix<ElementType>& queries,
                      std::vector< std::vector<int> >& indices,
                      std::vector<std::vector<DistanceType> >& dists,
                      float radius,
                      const SearchParams& params);
queries
The query point. Size of matrix is ($num_queries \times dimentionality $).
indices
Matrix that will contain the indices of the K-nearest
neighbors found. For the pre-allocated version, only as many
neighbors are returned as many columns in this matrix. If fewer
neighbors are found than columns in this matrix, the element after
that last index returned is -1. In case of the std::vector version,
the rows will be resized as needed to fit all the neighbors to be
returned, except if the “max_neighbors” search parameter is
set.
dists
Matrix that will contain the distances to the K-nearest
neighbors found. The same number of values are returned here as for
the indices matrix.
Note: For Euclidean distances, the flann::L2 functor computes
squared distances, so the value passed here needs to be a squared
distance as well.
radius
The search radius.
Note: For Euclidean distances, the flann::L2 functor computes
squared distances, so the value passed here needs to be a squared
distance as well.
params
Search parameters.

The method returns the number of nearest neighbors found.

flann::Index::save

保存索引到一个文件
Saves the index to a file.

    void Index::save(std::string filename);
filename 文件名字
The file to save the index to

flann::hierarchicalClustering

多层聚类:将给定的点构造成一个层次k均值树,并且选择其中使聚类方差最小的一支cut。
Clusters the given points by constructing a hierarchical k-means tree
and choosing a cut in the tree that minimizes the clusters’ variance.

    template <typename Distance>
    int hierarchicalClustering(const Matrix<typename Distance::ElementType>& points, 
        Matrix<typename Distance::ResultType>& centers,
        const KMeansIndexParams& params,
        Distance d = Distance())
features 要被聚类的点
:   <span>The points to be clustered</span>

centers 聚类得到的中心点。这个矩阵中的行数表征了所需聚类数量。然而,基于层次树中分支的选择,计算得到的聚类数量会是
$(branching-1)*k+1$ 的最大值,该值会比所求聚类数小。$branching$是分支数。
:   <span>The centers of the clusters obtained. The number of rows in
    this matrix represents the number of clusters desired. However,
    because of the way the cut in the hierarchical tree is choosen, the
    number of clusters computed will be the highest number of the form
    $(branching-1)*k+1$ that’s lower than the number of clusters
    desired, where $branching$ is the tree’s branching factor (see
    description of the KMeansIndexParams). </span>

params
:   <span>Parameters used in the construction of the hierarchical
    k-means tree</span>

该方法的返回值是计算得到聚类的数量。
The function returns the number of clusters computed.

flann::KdTreeCuda3dIndex

提供CUDA接口,用于大量3d数据集的提速。
FLANN provides a CUDA implementation of the kd-tree build and search
algorithms to improve the build and query speed for large 3d data sets.
This section will provide all the necessary information to use the
KdTreeCuda3dIndex index type.

编译:如果CMake检测到CUDA模块,libflann_cuda.so库将会被编译。
Building: If CMake detects a CUDA install during the build (see
section [sec:downloadingandcompiling]), a library libflann_cuda.so
will be built.

基本用法:
Basic Usage: To be able to use the new index type, you have to include the FLANN header this way:

    #define FLANN_USE_CUDA
    #include <flann/flann.hpp>

如果在头文件之前定义了FLANN_USE_CUDA,你需要手动链接libflann_cuda.so到你的项目。这样的好处是你不用编译的时候加上nvcc参数了,除非你还要用其它的CUDA代码。
If you define the symbol FLANN_USE_CUDA before including the FLANN
header, you will have to link libflann_cuda.so or libflann_cuda_s.a
with your project. However, you will not have to compile your source
code with nvcc, except if you use other CUDA code, of course.

然后,可以用KDTreeCuda3dIndexParams方法创建索引,这个索引会负责所有索引创建和搜索的数据拷贝工作。
You can then create your index by using the KDTreeCuda3dIndexParams to
create the index. The index will take care of copying all the data from
and to the GPU for you, both for index creation and search.

当心:关于选用CUDA还是多线程CPU进行kdtree运算,很难给出一条通用的指导意见。最好是尝试一下,看看效果决定。
A word of caution: A general guideline for deciding whether to use the
CUDA kd tree or a (multi-threaded) CPU implementation is hard to give,
since it depends on the combination of CPU and GPU in each system and
the data sets. For example, on a system with a Phenom II 955 CPU and a
Geforce GTX 260 GPU, the maximum search speedup on a synthetic (random)
data set is a factor of about 8-9 vs the single-core CPU search, and
starts to be reached at about 100k search and query points. (This
includes transfer times.) Build time does not profit as much from the
GPU acceleration; here the benefit is about 2x at 200k points, but this
largely depends on the data set. The only way to know which
implementation is best suited is to try it.

高级用法:一些情况下,GPU里已经有了缓存的数据,这时可以直接用缓存,而不必再从系统缓存中拷贝。具体的实现方法是,把GPU的指针传给flann::Matrix的输入,并且告诉索引按照GPU的指针去处理。
Advanced Usage: In some cases, you might already have your data in a
buffer on the GPU. In this case, you can re-use these buffers instead of
copying the buffers back to system RAM for index creation and search.
The way to do this is to pass GPU pointers via the flann::Matrix
inputs and tell the index via the index and search params to treat the
pointers as GPU pointers.

    thrust::device_vector<float4>  points_vector( n_points );
    // ... fill vector here...
    float* gpu_pointer = (float*)thrust::raw_pointer_cast(&points_vector[0]);
    flann::Matrix<float> matrix_gpu(gpu_pointer,n_points,3, 4);
    flann::KDTreeCuda3dIndexParams params;
    params["input_is_gpu_float4"]=true;
    flann::Index<flann::L2<float> > flannindex( matrix_gpu, params  );
    flannindex.buildIndex();

首先,创建并填充一个格式为float4的GPU缓存
First, a GPU buffer of float4 values is created and filled with points.[^2]

然后,把一个指向该缓存的GPU指针存放在一个FLANN的矩阵中,3列,步长为4(·float4的最后一个元素应当为0
Then, a GPU pointer to the buffer is stored in a flann matrix with 3 columns and a stride of 4 (the last element in the float4 should be zero).

最后,创建索引,input_is_gpu_float4标识表明以GPU缓存的形式去处理矩阵。
Last, the index is created. The input_is_gpu_float4 flag tells the index to treat the matrix as a gpu buffer.

同样的,返回值为flann矩阵的搜索过程也可以用GPU缓存来处理,但返回值为向量的不可以。要实现这种效果,索引中的指针和距离矩阵一定要指向GPU缓存,并且cols的值必须设为3(因为我们在处理三维点)。这里的步长值stride可以任意设置。

Similarly, you can specify GPU buffers for the search routines that
return the result via flann matrices (but not for those that return them
via std::vectors). To do this, the pointers in the index and dist
matrices have to point to GPU buffers and the cols value has to be set
to 3 (as we are dealing with 3d points). Here, any value for stride
can be used.

    flann::Matrix<int> indices_gpu(gpu_pointer_indices,n_points, knn, stride);
    flann::Matrix<float> dists_gpu(gpu_pointer_dists,n_points, knn, stride);
    flann::SearchParams params;
    params.matrices_in_gpu_ram = true;
    flannindex.knnSearch( queries_gpu ,indices_gpu,dists_gpu,knn, params);

注意:这里千万不能混用CPU里的矩阵信息。
Note that you cannot mix matrices in system and CPU ram here!

Search Parameters: 搜索参数
The search routines support three parameters:搜索支持以下参数:

  • eps - used for approximate knn search. The maximum possible error
    is e = d b e s t ∗ e p s e= d_{best} * eps e=dbesteps, i.e. the distance of the returned neighbor
    is at maximum e p s eps eps times larger than the distance to the real best
    neighbor.

    用于大致k临近搜索。最大可能误差值是 e = d b e s t ∗ e p s e= d_{best} * eps e=dbesteps,也就是返回的临近点的距离最多也就比实际最短距离多出了 e e e

  • use_heap - used in knn and radius search. If set to true, a heap
    structure will be used in the search to keep track of the distance
    to the farthest neighbor. Beneficial with about k > 64 k>64 k>64 elements.

    knn和半径搜索中用的参数,堆结构。堆结构主要用来跟踪最远临近点,元素数多于64比较有益。

  • sorted - if set to true, the results of the radius and knn
    searches will be sorted in ascending order by their distance to the
    query point.

    设为true时,半径搜索和knn搜索会按照到查询点距离的升序排列

  • matrices_in_gpu_ram - set to true to indicate that all (input and
    output) matrices are located in GPU RAM.

    设为true表示所有矩阵信息都是在GPU内存里的。

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_45687825/article/details/110881552

智能推荐

oracle 12c 集群安装后的检查_12c查看crs状态-程序员宅基地

文章浏览阅读1.6k次。安装配置gi、安装数据库软件、dbca建库见下:http://blog.csdn.net/kadwf123/article/details/784299611、检查集群节点及状态:[root@rac2 ~]# olsnodes -srac1 Activerac2 Activerac3 Activerac4 Active[root@rac2 ~]_12c查看crs状态

解决jupyter notebook无法找到虚拟环境的问题_jupyter没有pytorch环境-程序员宅基地

文章浏览阅读1.3w次,点赞45次,收藏99次。我个人用的是anaconda3的一个python集成环境,自带jupyter notebook,但在我打开jupyter notebook界面后,却找不到对应的虚拟环境,原来是jupyter notebook只是通用于下载anaconda时自带的环境,其他环境要想使用必须手动下载一些库:1.首先进入到自己创建的虚拟环境(pytorch是虚拟环境的名字)activate pytorch2.在该环境下下载这个库conda install ipykernelconda install nb__jupyter没有pytorch环境

国内安装scoop的保姆教程_scoop-cn-程序员宅基地

文章浏览阅读5.2k次,点赞19次,收藏28次。选择scoop纯属意外,也是无奈,因为电脑用户被锁了管理员权限,所有exe安装程序都无法安装,只可以用绿色软件,最后被我发现scoop,省去了到处下载XXX绿色版的烦恼,当然scoop里需要管理员权限的软件也跟我无缘了(譬如everything)。推荐添加dorado这个bucket镜像,里面很多中文软件,但是部分国外的软件下载地址在github,可能无法下载。以上两个是官方bucket的国内镜像,所有软件建议优先从这里下载。上面可以看到很多bucket以及软件数。如果官网登陆不了可以试一下以下方式。_scoop-cn

Element ui colorpicker在Vue中的使用_vue el-color-picker-程序员宅基地

文章浏览阅读4.5k次,点赞2次,收藏3次。首先要有一个color-picker组件 <el-color-picker v-model="headcolor"></el-color-picker>在data里面data() { return {headcolor: ’ #278add ’ //这里可以选择一个默认的颜色} }然后在你想要改变颜色的地方用v-bind绑定就好了,例如:这里的:sty..._vue el-color-picker

迅为iTOP-4412精英版之烧写内核移植后的镜像_exynos 4412 刷机-程序员宅基地

文章浏览阅读640次。基于芯片日益增长的问题,所以内核开发者们引入了新的方法,就是在内核中只保留函数,而数据则不包含,由用户(应用程序员)自己把数据按照规定的格式编写,并放在约定的地方,为了不占用过多的内存,还要求数据以根精简的方式编写。boot启动时,传参给内核,告诉内核设备树文件和kernel的位置,内核启动时根据地址去找到设备树文件,再利用专用的编译器去反编译dtb文件,将dtb还原成数据结构,以供驱动的函数去调用。firmware是三星的一个固件的设备信息,因为找不到固件,所以内核启动不成功。_exynos 4412 刷机

Linux系统配置jdk_linux配置jdk-程序员宅基地

文章浏览阅读2w次,点赞24次,收藏42次。Linux系统配置jdkLinux学习教程,Linux入门教程(超详细)_linux配置jdk

随便推点

matlab(4):特殊符号的输入_matlab微米怎么输入-程序员宅基地

文章浏览阅读3.3k次,点赞5次,收藏19次。xlabel('\delta');ylabel('AUC');具体符号的对照表参照下图:_matlab微米怎么输入

C语言程序设计-文件(打开与关闭、顺序、二进制读写)-程序员宅基地

文章浏览阅读119次。顺序读写指的是按照文件中数据的顺序进行读取或写入。对于文本文件,可以使用fgets、fputs、fscanf、fprintf等函数进行顺序读写。在C语言中,对文件的操作通常涉及文件的打开、读写以及关闭。文件的打开使用fopen函数,而关闭则使用fclose函数。在C语言中,可以使用fread和fwrite函数进行二进制读写。‍ Biaoge 于2024-03-09 23:51发布 阅读量:7 ️文章类型:【 C语言程序设计 】在C语言中,用于打开文件的函数是____,用于关闭文件的函数是____。

Touchdesigner自学笔记之三_touchdesigner怎么让一个模型跟着鼠标移动-程序员宅基地

文章浏览阅读3.4k次,点赞2次,收藏13次。跟随鼠标移动的粒子以grid(SOP)为partical(SOP)的资源模板,调整后连接【Geo组合+point spirit(MAT)】,在连接【feedback组合】适当调整。影响粒子动态的节点【metaball(SOP)+force(SOP)】添加mouse in(CHOP)鼠标位置到metaball的坐标,实现鼠标影响。..._touchdesigner怎么让一个模型跟着鼠标移动

【附源码】基于java的校园停车场管理系统的设计与实现61m0e9计算机毕设SSM_基于java技术的停车场管理系统实现与设计-程序员宅基地

文章浏览阅读178次。项目运行环境配置:Jdk1.8 + Tomcat7.0 + Mysql + HBuilderX(Webstorm也行)+ Eclispe(IntelliJ IDEA,Eclispe,MyEclispe,Sts都支持)。项目技术:Springboot + mybatis + Maven +mysql5.7或8.0+html+css+js等等组成,B/S模式 + Maven管理等等。环境需要1.运行环境:最好是java jdk 1.8,我们在这个平台上运行的。其他版本理论上也可以。_基于java技术的停车场管理系统实现与设计

Android系统播放器MediaPlayer源码分析_android多媒体播放源码分析 时序图-程序员宅基地

文章浏览阅读3.5k次。前言对于MediaPlayer播放器的源码分析内容相对来说比较多,会从Java-&amp;amp;gt;Jni-&amp;amp;gt;C/C++慢慢分析,后面会慢慢更新。另外,博客只作为自己学习记录的一种方式,对于其他的不过多的评论。MediaPlayerDemopublic class MainActivity extends AppCompatActivity implements SurfaceHolder.Cal..._android多媒体播放源码分析 时序图

java 数据结构与算法 ——快速排序法-程序员宅基地

文章浏览阅读2.4k次,点赞41次,收藏13次。java 数据结构与算法 ——快速排序法_快速排序法