一可软件 一可软件 Batch tools
English
Knowledge Hub

Comparison of Application Scenarios for Image and Video Hash Algorithms - pHash - PDQ - TMK+PDQF

Comparative analysis of the technical characteristics and application scenarios of three mainstream hash algorithms: pHash, PDQ, and TMK+PDQF.

Conclusion first: pHash is more suitable for image-level deduplication and similarity retrieval; PDQ is a high-bit image fingerprint for images; TMK+PDQF (and its vPDQ implementation) are video-oriented fingerprint algorithms that include temporal information.

The official documentation also provides audio and video interfaces, but the video part returns a frame-level hash sequence, which is essentially a "per-frame image hash," lacking explicit temporal encoding and is typically used for lightweight screening rather than robust video matching.

pHash

pHash is an open-source perceptual hash library, long used for image similarity retrieval, and provides image hash implementations such as DCT and Marr wavelet. It is suitable for image deduplication and near-duplicate retrieval scenarios.

PDQ

PDQ is a "photo-hashing" algorithm open-sourced by Meta, outputting a 256-bit image fingerprint. It is designed for fast, threshold-based matching of image content and is widely used in content moderation and cross-platform shared libraries.

In video scenarios, PDQ can be used as a component for "per-frame fingerprint extraction," but the official video solution recommends combining it with temporal modeling (see TMK+PDQF/vPDQ).

TMK+PDQF (including vPDQ)

TMK+PDQF is a video similarity algorithm: after computing PDQF (floating-point version of PDQ) features for each frame, it constructs two layers of fixed-length descriptors through temporal kernels, enabling retrieval by first comparing the global layer and then the temporal layer, thereby improving the robustness of video-level matching.

Industry benchmarks and documentation position TMK+PDQF (and the subsequent vPDQ) as open-source video fingerprint solutions for near-duplicate and segment-level matching of videos under changes in re-encoding, resolution, and bitrate.