Reverse Video Search

Reverse video search allows us to use a video clip as an input for a query against videos that have been indexed in a vector store.
Reverse Video Search

You may have used some kind of reverse image search before. Put simply, instead of searching using text: australian shepherds running, you can use an image: australian_shepherd_running.png. The search engine will then find all similar images based on that input.

But have you used reverse video search? The approach is the same: use your video as a query to find other videos.

Reverse video search enables users to find similar videos by using a video clip as the search input, rather than traditional text-based queries. This technology leverages advanced computer vision and machine learning to analyze and match visual content across video databases.

Component Description
Feature Extraction Processing videos to identify and encode visual elements, scenes, and patterns
Vector Embeddings Converting visual features into numerical representations for efficient comparison
Similarity Matching Algorithms that compare video embeddings to find similar content
Temporal Analysis Processing that considers the sequential nature of video content

Understanding Through Image Search First

Before diving into video search, it's helpful to understand reverse image search, which follows similar principles but with still images.

How Reverse Image Search Works

  1. Input Processing: The system takes an image as input
  2. Feature Extraction: Analyzes visual elements like colors, shapes, and patterns
  3. Similarity Matching: Compares these features against a database of images
  4. Result Ranking: Returns similar images ranked by relevance
Input Processing (Image as input) Feature Extraction (Colors, shapes, patterns) Similarity Matching (Compare features) Result Ranking (Rank by relevance)

Try it on Google Images: https://images.google.com/

In the example below, I'll upload a picture of an Australian Shepherd dog, and Google's reverse image search will find all similar pictures of Australian Shepherds.

Use Case Description Business Impact
E-commerce Finding similar products from product images Increased sales through visual discovery
Content Verification Identifying original sources of images Enhanced content authenticity
Brand Protection Detecting unauthorized use of logos/images Better intellectual property protection
Real Estate Finding similar properties from photographs Improved property matching

Image Feature Extraction

To perform a search we need to extract features from the image. Below, we're just leaving the default options, but you can go crazy with how many features you can pull out

import requests

url = "https://api.mixpeek.com/ingest/images/url"

payload = {
    "url": "https://www.akc.org/wp-content/uploads/2017/11/Australian-Shepherd.1.jpg",
    "collection": "sample_dogs",
    "feature_extractors": {
        "embed": [
            {
                "type": "url",
                "embedding_model": "image"
            }
        ]
    }
}
headers = {  
  'Authorization': 'Bearer API_KEY', # removed after for brevity
  "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)
Ingest Image Url - Mixpeek
**Requirements:** - Required permissions: write
import requests

url = "https://api.mixpeek.com/features/search"

payload = {
    "queries": [
        {
            "type": "url",
            "value": "https://www.akc.org/wp-content/uploads/2017/11/Australian-Shepherd.1.jpg",
            "embedding_model": "image"
        },
    ],
    "collections": ["sample_dogs"]
}
Search Features - Mixpeek
This endpoint allows you to search features.

Reverse video search works the same way. We first embed a couple videos, then provide a sample video as a search.

For our index, we'll use a movie trailer from the 1940s classic, The Third Man:

Prepare the video(s)

We'll split the video up by 5 secton intervals, then embed each interval using the multimodal embedding model. We'll also pull out a description from each interval.

import requests
import json

url = "https://api.mixpeek.com/ingest/videos/url"

payload = json.dumps({
  "url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/The+Third+Man++Official+Trailer.mp4",
  "collection": "my_video_collection",
  "feature_extractors": [
    {
      "interval_sec": 5,
      "describe": {
        "enabled": True
      },
      "embed": [
        {
          "type": "url",
          "embedding_model": "multimodal"
        }
      ]
    }
  ]
})

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)
Ingest Video Url - Mixpeek
**Requirements:** - Required permissions: write

Embed the video to search and run!

Now we have a grainy video clip from some CCTV that we'll use for our reverse video search:

We'll do the same thing, only difference is we'll want the embedding from the video we want to search across the already indexed and embedded videos:

import requests

url = "https://api.mixpeek.com/features/search"

payload = {
    "queries": [
        {
            "type": "url",
            "value": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/media-analysis/video_queries/exiting_sewer.mp4",
            "embedding_model": "multimodal",
        },
    ],
    "collections": ["my_video_collection"],
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

Compare results

Now that we have our embeddings we can run a KNN search:

This will return an array of objects that we can use to render in our application indicating what the most similar video timestamps are based on the video embedding as a query

results = [
    {"start_time": 25.0, "end_time": 30.0, "score": 0.6265061},
    {"start_time": 5.0, "end_time": 10.0, "score": 0.6025797},
    {"start_time": 30.0, "end_time": 35.0, "score": 0.59880114},
]

Now if we look at the original video @ 25 seconds in:

Amazing, we found a challenging scene to describe using a video query as an input. Now imagine doing that across billions of videos 🤯

Using this template, we set it so that whenever a new object is added to our S3 bucket it's automatically processed and inserted into our database (connection established prior). Additionally, if a video is ever deleted from our S3 bucket its' embeddings are deleted from our database as well.

Applications and Use Cases

Industry Use Case Benefits
Content Creation Finding specific scenes or clips Streamlined editing process
Media Monitoring Tracking content reuse across platforms Better copyright enforcement
Security Analyzing surveillance footage Enhanced threat detection
E-commerce Product discovery through video Improved shopping experience

Additional Resources

For additional information and implementation details, refer to:

About the author
Ethan Steininger

Ethan Steininger

Probably outside.

Multimodal Makers | Mixpeek

Learn best practices, reference architectures and follow example tutorials to build multimodal AI applications

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Multimodal Makers | Mixpeek.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.