Build with Valossa Video AI

Valossa AI REST API lets you analyze and understand video content.
It extracts time coded, descriptive multimodal metadata that is structured and machine-readable. Based on AI that understands video like a human.

				
					curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "api_key": "YOUR_API_KEY",
    "media": {
      "video": {
        "url": "https://example.com/video.mp4"
      }
    }
  }' \
  https://api-eu.valossa.com/core/1.0/new_job
				
			

Extract complete JSON metadata from any video for search, analytics, large language model processing and general enrichment of audiovisual assets.

Obtain speech to text, captions, summaries, scene descriptions, keywords, topics, faces, objects, content structure, moods and more. 

How it Works

Three steps from video file to structured metadata

1
Upload Your Video

Send a video URL or file to the API. We accept MP4, MOV, AVI, MKV, and 50+ other formats. Up to 3 hours per video.

2
AI Analyzes Content

Valossa AI processes the video with broad set of detection types: speech, faces, objects, text, scenes, topics, brands, and more.

3
Get Metadata

Retrieve rich JSON metadata via the API. Timestamps, confidence scores, bounding boxes, transcripts — ready for your application.

Simple REST API

One API call to analyze video speech, audio and visual content. Get results as structured JSON with timestamps, confidence scores, and detection metadata.

Works with any programming language. We provide examples in Python, JavaScript, Java, and curl.

				
					curl "https://api-eu.valossa.com/core/1.0/job_results?api_key=YOUR_API_KEY&job_id=JOB_ID" -o results.json
# get AI analysis metadata for every modality in a single JSON file

				
			

Video AI Suited for Scale, Breadth and Volume

Getting Started Guide

From zero to your first API call in 15 minutes. Covers authentication, your first video upload, and reading the results.

valossa ad scout product being displayed on a laptop
API Reference

Complete endpoint documentation with parameters, response schemas, error codes, and examples. OpenAPI 3.1 spec available.

Tools and Code Examples

Python, JavaScript, curl code snippets. Metadata Reader CLI for parsing. Postman collection for quick testing.

				
					Prompt: "Find a scene with suspense and intrigue"

Response: 
00:00:58 - 00:01:02: Threatening with guns, ominous dark setting, harmful dialogue, suspenseful music, expressions of anger and displeasure
				
			

Looking for Video Analysis for Your LLM Media Application or Agent?

Modern video content management requires large language and vision model analysis on media content. With Valossa multimodal video AI, you can build powerful video understanding capabilities to your application. Unleash the video intelligence with powerful, time coded metadata.

Valossa Assistant — An Agentic Service for Video Analysis Tasks

Not a developer? Try Valossa Assistant — a conversational agentic AI application for video analysis including RAG, clipping and prompt-based video task assignmnts. Ask questions about your videos, search transcripts, extract clips, and generate reports using natural language.

No code required. Upload a video and start asking. Assistant watches and listens through video content using Valossa AI and turns it into answers, insights, meta assets, text descriptions, clips — complex task automation. 

				
					Me: Are there any sensitive scenes in this video?

Assistant: 
Here are segments from the video containing violence, weapons, threats, explicit language, substance use, injury, death, and other sensitive material:

Dark injury scene
00:00:58 - 00:01:02

Very dark outdoor scene; ground is uneven; content involves blood, open wound, bruise (injury and violence present).

Ski resort gun threat
00:03:36 - 00:03:41

Outdoor dining at a ski resort; gloved hand holding a black revolver, very tense moment (weapons and threat present).
....
				
			

Work smarter with multimodal gen AI that understands video like a human.

Valossa AI accelerates video content analytics with advanced speech recognition, visual analysis and contextual intelligence.

Gain more profound metadata for your next generation video intelligence applications.

Valossa AI is More Than Speech to Text. Generative AI with Vision and Hearing Converts Video to Text and Structured Metadata​

Accurate Speech Transcripts and Video Scene Logs

Perfect alignment of text and visuals with full scene breakdowns — no more manual time-code sync

valossa ad scout product being displayed on a laptop
Instant Captions, Subtitles and Translations

Create multilingual captions seamlessly and make content accessible to global audiences

Broad Multimodal Metadata in a Single Pass

Obtain key metadata. Find moments in seconds with audio-visual scene search over people, objects, speech, sounds and emotions

Designed for Professional Workflow Automation at Scale

Send multiple files at once via Valossa API to process videos at scale. 

Transform Hours of Video to Text and Structured Metadata

frequently asked questions

With Valossa AI, you can automate transcripts, captions, visual descriptions, translations, highlights, clips, metadata generation, content analytics, and video search. We analyze your videos to unlock valuable insights.

Video Analysis AI software uses artificial intelligence to automatically interpret video content. It can detect objects, people, activities, and even emotions within footage. Instead of manually reviewing hours of video, AI video analysis identifies meaningful patterns, making it faster and more accurate to gain insights. Businesses use it for media productivity, video monetization, advertising, media management, compliance, cognitive insights and content intelligence.

Absolutely! We offer both an API for seamless integration and a user-friendly Valossa Portal for manual use and easy inspection of analysis results.

It is a REST API that allows users to upload, analyze and understand video file content and obtain detailed metadata from audiovisual content streams.
It extracts time coded, descriptive multimodal JSON metadata that is well-structured, documented and machine-readable.

Multimodal AI Video Analysis is applied across industries to automate real-world cognitive work. In media and broadcasting, it automates content production, monetization and management.  Modern conversational AI systems use multimodal video analysis to resolve complex questions about the content.  In compliance and monitoring, it flags sensitive or restricted material from speech or visual information. The versatility of AI-powered video analysis helps organizations save time, reduce costs, and automate processes with actionable insights.

Valossa AI has been trained to reach the best multimodal details from the video regarding breadth, accuracy, speed and cost. With good quality video content, the accuracy can become above 98% for speech and face recognition . Valossa AI is a multimodal video intelligent AI    

Many industries benefit, including:

  • Media & Broadcasting: content indexing, search and management, metadata generation, contextual advertising, transcribing, automatic subtitling, highlights clipping, compliance

  • Security & Surveillance: anomaly detection, threat monitoring

  • Retail and smart spaces: flow measurement of people and objects, customer journey tracking, sentiment analysis

  • Healthcare: patient monitoring, behavior analysis.

  • Education & Research: lecture transcription, presentation analysis, metadata and topic extraction, video data mining, search

  • Creators, podcasting and marketing: speeding up clipping and repurposing of content, social media promotions, video analytics

  • Advertising and marketing research: video analytics, competitor analysis, ad effectiveness, contextual targeting  

Yes, on-premises setups are available. Initial costs and IPR protection precautions may vary case by case.

Our expert team is available for custom AI solutions tailored to your needs, ensuring the perfect fit for your project.

AI Video Analysis is applied across industries to solve real-world problems. In media and broadcasting, it automates content tagging, transcription and generating additional media assets like transcripts, captions, metadata, summaries, categori. In retail, it measures customer behavior and store traffic. In compliance, it flags sensitive or restricted material. The versatility of AI-powered video analysis helps organizations save time, reduce costs, and improve decision-making.