Turn Any Video Into a Searchable Knowledge Base: Whisper + MySQL (Step by Step)

Why build this now?

Teams record everything, but finding the one minute that matters can be painful. By transcribing and indexing, you unlock fast search, skim-ready summaries, and reusable content for blogs, docs, and FAQs.

What you will build

A simple pipeline: Video → Whisper transcription → MySQL storage → Search UI.
Chunked text with timestamps, so results can jump back to the right moment in the video.
A tiny REST API in PHP that your existing site can call.

Prerequisites

MySQL 8 or MariaDB.
FFmpeg installed on your machine or server.
Python 3.10+ with faster-whisper or OpenAI Whisper.
PHP 8+ for the minimal API examples.

Database design

We will create two tables: one for the video asset and one for transcription chunks.

CREATE TABLE videos (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(200) NOT NULL,
  source_url VARCHAR(255) NULL,
  created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE TABLE video_chunks (
  id INT PRIMARY KEY AUTO_INCREMENT,
  video_id INT NOT NULL,
  t_start_sec DECIMAL(10,2) NOT NULL,
  t_end_sec DECIMAL(10,2) NOT NULL,
  content MEDIUMTEXT NOT NULL,
  FULLTEXT KEY ft_content (content),
  FOREIGN KEY (video_id) REFERENCES videos(id) ON DELETE CASCADE
) ENGINE=InnoDB;

Step 1: Extract audio

# Save audio as 16 kHz mono WAV
ffmpeg -i input.mp4 -ac 1 -ar 16000 audio.wav

Step 2: Transcribe with Whisper

Use Python and the high performance faster-whisper package. It streams segments with start and end times, perfect for chunking.

pip install faster-whisper mysql-connector-python

from faster_whisper import WhisperModel
import mysql.connector

model = WhisperModel("base")  # use "small" or "medium" for higher quality
segments, info = model.transcribe("audio.wav")

db = mysql.connector.connect(
    host="localhost", user="your_user", password="your_pass", database="your_db", charset="utf8mb4"
)
cur = db.cursor()

# 1) insert the video row
cur.execute("INSERT INTO videos (title, source_url) VALUES (%s, %s)", ("Weekly Sync 2025-09-01", "input.mp4"))
video_id = cur.lastrowid

# 2) insert each segment as a searchable chunk
for s in segments:
    cur.execute(
        "INSERT INTO video_chunks (video_id, t_start_sec, t_end_sec, content) VALUES (%s, %s, %s, %s)",
        (video_id, s.start, s.end, s.text.strip())
    )

db.commit()
cur.close()
db.close()

Step 3: Add a tiny PHP search API

This endpoint returns matching chunks with timestamps you can link back to in your player.

<?php
header("Content-Type: application/json; charset=utf-8");
$q = $_GET["q"] ?? "";
$pdo = new PDO("mysql:host=localhost;dbname=your_db;charset=utf8mb4","your_user","your_pass",[
  PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
]);
$sql = "SELECT v.title, c.t_start_sec, c.t_end_sec, c.content
        FROM video_chunks c
        JOIN videos v ON v.id = c.video_id
        WHERE MATCH(c.content) AGAINST(:q IN NATURAL LANGUAGE MODE)
        ORDER BY c.id DESC
        LIMIT 50";
$stmt = $pdo->prepare($sql);
$stmt->execute([":q" => $q]);
echo json_encode($stmt->fetchAll(PDO::FETCH_ASSOC), JSON_UNESCAPED_UNICODE);

Step 4: Simple front end

<form id="s">
  <input name="q" placeholder="Search meetings and lectures">
  <button>Search</button>
</form>
<ul id="results"></ul>

<script>
document.getElementById("s").onsubmit = async (e) => {
  e.preventDefault();
  const q = new FormData(e.target).get("q");
  const res = await fetch("/api/search.php?q=" + encodeURIComponent(q));
  const data = await res.json();
  const ul = document.getElementById("results");
  ul.innerHTML = data.map(r => (
    `<li><strong>${r.title}</strong> — ${r.content}
      <br><small>${r.t_start_sec.toFixed(2)}s to ${r.t_end_sec.toFixed(2)}s</small></li>`
  )).join("");
};
</script>

Quality tips

For the clearest text, record with a dedicated mic and reduce background noise.
Try model sizes base, small, and medium to balance speed and accuracy.
Use MySQL stopword and ngram settings if you search languages that are not space separated.
Add a summary field per video if you want quick previews for your blog list.

Optional: automatic blog generation

After transcription you can turn the best parts into a blog post automatically. Save a cleaned summary into your existing blog table and schedule it for review before publishing.

Security and cost notes

Run transcription on a machine you control when dealing with private meetings.
Limit your API endpoint with a simple token or session check to avoid abuse.
Rotate database credentials and back up your transcripts regularly.

Conclusion

You now have a lean pipeline that converts any video into searchable knowledge. Whisper extracts accurate text, MySQL stores it in tidy chunks with full text indexes, and a tiny PHP API powers fast search. Start with one meeting or lecture, ship the basics, and iterate. The payoff is huge: faster answers, reusable content, and fewer hours lost scrubbing through timelines.