Turn Any Video Into a Searchable Knowledge Base: Whisper + MySQL (Step by Step)

Why build this now?

Teams record everything, but finding the one minute that matters can be painful. By transcribing and indexing, you unlock fast search, skim-ready summaries, and reusable content for blogs, docs, and FAQs.

What you will build

Prerequisites

Database design

We will create two tables: one for the video asset and one for transcription chunks.

CREATE TABLE videos (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(200) NOT NULL,
  source_url VARCHAR(255) NULL,
  created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE TABLE video_chunks (
  id INT PRIMARY KEY AUTO_INCREMENT,
  video_id INT NOT NULL,
  t_start_sec DECIMAL(10,2) NOT NULL,
  t_end_sec DECIMAL(10,2) NOT NULL,
  content MEDIUMTEXT NOT NULL,
  FULLTEXT KEY ft_content (content),
  FOREIGN KEY (video_id) REFERENCES videos(id) ON DELETE CASCADE
) ENGINE=InnoDB;

Step 1: Extract audio

# Save audio as 16 kHz mono WAV
ffmpeg -i input.mp4 -ac 1 -ar 16000 audio.wav

Step 2: Transcribe with Whisper

Use Python and the high performance faster-whisper package. It streams segments with start and end times, perfect for chunking.

pip install faster-whisper mysql-connector-python
from faster_whisper import WhisperModel
import mysql.connector

model = WhisperModel("base")  # use "small" or "medium" for higher quality
segments, info = model.transcribe("audio.wav")

db = mysql.connector.connect(
    host="localhost", user="your_user", password="your_pass", database="your_db", charset="utf8mb4"
)
cur = db.cursor()

# 1) insert the video row
cur.execute("INSERT INTO videos (title, source_url) VALUES (%s, %s)", ("Weekly Sync 2025-09-01", "input.mp4"))
video_id = cur.lastrowid

# 2) insert each segment as a searchable chunk
for s in segments:
    cur.execute(
        "INSERT INTO video_chunks (video_id, t_start_sec, t_end_sec, content) VALUES (%s, %s, %s, %s)",
        (video_id, s.start, s.end, s.text.strip())
    )

db.commit()
cur.close()
db.close()

Step 3: Add a tiny PHP search API

This endpoint returns matching chunks with timestamps you can link back to in your player.

<?php
header("Content-Type: application/json; charset=utf-8");
$q = $_GET["q"] ?? "";
$pdo = new PDO("mysql:host=localhost;dbname=your_db;charset=utf8mb4","your_user","your_pass",[
  PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
]);
$sql = "SELECT v.title, c.t_start_sec, c.t_end_sec, c.content
        FROM video_chunks c
        JOIN videos v ON v.id = c.video_id
        WHERE MATCH(c.content) AGAINST(:q IN NATURAL LANGUAGE MODE)
        ORDER BY c.id DESC
        LIMIT 50";
$stmt = $pdo->prepare($sql);
$stmt->execute([":q" => $q]);
echo json_encode($stmt->fetchAll(PDO::FETCH_ASSOC), JSON_UNESCAPED_UNICODE);

Step 4: Simple front end

<form id="s">
  <input name="q" placeholder="Search meetings and lectures">
  <button>Search</button>
</form>
<ul id="results"></ul>

<script>
document.getElementById("s").onsubmit = async (e) => {
  e.preventDefault();
  const q = new FormData(e.target).get("q");
  const res = await fetch("/api/search.php?q=" + encodeURIComponent(q));
  const data = await res.json();
  const ul = document.getElementById("results");
  ul.innerHTML = data.map(r => (
    `<li><strong>${r.title}</strong> — ${r.content}
      <br><small>${r.t_start_sec.toFixed(2)}s to ${r.t_end_sec.toFixed(2)}s</small></li>`
  )).join("");
};
</script>

Quality tips

Optional: automatic blog generation

After transcription you can turn the best parts into a blog post automatically. Save a cleaned summary into your existing blog table and schedule it for review before publishing.

Security and cost notes

Conclusion

You now have a lean pipeline that converts any video into searchable knowledge. Whisper extracts accurate text, MySQL stores it in tidy chunks with full text indexes, and a tiny PHP API powers fast search. Start with one meeting or lecture, ship the basics, and iterate. The payoff is huge: faster answers, reusable content, and fewer hours lost scrubbing through timelines.