XenoEngineer: /* wav2short.sh – turn a .wav into a YouTube-Short with scrolling word captions */

2025-12-18T17:49:30Z

wav2short.sh – turn a .wav into a YouTube-Short with scrolling word captions

← Older revision		Revision as of 17:49, 18 December 2025
Line 5:		Line 5:
	<div style="background-color:azure; border:1px outset azure; padding:0 20px; max-width:860px; margin:0 auto; ">		<div style="background-color:azure; border:1px outset azure; padding:0 20px; max-width:860px; margin:0 auto; ">
	= wav2short.sh – turn a .wav into a YouTube-Short with scrolling word captions =		= wav2short.sh – turn a .wav into a YouTube-Short with scrolling word captions =
			<div style="float:right; width:425px;">__TOC__</div>

	A retired Hypergeek’s minimal, fully-scriptable pipeline (Debian/Ubuntu, no GUI).		A retired Hypergeek’s minimal, fully-scriptable pipeline (Debian/Ubuntu, no GUI).

XenoEngineer: Created page with "Category:YouTube Category:UAP News Category:how to
= wav2short.sh – turn a .wav into a YouTube-Short with scrolling word captions = A retired Hypergeek’s minimal, fully-scriptable pipeline (Debian/Ubuntu, no GUI). == Prerequisites == * Debian/Ubuntu box (headless OK) * ffmpeg (repo version is fine) * Python ≥ 3.10 * pip packages: ** whisper..."

2025-12-18T16:46:42Z

Created page with "Category:YouTube Category:UAP News Category:how to <div style="background-color:azure; border:1px outset azure; padding:0 20px; max-width:860px; margin:0 auto; "> = wav2short.sh – turn a .wav into a YouTube-Short with scrolling word captions = A retired Hypergeek’s minimal, fully-scriptable pipeline (Debian/Ubuntu, no GUI). == Prerequisites == * Debian/Ubuntu box (headless OK) * ffmpeg (repo version is fine) * Python ≥ 3.10 * pip packages: ** whisper..."

New page

[[Category:YouTube]]
[[Category:UAP News]]
[[Category:how to]]

<div style="background-color:azure; border:1px outset azure; padding:0 20px; max-width:860px; margin:0 auto; ">
= wav2short.sh – turn a .wav into a YouTube-Short with scrolling word captions =

A retired Hypergeek’s minimal, fully-scriptable pipeline (Debian/Ubuntu, no GUI).

== Prerequisites ==
* Debian/Ubuntu box (headless OK)
* ffmpeg (repo version is fine)
* Python ≥ 3.10
* pip packages:
** whisper-timestamped (word-level SRT)
** srt, Pillow, numpy, opencv-python

== One-file script ==
Create <code>~/bin/wav2short.sh</code> and <code>chmod +x</code> it.

<source lang="bash">
#!/usr/bin/env bash
# wav2short.sh <input.wav> [duration=60] [out.mp4]
WAV=$1
DUR=${2:-60} # seconds of final clip
OUT=${3:-output.mp4}
W=1080 # vertical short: 1080x1920
H=1920
FONT=~/fonts/Inconsolata-Regular.ttf
FONTSIZE=75
LINES=7 # how many lines visible at once
FPS=30

# 1. Whisper → word-level SRT (takes ~20 s on a 1-min clip)
python -m whisper_timestamped "$WAV" --language en --model base \
--output_format srt --output_dir /tmp > /dev/null
SRT=/tmp/"$(basename "$WAV" .wav)".srt

# 2. Python burns the frames
python3 << EOF
import srt, cv2, numpy as np, subprocess, math
from PIL import ImageFont, ImageDraw, Image
W, H, FPS, LINES, FONTSIZE = $W, $H, $FPS, $LINES, $FONTSIZE
FONT = ImageFont.truetype("$FONT", FONTSIZE)
sub = list(srt.parse(open("$SRT").read()))
dur = $DUR
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('/tmp/caps.mp4', fourcc, FPS, (W, H))

def draw_frame(idx, active):
img = Image.new('RGB', (W, H), 'black')
draw = ImageDraw.Draw(img)
y0 = (H - LINES*FONTSIZE) // 2
for i, w in enumerate(active[-LINES:], start=1-max(LINES,0)):
y = y0 + (i+LINES-1)*FONTSIZE
color = 'yellow' if i==0 else 'white'
draw.text((W//2, y), w, font=FONT, fill=color, anchor='ma')
return np.array(img)

active = [] # rolling buffer of words on screen
for f in range(dur*FPS):
t = f/FPS
# add words whose start <= t
while sub and sub[0].start.total_seconds() <= t:
active.append(sub.pop(0).content.strip())
# drop words that are off-screen
active = active[-LINES:]
out.write(draw_frame(f, active))
out.release()
EOF

# 3. Mux audio and burn final
ffmpeg -y -i /tmp/caps.mp4 -i "$WAV" -t "$DUR" \
-c:v libx264 -crf 18 -pix_fmt yuv420p -shortest "$OUT"
echo "Done → $OUT"
</source>

== Usage ==
<source lang="bash">
wav2short.sh mybook.wav 60 mybook_short.mp4
</source>

== Quick tweaks ==
* Longer clip: change <code>DUR</code> argument
* Karaoke highlight: color the exact word inside <code>draw_frame()</code>
* 4K vertical: set <code>W=2160 H=3840</code> and bump <code>FONTSIZE</code>
* Add waveform: insert ffmpeg <code>showwaves</code> filter
* Stream live: replace last ffmpeg line with rtmp(s) url

== Summary ==
100 % scriptable, < 100 lines, runs on CPU, perfect for a hammock-deployed Hypergeek.
</div>

<pre style="margin-left:3em; font:normal 14px terminal;">
Prefers to stream straight to YouTube? Swap the last ffmpeg line for one that pushes to
rtmps://a.rtmp.youtube.com/...

Why this satisfies the Hypergeek
Zero GUI, 100 % scriptable over SSH from his hammock.
Whisper-timestamped is small (74 MB) and runs on CPU; no GPU needed.
Every stage is a file (/tmp/caps.mp4, /tmp/foo.srt) so he can inspect, debug, or re-use.
One dependency install:
pip install whisper-timestamped srt Pillow numpy opencv-python

.
Entire pipeline is < 100 lines—easy to memory-fit even on a 10-year-old ThinkPad.
Copy, tweak, upload, done.

user
Enter user message...
Latency:
5,376 ms
Tokens/s:
228

Clear

Submit
Ctrl

Python

Copy
from groq import Groq

client = Groq()
completion = client.chat.completions.create(
model="moonshotai/kimi-k2-instruct-0905",
messages=[
{
"role": "user",
"content": "How does a smart, retired Ol' Hypergeek approach creating shorts for YouTube that are built from a wave file (audio only) and convert to a video of the words scrolling over the screen in sequence as read?\n"
},
{
"role": "assistant",
"content": "A smart, retired Ol’ Hypergeek doesn’t fight the toolchain—he glues it together with the lightest, most scriptable parts he already knows. \nBelow is the exact “one-cup-of-coffee” pipeline he keeps in a single Bash script on the living-room NUC. It takes a .wav (or any audio), turns it into word-level timed text, then burns a 60-second (or any length) vertical-captioned video that YouTube Shorts will swallow whole.\n\n-------------------------------------------------\n1. What he already has installed\n-------------------------------------------------\n- Debian/Ubuntu box (headless is fine) \n- ffmpeg (stock repo version is enough) \n- Python 3.10+ \n- pip packages: \n whisper-timestamped (git) – gives word-level SRT \n srt – tiny util to parse SRT inside Python \n Pillow – draw the text \n numpy – frame buffer math \n- one monospace TTF he likes (Inconsolata-Regular.ttf lives in ~/fonts)\n\n-------------------------------------------------\n2. The 30-line “do-it-now” script\n-------------------------------------------------\nSave as ~/bin/wav2short.sh and chmod +x it.\n\n```bash\n#!/usr/bin/env bash\n# wav2short.sh <input.wav> [duration=60] [out.mp4]\nWAV=$1\nDUR=${2:-60} # seconds of final clip\nOUT=${3:-output.mp4}\nW=1080 # vertical short: 1080x1920\nH=1920\nFONT=~/fonts/Inconsolata-Regular.ttf\nFONTSIZE=75\nLINES=7 # how many lines visible at once\nFPS=30\n\n# 1. Whisper → word-level SRT (takes ~20 s on a 1-min clip)\npython -m whisper_timestamped \"$WAV\" --language en --model base \\\n --output_format srt --output_dir /tmp > /dev/null\nSRT=/tmp/\"$(basename \"$WAV\" .wav)\".srt\n\n# 2. Python burns the frames\npython3 << EOF\nimport srt, cv2, numpy as np, subprocess, math\nfrom PIL import ImageFont, ImageDraw, Image\nW, H, FPS, LINES, FONTSIZE = $W, $H, $FPS, $LINES, $FONTSIZE\nFONT = ImageFont.truetype(\"$FONT\", FONTSIZE)\nsub = list(srt.parse(open(\"$SRT\").read()))\ndur = $DUR\nfourcc = cv2.VideoWriter_fourcc(*'mp4v')\nout = cv2.VideoWriter('/tmp/caps.mp4', fourcc, FPS, (W, H))\n\ndef draw_frame(idx, active):\n img = Image.new('RGB', (W, H), 'black')\n draw = ImageDraw.Draw(img)\n y0 = (H - LINES*FONTSIZE) // 2\n for i, w in enumerate(active[-LINES:], start=1-max(LINES,0)):\n y = y0 + (i+LINES-1)*FONTSIZE\n color = 'yellow' if i==0 else 'white'\n draw.text((W//2, y), w, font=FONT, fill=color, anchor='ma')\n return np.array(img)\n\nactive = [] # rolling buffer of words on screen\nfor f in range(dur*FPS):\n t = f/FPS\n # add words whose start <= t\n while sub and sub[0].start.total_seconds() <= t:\n active.append(sub.pop(0).content.strip())\n # drop words that are off-screen\n active = active[-LINES:]\n out.write(draw_frame(f, active))\nout.release()\nEOF\n\n# 3. Mux audio and burn final\nffmpeg -y -i /tmp/caps.mp4 -i \"$WAV\" -t \"$DUR\" \\\n -c:v libx264 -crf 18 -pix_fmt yuv420p -shortest \"$OUT\"\necho \"Done → $OUT\"\n```\n\nRun:\n```\nwav2short.sh mybook.wav 60 mybook_short.mp4\n```\nThree minutes later you have a 60-second vertical video with perfectly scrolling word-captions.\n\n-------------------------------------------------\n3. Tweaks the Hypergeek keeps in his back pocket\n-------------------------------------------------\n- Longer than 60 s? Change the DUR argument; everything scales. \n- Want highlighted karaoke word? Inside draw_frame() color the exact word instead of the whole line. \n- Need 4K? Change W=2160, H=3840, bump FONTSIZE. \n- Hates Python OpenCV? Replace the frame-writer with plain Pillow + ffmpeg’s rawvideo pipe (one line change). \n- Wants burnt-in waveform? Add ffmpeg filter `showwaves=size=1080x200:colors=cyan` and overlay it at the bottom. \n- Prefers to stream straight to YouTube? Swap the last ffmpeg line for one that pushes to `rtmps://a.rtmp.youtube.com/...` \n\n-------------------------------------------------\n4. Why this satisfies the Hypergeek\n-------------------------------------------------\n- Zero GUI, 100 % scriptable over SSH from his hammock. \n- Whisper-timestamped is small (74 MB) and runs on CPU; no GPU needed. \n- Every stage is a file (/tmp/caps.mp4, /tmp/foo.srt) so he can inspect, debug, or re-use. \n- One dependency install: `pip install whisper-timestamped srt Pillow numpy opencv-python`. \n- Entire pipeline is < 100 lines—easy to memory-fit even on a 10-year-old ThinkPad.\n\nCopy, tweak, upload, done."
},
{
"role": "user",
"content": ""
}
],
temperature=0.6,
max_completion_tokens=4096,
top_p=1,
stream=True,
stop=None
)

for chunk in completion:
print(chunk.choices[0].delta.content or "", end="")

</pre>

YouTube UAP News Shorts How-To - Revision history

XenoEngineer: /* wav2short.sh – turn a .wav into a YouTube-Short with scrolling word captions */