【Whisper】日本語音声の文字起こしにチャレンジしました。

環境

Ubuntu 22.04 on WSL2
CUDA 11.8
Python 3.10

pip install torch==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install git+https://github.com/openai/whisper.git

Ubuntuにはあらかじめffmpegをインストールしています。

sudo apt install ffmpeg

「ffmpeg-python」というPythonパッケージのインストールは不要です。

import whisper

model = whisper.load_model("large")
result =  model.transcribe("audio.wav")
print(result["text"])

たったこれだけでした。モデルのダウンロードも自動で行われます。

上記の例では音声ファイル「audio.wav」を指定していますが動画ファイル（拡張子 mp4, movなど）を指定しても問題なく動作します。

続きの記事を書きました。
touch-sp.hatenablog.com