最近在使用Kaldi的aishell/v1进行说话人识别,一整套跑完花了我将近10天的时间,不得不说传统的i-vector方法不能用GPU加速也太惨了。
回归正题,一开始是使用自己的数据集替换了aishell数据集跑例程,将数据集分成了train,dev和test三部分,其实真正需要测试的数据并不是数据集中的一部分,当时的想法是等例程跑完后,用得到的模型对真正要测试的数据一个个提取i-vector打分再算EER等等,就没将test部分替换成需要测试的数据,其实Kaldi已经帮你把交叉验证部分已经封装好了,最后得到的就是EER和threshold。显而易见,10天后我发现了这一结果,test得到的EER并不是需要测试的数据的结果,所以需要换上新的测试数据。
幸好,模型是训练完了,只需要用到测试部分的代码,同时把test路径改成自己想要的。只需要去掉GMM-UBM和PLDA的训练部分就行,修改后的run.sh
为
#!/usr/bin/env bash
# Copyright 2017 Beijing Shell Shell Tech. Co. Ltd. (Authors: Hui Bu)
# 2017 Jiayu Du
# 2017 Chao Li
# 2017 Xingyu Na
# 2017 Bengu Wu
# 2017 Hao Zheng
# Apache 2.0
# This is a shell script that we demonstrate speech recognition using AIShell-1 data.
# it's recommended that you run the commands one by one by copying and pasting into the shell.
# See README.txt for more info on data required.
# Results (EER) are inline in comments below
data=/data/source/voiceprint_asr_train_20200403
data_url=www.openslr.org/resources/33
. ./cmd.sh
. ./path.sh
set -e # exit on error
# local/download_and_untar.sh $data $data_url data_aishell
# local/download_and_untar.sh $data $data_url resource_aishell
# Data Preparation
local/aishell_data_prep.sh $data/wav
# Now make MFCC features.
# mfccdir should be some place with a largish disk where you
# want to store MFCC features.
mfccdir=mfcc
for x in test; do
steps/make_mfcc.sh --cmd "$train_cmd" --nj 10 data/$x exp/make_mfcc/$x $mfccdir
sid/compute_vad_decision.sh --nj 10 --cmd "$train_cmd" data/$x exp/make_mfcc/$x $mfccdir
utils/fix_data_dir.sh data/$x
done
#split the test to enroll and eval
mkdir -p data/test/enroll data/test/eval
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/enroll
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/eval
local/split_data_enroll_eval.py data/test/utt2spk data/test/enroll/utt2spk data/test/eval/utt2spk
trials=data/test/aishell_speaker_ver.lst
local/produce_trials.py data/test/eval/utt2spk $trials
utils/fix_data_dir.sh data/test/enroll
utils/fix_data_dir.sh data/test/eval
#extract enroll ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
exp/extractor_1024 data/test/enroll exp/ivector_enroll_1024
#extract eval ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
exp/extractor_1024 data/test/eval exp/ivector_eval_1024
#compute plda score
$train_cmd exp/ivector_eval_1024/log/plda_score.log \
ivector-plda-scoring --num-utts=ark:exp/ivector_enroll_1024/num_utts.ark \
exp/ivector_train_1024/plda \
ark:exp/ivector_enroll_1024/spk_ivector.ark \
"ark:ivector-normalize-length scp:exp/ivector_eval_1024/ivector.scp ark:- |" \
"cat '$trials' | awk '{print \\\$2, \\\$1}' |" exp/trials_out
#compute eer
awk '{print $3}' exp/trials_out | paste - $trials | awk '{print $1, $4}' | compute-eer -
# Result
# Scoring against data/test/aishell_speaker_ver.lst
# Equal error rate is 0.140528%, at threshold -12.018
exit 0
完事后期待着新的测试数据的结果,可是不管怎么运行,都会报错
打开提示的日志后,看到了原因(Kaldi的日志还是很详细的)
原来是Read():wave-reader.cc:202) Unexpected byte rate 16 vs. 16000 * 2 *
的错误,应该是说测试文件的采样率不是16000Hz吧。抽查了几个测试数据,同时用librosa模块看了一下采样率,结果都是16000Hz,有点奇怪,一时间陷入了江菊(╯‵□′)╯︵┴─┴
后来才知道,这个问题是测试数据wav的头文件格式的问题,wav头文件格式如下:
对于16000Hz的文件,波形音频数据传送速率,其值为通道数×每秒数据位数×每样本的数据位数/8,于是
1 * 16000 * 16 / 8 = 32000
两种办法修改到正确的值:
def modify_wav(wav_file_path):
with open(wav_file_path, 'rb+') as wav_file:
with open(wav_file_path.replace(".wav", "") + "_modify.wav", "wb+") as new_wav_file:
index = 0
while True:
data = wav_file.read(1)
if not data:
break
if index == 28 or index == 29:
if index == 28:
new_wav_file.write(b'\x00')
if index == 29:
new_wav_file.write(b'}')
else:
new_wav_file.write(data)
index += 1
另外一种方法比较简单:
import soundfile as sf
sig, sr = sf.read(wav_file_path)
sf.write(wav_file_path, sig, sr)
本文地址:https://alphalrx.cn/index.php/archives/136/
版权说明:若无注明,本文皆为“LRX's Blog”原创,转载请保留文章出处。