Kaldi报错:Read():wave-reader.cc:202) Unexpected byte rate 16 vs. 16000 * 2 *

最近在使用Kaldi的aishell/v1进行说话人识别,一整套跑完花了我将近10天的时间,不得不说传统的i-vector方法不能用GPU加速也太惨了。

回归正题,一开始是使用自己的数据集替换了aishell数据集跑例程,将数据集分成了train,dev和test三部分,其实真正需要测试的数据并不是数据集中的一部分,当时的想法是等例程跑完后,用得到的模型对真正要测试的数据一个个提取i-vector打分再算EER等等,就没将test部分替换成需要测试的数据,其实Kaldi已经帮你把交叉验证部分已经封装好了,最后得到的就是EER和threshold。显而易见,10天后我发现了这一结果,test得到的EER并不是需要测试的数据的结果,所以需要换上新的测试数据。

幸好,模型是训练完了,只需要用到测试部分的代码,同时把test路径改成自己想要的。只需要去掉GMM-UBM和PLDA的训练部分就行,修改后的run.sh

#!/usr/bin/env bash
# Copyright 2017 Beijing Shell Shell Tech. Co. Ltd. (Authors: Hui Bu)
#           2017 Jiayu Du
#           2017 Chao Li
#           2017 Xingyu Na
#           2017 Bengu Wu
#           2017 Hao Zheng
# Apache 2.0

# This is a shell script that we demonstrate speech recognition using AIShell-1 data.
# it's recommended that you run the commands one by one by copying and pasting into the shell.
# See README.txt for more info on data required.
# Results (EER) are inline in comments below

data=/data/source/voiceprint_asr_train_20200403
data_url=www.openslr.org/resources/33

. ./cmd.sh
. ./path.sh

set -e # exit on error

# local/download_and_untar.sh $data $data_url data_aishell
# local/download_and_untar.sh $data $data_url resource_aishell

# Data Preparation
local/aishell_data_prep.sh $data/wav

# Now make MFCC  features.
# mfccdir should be some place with a largish disk where you
# want to store MFCC features.
mfccdir=mfcc
for x in test; do
  steps/make_mfcc.sh --cmd "$train_cmd" --nj 10 data/$x exp/make_mfcc/$x $mfccdir
  sid/compute_vad_decision.sh --nj 10 --cmd "$train_cmd" data/$x exp/make_mfcc/$x $mfccdir
  utils/fix_data_dir.sh data/$x
done

#split the test to enroll and eval
mkdir -p data/test/enroll data/test/eval
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/enroll
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/eval
local/split_data_enroll_eval.py data/test/utt2spk  data/test/enroll/utt2spk  data/test/eval/utt2spk
trials=data/test/aishell_speaker_ver.lst
local/produce_trials.py data/test/eval/utt2spk $trials
utils/fix_data_dir.sh data/test/enroll
utils/fix_data_dir.sh data/test/eval

#extract enroll ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
  exp/extractor_1024 data/test/enroll  exp/ivector_enroll_1024
#extract eval ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
  exp/extractor_1024 data/test/eval  exp/ivector_eval_1024

#compute plda score
$train_cmd exp/ivector_eval_1024/log/plda_score.log \
  ivector-plda-scoring --num-utts=ark:exp/ivector_enroll_1024/num_utts.ark \
  exp/ivector_train_1024/plda \
  ark:exp/ivector_enroll_1024/spk_ivector.ark \
  "ark:ivector-normalize-length scp:exp/ivector_eval_1024/ivector.scp ark:- |" \
  "cat '$trials' | awk '{print \\\$2, \\\$1}' |" exp/trials_out

#compute eer
awk '{print $3}' exp/trials_out | paste - $trials | awk '{print $1, $4}' | compute-eer -

# Result
# Scoring against data/test/aishell_speaker_ver.lst
# Equal error rate is 0.140528%, at threshold -12.018

exit 0

完事后期待着新的测试数据的结果,可是不管怎么运行,都会报错

报错结果

报错结果

打开提示的日志后,看到了原因(Kaldi的日志还是很详细的)

日志详细报错

日志详细报错

原来是Read():wave-reader.cc:202) Unexpected byte rate 16 vs. 16000 * 2 *的错误,应该是说测试文件的采样率不是16000Hz吧。抽查了几个测试数据,同时用librosa模块看了一下采样率,结果都是16000Hz,有点奇怪,一时间陷入了江菊(╯‵□′)╯︵┴─┴

后来才知道,这个问题是测试数据wav的头文件格式的问题,wav头文件格式如下:

wav文件头信息

wav文件头信息

对于16000Hz的文件,波形音频数据传送速率,其值为通道数×每秒数据位数×每样本的数据位数/8,于是

1 * 16000 * 16 / 8 = 32000

两种办法修改到正确的值:

def modify_wav(wav_file_path):
    with open(wav_file_path, 'rb+') as wav_file:
        with open(wav_file_path.replace(".wav", "") + "_modify.wav", "wb+") as new_wav_file:
            index = 0
            while True:
                data = wav_file.read(1)
                if not data:
                    break
                if index == 28 or index == 29:
                    if index == 28:
                        new_wav_file.write(b'\x00')
                    if index == 29:
                        new_wav_file.write(b'}')
                else:
                    new_wav_file.write(data)
                index += 1

另外一种方法比较简单:

import soundfile as sf
sig, sr = sf.read(wav_file_path)
sf.write(wav_file_path, sig, sr)

本文参考https://hupeng.me/

本文作者:Author:     文章标题:Kaldi报错:Read():wave-reader.cc:202) Unexpected byte rate 16 vs. 16000 * 2 *
本文地址:https://alphalrx.cn/index.php/archives/136/     
版权说明:若无注明,本文皆为“LRX's Blog”原创,转载请保留文章出处。
Last modification:May 9th, 2020 at 10:55 pm
给作者赏一杯奶茶吧!

Leave a Comment