联系客服
客服二维码

联系客服获取更多资料

微信号:LingLab1

客服电话:010-82185409

意见反馈
关注我们
关注公众号

关注公众号

linglab语言实验室

回到顶部
AISHELL-2 中文语音数据库

1747 阅读 2020-08-22 11:56:18 上传 0KB

希尔贝壳中文普通话语音数据库AISHELL-2的语音时长为1000小时,其中718小时来自AISHELL-ASR0009-[ZH-CN],282小时来自AISHELL-ASR0010-[ZH-CN]。

AISHELL-2 中文语音数据库

AISHELL-2 Open Source Mandarin Speech Corpus




       希尔贝壳中文普通话语音数据库AISHELL-2的语音时长为1000小时,其中718小时来自AISHELL-ASR0009-[ZH-CN],282小时来自AISHELL-ASR0010-[ZH-CN]。录音文本涉及唤醒词、语音控制词、智能家居、无人驾驶、工业生产等12个领域。录制过程在安静室内环境中, 同时使用3种不同设备: 高保真麦克风(44.1kHz,16bit);Android系统手机(16kHz,16bit);iOS系统手机(16kHz,16bit)。AISHELL-2采用iOS系统手机录制的语音数据。1991名来自中国不同口音区域的发言人参与录制。经过专业语音校对人员转写标注,并通过严格质量检验,此数据库文本正确率在96%以上。支持学术研究,未经允许禁止商用。


AISHELL-2 is a 1000-hour Mandarin Chinese Speech Corpus. 718 hours are from AISHELL-ASR0009-[ZH-CN] and 282 hours are from AISHELL-ARS0010-[ZH-CN]. The speech utterance contains 12 domains, including keywords, voice command, smart home, autonomous driving, industrial production, etc.The recording was put in quiet indoor environment, using 3 different devices in parallel: high fidelity microphone (44.1kHz, 16-bit); Android-system mobile phone (16kHz, 16-bit), iOS-system mobile phone (16kHz, 16-bit). AISHELL-2 choose audio data record by iOS-system.1991 speakers from different accent areas in China were participate in this recording. The manual transcription accuracy rate is above 96%, through professional speech annotation and strict quality inspection.This database is free for academic research, not in the commerce, if without permission. )

点赞
收藏
表情
图片
附件