[1-5]
[1-5]
2019 End-of-Year Release
2019 年尾版
2 down, keep it up!<playIcon></playIcon>
完成兩筆喇 <playIcon></playIcon>,繼續加油!
≥ 2 No votes
≥ 2 票 「錯」
≥ 2 Yes votes
≥ 2 票 「啱」
Abkhaz
阿布哈茲語
About
介紹
Accent
口音
Account
帳戶
Acehnese
亞齊語
{ $actionType }<playIcon></playIcon> did they accurately speak the sentence?
{ $actionType }<playIcon></playIcon>佢哋講得準唔準?
{ $actionType }<recordIcon></recordIcon> then read the sentence aloud
{ $actionType }<recordIcon></recordIcon>然後大聲讀出句子。
{ $actionType }<stopIcon></stopIcon> when done
{ $actionType } 完成後撳 <stopIcon></stopIcon>
{ $actionType } submit when ready
{ $actionType } 準備好就提交
Add
添加
Adding languages to work with
增加使用語言
Additional Language
其他語言
Add Language
新增語言
Add new custom accent "{ $inputValue }"
添加新嘅自定義口音 "{ $inputValue }"
Add new sentences
添加新句子
Add Sentences
添加句子
Add Your Voice
加入你嘅聲音
Adyghe
阿迪格語
A few bytes
幾個字節
Afrikaans
南非語
Age
年齡
Albanian
阿爾巴尼亞語
All
全部
All sentences you submit must be under <wikipediaLink>Public Domain (CC-0) license</wikipediaLink>. To support the inclusion of work not under public licence, we have created a <cc0WaiverLink>Contributions Agreement template</cc0WaiverLink> for works where the copyright owner would like to contribute the material to Common Voice.
所有語句必須符合<wikipediaLink>公共領域 (CC-0) 許可證</wikipediaLink>嘅規範。對於包含非公共領域牌照規範嘅內容,我哋會以<cc0WaiverLink>貢獻者協議模板</cc0WaiverLink>嚟接受由版權持有者貢獻嘅內容。
All voice clips in the dataset are scrubbed of personally identifying information. When a contributor provides demographic data via their profile, that information is de-identified from their voice clips before being bundled for download in the dataset and is never made public on their profile page.
數據集中所有嘅錄音片段都唔會包含可識別個人嘅資訊。如果貢獻者喺個人檔案中提供咗相關資料,嗰啲資訊將會喺錄音片段合併成數據集並以供下載之前去識別化,而且唔會喺佢地嘅個人檔案首頁中公開。
Amharic
阿姆哈拉語
Anonymized user data like age, gender, and accent helps improve the audio data used to train the accuracy of speech recognition engines. Your username and email will never be associated with your submitted data, and you can choose whether to make your username public or anonymous.
匿名化嘅使用者資料,如年齡、性別、口音等,可以幫助我哋改善用嚟提升訓練語音識別精確性嘅語音資料。你嘅帳號同電郵唔會同你提交嘅數據相關聯,你亦都可以決定公開你嘅帳號或者保持匿名。
Approve
通過
A quiet background hubbub is OK, but we don’t want additional voices that may cause a machine algorithm to identify words that are not in the written text. If you can hear distinct words apart from those of the text, the clip should be rejected. Typically this happens where the TV has been left on, or where there is a conversation going on nearby.
背景有安靜嘅人聲雜音都可以接受,但係唔可以有一把聲太突出,令機器演算法認出一啲原文冇嘅字。如果你聽到原文冇嘅字句,嗰段錄音就要拒批。一般有呢個情況就係背景開住咗電視,或者附近有其他人喺度傾偈。
Arabic
阿拉伯語
Aragonese
阿拉貢語
Are you coming? <strong>[called by another]</strong>
你嚟唔嚟㗎?<strong>[背景度有另一把聲嗌佢]</strong>
Armenian
亞美尼亞語
Armenian Western
亞美尼亞語(西)
Artificial intelligence
人工智能
Ask about a new language
申請加入新語言
Assamese
阿薩姆語
Asturian
阿斯圖里亞斯語
Audio Format
格式
Avatar
個人資料照片
Avatar uploaded
肖像已上載
Average
一般
A voice clip is marked "valid" when a user gives it a Yes vote.
當用户畀咗一票「啱」,段錄音會標做「有效」
Awards
獎勵
Azerbaijani
阿塞拜疆語
Back
返去
Background Noise
背景嘈音
Background Voices
背景聲音
Back to Top
返上最頂
Bambara
班巴拉語
Basaa
巴沙語
Bashkir
巴什基爾語
Basque
巴斯克語
Be cautious before rejecting a clip on the ground that the reader has mispronounced a word, has put the stress in the wrong place, or has apparently ignored a question mark. There are a wide variety of pronunciations in use around the world, some of which you may not have heard in your local community. Please provide a margin of appreciation for those who may speak differently from you.
拒批錄音嗰陣要審慎一啲,尤其係因為讀錯,文白異讀,變調,漏咗個問號拉高等等問題。世界上有好多唔同嘅發音,有啲人嘅習慣同你可能有啲啲唔同。請理解同包容一啲講嘢方式同你有少少唔同嘅朋友。
Become a partner
成爲合作夥伴
Belarusian
白羅斯語
Benefits
益處
Bengali
孟加拉語
[‘Beret’ is OK whether with stress on the first syllable (UK) or the second (US)]
[「銀行」第二個字讀第4調陽平,或者變調第2調陰上都得]
<bold>{ $count }</bold> Clips
片段
<bold>Help us</bold> find more voices
<bold>幫我哋</bold>揾多啲聲音
Bosnian
波斯尼亞話
Both
兩樣都做
Both of these projects are part of our efforts to bridge the digital speech divide. Voice recognition technologies bring a human dimension to our devices, but developers need an enormous amount of voice data to build them. Currently, most of that data is expensive and proprietary.
呢兩個計劃都係我哋努力糾正電子語音落差嘅一部份。語音識別技術可以令我哋嘅電子裝置更加人性化,但係開發者需要十分大量嘅語音數據,先能夠打造出噉樣嘅系統。目前大部分語音數據都相當昂貴,而且受專有權限制。
Both (Speak and Listen)
兩樣都做 (又聽又講)
Breton
不列塔尼語
Browse Languages
瀏覽語言
Build a custom goal
度身訂做目標
Build Profile
建立個人檔案
Bulgarian
保加利亞語
Burmese
緬甸語
Buryat
布里亞特語
<b>Why an email?</b> We may need to contact you in the future about changes to the dataset, an email provides us a point of contact.
<b>點解需要電郵地址?</b>我哋可能會喺未來聯絡閣下,提供與數據集相關嘅新資訊。電郵可作為我哋聯絡閣下嘅方式。
By editing your goal, you may lose your existing progress.
編輯目標後,可能會失去現有進度。
By opting in to receive emails you state that you are okay with Mozilla handling this info as explained in Mozilla’s <privacyLink>Privacy Policy</privacyLink>.
選擇接收電郵意味住閣下同意 Mozilla 根據<privacyLink>私隱政策</privacyLink>嚟處理呢啲個人數據。
<b>You agree</b> to not attempt to determine the identity of speakers in the Common Voice dataset
<b>你同意</b>唔去試圖識別 Common Voice 數據集内講話人嘅個人身份
By providing some information about yourself, the audio data you submit to Common Voice will be more useful to Speech
Recognition engines that use this data to improve their accuracy.
語音引擎可以使用你提供嘅一啲資訊,令你嘅 Common Voice 語音資料被善用以提升語音識別引擎嘅精確度。
By using Common Voice, you agree to our <termsLink>Terms</termsLink> and <privacyLink>Privacy Notice</privacyLink>
使用 Common Voice,即代表閣下同意我哋嘅<termsLink>條款</termsLink>同埋<privacyLink>私隱聲明</privacyLink>
CANCEL
取消
Cancel Re-recording
取消重新錄音
Cancel Submission
取消提交
Can't decide?
決定唔到?
Cantonese
粵語
Catalan
加泰羅尼亞語
Central Kurdish
中庫爾德語
Change your email via Settings under Login Identity
要更改電郵,請先撳登入身份,然後再撳設定
<chevron></chevron>See less
<chevron></chevron>睇少啲
<chevron></chevron>See more
<chevron></chevron>睇多啲
Chinese (China)
中文(中國大陸)
Chinese (Hong Kong)
中文(香港)
Chinese (Taiwan)
中文(台灣)
Chuvash
楚瓦士語
Click
撳
{ $clipCount } voice clips, total archive size { $size }. Expires { $expires }.
{ $clipCount } 條語音片段,總共 { $size }. 已過期 { $expires }.
Clip Graveyard
錄音垃圾桶
Clips recorded
段新錄音
Clips Uploaded
已上載嘅錄音
Clips validated
段錄音已驗證
Clips You've Recorded
閣下錄製嘅片段
Clips You've Validated
閣下驗證咗嘅錄音
Close
關閉
Close
關閉
Collecting sentences
收集句子
Collecting sentences from the public domain, or writing new ones for the public domain.
收集公眾領域嘅語句,或用公眾領域授權撰寫新語句。
Collect sentences
收集句子
Comment
註解
Common Voice data plus all other voice datasets above.
Common Voice 數據,以及上面列出嘅所有其他語音數據集。
Common Voice Dataset
Common Voice 數據集
Common Voice is a collaborative project, and we're depending on our community of partners and contributors to build the largest open-source dataset of voices ever.
We would like to thank the following people and organizations for their help with the project:
Common Voice 係一個合作計劃,目標係憑住一班合作伙伴同貢獻者嘅力量,建立一個史上最大嘅開源語音資料集。¶
¶
以下嘅個人同組織喺計劃入面嘅貢獻良多,我哋想向佢哋表示感謝:
Common Voice is a crowdsourcing platform, and the languages were all added by volunteers.
We would love for you to add your language! <languageRequestLink>Ask about adding your language.</languageRequestLink>
Common Voice 係一個群眾判(大眾外判)平台,啲語言都係由義工加嘅。我哋好希望你可以加埋你嘅語言入嚟!<languageRequestLink>申請加入你嘅語言。</languageRequestLink>
Common Voice is Mozilla's initiative to help teach machines how real people speak.
Common Voice 係 Mozilla 發起嘅一個教識機器好似真人噉發音嘅項目。
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
Common Voice 係 Mozilla 幫助教機器「真人點樣講嘢」嘅倡議嘅一部分。
Common Voice is part of Mozilla's initiative to help teach machines how real people speak. In addition to the Common Voice dataset, we’re also building an open source speech recognition engine called Deep Speech.
Common Voice 係 Mozilla 幫助教機器真人點樣講嘢嘅倡議嘅一部分。除咗 Common Voice 數據集之外,我哋仲建構緊一個名為Deep Speech嘅開源語音識別引擎。
Common Voice is the world’s largest publicly available, multi-language voice dataset.
Common Voice 係全球最大型嘅公共語音資料庫
Common Voice recordings are used by academics, small businesses, and voice recognition enthusiasts to help train and grow publicly available resources like voice models.
Can you let us know why you would like your recordings deleted?
學術界、小企業與語音識別愛好者會使用 Common Voice 嘅錄音片段嚟幫助訓練、發展語音模型等公共資源。
可唔可以話畀我哋知你點解想刪除錄音片段?
Community participation and decision making.
社群參與同決策。
Community Playbook
社群手冊
Confirm Goal
確認目標
Connect with Gravatar
連結 Gravatar
Contact
聯絡
Contact Form
聯絡人表格
Content available under a <licenseLink>Creative Commons license</licenseLink>
內容依照 <licenseLink>Creative Commons 授權條款</licenseLink>授權大眾使用
Continue
繼續
Contribute
貢獻
Contribute
貢獻
Contribute sentences
貢獻句子
Contribute to { $lang }
貢獻到{ $lang }
Contribute Your Voice
貢獻你把聲
Contribution Activity
貢獻記錄
Contribution Criteria
貢獻準則
Contribution Experience
參與經驗
Contributors record voice clips by reading from a bank of donated sentences.
貢獻者會錄低由句庫抽出嚟嘅句子。
Cookies
Cookies
Cookies
Cookies
<coquiLink>Coqui</coquiLink> is dedicated to open speech technology. Their projects include deep learning based STT and TTS engines.
<coquiLink>Coqui</coquiLink> 致力於開放語音技術。佢哋嘅專案包含使用深度學習技術嘅 STT 與 TTS 引擎。
Cornish
康和語
Corsican
科西嘉語
{ $count } clips
{ $count }錄音片段
{ $count }mo
{ $count } 月
{ $count }wk
{ $count } 週
{ $count }y
{ $count } 年
Create a Custom Goal
訂立個人目標
{ $created }
{ $created }
Criteria
準則
Croatian
克羅地亞語
Czech
捷克語
Daily Goal
每日目標
Danish
丹麥語
Dashboard
儀表版
Dataset Release
數據集發佈
Datasets
數據集
Datasets
數據集
Date
數據庫日期
Days
日
De-identified
去識別化
DELETE
刪除
Delete my recordings
刪除我嘅錄音
Delete Profile
刪除個人檔案
Dhivehi
馬爾代夫語
Different language
另一種語言
Difficult
難
Difficult to pronounce
好難發音
Discard ongoing recording
捨棄當前錄音
Discourse
個性簽名
Donate your voice
捐出你把聲
Don't see your language on Common Voice yet?
喺 Common Voice 度見唔到你嘅語言?
Don’t see your language reflected in the Dataset? To request a language head over to our Languages page.
數據集入面冇你嘅語言?請去語言版要求增添語言。
Download
下載
Download
下載
Download Common Voice Data
下載 Common Voice 語音數據
Download Data
下載數據
Download Dataset Bundle
下載打包數據集
Download { $language }
下載{ $language }
Download Links
下載連結
Download My Data
下載我嘅數據
Download profile data
下載檔案數據
Download the Dataset
下載數據集
Download the Single Word Target Segment
下載單字目標分段
Do you have ideas on how we can make the Common Voice dataset better? Let us know on Discourse
閣下有冇任何可以令 Common Voice 資料集變得更好嘅諗法?歡迎到 Discourse 討論區話畀我哋知
Do you want to continue?
你想唔想繼續?
Do you want to Speak, Listen or both?
閣下想淨係講嘢、聽嘢,定係想又聽又講?
Drag and drop or <browseWrap>Browse</browseWrap>
拖落呢度,或<browseWrap>瀏覽</browseWrap>
During contribution submission feedback will be skipped after clicking 'Submit'. Contribution will continue directly with the next set of 5 recordings or validations.
貢獻期間,單擊「提交」可略過提供意見,直接去到下一組5段錄音或驗證。
Dutch
荷蘭語
Each entry in the dataset consists of a unique MP3 and corresponding text file. Many of the <b>{ $total }</b> recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help train the accuracy of speech recognition engines.
The dataset currently consists of <b>{ $valid }</b> validated hours in <b>{ $languages }</b> languages, but we’re always adding more voices and languages. Take a look at our <languagesLink>Languages page</languagesLink> to request a language or start contributing.
數據集中嘅每筆資料包含一組獨特嘅 MP3 錄音檔同埋對應文字檔案。資料集中包含 <b>{ $total }</b> 個鐘嘅錄音片段,當中亦包含好多唔同年齡層、性別、口音等,可以幫助訓練語音識別引擎嘅人口統計資料。
數據集中目前包含 <b>{ $valid }</b> 個鐘 <b>{ $languages }</b> 種語言嘅已驗證資料,但我哋希望可以持續加入更多語音同埋語言。歡迎到我哋嘅<languagesLink>語言頁面</languagesLink>要求增添語言,或者隨時開始加入貢獻。
Easy
容易
Edit
編輯
Edit Profile
編輯個人檔案
Email
電郵地址
Email
電郵地址
<emailFragment>Sign up</emailFragment> to our mailing list to learn how you can take part in campaigns, events and co-design features on Common Voice.
<emailFragment>註冊</emailFragment>我哋嘅電郵通知,瞭解可以點參與Common Voice啲活動、節目同一齊設計功能。
Email is already used for a different account
呢個電郵已經畀另一個帳戶用咗
Email Subscriptions
電郵訂閱
English
英語
English
英文
Enter Email to Download
寫低電郵嚟下載
Enter email to send a sign up link
輸入你嘅電郵地址嚟轉送註冊鏈結
Enter your email
輸入你嘅電郵
Error { $code }
錯誤 { $code }
Erzya
厄爾茲亞語
ESC
退出
Esperanto
世界語
Estonian
愛沙尼亞語
Events
活動
Everyone
所有人
Exit & Delete clips
退出並刪除錄音片段
FAQ
常見問題
Faroese
法羅語
Female
女
Finish editing first?
編輯完先?
Finish recording
完成錄音
Finish recording first?
不如完成咗啲錄音先啦?
Finish Review
完成審核
Finnish
芬蘭語
For every voice clip donated, and every audio clip validated, your account dashboards are updated to reflect your latest progress in each language you contribute to. Yes, you can contribute to more than one!<br/><br/> Use dashboards to track your stats, see how you're doing alongside others in the community, and set daily or weekly contribution goals.
每次你提供、核實語音,你嘅進度都會喺該語言嘅面板度展示出嚟。係啊,你可以貢獻多過一隻語言。用面板去睇實你嘅戰績,睇下自己同埋其他人做成點樣,再定返個每日或者每週嘅目標吖。
for example
譬如
For these launched languages the website has been successfully <localizationGlossaryLink>localized</localizationGlossaryLink>, and has enough <sentenceCollectionGlossaryLink>sentences collected</sentenceCollectionGlossaryLink> to allow for ongoing <speakLink>Speak</speakLink> and <listenLink>Listen</listenLink> contributions.
以下「已上線」嘅語言,代表網站已經成功被<localizationGlossaryLink>本地化</localizationGlossaryLink>,而且都已經<sentenceCollectionGlossaryLink>收集咗足夠多嘅句子</sentenceCollectionGlossaryLink>令大家可以用<speakLink>講話</speakLink>同<listenLink>聆聽</listenLink>嘅方式嚟貢獻。
French
法語
Frequently Asked Questions
常見問題
Frisian
菲士蘭語(荷蘭)
Fulah
富拉語
Galician
加利西亞語
GB
千兆字節
Gender
性別
Georgian
格魯吉亞語
German
德語
Get involved
參與
Get involved
參與
Get Involved
參與
Get started with goals
訂立目標,開始貢獻
Get Started with Speech Recognition
語音識辨新手入門
<githubLink>NVIDIA NeMo</githubLink>™ is an <docsLink>open-source toolkit</docsLink> for researchers developing state-of-the-art conversational AI models.
<githubLink>NVIDIA NeMo</githubLink>™ 係一套可以畀研究者開發最先進嘅對話式 AI 模型嘅<docsLink>開源工具包</docsLink>。
Glossary
術語表
Goal reached
目標已經達到
Goals
目標
Go to Discourse
去 Discourse 討論區
Go to Languages Page
去語言版
Go to { $name }
往 { $name }
<governanceLink>Read more about how we're governed</governanceLink>
<governanceLink>了解更多我哋係點管治嘅</governanceLink>
Grammatical / spelling error
文法 / 串法錯誤
Great! How many clips a week?
好啊!每個禮拜要錄幾多段音?
Great! How many clips per day?
好啊!每日錄幾多段音?
Great!<recordIcon></recordIcon> Record your next clip
好!撳<recordIcon></recordIcon>錄下一個片段
Great work!<playIcon></playIcon> Listen again when you're ready
好嘢!<playIcon></playIcon> 準備好就可以再聽多次
Greek
希臘語
Guarani
瓜拉尼語
Haitian
海地語
Hakha Chin
喀哈阡語
Hausa
豪薩語
Have Feedback?
有意見反饋?
Have questions about Common Voice? Ideas for improvements or feedback about a specific language? Join us on our <discourseLink>Discourse forum</discourseLink> and let us know.
對 Common Voice 有問題?有針對某種語言嘅新諗法或者改善嘅意見?歡迎加入 <discourseLink>Discourse 討論區</discourseLink>留言畀我地知。
Have you read our Terms?
睇咗我哋嘅條款未?
Having an account is not required to contribute, though it is helpful.
唔建立帳戶亦可貢獻,但如果有嘅話會更有幫助,下面話你知點解。
Having a profile is not required to contribute though it is helpful, see why below.
唔成立個人檔案都可以貢獻,但如果有嘅話會更有幫助,下面話你知點解。
Hebrew
希伯來語
Help
幫助
Help Common Voice reach { $hours } hours in a language with a personal goal
建立一個幫 Common Voice 嘅任何一種語言達到{ $hours } 小時嘅個人目標
Help create Common Voice’s first target segment in { $locale }
幫 Common Voice 創立 { $locale } 嘅第一個目標細分群體
Help reach { $hours } hours in { $language } with a personal goal
建立一個幫{ $language }達到{ $hours } 小時嘅個人目標
Help teach machines how real people speak, donate your voice at { $link }
去 { $link } 貢獻你嘅聲線,等機器學識真人點樣講嘢
Help us build a community around voice technology, stay in touch via email.
幫我哋打造一個使用語音技術嘅社群,並透過電郵保持聯繫。
Help us build a high quality, publicly open dataset
幫我哋建立一個高質素又開放畀公眾使用嘅數據集
Help us by reviewing sentences for correctness according to the guidelines.
幫助我哋審核啲語句,以確保其準確並符合要求。
Help us by writing or collecting Public Domain sentences.
幫我哋收集或者創作公共領域嘅句子
Help us find more voices, share your goal
幫我哋揾更多人參與錄音,分享閣下嘅目標
Help us find others to donate their voice!
幫我哋揾其他人一齊獻聲!
Help us get to { $goal }
幫我哋達到 { $goal } 嘅目標
Help us validate sentences!
幫我哋手驗證句子啦!
Help us validate voices
幫我哋手驗證錄音啦
Here are the links to download your ZIP files.
下列係你嘅 ZIP 檔下載鏈結。
Hidden
隱藏
Hill Mari
馬里語(山地)
Hindi
印地語
His hand was rais-ed.
我有一支<strong>千 (cin1) </strong>筆。
Home
主頁
Hours
個鐘
Hours Recorded
已錄製時數
Hours Validated
已核對時數
{ $hours } validated hours so far!
個鐘頭已經驗證!
How ?
點樣?
How are project decisions made?
項目決定係點做嘅?
How can I get the Common Voice data?
我可以點樣可以攞到 Common Voice 嘅數據?
How can we effectively grow a language on Common Voice?
點樣有效發展 Common Voice 上面嘅語言?
How does Common Voice calculate hours?
Common Voice 係點計啲鐘數嘅呢?
How does Common Voice work?
Common Voice 係點運作嘅?
How does Common Voice work?
Common Voice 係點運作嘅?
How does it work?
佢係點運作嘅?
How do I access and use the dataset?
點樣攞到數據集嚟用?
How do I add a language?
我點樣新加一隻語言?
How do I add sentences?
點樣添加句子?
How do I stay in touch?
點樣保持聯繫?
How do you ensure anonymity and privacy of the people who donated their voices?
你係點同貢獻者保證佢哋貢獻嘅錄音片段都係匿名兼保密嘅呢?
How to
點樣
How-to
點樣
How to Cite
點樣引用
How would you describe your accent?
你會點描述自己嘅口音?
Hungarian
匈牙利語
I agree
我同意
I am a non-native speaker and I speak with an accent, do you still want my voice?
如果我唔係母語人士而且講野有口音,咁你地仲要唔要我把聲?
Icelandic
冰島語
I'd like to receive emails such as goal reminders, my progress updates and newsletters about Common Voice.
我想收到有關目標提醒、個人進度同 Common Voice 新聞嘅電郵。
I do not agree
我唔同意
If the recording breaks up, or has crackles, reject unless the entirety of the text can still be heard.
如果個錄音斷開咗或者有沙沙聲,除非啲文字可以完整聽得到,否則就唔好批。
If the recording breaks up, or has crackles, reject unless the entirety of the text can still be heard.
如果段錄音斷開咗,或者沙沙聲,除非啲字聽得清楚,否則唔好批。
If you come across something that these guidelines don’t cover, please vote according to your best judgement. If you really can’t decide, use the skip button and go on to the next recording.
如果你遇到呢份指引冇列出嘅情況,請用你嘅判斷去投票。如果真係決定唔到,可以用跳過掣去聽下一段錄音。
Igbo
伊博語
I just created a personal goal for voice donation to #CommonVoice -- join me and help teach machines how real people speak { $link }
我啱啱訂立咗貢獻廣東話錄音俾 #CommonVoice 嘅目標,你都一齊來加入,幫機器學講純正嘅廣東話啦。{ $link }
I'm okay with you handling this info as you explain in Mozilla's <privacyLink>Privacy Policy</privacyLink>
我同意你依照 Mozilla 嘅<privacyLink>私隱保護政策</privacyLink>中描述嘅方式來處理呢啲資料
Includes email, username & demographic info, available right away
包括電郵地址、用户名同人口統計資訊
Includes mp3s and related sentences, may take some time to prepare
包含 MP3 錄音檔同相關語句,可能需要啲時間嚟準備
* Indicates required field
*表示必填
Indonesian
印尼語
Information about the language
呢種語言嘅資料
In Progress
準備緊
Interested in learning more and contributing to the project?
你有冇興趣學多啲嘢,為呢一個計劃貢獻?
Interface Language
介面語言
Interlingua
國際語
Interlingue
國際語
Irish
愛爾蘭語
Is my account information public?
我嘅賬戶資料係咪公開嘅?
<isoCodeLink>ISO Codes</isoCodeLink> if known
<isoCodeLink>ISO 代碼</isoCodeLink>(如有)
Is the clip valid?
段錄音有冇效?
Is the goal of Common Voice to build a voice assistant?
Common Voice 係咪志在建立一個語音助手?
Italian
意大利語
It contains words or phrases that are hard to read or pronounce.
句野含有一啲好難閲讀同發音嘅字。
It is written in a language different than what I’m speaking.
呢句係另一種語言嘅句子。
Izhorian
伊喬里亞語
Japanese
日語
Join the Common Voice mailing list
接收 Common Voice 電郵通知
Just Unsure?
單係唔確定?
Kabardian
卡巴爾達語
Kabyle
卡拜列語
Kaqchikel
吉志高語
Karakalpak
卡拉卡爾帕克語
Kazakh
哈薩克語
{ $kb }kb max
{ $kb } kb 上限
Keep
保留
Keep it up, record again <recordIcon></recordIcon>
繼續加油,再錄多次!<recordIcon></recordIcon>
Keep the recordings
保存錄音
Keep track of your progress and metrics across multiple languages.
追蹤你喺所有語言嘅進度同指標。
Keep track of your progress with a profile
建立個人檔案,紀錄閣下嘅進度
Keep track of your progress with a profile and help our voice data be more accurate.
使用你嘅個人檔案可以保留你嘅進展,仲可以幫我哋提高語音數據嘅準確度。
Khmer
高棉語
K'iche'
頡車語
Kikuyu
基庫尤語
Kinyarwanda
盧旺達語
Komi-Zyrian
科密語
Korean
韓語
Kurmanji Kurdish
庫爾德語(北)
Kyrgyz
吉爾吉斯語
Language
語言
Language
語言
Language
語言
Language Request
語言申請
Language request successfully submitted, thank you.
已成功提交語言申請,多謝。
Languages
所有語言
Lao
老撾語(寮語)
Latvian
拉脱維亞語
Launched
已啟動
Leaderboard Visibility
排行榜可見性
Learn how to request it here!
撳呢度瞭解點樣請求加埋佢!
Learn how to take part
瞭解點樣參與
Learn More
了解更多
Leaving now means you’ll lose your changes
而家離開將唔會儲存你嘅變更
Leaving now means you'll lose your progress
如果而家離開,你會失去而家嘅進展
Let's Get Started
我哋開始啦
LibriSpeech is a corpus of approximately 1000 hours of 16Khz read English speech derived from read audiobooks from the LibriVox project.
LibriSpeech 係一個從LibriVox計劃入面,攞到大概一千個鐘嘅16Khz 英語有聲書錄音嘅語料庫。
License
授權條款
License: <licenseLink>CC-0</licenseLink>
授權條款: <licenseLink>CC-0</licenseLink>
License: <licenseLink>{ $license }</licenseLink>
授權條款:<licenseLink>{ $license }</licenseLink>
Ligurian
利古里亞語
Link Copied
連結已複製
Links to websites that can help us understand the language
有助了解呢門語言嘅網站網址
Listen
聽聲
Listen
聽聲
Listening
聆聽
Listen-Queue
聆聽隊列
Lithuanian
立陶宛語
Loading…
加載緊⋯⋯
Loading sentences…
加載緊句子……
Localization
本地化
Localized
已被本地化
Log In
登入
Login Identity
登入身分
Log in or sign up to get started
登入或注冊嚟開始
Login / Signup
登入/註冊
Log In / Sign Up
登入 / 註冊
Log In / Sign Up with { $company } email
登入 / 注冊 { $company } 電郵
Logout
登出
Log Out
登出
Lojban
邏輯語
Looks like there aren't any clips to listen to in this language. Help us fill the queue by recording some now.
睇來而家冇錄音可聽。不如幫我哋錄返啲?
Luganda
盧干達語
Luxembourgish
盧森堡語
Macedonian
馬其頓語
Maithili
邁蒂利語
Make sure the platform is recording before you start speaking, and that it only stops once you’re finished.
開始講嘢之前檢查有冇錄緊音,以及記得講完先好撳停止錄音。
Make your submitted data as rich as possible by providing some anonymous demographic data. We de-identify all demographic data before making it public.
透過提供匿名嘅人口統計特徵,令你提交嘅數據更豐富。公開之前,我哋會先去除人口統計特徵資料入面嘅個人識別資料。
Malagasy
馬拉加斯語
Malay
馬來語
Malayalam
馬拉也藍語
Male
男
Maltese
馬耳他語
Manage Email Subscriptions
管理電郵訂閱
Manage Subscriptions
管理訂閱
Mapudungun
馬普切語
Marathi
馬拉地語
Maybe our <homepageLink>homepage</homepageLink> will help? To ask a question, please join the <matrixLink>Matrix community chat</matrixLink>, monitor site issues via <githubLink>GitHub</githubLink> or visit <discourseLink>our Discourse forums</discourseLink>.
我哋嘅<homepageLink>首頁</homepageLink>可能會幫到你?想問問題,請加入<matrixLink>Matrix 群組聊天室</matrixLink>,網站問題可以交畀<githubLink>GitHub</githubLink>,或去一去我哋嘅<discourseLink>Discourse 論壇</discourseLink>話畀我哋知。
MB
兆字節
Meadow Mari
馬里語(東部)
Meetei Lon
曼尼普爾語
Message
訊息
[Mismatched content]
[唔關事嘅內容]
[Mismatched content]
[唔關事嘅內容]
Misreadings
錯讀
Missing an <strong>'s'</strong> at the end of a word.
漏咗詞尾嘅<strong>咗</strong>或者句尾嘅<strong>㗎、喇、咋、啊、喎</strong>。
Missing <strong>'A'</strong> or <strong>'The'</strong> at the beginning of the recording.
漏咗詞頭嘅量詞,例如「<strong>個</strong>」或者「<strong>啲</strong> 」。
Missing the end of the last word by cutting off the recording too quickly.
太早停咗個錄音,搞到最後一隻字斷咗錄唔到
Mixed
混合
Moksha
莫克沙語
Mongolian
蒙古語
More
更多
[More has been recorded than the required text]
[錄埋啲原句以外嘅字入去]
Mossi
莫西語
Most of the data used by large companies isn’t available to the majority of people. We think that
stifles innovation. So we’ve launched Common Voice, a project to help make voice recognition open
and accessible to everyone.
大部分現成嘅數據都係由大公司擁有,冇開放畀大眾使用。我哋覺得噉樣會阻礙創新,所以創立咗 Common Voice 計劃,等大家都可以使用開放同易用嘅語音識別技術。
Most of the data used by large companies isn’t available to the majority of people. We think
that stifles innovation. So we’ve launched Project Common Voice, a project to help make voice
recognition open to everyone.
大部分現成嘅數據都係由大公司擁有,冇開放畀大眾使用。我哋覺得噉樣會阻礙創新,所以創立咗每個人都可以來自由建造語音識別嘅 Common Voice 計劃。
Most recordings are of people talking in their natural voice. You can accept the occasional non-standard recording that is shouted, whispered, or obviously delivered in a ‘dramatic’ voice. Please reject sung recordings and those using a computer-synthesized voice.
大部份錄音都係用正常聲線去錄。如果間中有啲唔標準嘅錄法,例如係大叫,細聲講嘢,或者好有「戲劇效果」噉讀,都可以接受嘅。唱歌或者電腦合成嘅錄音,就請你拒批。
Most speech databases are trained with an overrepresentation of certain demographics which results in a bias towards <articleLink>male and middle class</articleLink>. Accents and dialects that tend to be under-represented in training data sets are typically associated with groups of people who are already marginalised. Many machines also struggle to understand female voices.
This is why in our voice database we want variety!
大多數嘅語音數據庫嘅語音來源偏重某啲特定人口,令到結果偏向於<articleLink>男性以及中產階級</articleLink>。而喺呢啲訓練機械嘅數據之中,各種弱勢嘅口音以及方言往往同邊緣化嘅人群有關,同時好多機器亦好難理解女性嘅聲音。
呢個就係點解我哋嘅語音數據庫應該海納百川,收集各種聲音!
Mozilla Common Voice is an initiative to help teach machines how real people speak.
Mozilla Common Voice 係一個教識機器「人類點樣講嘢」嘅倡議。
Mozilla doesn’t pick or favor any one language over another. Instead, Common Voice is a purely community-driven initiative, but it takes <multilangLink>several steps to add a new language</multilangLink> and begin collecting voice donations. First, the Common Voice website needs to be translated so community members can access the contributor experience in their own language. Next, we need a large collection of copyright-free sentences for people to read outloud. Once both of those requirements are satisfied a language is “launched” on Common Voice for people to start recording their voice and validating others donations. If you want to help launch a new language, head over to our <sentenceCollectorLink>sentence collection tool</sentenceCollectorLink> to get started.
Mozilla 唔會特別偏好邊種語言。相反而然 Common Voice 係一個純社群發起嘅計劃,需要<multilangLink>幾個步驟先可以新增語言</multilangLink>並開始收集語音片段。首先,需要完成翻譯 Common Voice 網站,噉樣社群成員先得用自己嘅語言進行貢獻。次之,我哋需要大量嘅無版權語句,畀大家可以朗讀出嚟。當兩個條件都滿足之後, Common Voice 即可「上線」,畀大家開始錄音,同埋驗證其他人所錄低嘅片段。如果你想協助準備畀新語言上線,歡迎到<sentenceCollectorLink>語句收集工具</sentenceCollectorLink>開始幫手。
Mozilla is dedicated to keeping the web open and accessible for everyone. To do that we need to empower web creators through projects like Common Voice. As voice technologies proliferate beyond niche applications, we believe they must serve all users equally. That means investing in more languages and accommodating diverse accents and demographics when building and testing voice technologies. Common Voice is a public resource available to everyone and Mozilla teams and developers around the world are already using it on our own projects as well.
Mozilla 致力於保持網路開放,令任何人都可使用。為咗達到呢個目標,我哋要透過 Common Voice 噉樣嘅計劃嚟幫助網絡創作者。隨住採用語音技術嘅程式激增,我哋相信呢啲程式應該公平噉嚟服務所有使用者。噉樣意味住喺建設與測試語音科技時,需要滿足更多唔同腔調、年齡層等等嘅需求。Common Voice 將會成為一套人人可用嘅公眾資源,而且Mozilla 同埋全球開發者羣眾已將佢用喺專案開發當中。
Mozilla’s open source voice recognition engine Deep Speech can be used to build speech recognition applications. Read our <githubLink>Github overview</githubLink> or join the <discourseLink>DeepSpeech Discourse</discourseLink> to learn how to get started.
Mozilla 嘅開放原始碼語音識別引擎 Deep Speech,可以用來打造語音識別應用程式。您可以睇下我哋嘅 <githubLink>Github 概觀</githubLink>或者加入 <discourseLink>DeepSpeech Discourse</discourseLink> 了解點樣入門。
Mutual accountability.
互相問責。
Mycroft Ai
Mycroft Ai
Mycroft is the world’s first open source assistant.
Mycroft runs anywhere - on a desktop computer, inside an automobile, or on a Raspberry Pi.
Mycroft 係全球第一套開放原始碼嘅語音助理,無論喺電腦、汽車、Raspberry Pi 定任何地方都用得到。
My Sentences
我嘅句子
My Sentences
我嘅句子
n
n
N
N
N/A
唔適用
Name
名稱
Names of your language
語言名稱
Native Language
母語
Native Language
母語
Need some help with accent?
需要口音方面嘅更多定義?
Need some help with variants?
喺方音方面使唔使幫手?
Need to download your data?
需要下載你嘅數據?
Nepali
尼泊爾語
New Language Launch
新語言發佈
Next
下一個
Next Goals: { $goal }
下一個目標:{ $goal }
Nias
尼亞斯語
No
唔係
No
唔係
[No ‘a’ in the original text]
[原本冇「啊」]
No gravatar found for your email
揾唔到你電郵所屬嘅 Gravatar
No microphone found.
未發現咪高峰。
Norwegian Bokmål
書面挪威語
Norwegian Nynorsk
新挪威語
Note: When set to 'Visible', this setting can be changed from the <profileLink>Profile page</profileLink>
注意:設定做「可見」之後,以後可以喺<profileLink>個人檔案</profileLink>修改
Note: You will still need to select between Speak or Listen to change contribution type.
注意:你仲需要揀「錄音」或「聽聲」嚟轉換貢獻模式。
No Thanks
唔使嘞
No variant selected
未揀方音
Now that you know a little bit more about Common Voice, why not try it out? Click on the microphone icon to start reading sentences aloud. <br/><br/>If you prefer to review other people's voice contributions, click on the play icon. You’ll help confirm that recordings match the sentences written on screen.
而家你知多咗 Common Voice 嘅資訊喇,有冇興趣來試下?請你撳一撳個咪來大聲朗讀句子。<br/><br/>如果你想幫手審核其他人嘅錄音,請你撳一撳個播放圖示來確認段錄音同段文字係咪相同。
Now you can donate your voice to help us build an open-source voice database that anyone can use
to make innovative apps for devices and the web. Read a sentence to help machines learn how real people speak. Check the work of other
contributors to improve the quality. It’s that simple!
而家你可以將自己嘅聲音捐畀我哋,以幫助建立開放嘅語音資料庫,等任何人都可以為裝置同互聯網發行嶄新嘅應用程式。<lineBreak></lineBreak>
只要朗讀一段文字,你就可以幫助機器了解我哋點樣講嘢。你亦都可以驗證其他貢獻者嘅錄音,協助改善品質。就係咁簡單!
Number of Voices
錄音人數
Numbers. There should be no digits in the source text because they can cause problems when read aloud. The way a number is read depends on context and might introduce confusion in the dataset. For example, the number “2409” could be accurately read as both “twenty-four zero nine” and “two thousand four hundred nine”.
數字嘅問題:源文本中唔應該出現數字,因為數字可能會導致朗讀方面出問題。 數字嘅讀法會因上下文而有所不同,可能會導致數據集出現混淆。例如,數字「2409」可以被讀作「二四零九」或者「二千四百零九」。
Occitan
奧克語
Odia
奧里亞語
Off
閂
Offensive language
冒犯性言語
Offensive speech
冒犯性言論
On
開
Once logged in you can select your languages from the profile section.
登入後,你可以喺個人資料中揀自己嘅語言。
On desktop devices you can contribute by downloading…
喺桌上型電腦,閣下可以下載:
One sentence per line
每行一句
On his head he wore a beret.
佢去咗銀行。
On iOS please continue with Safari to enable recording…
iOS 用戶,請使用 Safari瀏覽器以繼續錄製…
On the other hand, if you think that the reader has probably never come across the word before, and is simply making an incorrect guess at the pronunciation, please reject. If you are unsure, use the skip button.
另一方面,如果你覺得個朗讀者可能唔識隻字,又估錯隻字點讀,請拒批。唔肯定嘅話,就撳跳過掣。
Optionally join on our email list for updates and new information about the project.
你可以選擇接收有關項目更新同新資訊嘅電子郵件。
Optionally submitted demographic data (e.g. age, gender, language, and accent) will never be made public on your profile, and will not be linked to your account in the dataset. Individual audio clips will be associated with demographic data for the purpose of more accurate analysis - for example, a researcher might want to target a training model to a specific demographic segment.
可以自由提供嘅人口背景資料(例如年齡、性別、語言、口音)絕對唔會喺你嘅 profile 公開,都唔會同你嘅户口㨢埋一齊。獨立嘅錄音片段會同背景資料褦埋一齊,令到分析更加精準,因為有時研究人員會想訓練啲語音模型去專門處理某種人口背景嘅語音。
Other
其他
Other Language
其他語言
Other people validate those voice clips.
其他人驗證呢啲錄音片段。
Other voice datasets…
其他語音數據集…
Other Voice Datasets
其他語音數據集
Our governance is founded on the pillars of:
我哋嘅管治係建基於:
Our Partners
我哋嘅合作伙伴
Our source text is made up of original contributor donations as well as dialogue from public domain movie scripts like <italic>It’s a Wonderful Life</italic>.
You can view our source sentences in this <githubLink>GitHub folder</githubLink>.
我哋嘅源文本有來自貢獻者嘅原始貢獻,仲有公共領域嘅影片劇本,例如 <italic>莫負少年頭</italic> 嘅對白腳本。
閣下可到呢個 <githubLink>GitHub 資料夾</githubLink>來檢視我哋嘅來源文本。
Overall Accuracy
整體準確度
Overall Hr. Total
錄音時數 (小時)
Overall project status: see how far we’ve come!
總體項目狀態:睇下我哋已經行咗幾遠!
p
p
Papiamento (Aruba)
帕皮阿門托語(阿魯巴)
Partner
合作
Partners
合作夥伴
Partners
合作夥伴
[Part of the text can’t be heard]
[部份文字聽唔到]
[Part of the text can’t be heard]
[部份文字聽唔到]
Pashto
普什圖語
Past recordings download requests
過往嘅下載錄音請求
<p>Common Voice is a publicly available voice dataset, powered by the voices of volunteer contributors around the world. People who want to build voice applications can use the dataset to train machine learning models.</p>
<p>At present, most voice datasets are owned by companies, which stifles innovation. Voice datasets also underrepresent: non-English speakers, people of colour, disabled people, women and LGBTQIA+ people. This means that voice-enabled technology doesn’t work at all for many languages, and where it does work, it may not perform equally well for everyone. We want to change that by mobilising people everywhere to share their voice.</p>
Common Voice 係一個公開嘅語音數據集,佢係靠世界各地嘅志願者貢獻出自己把聲嚟構成嘅。想自己整語音應用嘅個人同團體都可以用呢份數據集嚟訓練啲機械學習模型。
目前大部分語音數據集都係得大公司先擁有,噉樣會阻礙創新。而且好多語音數據集都忽視咗非英語人士、有色人種、殘障人士、女性、LGBTQIA+ 等人羣嘅數據。噉導致咗有好多語言啲語音技術都唔支援,或者就算係支援咗都唔係對所有人嘅效果都一樣咁好。我哋想等大家都可以好方便噉貢獻出自己把聲,嚟改變呢個現狀。
People come and contribute their voices.
個個人都嚟貢獻佢哋自己把聲。
Persian
波斯語
Personal dashboards keep you up-to-date with individual and community progress.
個人儀表板會準時更新你同埋大衆嘅進度。
<playbookLink>Find helpful guidance</playbookLink> on the entire Common Voice journey, from localisation to dataset usage, as well as how to connect with our community.
無論係在地化、數據使用,定係社群交際方面嘅資訊,都可以喺使用 Common Voice 嘅過程中,隨時<playbookLink>造訪實用指南</playbookLink>獲取。
<playIcon></playIcon>Last one!
<playIcon></playIcon> 最後一個!
Play/Stop
播放/停止
Polish
波蘭語
Portuguese
葡萄牙語
Press play, listen & tell us: did they accurately speak the sentence below?
撳播放、聽完、再回覆:呢句話讀得啱唔啱?
Press { shortcut-play-toggle } to toggle play mode
撳 { shortcut-play-toggle } 即可切換播放模式
Privacy
私隱
Privacy
私隱
Privacy Policy
私隱政策
Privacy, security and transparency.
私隱、安全同透明度。
Pro
專業
Profile
個人檔案
Profile
個人檔案
Profile
個人檔案
Profile information improves the audio data used in training speech recognition accuracy.
個人檔案信息可以改良用於訓練語音識別準確度嘅音頻數據。
Progress
進展
Punctuation
標點
Punjabi
旁遮普語
Quechua Chanka
奇楚華昌卡語
r
r
[‘Raised’ in English is always pronounced as one syllable, not two]
[喺邊一度嘅粵語入面,「鉛」同「簽」都係唔同音]
Reader Effects
朗讀效果
Reading contractions that aren't actually there, such as "We're" instead of "We are", or vice versa.
讀咗啲冇寫出嚟嘅縮略,例如寫住「唔係」但係讀咗做「咪」。
Read More
了解更多
Read more on our About page
去關於我哋嘅頁面瞭解更多
Read the sentence carefully - don’t miss, change or add words.
讀句子要小心,唔好漏字、改字或者加字。
Ready to add your voice or lend your ear?
準備好參與錄音,或者⋯⋯借一借對耳來用未?
Ready to contribute?
準備好貢獻未?
Ready to do { $count } more?
準備好做多{ $count } 個未?
Ready to donate your voice?
準備好貢獻你把聲未?
Ready to help validate sentences?
準備好幫手驗證句子未?
reCAPTCHA is required if you want to proceed
想繼續嘅話要完成 reCAPTCHA 驗證
Receive emails such as challenge and goal reminders, progress updates, and newsletters about Common Voice.
接收挑戰及目標提醒、進度更新同關於 Common Voice 嘅新聞簡報。
Recorded Clips
已錄片段
Recorded Hours
錄音時數
<recordIcon></recordIcon> Last one!
<recordIcon></recordIcon>最尾一個!
[Recording cut off before the end of the last word]
[未講完最後嗰隻字就斷咗錄音]
Recordings
錄音
Recordings
錄音
Recording voice clips is an integral part of building our open dataset; some would say it's the fun part too.
錄音係我哋建立開放數據集不可或缺嘅一部份,亦都係好多人覺得最好玩嘅嗰部份!
Record/Stop
錄音/停止
Record your voice
錄低你把聲
Refresh
刷新
Reject
否決
Rejected Sentences
否決咗嘅句子
Remove
移除
Remove Avatar
刪除肖像
Report
報告
Report Bugs
報告 bug
Report copyright issues
報告版權問題
Report was passed successfully
成功送出報告
Request a Language
申請增加一種語言
Request recordings
請求獲取錄音
*required
*必填欄位
Re-record
重新錄音
RE-RECORD
再錄過
Re-record clip
重新錄音
RETRY
重試
Return
返回
Return here to edit your goal anytime.
你可以隨時返返嚟呢度編輯目標。
Return to Common Voice
返回 Common Voice
Return to Common Voice Datasets
返回 Common Voice 數據庫
Return to Languages
返回語言列表
Review
覆核
Review
審核
Review
審核
Review & re-record clips here as you go
喺度確認或者重錄片段
Review & re-record clips if needed
有需要時覆核並重新錄音
Review sentences
審核句子
Review Sentences
審核句子
Review Sentences
審核句子
Review & Submit
覆核並提交
Romanian
羅馬尼亞語
Romansh Sursilvan
舒蕭凡羅馬什語
Romansh Vallader
羅曼什語
Runyankole
尼揚科勒語
Russian
俄語
s
s
S
S
Sakha
薩哈語
Santali (Ol Chiki)
桑塔利語
Sardinian
薩丁尼亞語
Save
儲存
Saved
已儲存
Search
揾
Search for answers
揾答案
Search for them on the Internet
上網搵下
See how your progress compares to other contributors all over the world.
對比你同世界各地嘅貢獻者嘅進度。
See Less
睇少啲
See More
睇多啲
Select a Language...
揀一種語言…
Selected
已揀
Select Language
選擇語言
Send sign up link
寄送註冊連結
Sentence Collection
收集句子
Sentence Collection
句子收集
Sentences
句子
Sentences are collected for people to read aloud.
啲句子已經收好攞去畀人讀喇。
Sentence should not contain abbreviations
句子唔應該含有縮寫
Sentence text
句子文本
Serbian
塞爾維亞語
Set a goal
訂立目標
Set my visibility
公開/隱藏個人檔案
Settings
設定
Settings
設定
Share Common Voice
分享Common Voice
Share my goal
分享我嘅目標
Share your clip
分享你段錄音
Share your { $count } Clip Daily Goal for { $type }
分享閣下嘅每日 { $type } 目標: { $count } 片段
Share your { $count } Clip Weekly Goal for { $type }
分享閣下嘅每週 { $type } 目標: { $count } 片段
Shilha
施盧赫語
Shortcuts
捷徑
[Should be ‘dinosaurs’]
[應該係「嗰啲」唔係「啲」,讀漏咗字所以唔啱]
[Should be “We are”]
[應該要係「唔係」]
Show my ranking
顯示我嘅排名
Sicilian
西西里語
Sign up
註冊
Sign up for an account
註冊帳户
Sign up for Common Voice newsletters, goal reminders and progress updates
留低你嘅電郵地址,收取 Common Voice 電子報、目標提醒、同進度更新。
sign up for email updates
訂閲最新消息電子報
Sign up for { $lang } updates:
登記接收{ $lang }嘅最新消息:
Single ZIP file containing
包含單個zip檔
Sinhala
僧伽羅語
Size
數據庫大細
Size
大細
Skip
跳過
Skip
跳過
Skip Submission Feedback
跳過提交反饋
Slovak
斯洛伐克語
Slovenian
斯洛文尼亞語
Social media
社交媒體
Somali
索馬里語
Someone asks for a language to be added.
有人請求新加一門語言。
Something went wrong with reCAPTCHA. Please try again.
reCAPTCHA 出咗錯,請試多次。
Sorbian, Lower
索布語(低地)
Sorbian, Upper
索布語(高地)
Sorry, Common Voice is running slowly. Thanks for your interest.
對唔住,Common Voice 運行得好慢。多謝你嘅耐心等待。
Sorry, something went wrong
對唔住,出咗啲問題
Spanish
西班牙語
Speak
講
Speak
錄音
Speakers
獻聲人數
Speaking
講
Speaking and Listening
又聽又講
Speak now
請開聲講
Speak up, contribute here!
喺度開聲貢獻!
<speechBlogLink>Get Started with Speech Recognition</speechBlogLink>
<speechBlogLink>語音辨識新手上路</speechBlogLink>
Speech is often the most natural way we communicate with each other and voice technologies are bringing that convenience to our computers and mobile devices. We want to empower developers to build amazing voice recognition applications like real-time translators and voice-enabled digital assistants. But right now most of the voice data required to build these kinds of apps is expensive and proprietary. We hope the Common Voice dataset will give developers what they need to innovate and make speech technology available in their own language.
To make voice recognition even more universal, we're collecting voice samples in widely spoken languages as well as those with a smaller population of speakers often underserved by commercial speech recognition services. Publishing a diverse dataset of voices will empower developers, entrepreneurs, and entire speech communities to address this gap themselves.
講嘢通常係我哋同其他人最自然嘅溝通方式,語音技術亦都令電腦同流動裝置更加方便。我哋希望開發者打造令人驚嘆嘅語音識別程式,例如即時翻譯機、有語音功能嘅數位助理等等。但係而家用嚟建設呢啲軟件要用嘅語音資料,大部分都係好貴嘅,所以我哋希望可以提供Common Voice數據集畀開發者進行創新,亦令我哋可以用自己嘅語言打造語音技術。
為咗令語音識別技術可以更加普遍,我哋收集無論係有廣大使用者,定係較少使用者嘅語言(大部分商業語音識別技術對呢啲語言都冇乜支援)嘅語音片段,並發佈一組有多元語言同腔調嘅語音資料集,希望可以提供畀開發者、創業家、以及成個語音技術社群跨越呢道鴻溝。
Speech-to-text (STT)
語音轉文字
Speech-to-text (STT) technologies convert voice data into text.
語音轉文字技術係將聲音數據轉化成文字。
Split into { $archiveCount } ZIP files containing
包含 { $archiveCount } 個分開嘅 zip 檔
Splits
語音特徵概況
Start recording
開始錄音
Start typing to describe your accent
開始寫閣下嘅口音描述
Statistics
統計數據
Stats
統計資料
Status Page
狀態頁面
Streaks
Streaks
<strong>[Crackle]</strong> giant dinosaurs of <strong>[crackle]</strong> -riassic.
大<strong>[嘞嘞聲]</strong> 恐 <strong>[嘞嘞聲]</strong> 龍。
<strong>{ Crackle }</strong> giant dinosaurs of <strong>{ crackle }</strong> -riassic.
<strong>{ Crackle }</strong>嗰啲三疊紀嘅<strong>{ crackle }</strong>龍
<strong>[Sneeze]</strong> The giant dinosaurs of the <strong>[cough]</strong> Triassic.
<strong>[乞嚏]</strong> 三疊紀嘅 <strong>[咳]</strong> 大恐龍。
<strong>{ Sneeze }</strong> The giant dinosaurs of the <strong>{ cough }</strong> Triassic.
<strong>{ Sneeze }</strong>嗰啲三疊紀嘅<strong>{ cough }</strong>巨型恐龍
Submit
遞交
Submit
遞交
Submit
遞交
Submit a report
提交一個報告
Submit clips
提交錄音
Submit clips
提交錄音
Subscribe
訂閲
Success
成功
Success, profile created!
成功建立個人檔案!
Swahili
斯瓦希里語
Swedish
瑞典語
Syriac
敘利亞語
Tagalog
他加祿語
Tahitian
大溪地語
Taiwanese (Minnan)
台灣閩南語
Tajik
塔吉克語
Taking several attempts to read a word.
錄咗幾次先至錄得到個字
Tamil
泰米爾語
Tap
敲
Tatar
韃靼語
Tatoeba is a large database of sentences, translations, and spoken audio for use in language learning. This download contains spoken English recorded by their community.
Tatoeba 係一套用於語言學習嘅大型數據庫,當中包含咗各種句子、翻譯、以及錄音。呢個下載項目包含咗其社群所錄低嘅英語語音。
TED-LIUM Corpus
TED-LIUM 語料庫
Tell us what you'd like to download:
話畀我哋知你想下載啲咩:
Telugu
泰盧固語
Terms
條款
Terms
條款
Thai
泰語
Thanks for confirming your account, now let's build your profile.
感謝你確認你嘅帳戶,而家我哋一齊建立你嘅個人資料啦
Thanks to contributions from over 259k people in over 50 languages, this data is being used to train speech-enabled applications to better respond to the human voice.
受惠於全球超過25.9萬人為50幾種唔同語言嘅無私貢獻,我哋將會利用呢啲資料嚟訓練有語音功能嘅應用程式,令佢哋可以更準確噉理解人聲。
Thank you for recording!<lineBreak></lineBreak>Now review and submit your clips below.
多謝你嘅錄音!<lineBreak></lineBreak>請喺下面審核同提交你嘅錄音。
Thank you for your interest in contributing to { $lang }. We work hard to get every language ready for launch and keep
the teams updated via email. If you want to contribute, please provide your email below.
多謝你有意向為{ $lang }貢獻內容。我哋會盡力令大家用到每種語言並會以電郵通知團隊。如果你想幫手貢獻,請喺下面留低你嘅電郵地址。
Thank you! You’ve sent a new language enquiry
唔該晒!新語言申請已經提交。
The bumblebee sped by.
噷……
The Clip Graveyard consists of voice clips that didn't make it into the Common Voice dataset. Just like the dataset, the Clip Graveyard is available for download.
垃圾桶度有無法進入 Common Voice 數據集嘅語音片段。同數據集一樣,垃圾桶嘅內容亦可下載。
The clip has disrespectful or offensive language.
呢個片段中有唔尊重人哋或冒犯性語言。
The Common Voice dataset complements Mozilla’s open source voice recognition engine Deep Speech. The first version of Deep Speech was released in November 2017 and has continued to evolve ever since. Together with the Common Voice dataset, we believe this open source voice recognition technology should be available to everybody. It’s our hope these technologies will enable developers to build a wave of innovative products and services.
Common Voice 能夠同 Mozilla 嘅開放原始碼語音識別引擎 Deep Speech 互補。初版嘅 Deep Speech 喺 2017 年 11 月發行,並持續發展。加埋 Common Voice 數據集,我哋相信呢套開放原始碼語音辨識技術應該開放畀所有人使用,亦希望呢啲技術可以令開發者建設到新一輪嘅產品同埋服務。
The Common Voice dataset complements Mozilla’s open source voice recognition engine Deep Speech, which you can use to build speech recognition applications. Read our <githubLink>Github overview</githubLink> or join the <discourseLink>DeepSpeech Discourse</discourseLink> to learn how to get started.
Common Voice 數據集可同 Mozilla 嘅開放原始碼語音識別引擎 Deep Speech 互補,畀閣下用來製作語音識別應用程式。閣下可閱讀我哋嘅 <githubLink>Github 概觀</githubLink>或加入<discourseLink> DeepSpeech Discourse</discourseLink> 了解點樣入門。
The Common Voice Dataset contains hundreds of thousands of voice samples that help developers build voice recognition tools.
Common Voice 數據集有幾十萬條語音樣本,可以用嚟幫開發者建造語音識別工具。
The Common Voice dataset is an open and publicly available resource that can be used to train a wide variety of speech-enabled applications. To protect the security of our contributors, we ask everyone who downloads the Common Voice dataset to respect contributors’ privacy.
All voice clips in the dataset are scrubbed of personally identifying information. When you download the dataset, you agree to not attempt to determine the identity of any contributor. That means you cannot try to link information in the dataset to a contributor’s personal information. You may, however, use the dataset to train speech recognition, speaker recognition, or other applications, by, for instance, linking information in the dataset to other information already in the dataset.
Common Voice 數據集係一份開放,可公開使用嘅資源。含有語音功能嘅應用程式可使用呢份資料嚟訓練程式。為咗保護貢獻者嘅安全,我哋要求所有下載 Common Voice 資料集嘅人確保貢獻者的隱私安全。
所有語音片段中嘅個人識別資料已經被清除。當你下載數據集嗰陣,你須要同意唔會嘗試識別數據集當中嘅任何貢獻者。呢個代表你唔可以嘗試將數據集中嘅資訊,同貢獻者嘅個人資訊聯繫起嚟。但你可以將數據集中嘅唔同資訊互相連結埋,用嚟訓練語音識別、講話人識別等功能,或其他應用程式。
The Common Voice dataset is available for download under the <licenseLink>CC0</licenseLink> license on <datasetLink>our Datasets page</datasetLink>. You can also download several other publicly available datasets from the same page.
我哋嘅資料集可到<datasetLink>Common Voice 數據集頁面</datasetLink>下載,本數據集使用<licenseLink>CC0</licenseLink>授權。閣下仲可以喺該頁面中下載其它幾套嘅數據集。
The Common Voice Sentence Collector has collected { $sentenceCount } sentences in { $languageCount } languages!
Common Voice 句子收集器已經收集咗 { $languageCount } 門語言嘅 { $sentenceCount } 個句子喇!
The count of voice recording hours that have been validated by 2 out of 3 users with a vote of “Yes”. These mark progress toward the overall project 10k hours goal.
每3位使用者當中,有2位使用者投下「啱」嘅錄音時數。呢個就係成個計劃一萬小時目標嘅進度。
The count of voice recording hours we have collected so far.
到目前為止我哋收集到嘅錄音時數。
The giant dinosaur of the Triassic.
啲三疊紀嘅巨型恐龍
The giant dinosaurs of the Triassi-.
嗰啲三疊紀嘅巨型恐-。
The giant dinosaurs of the Triassic.
嗰啲三疊紀嘅巨型恐龍
The giant dinosaurs of the Triassic. <strong>[read by one voice]</strong>
嗰啲三疊紀嘅巨型恐龍。<strong>[由一把聲音讀出]</strong>
The giant dinosaurs of the Triassic. Yes.
嗰啲三疊紀嘅巨型恐龍。好。
The giant dino <strong>[cough]</strong> the Triassic.
三疊紀嘅 <strong>[咳]</strong> 大恐龍。
The giant dino <strong>{ cough }</strong> the Triassic.
嗰啲三疊紀嘅<strong>{ cough }</strong>恐龍
The goal of the Common Voice dataset is to enable anyone in the world to build speech recognition, speaker recognition, or any other type of application that requires voice data. A voice assistant is just one of many types of applications you could use the dataset to build.
Common Voice 數據集嘅目標係令任何人都可以建造語音識別、説話者識別,或其他任何需要語音資料嘅應用程式。語音助理就是呢個數據集可以用嚟建造嘅應用之一。
The multi-language version of the Common Voice dataset is currently undergoing community supported bundling and cleaning. If you would like to help us bring Common Voice to new languages, go check out the <sentenceCollectorLink>Sentence Collection Tool</sentenceCollectorLink> for adding new sentences to the dataset, and Mozilla <pontoonLink>Pontoon</pontoonLink> for translating the website itself. New languages are added to Common Voice for voice contribution when 5000 approved sentences have been collected.
多語言版本嘅 Common Voice 數據集,目前正交由社群進行清理同埋打包。若閣下想幫我哋新添語言到 Common Voice,請使用 <sentenceCollectorLink>語句收集工具</sentenceCollectorLink>來將語句加入到數據集,並到 <pontoonLink>Mozilla Pontoon</pontoonLink> 度將網站翻譯成該語言。當每種語言有超過 5000 條語句並獲審批後,就會正式加入 Common Voice。
The number of recordings and which languages you contribute to will be public.
你貢獻嘅錄音數量,同埋你貢獻咗邊幾種語言,都會係公開嘅。
The process by which a contributor’s profile information is obscured from their donated voice clips when packaged for download as a part of the dataset.
喺打包做下載資料集時,貢獻者嘅個人資料會從其所貢獻嘅語音片段隱藏。
There are lots of ways to think about language. For the purposes of speech recognition models, Common Voice suggests focussing on ‘mutual intelligibility’, or ‘can speakers of this language mostly understand one another if they try to?’
我哋可以用唔同角度去諗咩係語言。對於語音識別模型,Common Voice 建議將重點放喺「互通度」上面,即係話「講呢種語言嘅人互相對話嘅話,係咪大部分會理解到對方講緊啲咩?」
The recording was too long.
段錄音太長喇。
The recording was too quiet.
段錄音太靜喇。
The recording was too short.
段錄音太短喇。
There will be natural variations in volume between readers. Reject only if the volume is so high that the recording breaks up, or (more commonly) if it is so low that you can’t hear what is being said without reference to the written text.
唔同嘅朗讀者自然會有聲量嘅偏差。淨係聲量大到個錄音會斷開,或者(更常見)係聲量細到冇字幕就聽唔清嗰陣,先至好唔批
These languages are currently under community development. The progress bars indicate how far each language is in the process of <localizationGlossaryLink>website localization</localizationGlossaryLink> and <sentenceCollectionGlossaryLink>sentence collection</sentenceCollectionGlossaryLink>.
呢啲語言而家仲處於社群開發進程中,進度條展示咗每種語言<localizationGlossaryLink>網站本地化</localizationGlossaryLink>同<sentenceCollectionGlossaryLink>搜集語句</sentenceCollectionGlossaryLink>嘅進度。
The selected file is too large
檔案過大
The Sentence Collector is part of <commonVoiceLink>Common Voice</commonVoiceLink>. It allows contributors to collect and validate sentences created by the community. You can use this tool also to import and clean-up small-to-medium-sized public domain corpus you have found or collected. All sentences need to be Public Domain. Approved sentences are exported every week to the Common Voice repository and are released on the Common Voice website on every new deployment.
語句收集工具係 <commonVoiceLink>Common Voice</commonVoiceLink> 嘅一部份。佢可以畀參與者收集社群提供嘅語句及判斷其正確與否。閣下亦可用此工具載入及整理喺公共區域中取得嘅中小型語料。所有語句必須來自公共領域。被核准嘅語句每個星期都會被導入至 Common Voice 嘅語音庫中,並喺每次 Common Voicew 網站部署個陣發佈到上面。
The sentence has a grammatical or spelling error.
呢句有文法或串法錯誤。
The sentence has disrespectful or offensive language.
呢句唔尊重人或者語帶冒犯。
The sentence must be grammatically correct.
句子要符合語法。
The sentence must be speakable.
句子要係讀得出嘅。
The sentence must be spelled correctly.
句子寫法要正確。
The site is ready to be launched when it reaches 75% completion.
內容翻譯咗75%之後,網站就可以準備發佈。
The site will be back up as soon as possible. For the latest information, please join the <matrixLink>Matrix community chat</matrixLink> or visit <githubLink>GitHub</githubLink> or <discourseLink>our Discourse forums</discourseLink> to submit and monitor site experience issues.
網站將會盡快恢復作業。請到我哋嘅 <matrixLink>Matrix 社群聊天頻道</matrixLink>、<githubLink>GitHub</githubLink> 上嘅網站報告問題,或到 <discourseLink>Discourse 討論區</discourseLink>報畀我哋,或者瀏覽最新資訊。
The TED-LIUM corpus was made from audio talks and their transcriptions available on the TED website.
TED-LIUM 語料庫係由 TED 網站上提供嘅講座對話語音同埋演講文字抄稿一齊製成嘅語料庫。
The website text is translated into that language.
網站文本翻譯咗成嗰門語言喇。
This is approximately the number of hours required to train a production speech-to-text system.
呢個大約係訓練一個語音轉文字系統嘅所需時數。
This is a use case driven segment containing data to power spoken digit recognition and yes / no detection.
呢個係按照實際使用需要拆出嚟嘅部份,入面嘅資料可以用喺數字識別同埋是/否檢測。
This is open source software which can be freely remixed, extended, and improved. Mycroft may be used in anything from a science project to an enterprise software application.
呢個係一套可以自由混搭、擴展、改進嘅開放原始碼軟件。Mycroft可以用於各種情景,譬如科學專案、企業應用程式等。
This is our process for translating and adapting our content for many locales (languages).
呢個係我哋翻譯同套用去唔同嘅本地環境(語言)嘅過程。
This project is an effort to bridge the digital speech divide. Voice recognition technologies bring a human dimension to our devices, but developers need an enormous amount of voice data to build them. Currently, most of that data is expensive and proprietary.
We want to make voice data freely and publicly available, and make sure the data represents the diversity of real people. Together we can make voice recognition better for everyone.
呢個計劃係想縮短數碼語音嘅技術分歧。語音識別科技可以令我哋嘅設備更加人性化,但係開發者要靠大量嘅語音資料先至可以建立到。目前可用嘅資料價格昂貴,又係專有技術。
我哋想令語音數據可以公開自由畀人使用,並且確保呢啲數據反映出我哋大衆嘅多樣性。合衆人之力,我哋可以幫大家將語音識別技術變得更好!
This setting controls your leaderboard visibility. When hidden, your progress will be private. This means your image, user name and progress will not appear on the leaderboard. Note that leaderboard refresh takes ~{ $minutes }min to populate changes.
用該選項喺排行榜上高公開/隱藏個人檔案。喺「隱藏」狀態時,閣下嘅進度得自己睇到,照片、用戶名稱、貢獻進度等均唔會出現喺排行榜上高。注意喺改變設定 { $minutes } 分鐘之後,排行榜設定先會生效。
Three to go!
仲有三個!
Tibetan
藏語
Tigre
提古利語
Tigrinya
提格利尼亞語
Today
今日
Today's Common Voice progress on clips recorded
今日 Common Voice 錄音片段嘅進度
Today's Common Voice progress on clips validated
今日 Common Voice 驗證片段嘅進度
Today's Progress
今日進度
To make it into the Common Voice dataset, a voice clip must be validated by two separate users.
一個錄音片段必須先通過兩個唔同嘅用户驗證,先可以進入 Common Voice 數據集。
To make the Common Voice dataset as useful as possible we have decided to only allow source text that is available under a Creative Commons (CC0) license. Using the CC0 standard means its more difficult to find and collect source text, but allows anyone to use the resulting voice data without usage restrictions or authorization from Mozilla. Ultimately, we want to make the multi-language dataset as useful as possible to everyone, including researchers, universities, startups, governments, social purpose organizations, and hobbyists.
為咗令 Common Voice 數據集發揮最大效益,我哋決定僅允許以 Creative Commons (CC0) 授權條款提供嘅源文本。用 CC0 條款標準會導致比較難揾到源文本,但係噉可以令所有人都用到出品嘅語音數據,亦唔會受到 Mozilla 嘅限制或要求授權。我哋嘅最終目標係令呢個多語言數據集對所有人都有幫助,包括研究者、大學、創業公司、政府、社團組織、興趣愛好者等等。
Top Contributors
主要貢獻者
Total
總共
Total Approved
總批准數
{ $totalHours } hours is achievable in just over { $periodMonths } months if { $people } people record { $clipsPerDay } clips a day.
假如有 { $people } 個人每日都錄到{ $clipsPerDay } 條片,就可以喺 { $periodMonths } 個月內達到{ $totalHours } 個鐘嘅錄音目標。
To the right we outline the benefits and clarify what information we make public. Use the links below to get started with a Common Voice account on your own device.
喺右手邊,我哋列明咗各項成效同我哋會公開嘅資訊。請用下面嘅連結嚟喺你自己嘅裝置上開始使用 Common Voice 賬戶。
Toward next goal
向住下一個目標
Track progress here and on your stats page.
可以在此追蹤進度,或前往統計頁面。
Translate this page
翻譯呢個頁面
<translateVideoLink>Watch our guide on how to use Pontoon.</translateVideoLink>
<translateVideoLink>睇下 Pontoon 嘅使用教學。</translateVideoLink>
Translating the site
繙譯本站
Turkish
土耳其語
Turkmen
土庫曼語
Twi
契維語
Typically megabytes
通常幾 MB 大
Ubykh
尤比克語
Udmurt
烏德穆爾特語
Ukrainian
烏克蘭語
Understand contribution criteria
了解貢獻準則
Understand what to look for when listening to voice clips and help make your voice recordings richer too!
了解聽錄音嘅時候要注意啲乜,同埋令你嘅錄音片段更加豐富!
Upload aborted. Do you want to delete your recordings?
上載中斷咗,你想唔想刪除你嘅錄音?
Upload an image file
上載圖片
Urdu
烏爾都語
User Name
用户名
Users validate the accuracy of donated clips, checking that the speaker read the sentence correctly.
用户會核實錄音嘅準確度,睇下朗讀者有冇正確讀出句子。
Using Common Voice
使用 Common Voice
Uyghur
回鶻語(維吾爾語)
Uzbek
烏茲別克語
Validated Clips
已驗證片段
Validated Hours
已驗證時數
Validated Hrs
驗證時數
Validated Hr. Total
已驗證錄音(小時)
Validating donated clips is equally important to the Common Voice mission. Take a listen and help us create quality open source voice data.
驗證人哋錄低嘅錄音片段,對 Common Voice 嘅使命都非常重要。聽下其他人貢獻嘅錄音,就可以幫我哋建立高質素嘅開放源碼語音數據集。
Validation Progress
驗證進度
Validations
驗證
Variants are a specific form of a language - for example shared by those living in a geography or commmunity. Sometimes these are called dialects.
方音/變體係一種語言嘅具體形式,佢由生活喺某一個地域或者社區嘅人共享,有時被稱之為方言。
Varying Pronunciations
發音差異
Venetian
威尼斯語
Version
數據庫版本
Version
版本
Vietnamese
越南語
View your progress against personal and project goals.
喺個人同項目嘅目標下底,檢視你嘅進度。
Visible
可見
Voice clips are entered into a submission queue that readies them for listening.
錄音片段會拎去排隊準備畀人去聽。
Voice Contribution
錄音捐聲
Voice Dataset, Ready for Download
語音數據集,已準備好下載
Voice is natural, voice is human. That’s why we’re excited about creating usable voice technology
for our machines. But to create voice systems, developers need an extremely large amount of voice
data.
語音係自然、有人性嘅。所以我哋希望建立一套畀機器用到嘅語音技術。但建立呢一個語音系統嘅過程,需要超多嘅語音數據。
Voice is natural, voice is human. That’s why we’re fascinated with creating usable voice
technology for our machines. But to create voice systems, an extremely large amount of voice
data is required.
語音係自然、有人性嘅。所以我哋非常希望為機器建造可用嘅語音技術,但建造語音系統需要非常大量嘅語音數據。
Voice recognition technology is revolutionizing the way we interact with machines, but the currently available systems are expensive and proprietary. Common Voice is part of Mozilla’s initiative to make voice recognition technologies better and more accessible for everyone. Common Voice is a massive global database of donated voices that lets anyone quickly and easily train voice-enabled apps in potentially every language.
語音識別技術喺度改變緊我哋同機器互動嘅方法,但目前可用嘅系統唔單止貴,而且係專有技術。Mozilla 提出 Common Voice 作為改進語音識別技術,並將之普及到大眾嘅計畫嘅一部分。Common Voice 都係一套收集咗世界各地人所損贈語音嘅數據庫,希望有助所有人來又快又易噉訓練出可以識別任何語音功能嘅應用程式。
Voice recognition technology is revolutionizing the way we interact with machines, but the currently available systems are expensive and proprietary. Mozilla Common Voice is an initiative to make voice recognition technologies better and more accessible for everyone. Common Voice is a massive global database of donated voices that lets anyone quickly and easily train voice-enabled apps in potentially every language.
We're not only collecting voice samples in widely spoken languages but also in those with a smaller population of speakers. Publishing a diverse dataset of voices will empower developers, entrepreneurs, and communities to address this gap themselves.
語音識別技術喺度改變緊我哋同機器互動嘅方法,但目前可用嘅系統唔單止貴,而且係專有技術。Mozilla 提出 Common Voice 作為改進語音識別技術,並將之普及到大眾嘅計畫嘅一部分。Common Voice 都係一套收集咗世界各地人所損贈語音嘅數據庫,希望有助所有人來又快又易噉訓練出可以識別任何語音功能嘅應用程式。
我哋唔止想收集被廣泛使用嘅語言,亦都想收集少有人講嘅語音樣本。一套多元語音資料集,用來幫助開發者、創業家,以及唔同社群縮窄科技上個鴻溝。
Voices Online Now
在線人數
Voice Validation
驗證錄音
Volume
聲量
Votic
瓦佳語
VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines.
VoxForge 嘅成立,係用來收集被抄寫嘅對話內容,畀自由與開放源碼嘅語音辨識引擎使用。
Want to help make Common Voice even better?
Great! Get in touch via email or <discourseLink>Discourse</discourseLink>
forums, submit site issues via <githubLink>GitHub</githubLink>, or join the
<matrixLink>Matrix</matrixLink> community chat.
想幫手令 Common Voice 計劃變得更好?
太好喇!請用電郵或者 <discourseLink>Discourse</discourseLink> 論壇聯絡我哋,或者喺 <githubLink>GitHub</githubLink> 上面提交網站問題,或者加入
<matrixLink>Matrix</matrixLink> 群組傾偈。
Want to stay in touch with Common Voice?
想及時跟進 Common Voice ?
Want updates when we release a new version of the Common Voice dataset? Subscribe to our newsletter.
想喺新 Common Voice 數據集推出時收到通知?請訂閱我哋嘅電子報。
Watch our video explainer to help
睇睇我哋嘅影片解說
We are building an open and publicly available dataset of voices that everyone can use to train speech-enabled applications.
我哋整緊套公開而人人用得嘅語音數據集,人人都可以用佢來訓練認得到聲嘅應用程式。
We are going out to get a coffee.
我哋唔係出去飲咖啡啊。
We are going out to get coffee.
我哋唔係出去飲咖啡。
We at Mozilla are building a community around voice technology. We would like to stay in touch with updates, new data sources and to hear more about how you're using this data.
Mozilla 正喺度構建緊一個圍繞聲音技術嘅社群。我哋好樂意持續更新各種數據及其來源,並了解閣下會點樣使用呢啲數據。
We at Mozilla are building a community around voice technology. We would like to stay in touch with updates, new data sources and to hear more about how you're using this data.
Mozilla 喺度構建緊一個以聲音技術為主題嘅社群。我哋好樂意持續發佈新消息同數據源,並了解閣下會點樣使用呢啲數據。
We believe that large and publicly available voice datasets foster innovation and healthy commercial competition in machine-learning based speech technology. This is a global effort and we invite everyone to participate. Our aim is to help speech technology be more inclusive, reflecting the diversity of voices from around the world.
我哋相信,大型而公開可用嘅語音數據集能夠促進語音機器學習科技嘅創新,以及健康嘅商業競爭。呢個係一個全球運動,我哋邀請任何人士參與。我哋嘅目標係令語音技術能夠更有包容性,反映出世界各地語音嘅多樣性。
We believe that large, publicly available voice datasets will foster innovation and healthy commercial competition in machine-learning based speech technology.
Common Voice’s multi-language dataset is already the largest publicly available voice dataset of its kind, but it’s not the only one.
Look to this page as a reference hub for other open source voice datasets and, as Common Voice continues to grow, a home for our release updates.
我哋相信如果有一組大規模、公開嘅語音數據集,會奠定以機器學習為基礎嘅語音技術上嘅創新,同埋健康嘅商業競爭。
Common Voice 嘅多語言數據集已經成為咗最大嘅開放語音數據集,但唔係唯一一套。
閣下可於該頁面揾到其他開放原始碼嘅語音數據集。隨住 Common Voice 持續成長,我哋也會喺呢處張貼更新資訊。
Website Localization
網站本地化
We calculate hours by estimating the average length of each recording, and then multiplying that number by the total number of recordings across all languages.
我哋靠估計每段錄音嘅平均長度來計算時數,然後乘以所有語言嘅總錄音數量。
We couldn’t find that page for you
我哋揾唔到你想去嘅頁面
We couldn’t get any audio clips for you to listen to.
Please try again later.
我哋冇晒錄音畀你聽嘞,遲啲再試啦。
Weekly
每週
Weekly Goal
每周目標
We launch the Common Voice site in this language.
我哋發佈呢種語言嘅 Common Voice 頁面。
Welcome { $company } staff!
歡迎 { $company } 嘅員工!
Welcome to Common Voice
歡迎使用 Common Voice
Welcome to the Common Voice Sentence Collector
歡迎嚟到 Common Voice 句子收集器
Welsh
威爾斯語
We promise to handle your information with care. Read more in our <privacyLink>Privacy Notice</privacyLink>.
我哋承諾會謹慎處理閣下嘅資料。詳見<privacyLink>私隱通告</privacyLink>。
We promise to handle your information with care. Read more in our <privacyLink>Privacy Notice</privacyLink>.
我哋保證會謹慎噉處理閣下嘅數據。敬請閲讀<privacyLink>私隱公告</privacyLink>。
We’re building an open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.
我哋想建立一套開放原始碼、多重語言嘅語音數據集,令到任何人都可以用來開發同語音相關嘅應用。
We’re crowdsourcing an open-source dataset of voices. Donate your voice, validate the accuracy of other people’s clips, make the dataset better for everyone.
我哋整緊一個開源嘅聲音資料集。一齊幫手,貢獻你嘅聲音,核實錄音嘅準確度,令資料集變得更加好。
We’re experiencing unexpected downtime
我哋遇上意外嘅系統停機時段
We’re going out to get coffee.
我哋咪出去飲咖啡。
We release the dataset every 3 months.
我哋每 3 個月發佈一次數據集。
We're receiving a lot of traffic and are currently investigating the issues.
我哋有太多流量湧入嚟喇,而家仲調查緊發生咗啲咩事。
We’re sorry, your platform is not currently supported.
唔好意思,你嘅平台暫時尚未支援。
We’ve made some changes. Delta Segments just contain the most recent clips since the last release. <deltaLink>Read more about this work</deltaLink>.
我哋有少少調整,新增部分(Delta Segments)剩係包含上次發佈之後新加嘅錄音。<deltaLink>了解更多呢部分嘅調整</deltaLink>。
We've run out of clips to validate in this language...
呢個語言嘅錄音都已經驗證晒喇……
We've run out of sentences to record in this language...
呢個語言可以錄嘅句子已經錄晒啦⋯⋯
We want the Common Voice dataset to reflect the audio quality a speech-to-text engine will hear in the wild, so we’re looking for variety. In addition to a diverse community of speakers, a dataset with varying audio quality will teach the speech-to-text engine to handle various real-world situations, from background talking to car noise. As long as your voice clip is intelligible, it should be good enough for the dataset.
我哋希望 Common Voice 數據集能夠反映出語音轉文字引擎會喺現實環境入面聽到嘅聲音,所以我哋希望能夠收集各種環境下同埋唔同錄音品質嘅片段。除咗多元嘅講者群體,如果呢個數據集包含到各種語音品質嘅片段,就可以令語音轉文字引擎處理到各種現實環境下嘅狀況,例如背景中有人喺度講嘢,或者有車輛嘅噪音。只要閣下嘅片段可以足夠俾人聽得明,即可收錄到數據集入面。
We want the machine learning algorithms to able to handle a variety of background noise, and even relatively loud noises can be accepted provided that they don’t prevent you from hearing the entirety of the text. Quiet background music is OK; music loud enough to prevent you from hearing each and every word is not.
我哋想啲機械學習演算法可以處理到唔同嘅背景雜音,甚至係有少少大聲嘅噪音都可以接受。前題係啲聲唔會影響到你你聽清楚錄音嘅文字。靜靜地嘅背景音樂都可以。但係音樂聲大到聽唔清講咩嘅話就唔得。
We will be in touch with more information about how to add your language to Common Voice very soon.
我哋會盡快同你聯絡,同你提供更多有關點樣增添語言到 Common Voice 入邊嘅資訊。
We will be in touch with more information as it becomes available.
當此項目可供使用之時,我哋會向閣下提供更多資訊。
We will not make your email public.
我哋唔會公開你嘅電郵地址。
We will review your request to remove your voice recordings from the dataset. If your request is approved, we will contact those who have downloaded the dataset and request they remove your voice recordings as well.
我哋將會審核你從數據集中刪除錄音嘅請求。如果你嘅請求獲得批准,我哋將會聯絡已下載數據集嘅使用者,並請佢哋都刪除你嘅錄音。
What does it mean that I can’t “determine the identity” of speakers in the Common Voice dataset?
佢話我無法喺Common Voice 數據集入邊“識別講者嘅身份”係咩解究呢?
What is a language on Common Voice?
Common Voice 度嘅一門語言係指乜?
What is Common Voice?
Common voice 係乜嘢?
What is Common Voice?
Common Voice 係乜嘢?
What issues are you experiencing with this sentence?
呢句有咩問題?
What kind of goal do you want to build?
閣下想要建立點樣嘅目標?
What level of audio quality is required for a voice clip to be used in the dataset?
錄音品質要到咩等級,先用得喺數據集入面?
What’s inside the Common Voice dataset?
Common Voice 數據庫入面有啲咩?
What's Public?
邊啲資料會公開?
What’s the difference between Common Voice and Deep Speech?
Common Voice 同 Deep Speech 有咩分別?
When a user rejects a voice clip it returns to the Queue. If rejected a second time, the voice clip is moved to the Clip Graveyard.
如果個用户投咗「錯」票,錄音片段就會返到隊列再畀其他人驗證。如果第二次都係畀人打成錯嘅,個片段就會進入垃圾桶。
When listening, check very carefully that what has been recorded is exactly what has been written; reject if there are even minor errors. <br />Very common mistakes include:
聽緊錄音嘅時候,認真睇下啲字同錄音係咪完全一致;有少少錯就唔可以批准通過。<br />常見嘅錯誤包括:
When will you release Common Voice data in other languages?
Common Voice 幾時會發放其他語言嘅數據?
When you request your recordings, we compile them into one or multiple ZIP files. Here are your past requests:
喺你提取錄音嗰陣,我哋會將啲音檔整合成一個或者幾個 zip 檔。下面係你之前嘅提取記錄:
Where does the source text come from?
呢段源文本係邊度來嘅?
Which variant of { $language } do you speak?
你講緊嘅係 { $language } 嘅邊種方音?
Why ?
點解?
Why a profile?
點解要建立個人檔案?
Why Common Voice?
點解要做 Common Voice?
Why does this matter?
點解呢個好重要?
Why don’t you ask people to read from books or Wikipedia articles in different languages?
不如嗌多啲人來用各種語言朗讀啲書或者維基百科入面嘅文章?
Why do you need so many different speakers per language?
點解每種語言需要咁多貢獻者呢?
Why is 10,000 validated hours the per language goal for capturing audio?
點解每個語言以收集10,000個驗證時數為目標?
Why is Common Voice part of the Mozilla mission?
點解Common Voice 係 Mozilla 嘅重點任務之一?
Why is it important?
點解呢個好重要?
Why is my language not included yet?
點解仲未有我嘅語言嘅?
Why should I sign up for an account?
我點解要注冊帳戶呢?
Would you like to request your voice recordings be deleted too, or do you prefer to keep them in the Common Voice dataset?
你想刪除埋你嘅所有錄音,定係想將錄音保留喺 Common Voice 嘅數據集中?
y
y
Y
Y
Yes
係
Yes
係
Yes, send me emails. I’d like to stay informed about the Common Voice Project.
好,傳送電郵畀我。我想及時獲取 Common Voice 嘅最新消息。
Yes, send me emails. I'd like to stay informed about the progress of this language on Common Voice.
係,請寄電郵畀我。我希望接收呢隻語言喺 Common Voice 嘅進度通知。
Yes, we especially want your voice! Part of the aim of Common Voice is to gather as many different accents as possible so that voice recognition services work equally well for everyone. This means donations from non-native speakers are particularly important.
當然,我哋特別想要閣下把聲音!Common Voice其中一個目標係盡可能收集各種口音,令到語音識別服務能夠適用到每個人。意味住非母語人士嘅貢獻尤其重要。
Yiddish
意第緒語
Yoruba
約魯巴語
You
閣下
You are about to initiate a download of <size>{ $size }GB</size>, proceed?
閣下將會下載 <size>{ $size } GB</size> 嘅檔案,要下載嗎?
You are prepared to initiate a download of <b>{ $size }</b>
你準備要下載<b>{ $size }</b>嘢
You can add sentences on the <writePageLink>Write page</writePageLink> or review sentences on the <reviewPageLink>Review page</reviewPageLink>.
<strong>語句收集工具</strong>係一套用嚟收集同驗證公有領域句子嘅工具。首先你要<scAccountLink>註冊帳户</scAccountLink>,跟住喺<strong>個人檔案</strong>加入你嘅語言。噉你就可以<strong>加入</strong>句子或者<strong>審核</strong>之前加咗嘅句子。
You can also copy and paste the direct URLs into your favorite download manager. They will expire in 12 hours, but you can come back to this page to generate new ones any time.
閣下亦可以喺自己想用嘅下載工具中貼上 direct URLs 以下載檔案。呢啲 URLs 嘅有效期為12個鐘。閣下可隨時再次訪問本頁面以獲取新嘅 URLs。
You can also use Keyboard Shortcuts: Y to Approve, N to Reject, S to Skip
你都可以用鍵盤快捷掣:撳 Y 通過、N 否決、S 跳過
You can choose to make your username public or anonymous.
你可以選擇公開你嘅賬户名稱,或者保持匿名。
You can help build a diverse, open-source dataset by creating a Common Voice profile and contributing your voice.
你可以透過創立一個 Common Voice 帳户並貢獻錄音,來協助我哋建立一個多元、開放源碼嘅數據集。
You can meet others in the Mozilla language communities by joining <discourseLink>Discourse</discourseLink> for topical conversations, or <matrixLink>Matrix</matrixLink> for quick advice.
想認識Mozilla語言社群入面嘅其他人呢,你可以加入 <discourseLink>Discourse</discourseLink> 嚟討論個別主題,或者加入 <matrixLink>Matrix</matrixLink> 去攞啲建議。
You cannot request your recordings while another request is already in progress.
已有處理緊嘅請求,無法再提取錄音。
You can request a new takeout of your recordings every { $days } days.
你可以每 { $days } 日請求一次新嘅錄音數據。
You must allow microphone access.
你必須容許咪高峰存取權。
Your accent is the way you pronounce words. It can be shaped by where you have lived, which other languages you speak and lots of other factors. You can share any information you feel is relevant here.
口音係指你對一個詞彙發音嘅方式。口音通常受成長地、講開嘅其它語言以及其它因素影響而成。你可以喺呢度寫低有關詳情。
Your anonymous voice recordings will remain in the Common Voice dataset. Once you delete your profile you will no longer be able to submit a request to remove your recordings from the dataset
閣下嘅錄音會以匿名的形式保留喺 Common Voice 嘅數據集。當閣下刪除咗個人檔案後,就唔能夠再從數據集中刪除錄音。
Your daily goal has been created
成功訂立咗閣下嘅每日目標
Your download has started.
你嘅下載已經開始咗
You’re contributing to a target segment
你現正貢獻緊錄音畀一個目標細分群體
You’re contributing to our first target segment
閣下正為我哋第一條目標細分群體貢獻錄音
You're currently set to <bold>NOT</bold> receive emails such as goal reminders, my
progress updates and newsletters about Common Voice
你目前選擇 <bold>唔接收</bold> 包括目標提醒、進度更新、及 Common Voice 電子報嘅電郵。
You're currently set to receive emails such as goal reminders, my
progress updates and newsletters about Common Voice
你目前選擇接收包括目標提醒、進度更新、及 Common Voice 電子報嘅電郵。
Your email address
閣下電郵地址
Your files are being assembled. Please check again later.
你嘅文件喺度整理緊,請稍後再試。
Your Languages
你嘅語言
Your username and email will not be associated with the published data.
你嘅用户名同埋電郵地址,唔會連結落去公開發佈嘅數據。
Your weekly goal has been created
成功訂立咗閣下嘅每週目標
You've helped Common Voice reach <goalPercentage></goalPercentage> of our daily { $goalValue } recording goal!
閣下已幫助 Common Voice 完成每日 { $goalValue } 錄音目標嘅<goalPercentage></goalPercentage>!
You've helped Common Voice reach <goalPercentage></goalPercentage> of our daily { $goalValue } validation goal!
閣下已幫助 Common Voice 達到我哋每日 { $goalValue } 驗證目標嘅 <goalPercentage></goalPercentage>!
You've successfully signed up for contributing to { $language }. Thank you.
閣下經已成功登記為{ $language }貢獻。多謝。
Zip #{ $offset } of { $total }
{ $total } 個 zip 檔中嘅第 { $offset } 個
Zulu
祖魯語
abnormality
異常
account recovery key
帳户復原密匙
add-on
附加元件
aggregate
整合
aggregate
整體
All Hands
All Hands
alternate text
替代文字
appear
出現
appearance
外觀
attack
攻擊
attacker
攻擊者
Attack Site
有害網站
attendee
參加者
authenticate
驗證
authenticated
已驗證
autoplay
自動播放
back out
回退
backup authentication code
備份驗證編碼
black box
黑盒
bookmark
加書籤
bookmark
書籤
boolean
布林值
boot
開機
breakpoint
中斷點
Bug report
錯誤報告
certificate authority
數碼證書認證機構
channel
頻道
cipher
密碼
clipboard
剪貼簿
clockwise
順時針
colorway
配色
compact
壓縮
compact - recommend deprioritize
壓縮 - 推薦降低優先級
context
環境
contribute
貢獻出
contribute
貢獻
corrupt
損毀
corrupt
損毀
corrupted
已損毀
counterclockwise
逆時針
cryptominer
加密貨幣挖礦程式
cryptomining
加密貨幣挖礦
debug
偵錯
deceptive site
詐騙網站
decode
解碼
decryption
解密
disinformation
假消息
distrust
唔信任
eavesdropping
竊聽
enable
啟用
encode
加密
encrypt
加密
encryption
加密
end user
終端用户
Enhanced Tracking Protection
加強追蹤保護
executable file
可執行檔案
extension
擴充
external
外置
Fellow
夥伴
fingerprinter
數碼指紋追蹤程式
fingerprinting
指紋跟蹤
gear
紀念品
grassroots
草根
home page
首頁
idle
閒置
import
匯入
innovations
創新
insecure
唔安全
installation
安裝
L10n
本地化
Legacy extension
遺留擴展
MDN
MDN
misinformation
錯誤資訊
native
原生
open source
開源
override
覆蓋
participation
參與
Patch
補丁
pop-up
快顯
preference
喜好設定
release
發佈
report
報告
report
報告
revert
還原
search suggestion
搜尋建議
security keys
安全匙
sensitive
敏感
sidebar
側欄
sign
簽署
signer
簽署人
studies
研究
track
跟蹤
trackers
追蹤器
Tracking Content
追蹤性內容
Turbo Mode
渦輪模式
unencrypted
未加密
unresponsive
冇反應
unsafe
唔安全
validate
驗證
validity
有效性
version
版本
Web Authentication
網絡身份驗證