Alsatian (gsw) · Common Voice

Contribution Guidelines
Understand how to contribute and validate sentences and audio clips to the Common Voice dataset
Voice Collection
Sentence Collection
Varying Pronunciations
Misreadings
Offensive Content
Background Noise
Background Voices
Volume
Reader Effects
Just Unsure?
Example
We welcome different accents! Be very cautious before rejecting a clip on the ground that you think the reader has mispronounced a word, has put the stress in the wrong place, or has ignored punctuation. There are a wide variety of pronunciations in use around the world, some of which you may not have heard in your local community. Please provide a generous margin of appreciation for those who may speak differently from you.
On the other hand, if you think that the reader has never come across the word before, and is making an incorrect guess at the pronunciation, please reject. If you are unsure, use the skip button.
The route was unclear.
[Canadian English might make "route" sound like "rowt"]
[British English might make "root"]
Sentences are vetted through a community-moderation process, however this process is not perfect. If you see or hear a sentence that offends or upsets you - for example because it violates our <participationGuidelines>community participation guidelines</participationGuidelines> - please do use the flag button in the UI. You can also reach out to us at <emailFragment>commonvoice@mozilla.com</emailFragment>.
Reading all the words on the page correctly does matter. When listening, check very carefully that what has been recorded is exactly what has been written; reject if they have added, contracted or missed words.
Very common mistakes include:
Missing 'A' or 'The' at the beginning of the recording.
Missing an 's' at the end of a word.
Reading contractions that aren't actually there, such as "We're" instead of "We are", or vice versa.
Missing the end of the last word by cutting off the recording too quickly.
Taking several attempts to read a word.
We are going out to get coffee.
We're going out to get coffee.
We are going out to get a coffee.
The bumblebee sped by.
[Should be “We are”]
[No ‘a’ in the original text]
[Mismatched content]
You need to be able to hear every word of the recording. We want machine learning algorithms to be able to handle a variety of background noise, and even relatively loud noises or quiet background music can be accepted provided that they don’t prevent you from hearing the entirety of the text. Crackles or ‘breaking up’ that prevent you hearing the text means you should reject the clip.
The giant dinosaurs of the Triassic.
[Sneeze] The giant dinosaurs of the [cough] Triassic.
The giant dino [cough] the Triassic.
[Crackle] giant dinosaurs of [crackle] -riassic.
[interrupted by background noise]
[Part of the text can’t be heard]
A little background noise is okay, but if you can hear another person speaking distinct words, the clip should be rejected. Typically this happens where the TV has been left on, or where there is a conversation going on nearby.
The giant dinosaurs of the Triassic. [read by one voice]
Are you coming? [called by another]
There will be natural variations in volume between readers. Reject only if the volume is so high that the recording breaks up, or (more commonly) if it is so low that you can’t hear what is being said without reference to the written text.
Most recordings are of people talking in their natural voice. You can accept the occasional non-standard recording that is shouted, whispered, or obviously delivered in a ‘dramatic’ voice. Please reject sung recordings and those using a computer-synthesized voice.
If you come across something that these guidelines don’t cover, please vote according to your best judgement. If you really can’t decide, use the skip button and go on to the next recording.
Still have questions?
Contact the Common Voice team
Public Domain
Citing Sentences
Common Voice is Mozilla's initiative to help teach machines how real people speak.
Su Common nakeuh Inisiatif Mozilla jibantu peurunoe meusen kiban cara ureueng geupeugah haba.
Mozilla Common Voice is an initiative to help teach machines how real people speak.
Speak up, contribute here!
Peugah haba, tuléh hinoe!
Voice is natural, voice is human. That’s why we’re fascinated with creating usable voice technology for our machines. But to create voice systems, an extremely large amount of voice data is required.
Most of the data used by large companies isn’t available to the majority of people. We think that stifles innovation. So we’ve launched Project Common Voice, a project to help make voice recognition open to everyone.
Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web. Read a sentence to help machines learn how real people speak. Check the work of other contributors to improve the quality. It’s that simple!
Voice is natural, voice is human. That’s why we’re excited about creating usable voice technology for our machines. But to create voice systems, developers need an extremely large amount of voice data.
Most of the data used by large companies isn’t available to the majority of people. We think that stifles innovation. So we’ve launched Common Voice, a project to help make voice recognition open and accessible to everyone.
Read More
Beuet Le Lom
Help us validate sentences!
Neutulông kamoe peusahèh kalimat!
Press play, listen & tell us: did they accurately speak the sentence below?
Looks like there aren't any clips to listen to in this language. Help us fill the queue by recording some now.
Press { shortcut-play-toggle } to toggle play mode
Recording voice clips is an integral part of building our open dataset; some would say it's the fun part too.
Clips recorded
Klip teureukam
Validating donated clips is equally important to the Common Voice mission. Take a listen and help us create quality open source voice data.
Clips validated
Hours Recorded
Hours Validated
Voices Online Now
Today's Progress
Help us get to { $goal }
Have you read our Terms?
Ready to donate your voice?
All
Today
Uroë Nyoë
{ $count }wk
{ $count }mo
{ $count }y
Help us build a high quality, publicly open dataset
Sign up for an account
sign up for email updates
Sign up for Common Voice newsletters, goal reminders and progress updates
Benefits
Make your submitted data as rich as possible by providing some anonymous demographic data. We de-identify all demographic data before making it public.
Profile information improves the audio data used in training speech recognition accuracy.
Keep track of your progress and metrics across multiple languages.
See how your progress compares to other contributors all over the world.
View your progress against personal and project goals.
Optionally join on our email list for updates and new information about the project.
What's Public?
We will not make your email public.
The number of recordings and which languages you contribute to will be public.
You can choose to make your username public or anonymous.
Optionally submitted demographic data (e.g. age, gender, language, and accent) will never be made public on your profile, and will not be linked to your account in the dataset. Individual audio clips will be associated with demographic data for the purpose of more accurate analysis - for example, a researcher might want to target a training model to a specific demographic segment.
Your username and email will not be associated with the published data.
Welcome { $company } staff!
You can help build a diverse, open-source dataset by creating a Common Voice profile and contributing your voice.
Log In / Sign Up with { $company } email
Having a profile is not required to contribute though it is helpful, see why below.
Common Voice is Mozilla's initiative to help teach machines how real people speak.
Su Common nakeuh Inisiatif Mozilla jibantu peurunoe meusen kiban cara ureueng geupeugah haba.
Mozilla Common Voice is an initiative to help teach machines how real people speak.
Speak up, contribute here!
Peugah haba, tuléh hinoe!
Voice is natural, voice is human. That’s why we’re fascinated with creating usable voice technology for our machines. But to create voice systems, an extremely large amount of voice data is required.
Most of the data used by large companies isn’t available to the majority of people. We think that stifles innovation. So we’ve launched Project Common Voice, a project to help make voice recognition open to everyone.
Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web. Read a sentence to help machines learn how real people speak. Check the work of other contributors to improve the quality. It’s that simple!
Voice is natural, voice is human. That’s why we’re excited about creating usable voice technology for our machines. But to create voice systems, developers need an extremely large amount of voice data.
Most of the data used by large companies isn’t available to the majority of people. We think that stifles innovation. So we’ve launched Common Voice, a project to help make voice recognition open and accessible to everyone.
Read More
Beuet Le Lom
Help us validate sentences!
Neutulông kamoe peusahèh kalimat!
Press play, listen & tell us: did they accurately speak the sentence below?
Looks like there aren't any clips to listen to in this language. Help us fill the queue by recording some now.
Press { shortcut-play-toggle } to toggle play mode
Recording voice clips is an integral part of building our open dataset; some would say it's the fun part too.
Clips recorded
Klip teureukam
Validating donated clips is equally important to the Common Voice mission. Take a listen and help us create quality open source voice data.
Clips validated
Hours Recorded
Hours Validated
Voices Online Now
Today's Progress
Help us get to { $goal }
Have you read our Terms?
Ready to donate your voice?
All
Today
Uroë Nyoë
{ $count }wk
{ $count }mo
{ $count }y
Help us build a high quality, publicly open dataset
Sign up for an account
sign up for email updates
Sign up for Common Voice newsletters, goal reminders and progress updates
Benefits
Make your submitted data as rich as possible by providing some anonymous demographic data. We de-identify all demographic data before making it public.
Profile information improves the audio data used in training speech recognition accuracy.
Keep track of your progress and metrics across multiple languages.
See how your progress compares to other contributors all over the world.
View your progress against personal and project goals.
Optionally join on our email list for updates and new information about the project.
What's Public?
We will not make your email public.
The number of recordings and which languages you contribute to will be public.
You can choose to make your username public or anonymous.
Optionally submitted demographic data (e.g. age, gender, language, and accent) will never be made public on your profile, and will not be linked to your account in the dataset. Individual audio clips will be associated with demographic data for the purpose of more accurate analysis - for example, a researcher might want to target a training model to a specific demographic segment.
Your username and email will not be associated with the published data.
Welcome { $company } staff!
You can help build a diverse, open-source dataset by creating a Common Voice profile and contributing your voice.
Log In / Sign Up with { $company } email
Having a profile is not required to contribute though it is helpful, see why below.
Common Voice is Mozilla's initiative to help teach machines how real people speak.
Su Common nakeuh Inisiatif Mozilla jibantu peurunoe meusen kiban cara ureueng geupeugah haba.
Mozilla Common Voice is an initiative to help teach machines how real people speak.
Speak up, contribute here!
Peugah haba, tuléh hinoe!
Voice is natural, voice is human. That’s why we’re fascinated with creating usable voice technology for our machines. But to create voice systems, an extremely large amount of voice data is required.
Most of the data used by large companies isn’t available to the majority of people. We think that stifles innovation. So we’ve launched Project Common Voice, a project to help make voice recognition open to everyone.
Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web. Read a sentence to help machines learn how real people speak. Check the work of other contributors to improve the quality. It’s that simple!
Voice is natural, voice is human. That’s why we’re excited about creating usable voice technology for our machines. But to create voice systems, developers need an extremely large amount of voice data.
Most of the data used by large companies isn’t available to the majority of people. We think that stifles innovation. So we’ve launched Common Voice, a project to help make voice recognition open and accessible to everyone.
Read More
Beuet Le Lom
Help us validate sentences!
Neutulông kamoe peusahèh kalimat!
Press play, listen & tell us: did they accurately speak the sentence below?
Looks like there aren't any clips to listen to in this language. Help us fill the queue by recording some now.
Press { shortcut-play-toggle } to toggle play mode
Recording voice clips is an integral part of building our open dataset; some would say it's the fun part too.
Clips recorded
Klip teureukam
Validating donated clips is equally important to the Common Voice mission. Take a listen and help us create quality open source voice data.
Clips validated
Hours Recorded
Hours Validated
Voices Online Now
Today's Progress
Help us get to { $goal }
Have you read our Terms?
Ready to donate your voice?
All
Today
Uroë Nyoë
{ $count }wk
{ $count }mo
{ $count }y
Help us build a high quality, publicly open dataset
Sign up for an account
sign up for email updates
Sign up for Common Voice newsletters, goal reminders and progress updates
Benefits
Make your submitted data as rich as possible by providing some anonymous demographic data. We de-identify all demographic data before making it public.
Profile information improves the audio data used in training speech recognition accuracy.
Keep track of your progress and metrics across multiple languages.
See how your progress compares to other contributors all over the world.
View your progress against personal and project goals.
Optionally join on our email list for updates and new information about the project.
What's Public?
We will not make your email public.
The number of recordings and which languages you contribute to will be public.
You can choose to make your username public or anonymous.
Optionally submitted demographic data (e.g. age, gender, language, and accent) will never be made public on your profile, and will not be linked to your account in the dataset. Individual audio clips will be associated with demographic data for the purpose of more accurate analysis - for example, a researcher might want to target a training model to a specific demographic segment.
Your username and email will not be associated with the published data.
Welcome { $company } staff!
You can help build a diverse, open-source dataset by creating a Common Voice profile and contributing your voice.
Log In / Sign Up with { $company } email
Having a profile is not required to contribute though it is helpful, see why below.
Let's Get Started
Welcome to Common Voice
Interested in learning more and contributing to the project?
Common Voice is the world’s largest publicly available, multi-language voice dataset.
Thanks to contributions from over 259k people in over 50 languages, this data is being used to train speech-enabled applications to better respond to the human voice.
Next
Back
Browse Languages
2019 End-of-Year Release
Voice Dataset, Ready for Download
Account
Having an account is not required to contribute, though it is helpful.
To the right we outline the benefits and clarify what information we make public. Use the links below to get started with a Common Voice account on your own device.
Enter email to send a sign up link
Send sign up link
Ready to add your voice or lend your ear?
Now that you know a little bit more about Common Voice, why not try it out? Click on the microphone icon to start reading sentences aloud. <br/><br/>If you prefer to review other people's voice contributions, click on the play icon. You’ll help confirm that recordings match the sentences written on screen.
Ready to contribute?
Personal dashboards keep you up-to-date with individual and community progress.
For every voice clip donated, and every audio clip validated, your account dashboards are updated to reflect your latest progress in each language you contribute to. Yes, you can contribute to more than one!<br/><br/> Use dashboards to track your stats, see how you're doing alongside others in the community, and set daily or weekly contribution goals.

Contribution Guidelines

COMMENT GUIDELINES PAGE

CONTEXT guidelines-header•web/locales/common-voice/en/pages/guidelines.ftl•Common Voice

CREATED November 8, 2024 01:11:41 PM

No translations available.

TERMS
COMMENTS

No terms available.

MACHINERY
LOCALES91