About Tatoeba

Tatoeba is more than just an idea. It's something already tangible... http://tatoeba.org.
The first lines of code were actually written four years ago, by someone who clearly had no idea what she was doing, but knew that she couldn't be the only one thinking "This would be so useful, why didn't anyone try to do this?"
The project initially intended to build a new kind of tool: a multilingual dictionary of sentences. A tool where you can search certain words, and it would return example sentences containing these words, with their translations in the desired languages. The name "Tatoeba" resulted from this concept, because "tatoeba" means "for example" in Japanese.
The dream behind Tatoeba was that you could search any word, and it would always return a result. Always. If not today, then tomorrow. And most importantly, it would return a result no matter what language you are searching from and what language you are searching to. This is obviously the kind of dream that can only be achieved through the contributions of thousands and thousands of people.
It was around this idea that the project started, but as it evolved, one thing became clear: it wasn't simply about building a new language tool, it was actually about building a whole ecosystem.

How can Tatoeba make the Web better?

The Web is filled with invaluable resources for language learning but...
    • They're mostly limited to English speakers and learners. It's very difficult (or even impossible) to find good language material for combinations of languages that doesn't involve English.
    • They're not structured. You can find a lot of information from language forums and blogs, but don't provide data that is directly "reusable". You would have to spend a lot of time filtering out the pieces you want and editing them.
    • They're not open. Building a language website/app (or even a textbook) would require you to either create more or less from scratch your own content, or pay for licensed content.
What Tatoeba will bring for sure is this:
    • Language learning material from any language, to any language. In the process we may see the emergence of language resources in endangered languages, and perhaps even save these languages.
    • A whole new generation of language learning tools and applications. Developers will eventually be able to stop wasting time on building content. Instead, they can let their creativity go wild on what can be done with all the nice, free and organized language data that will be available on the Web.
    • And why not a scientific breakthrough in linguistic-related fields.
    • Eventually, we're expecting to see changes that even we didn't expect :) But it all starts with changing people's mindset, and get them involved in opening up the data.

Immediate goals & roadmap

Strong community
    • We obviously want more people to use and contribute in Tatoeba, but not just that. We also want more and more developers to start re-using our data. And we want everyone to understand and spread the values that the project is based on.
    • We are planning to...
      • Contact "high profile" language learning bloggers and webmasters.
      • Contact developers who are building language tools.
      • Provide an API.
      • Organize events.
      • Reach out to communities of translators.
      • Get language teachers to use Tatoeba with their students.
      • Find more people to code with us.
Scalable system
    • We want Tatoeba to be able to handle thousands of visitors a day, thousands of new contributions a day, and support hundreds of languages.
    • We are planning to...
      • Review the interface.
      • Review the architecture.
      • Optimize, a lot.
Quality of the data
    • This is always a big problem with user-generated content: you never really know how reliable the content is. We want to have a system where mistakes can be easily found and quickly corrected. We want to be able to tell the people that they can safely use our data for learning purpose, and not be afraid to learn something wrong.
    • We are planning to...
      • Code better tools to filter the flow of activity.
      • Integrate some game mechanics to give incentive for people to contribute and notify mistakes, and to make contributing more enjoyable.

Information about the team

Several people have participated in coding Tatoeba, but we are currently two main active developers.
Trang
    • French student, graduating in a few months, majoring in Computer Science and Engineering.
    • Started Tatoeba back in 2006 because she wanted a sentences database for French-Japanese.
    • Wants to start a revolution in education someday.
    • Loves languages, especially Japanese.
Allan
    • French student, also in Computer Science (3rd year).
    • Joined Tatoeba in June 2009.
    • Linux geek, open source and free software activist.
    • Loves languages, especially Shanghainese.