Here is a step by step guide to publishing your language data on the TAUS Data Marketplace.

In November 2020, TAUS launched the Data Marketplace, an open data market where translators, language service providers, technology developers and enterprises come together to sell and buy language data for machine translation and other machine learning applications.

We’ve realized that there is not necessarily a lack of language data out there, but rather that the data is locked in silos or sits unused. Have you created or collected language data over time that’s been waiting to be put to use? Now you can easily monetize it, and here is how.

Step 1: Confirm that You Have Rights to Sell Data

Whether you are the customer (who paid for the translations), the LSP (the agency in the middle) or the translator (who produced the translation), be aware of the rights that you have when it comes to sharing and selling the translation memories or any other type of language data. We lay this out for you in the Data Marketplace FAQs. In general, the problems around ownership of translation data exist mostly in a theoretical setting. There are no records of legal cases around the ‘unlawful’ use of translation memories. If a problem arises, a ‘notice and take-down’ will settle it. 

For more insights and general guidance, read our article on Language Data Ownership and Copyright in Translation or download a free copy of the  Who Owns My Language Data White Paper. If you are still in doubt, you may want to check in with your customers before publishing the data in the Data Marketplace.

Step 2: Make your Data MT-ready

Data Marketplace prepares your data for sale by cleaning it. Navigate to the Sell Data section of the Data Marketplace and simply upload a TMX file (more file formats will be available soon) by dragging and dropping it or by clicking on the browse button and selecting a file from your computer. This initiates the data analysis process.

What you’ll see as a result are the counts and the quality of the data uploaded, specifically: High-quality - unique segments that passed the filtering; Replica - segments that already exist in the Data Marketplace, and Low-quality - segments that haven’t passed the filtering. You’ll also be able to see the sample of the segments that were marked as low-quality and won’t be included in the published file.

Step 3: Define Metadata and Price

If you’re happy with the result of the analysis, you can move onto the Publishing step, where you add metadata such as domain and content type to your dataset, and a description to make your dataset stand out from the rest. 


Step 4: Create an Account and Publish

In order to publish a dataset, you need to have an account with the Data Marketplace. That way you can edit your published datasets and we can make sure that you receive your payment when someone purchases the data.

Created an account? The very last thing you need to do is click on the Publish button.

Your document will now appear in the Data Marketplace, both under the Data Sellers section and when a potential buyer selects that language pair under the Explore Data tab.

The Data Marketplace has set out to become a vibrant and secure global  language data trading platform. You are in the right place if you want to share your language datasets with the wider AI and ML services providers. The Data Marketplace team is on call to support you in the steps, from listing your datasets to getting paid. Have any questions or concerns?  See complete publishing guidelines for sellers here, or reach us at at any point in time. 

5 minute read

New call-to-action