Request For Proposal: Machine translation evaluation in the context of scholarly communication
OPERAS AISBL
PROPOSALS DUE BY: 23 December 2022
Company Background
OPERAS is the Research Infrastructure supporting open scholarly communication in the social sciences and humanities (SSH) in the European Research Area. Its mission is to coordinate and federate resources in Europe to efficiently address the scholarly communication needs of European researchers in the field of SSH.
Project Overview
In 2020, the French Ministry of Higher Education and Research (MESR) launched the Translations and Open Science project with the aim to explore the opportunities offered by translation technologies to foster multilingualism in scholarly communication and thus help to remove language barriers according to Open Science principles.
During the initial phase of the project (2020), a first working group, made up of experts in natural language processing and translation, published a report suggesting recommendations and avenues for experimentation with a view to establishing a scientific translation service combining relevant technologies, resources and human skills.
Once developed, the scientific translation service is intended to:
- address the needs of different users, including researchers (authors and readers), readers outside the academic community, publishers of scientific texts, dissemination platforms or open archives;
- combine specialised language technologies and human skills, in particular adapted machine translation engines and in-domain language resources to support the translation process;
- be founded on the principles of open science, hence based on open-source software as well as shareable resources, and used to produce open access translations.
Project Goals
In order to follow up on recommendations and lay the foundation of the translation service, the OPERAS Research Infrastructure was commissioned by the MESR to coordinate a series of preparatory studies in the following areas:
- Mapping and collection of scientific bilingual corpora: identifying and defining the conditions for collecting and preparing corpora of bilingual scientific texts which will serve as training dataset for specialised translation engines, source data for terminology extraction, and translation memory creation.
- Use case study for a technology-based scientific translation service: drafting an overview of the current translation practices in scholarly communication and defining the use cases of a technology-based scientific translation service (associated features, expected quality, editorial and technical workflows, and involved human experts).
- Machine translation evaluation in the context of scholarly communication: evaluating a set of translation engines to translate specialised texts.
- Roadmap and budget projections: making budget projections to anticipate the costs to develop and run the service.
The four preparatory studies are planned during a one-year period as of September 2022.
The present call for tenders only covers the (3) Machine translation evaluation in the context of scholarly communication.
One last call will be released in the coming months for the following study: (4) Roadmap and budget projections.
The (1) Mapping and collection of corpora and (2) Use case study calls are now closed.
Scope of Work
In the neural age, machine translation has dramatically improved both in terms of performance and availability. In the light of such developments, this technology is often seen as the solution to break language barriers and promote multilingualism in many fields, from institutional communication to cultural or entertainment content. However, while certainly highlighting the benefits of this tool, experiences to date call for some caution: the use of machine translation should be critically considered and requires some preliminary tests.
When it comes to the choice of a machine translation tool, one of the crucial steps of the selection process is evaluation. As the number of available solutions increases, such evaluation provides information on the performance and the relevance of a given machine translation engine in a specific domain. In order to avoid any bias, evaluation must be based on objective and accurate criteria – such as quality expectations, error classification and severity, etc. – defined according to the context in which machine translation is intended to be used.
Therefore, OPERAS welcomes proposals from public and private entities to perform machine translation evaluation in the context of scholarly communication. The evaluation will cover a maximum of three translation engines in the English-French language pair and in three scientific disciplines, one from each of following macro-domains: 1) Life Sciences (EN → FR); 2) Physical Sciences and Engineering (EN → FR); 3) Social Sciences and Humanities (FR → EN). The final selection of the three scientific disciplines which will be the object of the evaluation task will be notified early December at the latest. For more information about the disciplines considered, please refer to the ERC panel structure available here.
For each discipline, the selected contractor will be provided with specialised datasets (bilingual parallel corpora and terminology databases) in order to perform machine translation fine-tuning prior to evaluation. The bidders are therefore expected to suggest in their proposal three possible machine translation engines to be fine-tuned and evaluated as part of this task. The selection committee will attach the greatest importance to software openness: open-source technologies should therefore be privileged and each proposal should include at least one open-source machine translation engine to be evaluated. The final choice of the engines to be included in the task will be made in collaboration with the steering and scientific committees of the project.
The selected contractor will also be asked to suggest relevant scientific publications that can be used for the purpose of the evaluation. These should include full texts of scholarly publications as well as their metadata – in particular titles, abstracts and keywords. The final selection of the texts to be used for the evaluation will be made in collaboration with the steering and scientific committees of the project.
In particular, proposals will be considered according to the following specifications:
- The suggested fine-tuning and evaluation methodology should be replicable at different stages of the project, including production.
- Evaluation criteria should be clearly defined and relevant to the scholarly communication context.
- The evaluation should provide information about the usability of the raw machine translation output for different purposes, from writing assistance to post-editing or gisting. To this aim, at least the following personas* should be taken into account in the evaluation:
- A researcher using machine translation as a support to write a paper in a foreing language or to translate a paper into a foreign language;
- A translator using machine translation to perform post-editing in a computer-assisted translation environment;
- A reader using machine translation to get an idea of the content of a scientific publication.
- A measure of post-editing effort should be provided and preferably based on the following three dimensions: productivity (time to post-edit), technical (number of keystrokes), or cognitive (perception of effort or eye-tracking).
- The evaluation should be performed at a document level rather than a sentence level.
- Even though the selected contractor plans to use automatic metrics (to be clarified in the proposal), the evaluation should also involve human translators who are native speakers of the target language and specialise in the disciplines that are the object of the evaluation task.
- Reference human translations should be taken into account in the evaluation of the machine translation output. In case human translations are not available for the texts selected for the evaluation, the bidders may allocate a part of the budget proposed to pay for professional human translations.
- If several translators are involved in the evaluation (preferred scenario), Inter-Annotator Agreement should be measured.
- Information on the usability of the machine translation engine from the different personas* identified above would be an asset. In order to provide such information, the evaluation should answer questions like: how can different kinds of users interact with the machine translation engine? How smoothly can the user interface of the machine translation engine be used by a researcher or a non-professional translator? Can the machine translation engine be used within a computer assisted translation environment via API? Can the machine translation engine be integrated into websites? etc.
*More information about these personas and their practices will be provided by the use-case study planned as part of the Translations and Open Science project. Please see details here.
Target Deliverables and Schedule
- A clear description of the methodology used to fine-tune and evaluate the machine translation engines, so as to allow replication in the future (for example, with a larger number of disciplines or with production constraints).
- One report containing evaluation methodology details, evaluation outcome information and recommendations on machine translation deployment (best practices, suggested engines, etc.) for each one of the three disciplines that are the object of the evaluation task. Each of the three reports should not exceed five single-spaced pages.
- MQM-based annotation of each machine-translated publication included in the task.
- Post-edited version of each machine-translated publication included in the task with post-editing effort measures.
- A final report including a financial statement and a description and justification of any deviations that occurred during the course of the completion of the Work in terms of: work organisation, resource allocations, content of the deliverables.
Final Project Due: 30 June 2023
Bid period: 1 November to 23 December 2022
Result notification: 19 January 2023 EOD
Service starting date: 1 March 2023
Expected turnaround time: 4 months
Existing Roadblocks Or Technical Issues
Strict time frame calculated to comply with the planning of the four preparatory studies.
Budget Constraints
Budget range: €80,000-€100,000.
The budget amount proposed by the bidders must include all taxes, and in particular 21% VAT BE (reverse charge mechanism article 44 and 196 directive 2006/112/EC).
Evaluation Metrics
OPERAS will evaluate bidders and proposals based on the following criteria:
- Experience in machine translation training and evaluation, especially for scholarly content
- Experience in translation workflow and quality management
- Accuracy of the process and methodology
- Achievability of deliverables
- Adequacy of requested resources and expected results
Questions Bidders Must Answer To Be Considered
Bidders are asked to submit a service proposal describing the tasks that they will be able to perform in relation to the present call for tenders during a four-month period starting from 1 March 2023.
In particular, bidders are asked to include in their response the following information:
- Detailed description of the machine translation training process, including information on the translation engines that the bidder expects to fine-tune and evaluate
- Detailed description of the methodology and expertise that will be used to perform the machine translation evaluation
- Generic description of the team involved in the project (profiles and level of expertise)
- Provisional planning of the service tasks (Gantt chart required)
- Detailed budget
Submission Requirements
Bidders must adhere to the following guidelines to be considered:
- Only bidders who meet all 5 metrics in the evaluation section should submit a proposal.
- Proposals must be sent in by e-mail before 23 December 2022 to Susanna Fiorini (susanna.fiorini[a]operaseu.org). Bidders who are interested in submitting a proposal should inform Susanna Fiorini no later than 2 December 2022.
- Include samples and references as an appendix to the proposal.
- Proposals (excluding appendix) should not exceed 6 pages.
- A proposed schedule must also be included and clearly expressed in the proposal.
What We’re Looking For in Potential Vendors
The call is open to public and private vendors, regardless of their country of establishment.
We are particularly interested in receiving proposals from organisations with experience or interest in scholarly research and communication.
We attach great value to sustainable and ethical business models.
Vendors should be able to ensure smooth communication with the steering committee throughout the duration of the project.
Contact Information
For questions or concerns connected to this RFP, we can be reached at:
Susanna Fiorini
susanna.fiorini[a]operaseu.org