MTQE: Automated Quality Estimation

Predictive Translation Quality Scoring for AI Localization Pipelines

MTQE (Machine Translation Quality Estimation) in Custom.MT helps localization teams evaluate MT output at scale before human review. It scores every translated segment on a 0–100 range, allowing your workflow to automatically accept high-quality segments and focus human effort only where it’s needed.

Result

Less manual review, lower costs, higher consistency, and faster delivery across all languages.

Why Quality Estimation Matters

Traditional quality verification requires linguists to open, read, and judge every segment manually — a process that doesn’t grow.

With QE, your AI platform automatically answers:

“Is this segment good enough to publish?”

QE helps you

  • thumb up icon

    Control budgets by limiting human editing to low-scoring segments

  • green watch icon

    Reduce manual review by skipping high-quality MT output

  • arrow icon

    Create expandable, GenAI-augmented localization workflows

  • pencil icon

    Improve linguistic consistency across vendors, teams, and markets

How QE Works Inside the Template Workflow

step 1

Add QE as a workflow step

Appears alongside Machine Translation, Post-Editing, and Terminology.

step two icon

Choose your provider

Custom.MT supports multiple QE engines, including:

✔ TAUS – Available Now!
✔ ModernMT – Coming Soon
✔ OpenAI LLM QE – Coming Soon
✔ Claude LLM QE – Coming Soon

step three icon

Set your quality threshold

Example: 85/100 as the minimum acceptable quality.

70–80 → Suitable for user-generated content, FAQs, and support articles
90–95 → For high-visibility content requiring near-perfect quality
You can adjust thresholds per client, language pair, or content type.

step 4

During processing

Each segment receives a 0–100 score.

≥ threshold → automatically approved (locked/confirmed in your CAT tool)
< threshold → sent to Automatic Post-Editing or human review


Confirm
segments to mark them as finished while still allowing your team the freedom to make a quick, optional edit.

Lock segments to freeze the content entirely so that no human time or budget is spent on those segments.

step five icon

Real-time Translation Quality Estimation (TQE) Scoring in your CAT tool

As you navigate your document, the Custom.MT pane at the bottom of your screen provides instant “Traffic Light” guidance for the active segment.

  • Green (Above Threshold): The status bar turns green when the score meets your requirements, confirming the translation is safe to approve.
  • Red (Below Threshold): The status bar turns red if the score falls below your threshold, alerting you that the segment requires human review or automatic post-editing.
five + step icon

Combine MTQE + APE

Use the QE score to act as a smart filter for your Automatic Post-Editing engine. This ensures you only use AI rewriting resources where they are actually needed.

  • Low-scoring segments: Automatically routed to APE for correction and improvement.
  • High-scoring segments: Accepted as-is to bypass the APE step, saving both processing costs and time.
step 6 icon

Evaluate quality threshold over time

Update your quality threshold based on:

  • Risk Management: Set high QE thresholds (95+) for critical content and flexible scores (75–80) for low-risk internal docs.
  • Quality Audits: Compare QE scores against actual human effort to ensure your automation triggers remain accurate.
  • Engine Benchmarking: Use Custom.MT QE tools to measure performance and set optimal thresholds when switching MT providers or upgrading LLMs to maintain stable translation quality.

QE Workflows: Standard vs. Dual-Pass QE

Quality Estimation in Custom.MT is flexible enough to support different levels of automation — from fast, cost-efficient review to enterprise-grade quality assurance. Below are two recommended workflows depending on your content complexity, risk level, and required turnaround.

(Recommended for Most Teams)

MT → QE → APE → Human (only if needed)

MT generates the initial output.
QE (first pass) scores every segment and filters out high-quality results.

High-score segments → auto-approved
Low-score segments → routed for APE or human editors

APE (optional) improves only the flagged segments.
Human review happens only if required by content type.

Best for:

  • Large-range localization
  • Teams optimizing cost and speed
  • Marketing, UI, product content
  • Automated GenAI translation pipelines

(For High-Risk or Regulated Content)

MT → QE → APE → QE → Human (only if needed)

MT produces the initial translation.
QE (first pass) identifies low-quality segments.
Automatic Post-Editing (APE) automatically refines only the low-score segments using LLM-based post-editing.
QE (second pass) re-scores the improved output.

Segments now above the threshold → auto-approved
Segments still below the threshold → sent to linguists

Human editors work only on the most problematic parts.

Best for:

  • Life sciences, medical devices, pharma
  • Legal, financial, compliance content
  • Hardware safety instructions
  • Localization teams requiring audit trails or ISO compliance

Who Benefits From QE?

  • human icon

    LSPs & Internal Linguists

    ○ Edit only the segments that need attention
    ○ Faster turnaround
    ○ Clear segmentation of "safe to publish" vs. "requires review"

  • arrow icon

    Localization Managers

    ○ Predictable quality
    ○ Lower vendor costs
    ○ Fewer review loops
    ○ Faster time-to-market
    ○ Risk mitigation

  • building icon

    Enterprise Teams

    ○ Growable quality control
    ○ Consistent standards across large content volumes
    ○ Integrates into any GenAI-driven localization strategy

Choosing the Right QE System

Custom.MT supports several scoring engines, each with its strengths:

Model Description Best Use Case
ModernMT
Stable, fast scoring
General Workflow

Frequently Asked Questions

What is QE (Quality Estimation)?

Quality Estimation, often called MTQE in the translation world, is an AI-powered process that predicts the quality of a machine-translated segment instantly without needing a human to check it. It assigns a score to each sentence to help teams decide which parts are safe to use immediately and which require a professional editor. This technology essentially acts as an automated “first look” that saves time by filtering out high-quality translations from the risky ones.

What is the difference between MTQE and AI LQA?

While MTQE (Machine Translation Quality Estimation) focuses on predicting accuracy during the translation process, AI LQA (Language Quality Assurance) is an automated “audit” that evaluates the final output against specific style guides and error categories. Think of MTQE as an instant health check for a translation, whereas AI LQA is a more detailed post-exam that mimics a human editor’s feedback. Essentially, MTQE decides if a translation can pass through the workflow, while AI LQA explains exactly why a segment might be failing.

Does QE replace human reviewers?

No, but it reduces their workload. Humans focus only on segments below your quality threshold.

Which QE provider should I pick?

According to our benchmark, there is no single “best” provider, as performance varies significantly by language pair and business goals. General-purpose models like OpenAI and Claude often achieve higher raw scores across many languages, but specialized providers like ModernMT, TAUS, and Widn.AI are often easier to integrate into professional localization workflows. 

The benchmark reveals that a “one-size-fits-all” provider doesn’t exist, so the most effective strategy is to test multiple systems against your own language pairs and content types. You can use the Custom.MT threshold optimization tool to get a personalized summary of which models perform best for your specific needs.

Can I use QE across multiple CAT tools?

Right now, QE works on Trados, but we are going to expand across memoQ, XTM, Smartcat, and other connected tools.

Can QE trigger APE automatically?

Yes, you can auto-refine low-quality segments using APE prompts.

What is the best workflow: should I run QE before or after APE (Automatic Post-Editing)?

The optimal QE workflow depends on your content type and quality goals, but the most common setup is:
MT → QE → Post-editing (only for low-score segments)
 This flow ensures that high-quality machine-translated content is approved automatically, while linguists focus only on segments that fall below the threshold.

Which QE threshold should I use?

It depends on your quality expectations:

  • 85–90 → Recommended default for most enterprise content
  • 70–80 → Suitable for user-generated content, FAQs, and support articles
  • 90–95 → For high-visibility content requiring near-perfect quality.

    You can adjust thresholds per client, language pair, or content type.

Does QE work better with rule-based or LLM-based APE?

QE works well with both, but:

  • LLM-based APE benefits the most, because QE helps minimize unnecessary LLM calls.
  • Rule-based APE is faster and cheaper but may not improve low-quality segments enough.