Improve YouTube Translation

INFO

This is a time-sensitive article. Some screenshots and examples may be different from the latest version, but the principle of improvement described has not changed.

In the latest version, the "Sentence Segmentation Mode" setting option mentioned in this article has been deleted and replaced with a different working mode:

  • Standard mode, equivalent to "Sentence Segmentation Mode" set to "Enhance".
  • Simple mode, equivalent to "Sentence Segmentation Mode" set to "Normal".

Overview

Everyone knows that YouTube has native translation features, and subtitle automatically generated through speech recognition technology, referred to as "auto subtitle".

Therefore, to implement dual languages subtitle, the usual approach is to request two subtitle files from YouTube, the original and the translation, the time values of the two files are usually completely corresponding, and then concat the original and the translation according to the same time value.

You may have heard that Google’s translation technology has made breakthroughs, but if you are a deep YouTube user, you must have found that YouTube’s translation quality is often unsatisfactory. However, the translation quality of YouTube can be improved. The following explains the improved principle of Dualsub.

Take this The history of tea video as an example, the original sound language of the video is English, and translate it to Chinese, improve hunman subtitle and auto subtitle respectively.

Improve Human Subtitle

Here is part of the YouTube translation results:

00:04:21.389 --> 00:04:25.699 Today, tea is the second most consumed beverage in the world after water, 今天,茶是仅次于水的世界上消费量第二大的饮料, 00:04:25.699 --> 00:04:27.449 and from sugary Turkish Rize tea, 还有含糖的土耳其里兹茶, 00:04:27.449 --> 00:04:29.440 to salty Tibetan butter tea, 咸西藏黄油茶 00:04:29.440 --> 00:04:32.410 there are almost as many ways of preparing the beverage 几乎有多种方法来准备饮料 00:04:32.410 --> 00:04:34.299 as there are cultures on the globe. 因为全球都有文化。

Let's evaluate these translations:

  • Entry 1: Looks good.
  • Entry 2-3: Still acceptable.
  • Entry 4-5: Unsatisfactory.

If you paste the original to https://translate.google.com/ and translate it, you will get the same result:

What's interesting is that if you join these lines into one line with spaces, the translation result will become:

The translation quality has magically raised a level, as good as human translation.

Perhaps you have already seen the issue. YouTube translation is contextless, each subtitle dialogue is translated in individual context.

To get better translation quality, the translation should be contextual, multiple subtitle dialogues should be translated in a single context.

Therefore Dualsub introduces a setting option "translation mode":

  • Normal: Translate in contextless approach, the same as YouTube.
  • Enhance: Translate in contextual approach, and other improved tricks.

When you use the "Enhance" mode, the translation result will become:

00:04:21.389 --> 00:04:25.699 Today, tea is the second most consumed beverage in the world after water, 如今,茶已成为仅次于水的世界上消费量第二大的饮料, 00:04:25.699 --> 00:04:27.449 and from sugary Turkish Rize tea, 从含糖的土耳其里兹茶到咸味的西藏黄油茶, 00:04:27.449 --> 00:04:29.440 to salty Tibetan butter tea, 从含糖的土耳其里兹茶到咸味的西藏黄油茶, 00:04:29.440 --> 00:04:32.410 there are almost as many ways of preparing the beverage 制备饮料的方法几乎与全球文化一样多。 00:04:32.410 --> 00:04:34.299 as there are cultures on the globe. 制备饮料的方法几乎与全球文化一样多。

In this translation mode, the translation may not completely correspond to the original. A translation may correspond to multiple originals, and vice versa, even the order of subtitle dialogues will be exchanged:

00:02:14.855 --> 00:02:17.806 This gave China a great deal of power and economic influence 随着饮茶在世界范围内的传播, 00:02:17.806 --> 00:02:20.585 as tea drinking spread around the world. 这给了中国很大的力量和经济影响力。

In addition, there are some subtitles that are not suitable for use in the "Enhance" mode, such as lyric, because the lyric may not be a complete sentence, even without punctuation, and deliberately segmented. So using the "Enhance" mode may not be able to improve the quality of translation.

Improve Auto Subtitle

The term "auto subtitle" refers to the subtitle with the suffix "(auto-generated)" in the YouTube native subtitle menu.

Please know that there are two time value formats for subtitle.

One is the "sentence-based" format, for example:

00:00:00.000 --> 00:00:03.000 aaa bbb ccc

And the other is the "vocabulary-based" format, for example:

00:00:00.000 xxx 00:00:01.000 yyy 00:00:02.000 zzz

Obviously, human subtitle is "sentence-based", and auto subtitle is "vocabulary-based", so the native render effect of auto subtitle dialogue is "slide up vocabulary by vocabulary".

However, after being translated by YouTube, the format of auto subtitle is "sentence-based", here is part of the translation results:

00:01:42.640 --> 00:01:46.880 in the 9th century during the tang dynasty a japanese monk brought the 在9世纪唐朝期间,日本和尚带来了 00:01:46.880 --> 00:01:50.880 first tea plant to japan the japanese eventually developed their 日本最早的茶厂日本人最终发展了他们的 00:01:50.880 --> 00:01:54.880 own unique rituals around tea leading to the creation of the japanese 围绕茶的独特仪式导致日本人的创作 00:01:54.880 --> 00:01:58.240 tea ceremony and in the 14th century during the ming 茶道与明朝的14世纪 00:01:58.240 --> 00:02:01.840 dynasty the chinese emperor shifted the standard from tea 王朝将中国皇帝的标准从茶叶转移到了中国

From this we can guess how YouTube translates auto subtitle:

  1. Concat vocabularies into sentence, and each sentence does not exceed 80 characters.
  2. Each sentence is translated individually, it is contextless translation.

All steps are simple and rough, so the translation quality is rather bad.

Before doing translation, it is necessary to "concat vocabularies into sentence", also called "sentence segmentation", but the original is all lowercase letters without punctuation, which brings trouble to sentence segmentation.

Another approach is "evaluating tempo". When a person finishes speaking a sentence, he usually pauses for a while before saying the next sentence.

So Dualsub introduces a setting option "Sentence Segmentation Mode":

  • Normal: According to the character count in one sentence, the same as YouTube.
  • Enhance: According to the pause time between two vocabularies, and other improved tricks.

When you use the "Enhance" mode to do sentence segmentation, then use the "Enhance" mode to do translation, the results are as follows:

00:01:42.630 --> 00:01:45.520 in the 9th century during the tang dynasty, 在唐朝的9世纪, 00:01:45.520 --> 00:01:49.30 a japanese monk brought the first tea plant to japan, 一位日本僧侣将第一棵茶树带到了日本, 00:01:49.30 --> 00:01:52.950 the japanese eventually developed their own unique rituals around tea 日本人最终围绕茶制定了自己独特的礼节, 00:01:52.950 --> 00:01:56.149 leading to the creation of the japanese tea ceremony, 从而创立了日本茶道, 00:01:56.149 --> 00:01:59.119 and in the 14th century during the ming dynasty, 在明朝的14世纪, 00:01:59.119 --> 00:02:04.560 the chinese emperor shifted the standard from tea pressed into cakes to loose-leaf tea, 中国皇帝将标准从茶压成饼转变为活叶茶,

It can be seen that the translation quality has been significantly improved. Of course, this approach is not suitable for all situations, but it is still much better than the "counting characters" approach.