Breaking down language barriers with Amazon Translate

Breaking down language barriers with Amazon Translate

This blog post will discuss leveraging Amazon Translate to make a blog post in a different language.

I will show you a hypothetical architecture diagram borrowing the Publish button of hashnode.

Screenshot 2022-01-13 at 09.09.32.png

Amazon Translate

Amazon Translate will help you translate any text on-demand using advanced machine learning technologies.

Amazon Translate can be used to translate on-demand text but can also integrate with other AWS services:

  • Amazon Comprehend.
  • Amazon Transcribe.
  • Amazon Polly.
  • Amazon S3.
  • Amazon DynamoDB, Amazon Aurora.
  • AWS Lambda.

Max length of request text allowed is 5000 bytes

Amazon Translate service limits are based on bytes instead of characters. In most cases, the number of bytes and the number of characters are the same. However, in some languages, a single character is represented by multiple bytes.

If I run a small article like this one in Real-time translation, I will get an exception:

{
  "error": "Translate.TextSizeLimitExceededException",
  "cause": "Input text size exceeds limit. Max length of request text allowed is 5000 bytes while in this request the text size is 5568 bytes"
}

Workaround

You can split large documents into smaller parts to keep the total document size below the document size limit and stitch them all together in the end.

Alternative, when the user clicks publishing, I can add a flag that marks the usage of the Asynchronous batch processing based on the size of the article. The flag will be used in the AWS Step Functions to call a different flow, and now I can have a size limit per document of 1 MB (a separate article).

Architecture

This design is far from complete, and it does not consider many things. However, it is just an example, and you should read it as a possible action flow.

blog.png

The user clicks Publish, and this action will go to an AWS AppSync endpoint where there is a Lambda Resolver that will save the article in DynamoDB and emit an event like ArticlePublished that will execute AWS Step Functions.

The event details could be:

{
  "blogId": "16c28bb45c28d2ac6284fa9bda3a17aba355d020",
  "sourceLanguageCode": "en",
  "targetLanguageCode": "it",
  "onFly": true
}

This event considers the translation only for one language, but if I want to target multiple languages, the options are:

  • One event for each language
  • The AWS Step Functions could load your settings and Loop through your configuration, and translate for each language you set on your profile.

I keep it simple for this blog post, using the "One event for each language".

Amazon Translate in action

I will use AWS Step Functions to implement the translation flow. For example, I could have:

sf-design.png

You can try yourself using the following definition:

{
  "Comment": "A description of my state machine",
  "StartAt": "OnFly required",
  "States": {
    "OnFly required": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.onFly",
          "BooleanEquals": false,
          "Next": "Asynchronous Batch Processing with Amazon Translate"
        }
      ],
      "Default": "GetArticle"
    },
    "GetArticle": {
      "Type": "Task",
      "Resource": "arn:aws:states:::dynamodb:getItem",
      "Parameters": {
        "TableName": "article",
        "Key": {
          "blogId": {
            "S.$": "$.blogId"
          }
        }
      },
      "Next": "Map Input",
      "ResultPath": "$.article"
    },
    "Map Input": {
      "Type": "Pass",
      "Next": "TranslateText",
      "Parameters": {
        "blogId.$": "$.blogId",
        "sourceLanguageCode.$": "$.sourceLanguageCode",
        "targetLanguageCode.$": "$.targetLanguageCode",
        "text.$": "$.article.Item.text.S"
      }
    },
    "Asynchronous Batch Processing with Amazon Translate": {
      "Type": "Succeed"
    },
    "TranslateText": {
      "Type": "Task",
      "Parameters": {
        "SourceLanguageCode.$": "$.sourceLanguageCode",
        "TargetLanguageCode.$": "$.targetLanguageCode",
        "Text.$": "$.text"
      },
      "Resource": "arn:aws:states:::aws-sdk:translate:translateText",
      "ResultPath": "$.translation",
      "Next": "Map Translation"
    },
    "Map Translation": {
      "Type": "Pass",
      "Next": "Save translation",
      "Parameters": {
        "blogId.$": "$.blogId",
        "sourceLanguageCode.$": "$.sourceLanguageCode",
        "targetLanguageCode.$": "$.targetLanguageCode",
        "translation.$": "$.translation.TranslatedText"
      }
    },
    "Save translation": {
      "Type": "Task",
      "Resource": "arn:aws:states:::dynamodb:updateItem",
      "Parameters": {
        "TableName": "article_translated",
        "Key": {
          "blogId": {
            "S.$": "$.blogId"
          },
          "targetLanguageCode": {
            "S.$": "$.targetLanguageCode"
          }
        },
        "UpdateExpression": "SET article = :myValueRef",
        "ExpressionAttributeValues": {
          ":myValueRef": {
            "S.$": "$.translation"
          }
        }
      },
      "Next": "Up to the imagination from now on"
    },
    "Up to the imagination from now on": {
      "Type": "Succeed"
    }
  }
}

Because the AWS Step Functions integrates with Amazon DynamoDB and Amazon Translate, I need to grant the permission:

"Action": "dynamodb:UpdateItem",
"Resource": "arn:aws:dynamodb:eu-central-1:xxxx:table/article_translated"

"Action": "dynamodb:GetItem",
"Resource": "arn:aws:dynamodb:eu-central-1:xxxx:table:table/article"

"Action": "translate:TranslateText",
"Resource": "*"

Once the article is retrieved from Amazon DynamoDB, we do a mapping to change the format to pass as input of the Translate text state the following:

{
  "sourceLanguageCode": "en",
  "text": "my super article",
  "blogId": "16c28bb45c28d2ac6284fa9bda3a17aba355d020",
  "targetLanguageCode": "it"
}

The Step output will be

{
  "sourceLanguageCode": "en",
  "text": "my super article",
  "blogId": "16c28bb45c28d2ac6284fa9bda3a17aba355d020",
  "targetLanguageCode": "it",
  "translation": {
    "SourceLanguageCode": "en",
    "TargetLanguageCode": "it",
    "TranslatedText": "il mio super articolo"
  }
}

And before saving into Amazon DynamoDB, I will do another transformation into:

{
  "translation": "il mio super articolo",
  "sourceLanguageCode": "en",
  "blogId": "16c28bb45c28d2ac6284fa9bda3a17aba355d020",
  "targetLanguageCode": "it"
}

Errors to look for

If I run it with a genuine article, I will hit the "Input text size exceeds limit" error. Other errors to look for with the usage of Amazon DynanmoDB and AWS Step Functions are:

  • The state size limit of AWS Step Functions is 256 KB.
  • DynamoDB document size limit is 400 KB.

For example, the size of this article is 8 KB.

Conclusion

As you can see, I made it the Hello World of Amazon Translate without writing any code or any Lambda function. AWS Step Functions allow me to transform a possible complex flow in simple steps thanks to the visualization of the Workflow Studio. Moreover, I could build all without code knowledge because of the fantastic integration with the AWS services. Therefore, implementing Amazon Translate into an application is very easy. Of course, as pointed out, I could hit some limitations, but I have options to choose from.