AWS Transcribe and demo

We're going to talk a little bit about AWS Transcribe now. What is AWS Transcribe?

Amazon Transcribe makes it easy for developers to add speech to text capabilities to their applications. Audio data is virtually impossible for computers to search and analyze. Therefore, recorded speech needs to be converted to text before it can be used in applications. Historically, customers had to work with transcription providers that required them to sign expensive contracts and were hard to integrate into their technology stacks to accomplish this task. Many of these providers use outdated technology that does not adapt well to different scenarios, like low-fidelity phone audio common in contact centers, which results in poor accuracy.

You can look at here.

With simple language, if we want to make speech-to-text recognition, we can do it easily with AWS transcribe. There are a few things to keep in mind. The first is, the file can be a video or sound only. And secondly, the file format must match what AWS provides.

Some of the services or requirements needed:

AWS S3
AWS transcribe
Video/sound files (with AWS-defined formats)

It should be noted, for good sound will produce good recognition as well. So the final result depends on the quality of the sound and the articulation of the sound.

Here I deliberately use a speech from the Minister of Education of Indonesia, Nadiem Makarim.
Because this is a state speech, the result should be good. Take note, this video uses Indonesian, because English is too common. This also proves that AWS Transcribe already supports Indonesian.

Alright, after I try to download the video, we will make an S3 bucket first.

The first step is, prepare a bucket name for input according to the name we define. Also pay attention to the selected region, and make sure AWS transcribe is already in that region. For example, here I will use the Sydney region. Not too far from Indonesia.

Please remember the name of the bucket. Because we will need the bucket name when we set AWS transcribe.

After that, we can upload the video that was downloaded earlier.

Furthermore, this will be optional. Because if we use AWS transcribe, we can choose to save the results or not. If we want to save the result we have to provide one more bucket for the output. In this case, the output is a JSON file.

On the other hand, if we don't want the results, AWS will store them for us in AWS transcribe jobs. Note that this output is not immutable. If I'm not mistaken AWS only keeps it for 90 days. We can use the method only for testing. If we want to use it in production, we have to save the result, right?

Alright, we're going to make an alternate ending for this.

The assumption is that we will create a bucket for the output. After everything is done, we can move on to the AWS Transcribe section.

Quite confusing, lots of buttons here. Ok let's keep it simple. We go straight to the transcription jobs. Then we create a Job Name. Job Name here functions like ID, or an identifier that indicates that we have created a transcription job.

Fill in the required data.

We will see the results.

We return to s3, then we try to download the JSON file, then we open it.

We see here, that the results are very good. All the articulations are well read.

Alternate endings.

What if we don't use the output bucket?

Then we'll see what the results are.

Post-credits scene.

Wait, what is this?