Speech-to-Text

As technology modernizes, almost every human to system touch point has become more natural and intuitive. Conversion of speech to text has been around for a while, however it has recently been made more accessible by the large cloud platforms such as Microsoft Azure, AWS, Google, etc.

Speech recognition has become seamless: faster, more accurate transcription, more affordable and easier to incorporate into existing systems. The cloud platforms each provide slightly different features to their end customers, and some platforms offer industry or business-specific features such as medical transcription and noise cancellation, but they all cover general speech-to-text conversion. In this article, we will discuss Microsoft Azure's service in detail, as we have recently utilised it on a live client project.

Case Study - Practitioner Notes

We recently added on a “speech to text” feature to store patient treatment notes against a booking, for a client in the health care services space. The few minutes in between treatments are crucial for practitioners. Considering this scenario, typing and submitting all their notes at the end of the treatment is not comfortably achieved, however the notes are essential for audit purposes. This Speech-to-Text service made it possible always, and ultimately made the practitioner’s life easier. Azure’s smart mechanism identifies the gap in between the words/sentences and start converting the texts and responds back in milliseconds. Microsoft Azure provides an API (application programming interface) to integrate with any web application, and a SDK (software development kit) for all popular programming languages.

Cost Benefits

For all major cloud platforms, you pay only for the service time you consume. There is no upfront or termination cost involved. Business who need large levels of data to be stored can use this service and scale up easily.

Reports

Azure currently doesn’t provide a detailed enough report to review usage of this service under specific measures (i.e. Clinic level, Practitioner level, etc…), but we were able to store the usage data and specific usage metrics and report on it using tailor-made scripts.

Security

This Speech-to-Text service is certified by SOC, FedRAMP, PCI DSS, HIPAA, HITECH and ISO. The audio input and transcription data aren’t logged during audio processing. Also, the data is encrypted while it’s in storage. This service offers enterprise-grade security, availability, compliance and manageability.

About the Author

Prakash Annamalai is a Senior Solution Architect and Technical Project Manager at Intergy Consulting, who has more than a decade of experience in designing and delivering complex business systems.

If you wish to talk to Prakash Annamalai or one of our other consultants to explore how Speech-To-Text can help your business, please contact us today!