Request Payload
This sub-page helps the integrator to understand various different types of request payload based on the individual task or combination of tasks in that sequence that integrator wants to do.
Request Payload for Individual Task
{
"pipelineTasks": [
{
"taskType": "asr",
"config": {
"language": {
"sourceLanguage": "xx"
},
"serviceId": "xxxxx--ssssss-d-ddd--dddd",
"audioFormat": "wav",
"samplingRate": 16000,
"preProcessors": [
"vad"
],
"postProcessors": [
"itn"
]
}
}
],
"inputData": {
"input": [
{
"source": null
}
],
"audio": [
{
"audioContent": "{{generated_base64_content}}"
}
]
}
}This response contains 2 major parameters listed below and detailed further down the section:
pipelineTasks
inputData
Parameter: pipelineTasks
pipelineTasksType: Array
This parameter takes an array of tasks, in the form of dictionary of taskType and config, that are to be done by the integrator.
In the above example, pipelineTasks takes only one dictionary (line 3-13) because integrator wants to do only ASR.
taskType parameter takes String that takes the value asr
config parameter takes a Dictionary that contains following parameters:
For ASR, language parameter only takes sourceLanguage which accepts ISO-639 Series Code of the language.
serviceId parameter is obtained from the Pipeline Config Call response as described here.
audioFormat parameter accepts format of the audio which was recorded by the application.
For Android,
wavis preferred andFor iOS,
wavorflacis preferred.
However, the Server also accepts other well -known formats such as mp3.
Sampling Rate is determined by the application at which the audio is recorded. The Server accepts a minimum value of 8000 for samplingRate parameter.
Parameter: inputData
inputDatainputData Parameter takes the actual input from the integrator on which the individual task has to be done. It can take the input either via input parameter or audio parameter depending on the task to be done.
Since ASR is done on audio input data, for ASR,
inputparameter is optional, of no use for ASR butaudioparameter is mandatory.
audio parameter takes audioContent parameter which accepts base64 String of the actual audio captured.
This response contains 2 major parameters listed below and detailed further down the section:
pipelineTasks
inputData
Parameter: pipelineTasks
pipelineTasksType: Array
This parameter takes an array of tasks, in the form of dictionary of taskType and config, that are to be done by the integrator.
In the above example, pipelineTasks takes only one dictionary (line 3-12) because integrator wants to do only Translation.
taskType parameter takes String that takes the value translation
config parameter takes a Dictionary that contains following parameters:
For Translation, language parameter takes both sourceLanguage and targetLanguage which accepts ISO-639 Series Code of the language.
serviceId parameter is obtained from the Pipeline Config Call response as described here.
numTranslation is a optional parameter which enable the API to translate the numerical data/digit into the respective target language.
this feature is currently enabled only in ai4bharat/indictrans-v2-all-gpu--t4 service Id and for devanagari script supported languages. Default value is False.
Parameter: inputData
inputDatainputData Parameter takes the actual input from the integrator on which the individual task has to be done. It can take the input either via input parameter or audio parameter depending on the task to be done.
Since Transaltion is done on digital text input data, for Translation,
inputparameter is mandatory andaudioparameter is optional and of no use for Translation.
input parameter takes source parameter which accepts digital text string.
This response contains 2 major parameters listed below and detailed further down the section:
pipelineTasks
inputData
Parameter: pipelineTasks
pipelineTasksType: Array
This parameter takes an array of tasks, in the form of dictionary of taskType and config, that are to be done by the integrator.
In the above example, pipelineTasks takes only one dictionary (line 3-12) because integrator wants to do only TTS.
taskType parameter takes String that takes the value tts
config parameter takes a Dictionary that contains following parameters:
For TTS, language parameter only takes sourceLanguage which accepts ISO-639 Series Code of the language.
serviceId parameter is obtained from the Pipeline Config Call response as described here.
gender parameter takes a string input which can either be:
male
female
gender parameter tells the server that integrator is requesting the generated speech in either male or female voice.
speed parameter takes a integer input which helps in controlling on how fast the synthesized voice speaks. Range between 0.1 to 1.99
Increased speed makes the speech sounds quicker, useful for fast-paced content like alerts or summaries.
Decreased speed makes the speech is slower and more deliberate, ideal for accessibility or language learning.
samplingRate parameter takes a integer value which helps in determining the number of audio samples per second in the generated speech output, measured in Hertz (Hz). It's a key parameter that affects both audio quality and file size.
Parameter: inputData
inputDatainputData Parameter takes the actual input from the integrator on which the individual task has to be done. It can take the input either via input parameter or audio parameter depending on the task to be done.
Since TTS is done on digital text input data, for TTS,
inputparameter is mandatory andaudioparameter is optional and of no use for TTS.
input parameter takes source parameter which accepts digital text string.
Request Payload for Combination of Tasks in specific sequence
Parameter: pipelineTasks
pipelineTasksType: Array
This parameter takes an array of tasks, in the form of dictionary of taskType and config, that are to be done by the integrator.
In the above example, pipelineTasks takes two dictionaries:
Line 3 to 13 i.e.,
ASR DictionaryLine 14 to 23 i.e.,
Translation Dictionary
because integrator wants to do ASR of the input voice followed by Translation of the digital text.
Line Number 7 and Line Number 18 are connected with below understanding. Consider a use-case described below:
Integrator wants to speak in say Hindi language and wants to see the translated output in Marathi. For this to happen, integrator has to:
Convert the Audio integrator has spoken to digital text i.e., ASR of Hindi
Translate this digital Hindi text to Marathi digital text i.e., Translation from Hindi to Marathi
Parameter: pipelineTasks
pipelineTasksType: Array
This parameter takes an array of tasks, in the form of dictionary of taskType and config, that are to be done by the integrator.
In the above example, pipelineTasks takes two dictionaries:
Line 3 to 12 i.e.,
Translation DictionaryLine 13 to 22 i.e.,
TTS Dictionary
because integrator wants to do Translation of a digital text followed by TTS.
Line Number 8 and Line Number 17 are connected with below understanding. Consider a use-case described below:
Integrator wants to translate say from Hindi to Marathi language and wants to hear the output in Marathi. For this to happen, integrator has to:
Translate this digital Hindi text to Marathi digital text i.e., Translation from Hindi to Marathi
Generate this Marathi text speech i.e., TTS of the Marathi digital text.
Parameter: pipelineTasks
pipelineTasksType: Array
This parameter takes an array of tasks, in the form of dictionary of taskType and config, that are to be done by the integrator.
In the above example, pipelineTasks takes two dictionaries:
Line 3 to 13 i.e.,
ASR DictionaryLine 14 to 23 i.e.,
Translation DictionaryLine 24 to 33 i.e.,
TTS Dictionary
because integrator wants to do ASR of the voice input, then Translation of a digital text followed by TTS.
Line Number 7 and Line Number 18 are connected and Line Number 19 and Line Number 28 with below understanding. Consider a use-case described below:
Integrator wants to speak in say Hindi language and wants to hear the translated output in Marathi. For this to happen, integrator has to:
Convert the Audio integrator has spoken to digital text i.e., ASR of Hindi
Translate this digital Hindi text to Marathi digital text i.e., Translation from Hindi to Marathi
Generate this Marathi text speech i.e., TTS of the Marathi digital text.
Pre-Processors and Post-Processors within Compute Request
In Automatic Speech Recognition (ASR) systems, preprocessors and postprocessors play a crucial role in refining the audio input and enhancing the textual output, respectively. Below, we provide details on the available preprocessors and postprocessors, along with an example of how to configure them in your request body.
Preprocessors
Voice Activity Detection (VAD)
Syntax:
"preProcessors": ["vad"]Function: VAD allows audio content longer than 30 seconds to be passed and processed. It helps identify voice activity to ensure that only the detected voice activity is processed, reducing the load and improving the efficiency of the ASR system.
Denoiser
Syntax:
"preProcessors": ["denoiser"]Function: Denoiser helps in improving the accuracy of speech recognition by reducing background noise from audio inputs.
Postprocessors
Hotwords
Syntax:
"postProcessors": [{"hotword_list":["पत्रिका"]}]Function: A hotword is postprocessor allows users to share a list of keyword or phrase in which the system is trained to recognize with higher priority or accuracy. This helps in enhancing the ASR performance. This feature is only applicable for Hindi and for service Id "bhashini/ai4bharat/conformer-multilingual-asr".
Example: a Hindi news broadcast where words like "पत्रिका" (Magazine) are frequently mentioned. Adding these as hotwords ensures they are transcribed correctly rather than being replaced by phonetically similar but incorrect words
Inverse Text Normalization (ITN)
Syntax:
"postProcessors": ["itn"]Function: ITN converts spoken numbers and dates into their written forms. For example, the ASR would output "two thousand and twenty three" as "2023".
Punctuation
Syntax:
"postProcessors": ["punctuation"]Function: This postprocessor adds punctuations to the ASR output, making the text more readable and closer to natural written language.
Example:
ASR Output: "hello how are you"
Punctuation Output: "Hello, how are you?"
The configuration of preprocessors and postprocessors can be included within the config section of the request body as shown below:
In translation (NMT) systems, postprocessors play a crucial role in refining the textual output to meet specific needs. Below, we provide details on the postprocessor available for translation, along with an example of how to configure it in your request body.
Postprocessors
Glossary
Syntax:
"postProcessors": ["glossary"]Function: The glossary postprocessor allows users to create a list of glossary terms within Bhashini Udyat under the My Profile Section once logged in. Glossary terms created are unique for each Bhashini Inference API Key generated under app names. This postprocessor ensures that specific nouns and noun phrases have their translations overridden as per the user's glossary.
Example:
Default Translation: "Digital India Bhashini Division" is translated to "डिजिटल इंडिया भैसिनी प्रभाग".
With Glossary Term: If the glossary term between English and Hindi is entered as "डिजिटल इंडिया भाषिणी डिवीज़न", this will override the default translation.
Link to Access My Profile Page and Generate Keys and Glossary: Bhashini Udyat Profile Page
Glossary Terms Usage: Glossary terms help provide customized solutions for domain-specific translations, ensuring accuracy and context relevance in the translated output.
Example of Glossary Usage:
Case sensitivity handling (Ex: Glossary entry - English to Hindi as IPO -> आईपीओ).
Glossary entries will work by default for:
Entered noun/noun phrase (e.g., IPO)
Capitalized case (Ipo)
Lower case (ipo)
Upper case ( IPO)
Reverse case (if आईपीओ is the source, the target is IPO when translating from Hindi to English).
Configuration Example
The configuration of the glossary postprocessor can be included within the config section of the request body as shown below:
In Text to Speech (TTS) systems, preprocessor, postprocessors play a crucial role in refining the audio output and enhancing the audio quality respectively. Below, we have provide details on the available preprocessor, postprocessors, along with an format of how to configure them in your request body.
Preprocessors
Text Normalization (TN)
Syntax:
"preProcessors": ["text-normalization"]Function: It converts numbers and dates into their name forms. For example, the TTS would output "2025" as "two thousand twenty five".
Postprocessors
High Compression
Syntax:
"postProcessors": ["high-compression"]Function: This helps minimize audio file size during download without compromising quality, making it suitable for low-bandwidth(network) environment and applications where storage is a primary concern It also speeds up transmission and playback by reducing latency. It gives 64kbps audio.
Low Compression
Syntax:
"postProcessors": ["low-compression"]Function: This helps minimize audio file size during download without a significant loss in audio quality, making it suitable for low-bandwidth(network) environments. It also speeds up transmission and playback by reducing latency. It gives 128kbps audio.
Last updated