1. Download our Official Android App: Forums for Android!

Pass audio file to text to speech engine?

Discussion in 'Android Development' started by RoadhammerGaming, Sep 6, 2017.

  1. RoadhammerGaming

    Thread Starter
    Rank:
    None
    Points:
    15
    Posts:
    15
    Joined:
    Aug 24, 2017

    Aug 24, 2017
    15
    0
    15
    Hello, in my app I want to record the user's speech, run it through a band pass filter, then pass the resulting audio file (PCM / WAV) to the text to speech engine to speak the filtered results, I have everything working except cannot find a way to pass an audio file to the tts engine, I have googled this for a long time now (2 weeks) and no luck. is there any workaround for achieving this?
    What I tried was calling the RecognizerIntent, then start the band pass filter via recording, and also tried the other way around by start the band pass method first then calling the recognizer intent but either way kills the tts instance even tho it's running on a separate thread. Also I have tested this using the normal tts procedure in the recognizer intent and also the web search version of the recognizer intent both with the same results, If I don't implement the band pass filter (NOTE that a recording thread is started at this time) it works fine but as soon as I implement the bandpass filter it fails, with a helpfull message when in web search mode that says "google is unavailable" Here's my current code:

    RecognizerIntent, normal version:
    Code (Java):
    1.     public void getMic() {//bring up the speak now message window
    2.         tts = new TextToSpeech(this, new TextToSpeech.OnInitListener() {
    3.             @Override
    4.             public void onInit(int status) {
    5.                 if (status == TextToSpeech.SUCCESS) {
    6.                     result = tts.setLanguage(Locale.US);
    7.                     if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
    8.                         l = new Intent();
    9.                         l.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);
    10.                         startActivity(l);
    11.                     }
    12.                 }
    13.             }
    14.         });
    15.         k = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
    16.         k.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
    17.         k.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
    18.         k.putExtra(RecognizerIntent.EXTRA_PROMPT, "Say something");
    19.         try {
    20.             startActivityForResult(k, 400);
    21.         } catch (ActivityNotFoundException a) {
    22.             Log.i("CrowdSpeech", "Your device doesn't support Speech Recognition");
    23.         }
    24.         if(crowdFilter && running==4){
    25.         try {
    26.             startRecording();
    27.         } catch (FileNotFoundException e) {
    28.             e.printStackTrace();
    29.         }
    30.       }
    31.     }
    Recognizer intent web search version:
    Code (Java):
    1.     public void getWeb() {//Search the web from voice input
    2.         k = new Intent(RecognizerIntent.ACTION_WEB_SEARCH);
    3.         k.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
    4.         k.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
    5.         k.putExtra(RecognizerIntent.EXTRA_PROMPT, "Say something");
    6.         try {
    7.             startActivityForResult(k, 400);
    8.         } catch (ActivityNotFoundException a) {
    9.             Log.i("CrowdSpeech", "Your device doesn't support Speech Recognition");
    10.         }
    11.         if(crowdFilter && running==4){
    12.             try {
    13.                 startRecording();
    14.             } catch (FileNotFoundException e) {
    15.                 e.printStackTrace();
    16.             }
    17.         }
    18.     }
    And the startRecording method that applies the bandpass filter:
    Code (Java):
    1.     private void startRecording() throws FileNotFoundException {
    2.  
    3.         if (running == 4) {//start recording from mic, apply bandpass filter and save as wave file using TARSOS library
    4.             dispatcher = AudioDispatcherFactory.fromDefaultMicrophone(RECORDER_SAMPLERATE, bufferSize, 0);
    5.             AudioProcessor p = new BandPass(freqChange, tollerance, RECORDER_SAMPLERATE);
    6.             dispatcher.addAudioProcessor(p);
    7.             isRecording = true;
    8.             // Output
    9.             File f=new File(myFilename.toString()+"/Filtered result.wav");
    10.             RandomAccessFile outputFile = new RandomAccessFile(f, "rw");
    11.             TarsosDSPAudioFormat outputFormat = new TarsosDSPAudioFormat(44100, 16, 1, true, true);
    12.             WriterProcessor writer = new WriterProcessor(outputFormat, outputFile);
    13.             dispatcher.addAudioProcessor(writer);
    14.             recordingThread = new Thread(new Runnable() {
    15.                 @Override
    16.                 public void run() {
    17.                     dispatcher.run();
    18.                 }
    19.             }, "Crowd_Speech Thread");
    20.             recordingThread.start();
    21.         }
    22.     }
    The only reason I'm doing it this way is in hopes that by applying the filter that the tts engine would receive the modified audio, which is also saved in a file because originally I wanted to just pass the file to tts to read after recording, Is there any way to accomplish this?

    Another thing I'm thinking of is there any possible way inside my project that I can modify the source code inside the library that the recognizer intent references so that I can add a parameter to get audio from file?

    EDIT: 9/8/17
    Getting closer to an answer, I dug deeper and found that google gets a flac file instead of a wave file to translate speech into text, so I imported 2 new libraries, AndroidAudioConverter and FFmpegAndroid via the build.gradle at the app level:
    Code (Java):
    1. dependencies {
    2.     //other compliations
    3.  
    4.     compile 'com.github.adrielcafe:AndroidAudioConverter:0.0.8'
    5.     compile 'com.writingminds:FFmpegAndroid:0.3.2'
    6. }
    7. repositories {
    8.     maven {
    9.         url "https://jitpack.io"
    10.     }
    11. }
    and then used a googleResponse class I found online along with another recognizer class to convert the wav file to a flac file and submit it to google, now trying to find out how the get the response text and send it to be spoken, so much bouncing around and unused/un needed (in my app's case) methods in the recognizer class is totally confusing me!
     

    Advertisement

    #1 RoadhammerGaming, Sep 6, 2017
    Last edited: Sep 8, 2017

Share This Page

Loading...