Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
China Android Cellphones Stats

Baidu's Voice Recognition Software Is More Accurate Than Typing (thestack.com) 55

The massive Chinese web services company Baidu has launched their sophisticated new TalkType 'keyboard' which defaults to voice recognition app. An anonymous reader quotes The Stack: Baidu claims that the app's speech recognition is more accurate than actual typing, having developed and tested the technology alongside speech software experts at Stanford University...The researchers concluded that Baidu's technology was three times faster than a typical user typing in English. The results showed that the TalkType error rate was 20.4% lower than an English texter hunting and tapping for letters. The accuracy was even greater for those typing in Mandarin, with the error rate dropping 63.4% when using TalkType.
Of course, last year Baidu was also accused of gaming the testing for their image-recognition software.
This discussion has been archived. No new comments can be posted.

Baidu's Voice Recognition Software Is More Accurate Than Typing

Comments Filter:
  • by Carewolf ( 581105 ) on Saturday October 08, 2016 @05:41PM (#53038929) Homepage

    That is like the test where someone claimed they defeated the Turing test by pretending to a retarded foreign boy that didn't speak English.

    I guess they also only picked people who had never before typed anything in their life as well.

    • by fyngyrz ( 762201 ) on Saturday October 08, 2016 @06:13PM (#53039077) Homepage Journal

      Another thing -- when I'm typing, and there is an error, I'm right there to correct it.

      With voice recog, at least right now, editing it after it's been screwed up by Google or whatever is more of a PITA than just typing it out in the first place.

      Trying to actually do decent editing (at least on my S7) is seriously annoying. Cursor positioning is flaky as hell, parts of messages disappear above and blow the edit point, I try to drag the edit point and it scrolls up or down so fast there's no chance of actually getting where I meant to go...

      I grant you that this kind of thing is the result of bad design at some level in Android or some library most everyone is using, and could be corrected... but right now, it's SN/AFU. That's a big factor in why editing as I go, rather than trying to get "somewhere" in something already containing lots of text, is much easier on my temper.

      That said, I would welcome 99.99999% accurate voice recog. Not holding my breath, though.

      • Not just that, but it will very frequently give you a bunch of different choices all of which change a dozen-word phrase, instead of the single word you're trying to correct. It's absolutely and totally unuseable, and a moronic design that's the result of trying to be far too clever.
      • Have you tried Gravity Box? It can activate a pair of arrow keys. Needs root.
        That's it, unfortunately: if you want to have some arrow keys to position the edit point more conveniently, you have to root your phone!
      • A character error is easy to read past be a word error changes the meeting.

        • by fyngyrz ( 762201 )

          a word error changes the meeting.

          I complete agree: a turd error changes the meating. [send] (goddammit)

    • I just assume that they typed on a phone and not on a proper keyboard. Otherwise it does not make any sense.
    • by Mondor ( 704672 )

      Not really. The key is - (faster than) "English texter hunting and tapping for Mandarin letters".

  • by localroger ( 258128 ) on Saturday October 08, 2016 @05:56PM (#53038987) Homepage
    On a full English language keboard there is no way speech is faster if you know how to type. Now if you don't know how to type or you're using a touch screen, then yeah. Maybe if you're using Mandarin because it's not as straightforward as the Roman alphabet. But no, I can type considerably faster than I can talk and almost as fast as I can read, which is well over 100 wpm, and with a display and backspace key (since I'm human) my ultimate accuracy is 100%.
    • You're right, texting isn't as fast or accurate as typing, but I think you got the numbers wrong.

      Near the turn of the millennium, speech recognition software (ViaVoice, etc,.) achieved a claimed 99% accuracy. So I tried it out. After training, I got over 95% by speaking carefully (and slowly). The problem was finding and fixing those 05% mistakes took longer than typing the whole document over would have taken.

      And yeah, most touch typists can't get more than 35 wpm and touch screens are worse, so the dec

      • The keyboard layout of a modern computer / laptop is based on the typewriter key layout. The interesting thing is the that layout was deliberately crafted to *slow down* type speed, as the typists of the day (a hundred years ago) would type faster than the machines could render the text, leading to jamming. So, not only do I not believe that any voice recognition software can keep up to a touch typist, it's probable that with different key layouts (which exist, I know, but no-one actually uses them) a touch
        • Not sure where you got your incorrect information from, but the current keyboard was designed for maximum speed of the day, not to slow down typing speeds.
          While it is true that there are keyboard layouts that can make typing faster on a computer, the current keyboard was designed to space out the hammers so they would not jam on typewriters thus increasing the speed people could type.
      • For that 05%, I think it would be good to use the best of both worlds. That is the software will produce the written text while allowing you to edit manually using keyboard. You may mark some words using a command-word like 'fix-it'; Example: 'The thickness needs to be reduced' .. let the s/w produced 'The sickness needs to be reduced' .. you are visually seeing it...then you say 'fix-it thickness' . so it will highlight the word sickness which you can come back later and manually change to thickness. [th
    • On a full English language keboard

      Heh.

      there is no way speech is faster if you know how to type.

      Nah, that's just not true. Most professional typists don't exceed 100wpm, while the average person talks at 130-150wpm.

      If typing was so much faster than speaking, they wouldn't do live subtitles by having someone repeat the words into a mic for speech recognition. Which is what they do, with occasionally hilarious results.

      • by xvan ( 2935999 )
        As far as I know, live activities use stenotypers. Maybe they started using speech recognition for live captioning because it's cheaper, but it's the first time I've heard about it.
        • by TheSync ( 5291 )

          "Revoicing" is becoming more popular for live TV captioning. Revoicers, also known as respeakers, repeat clearly what is being said during unscripted events using special software that's trained to recognise their voice. Their speech is then converted into text which appears on a caption unit, an LED or large screen. Revoicers also need to pare down (edit) the live dialogue or conversation, which means the text that appears isn't verbatim, although it will always give a good idea of what's being said.

    • by maynard ( 3337 )
      Speaking is not writing. It impacts quality of prose. Though it might be a good way to bang out an initial rough, I'd still want to fine tune phrasing and word choice by keyboard.
    • and with a display and backspace key (since I'm human) my ultimate accuracy is 100%.

      That applies to every form of input. Texting only has a low accuracy rate because people need to make corrections, kind of like you are doing which makes you 100%.

      Also you typing 100wpm is atypical. Most people don't type that fast. No actually that's not right. I'll wager that very very few people are able to type that fast.

    • by chill ( 34294 )

      Then you have a major speech impediment and should probably see a therapist for it.

      Using your post at a sample, I am able to read it aloud in 22 seconds at a conversational rate. This is the same rate I use reading stories aloud to my children. Using my slower, more enunciated "speech recognition" voice, usually reserved for Google input, it takes me 37 seconds and the only thing I had to correct afterwards was the ( and ) you used. That includes all of your punctuation and the automatic correction of "kebo

    • by nbauman ( 624611 )

      On a full English language keboard there is no way speech is faster if you know how to type.

      How fast do you type?

      I've transcribed hundreds of hours of tapes, mostly lectures and panel discussions. I tested ~72 wpm. I spent a lot of time perfecting my typing methods and speed.

      I estimated that most lectures were about 120 wpm. Some people talk much faster, particularly in bursts. I think certified courtroom stenographers have to pass a test at 210 wpm.

      I could never keep up with continuous speech. I used a transcribing machine, and played it back at a slower speed, and/or backpedaled. I could usually

  • by Snotnose ( 212196 ) on Saturday October 08, 2016 @05:59PM (#53039001)
    When I got my Trash 80 back in the 70s the first program I bought was a typing tutor. I've been touch typing for some 40 years ago and my fingers don't have any trouble keeping up with my brain. My mouth, not so much.
  • Comment removed based on user account deletion
    • The article talks about speech recognition, not voice recognition. EditorDavid has the two concepts mixed up: speech recognition is all about trying to recognized what you are saying, whereas voice recognition is all about recognizing specific voice, like e.g. for reasons of identifying who is speaking.

      [actual expert here]

      Not exactly: "speech recognition" means taking in speech and putting out some kind of text; "speaker recognition" is a general term for identifying speakers or verifying speaker identity. "Voice recognition" is a term that is not used in the field (but is sometimes used in the media) which generally means the same thing as "speech recognition".

  • The results showed that the TalkType error rate was 20.4% lower than an English texter hunting and tapping for letters.

    How many of those errors could have been reliably corrected by some form of autocorrect, or was such already included in the tests?

    If I try and type "thw quick rbown fox jump sover the lazy dog" as fast I can... well, that's the result. Autocorrect could have fixed most of those problems.

    • by xvan ( 2935999 )
      The issue with auto correct is that it takes damn long to prevent it from correcting not recognized words.
    • The FIRST thing you do, when you get a new "smart" phone, is turn auto-correction off.

      About a month later, you will discover you never needed it in the first place. Plus, you will never have have to deal with people who mis-interpret your meaning in your text communication, as the improperly spelled uncorrected version of whatever you were trying to say will be instantly recognizable by whomever is reading it for what it was supposed to be, because as humans we are very, very good at that.

      Why would anyone w
  • by skam240 ( 789197 ) on Saturday October 08, 2016 @06:05PM (#53039033)

    Oh good, more assholes yelling into their phones while in public spaces. That's exactly what we need.

  • by myowntrueself ( 607117 ) on Saturday October 08, 2016 @06:54PM (#53039205)

    One of the things that characterises modern Chinese language is the proliferation of homophones (words that sound alike).

    The way that Chinese people cope with this is extreme use of context and of spelling; the homophones don't have the same character. Sometimes Chinese people will clarify meaning by sketching a character in the air, often unconsciously.

    If the error rate reduction is so huge based on speech recognition this would suggest that pinyin can replace characters for writing Chinese. And this has been disproved on many occasions; you can literally write an entire story using only the syllable 'ma'. In pinyin it all comes out as 'ma' with the 4 tones. In characters its actually readable. Same with the story of the lion eating poet in the stone den which is all 'shi'.

    So a great test of this Baidu software would be to get someone to read this to it and see what it comes up with:

    https://chinesepod.com/blog/ho... [chinesepod.com]

    https://en.wikipedia.org/wiki/... [wikipedia.org]

    and see if it gets it right:

    Sh Shì shí sh sh

    Shíshì shshì Sh Shì, shì sh, shì shí shí sh.

    Shì shíshí shì shì shì sh.

    Shí shí, shì shí sh shì shì.

    Shì shí, shì Sh Shì shì shì.

    Shì shì shì shí sh, shì sh shì, sh shì shí sh shìshì.

    Shì shí shì shí sh sh, shì shíshì.

    Shíshì sh, Shì sh shì shì shíshì.

    Shíshì shì, Shì sh shì shí shì shí sh.

    Shí shí, sh shí shì shí sh sh, shí shí shí sh sh.

    Shì shì shì shì.

    • That would seem to actually favor an engine like this. A good autocorrect does use context, but generally it only has access to what you have said before, not after, the current word (at least during the initial input). In such a context-dependent environment as you describe, being able to retroactively go back and change earlier text based on closely subsequent input (as speech recognition software often does, but keyboards generally don't) would seem especially valuable.

      In fact, Google's voice accuracy
      • Chinese input using New Phonetic Method actually does this too. Basically after I type the sounds then tone and move on to the next character it will change the first character based on the sounds and tones of the following characters and continue to do so onto I press enter. Often I will type out an entire sentence before pressing enter, though sometimes it starts to spit out bad results if you go for too long. It also does some recognition based off previous character choice such as when using gendered

  • I don't think the typical person 'hunts' and types anymore. Maybe 30 years ago......
  • What not a single comment about big brother implication .... specially for a Chinese company where each of them are accused of being an extension of the party.
    And I don't care about look what yahoo did or whatever an extension is different from complying with the law in a democracy

  • NSA is going to announce a competing keyboard. They're already digitizing everything but now they'll be sending you a copy as a courtesy and for proofreading and correction. Win-win!
  • The problem is the false sense of security and subsequent lack of proofreading and error correction.

    Try voice recognition software for a week. You'll likely find that you will read over something that you dictated and not realize that there are errors in it. People are less likely to find errors in something they dictated than in something they typed.

  • I can type more precisely and quickly and confidently than I can talk. Deciding which words to type is faster and easier. How is voice recognition going to improve on the speed by which I can speak, which is inferior to my typing ability whenever precision is required?

  • ...the app is probably keeping track of everything a user speaks/types and sending it back home to China.

Avoid strange women and temporary variables.

Working...