搜索
  • 首页
  • 快讯
  • 资讯
    • 推荐
    • 财经
    • AI
    • 创新
    • 城市
    • 最新
    • 创投
    • 汽车
    • 科技
    • 专精特新
  • 直播
  • 视频
  • 专题
  • 活动
搜索
寻求报道
我要入驻
城市合作

速记员即将被淘汰,未来 AI 可以把一切转录为文字 | 双语阅读

神译局·2017-07-11 21:02
百度和Trint推出软件,能够将一小时以内的音频转录为文本,且错误率在5%以内。

编者按:人工智能势不可当。虽然尚不完美,却极有可能在未来取代打字员,将人类从打字的繁琐中解放出来,甚至使人们摆脱设备的束缚。便捷、高效、低廉的人工智能转录还将对未来社会产生哪些影响?本文编译自GREG NOONE在 the Atlantic上发表的“When AI Can Transcribe Everything”。

怎样才是描述报业大亨鲁伯特·默多克(Rupert Murdoch)被奶油派砸了一脸的最好方式?这对世界新闻界来说不成问题。几乎所有媒体都报道了在2011年英国议会听证会期间,这位媒介大亨发表证词时发生的意外事件,报道风格从高雅喜剧到低俗喜剧皆由。但这对听证会的官方书记员来说,则是另一回事。通常情况下,书记员的工作只是记录听到的话语。奶油派袭击事件发生后——无论是出于有意选择还是受制于议会的固定风格——书记员决定以最简单的方式,将其标注为“中断”。

What is the best way to describe Rupert Murdoch having a foam pie thrown at his face? This wasn’t much of a problem for the world’s press, who were content to run articles depicting the incident during the media mogul’s testimony at a 2011 parliamentary committee hearing as everything from high drama to low comedy. It was another matter for the hearing’s official transcriptionist. Typically, a transcriptionist’s job only involves typing out the words as they were actually said. After the pie attack—either by choice or hemmed in by the conventions of house style—the transcriptionist decided?to go the simplest route?by marking it as an “[interruption].” ?

专业领域有大量的对话——会议、面试和电话会议等——需要转录为文字并存档,以备未来参考。这是一项繁琐的日常工作,但对于愿意付费的人来说,这项工作可以外包给专业的转录服务商。转录服务商会反过来雇佣人员,远程转录音频文件,或像我几个月的从业经历一样,参加会议,实时记录听到的内容。

Across professional fields, a whole multitude of conversations—meetings, interviews, and conference calls—need to be transcribed and recorded for future reference. This can be a daily, onerous task, but for those willing to pay, the job can be outsourced to a professional transcription service. The service, in turn, will employ staff to transcribe audio files remotely or, as in my own couple of months in the profession, attend meetings to type out what is said in real time.

尽管近年来出现了基于浏览器的转录助手,在现代西方经济社会中,转录依然是一项苦役,因为机器还是无法完全替代人类。直到去年年底,微软推出了一款产品使之成为可能。

Despite the recent emergence of browser-based transcription aids, transcription’s an area of drudgery in the modern Western economy where machines can’t quite squeeze human beings out of the equation. That is until last year, when Microsoft built one that could.

微软首席语言科学家黄学东(Xuedong Huang)在苏格兰爱丁堡大学攻读博士课程时,就被自动语音识别(ASR)深深地吸引了。“当时我刚离开中国,”黄学东回忆起用本科水平的美式英语,试图听懂苏格兰口音的教授讲话时的困难,他说,“我希望每个讲师和教授在教室里授课时,都能有字幕。”

Automatic speech recognition, or ASR, is an area that has gripped the firm’s chief speech scientist, Xuedong Huang, since he entered a doctoral program at Scotland’s Edinburgh University. “I’d just left China,” he says, remembering the difficulty he had in using his undergraduate knowledge of the American English to parse the Scottish brogue of his lecturers. “I wished every lecturer and every professor, when they talked in the classroom, could have subtitles.”

为了实现这种实时服务,黄学东和他的团队首先需要创建一个能够追溯转录的程序。人工智能的发展使他们得以利用名为“深度学习”的技术,将该程序训练为能从大量数据中识别出模式。黄学东和他的同事们利用该软件来转录NIST 2000 CTS测试集,这是20多年来作为语音识别工作基准的一组记录谈话。职业打字员在转录两个不同部分的测试时,分别会出现5.9%和11.3%的错误率。微软团队开发的系统则略微胜过两者。

In order to reach that kind of real-time service, Huang and his team would first have to create a program capable of retrospective transcription. Advances in artificial intelligence allowed them to employ a technique called deep learning, wherein a program is trained to recognize patterns from vast amounts of data. Huang and his colleagues used their software to transcribe the NIST 2000 CTS test set, a bundle of recorded conversations that’s served as the benchmark for speech recognition work for more than 20 years. The error rates of professional transcriptionists in reproducing two different portions of the test are 5.9 and 11.3 percent. The system built by the team at Microsoft edged past both. 

“这还不是一个实时系统,”黄学东承认,“但它与我们所期望的非常相近了,在我们现有能力的基础上已经到达了极限。实时系统没有那么遥不可及了。”

“It wasn’t a real-time system,” acknowledges Huang. “It was very much like we wanted to see, with all the horsepower we have, what is the limit. But the real-time system is not that far off.”

的确,ASR程序已经能够准确地转录采访或会议内容,内容看上去也不再那么荒唐。在上个月微软举办的Build大会上,副总裁沈向洋(Harry Shum)展示了一款PowerPoint转录服务,展示时的语音能够和个人幻灯片相关联。同时,微软也在和苹果、谷歌等公司展开激战,让实时移动翻译应用能够完美地进行转录。

Indeed, the promise of ASR programs capable of accurately transcribing interviews or meetings as they happen no longer seems so outlandish. At Microsoft’s Build conference last month, the company’s vice-president, Harry Shum, demonstrated a PowerPoint transcription service that would allow the spoken words of the presentation to be tied to individual slides. The firm is also in a close race with the likes of Apple and Google to perfect the transcripts produced by its real-time mobile translation app.

黄学东相信,转录软件将超越人类能力的观点是可以理解的。“完美结果的定义是存在争议的,”他用人类打字员的错误率加以印证。“如何’完美’取决于特定情形和应用。”

Huang believes the point at which transcription software will overtake human capabilities is open to interpretation. “The definition of a perfect result would be controversial,” he says, citing the error rates among human transcriptionists. “How ‘perfect’ this is depends on the scenario and the application.”

如果带有实时转录语言任务的ASR系统,只有在正确理解每个词的情况下才被认为是成功的,那么这在很大程度上已经被Cortana和Siri等手机助手实现了,只是实时翻译应用尚不具备这种功能。然而,越来越多的计算机科学家意识到,对于自动转录音频的要求并不需要那么高,文本中的错误可以之后修改。

An ASR system tasked with transcribing speech in real time is only deemed successful if every word is interpreted correctly, something that largely has been achieved with mobile assistants like Cortana and Siri, but has yet to be mastered in real-time translation apps.? However, a growing number of computer scientists are realizing that standards do not need to be as high when it comes to the automatic transcription of recorded audio, where any mistakes in the text can be amended after the fact.

“我们并不声称…这是完美的。只是在拥有优质音频的情况下,它能够接近完美。”

“We don’t claim ... this is perfect. But, with good audio, it can be close to perfect.”

两家公司——位于伦敦的Trint和推出SwiftCribe应用的中国互联网巨头百度——已经推出了基于浏览器的工具,能够将一小时以内的音频转录为文本,且错误率在5%以内。在页面上,它们的输出和我作为自由职业打字员参加许多会议期间实时打出的原始文档相似,最好时像詹姆斯·乔伊斯(Joycean)的意识流巨作,最糟时像一篇官样文章。但是通过把用户从转录员变为编辑,这两个程序都能够免去数小时繁琐而不能分心的任务。

Two companies—Trint, a start-up in London,and Baidu, the Chinese internet giant with an application called?SwiftScribe—have begun to offer browser-based tools that can convert recordings of up to an hour into text with a word-error rate of 5 percent or less.*?On the page, their output looks very similar to the raw documents I typed out in real-time during the many meetings I attended as a freelance transcriptionist: at best, a Joycean stream-of-consciousness marvel, and at worst, gobbledygook. But by turning the user from a scribe into an editor, both programs can shave hours off an onerous and distracting task.

当然,节省的时间取决于音频的质量。Trint和SwiftScribe在转录几乎无噪音的面对面访谈时表现出色,在转录嘈杂房间中的录音、信号不佳的电话访谈或带有非美式或英式英语口音时则十分吃力。我尝试过对Trint播放一段德国口音的英语,却看到它把“天气相当冷,但气氛不错”转录成“那颗心也在呕吐。是的,他的第一面。”

The amount of time saved, of course, is contingent on the quality of the audio. Trint and SwiftScribe tend to make short work of face-to-face interviews with the bare minimum of ambient noise, but struggle to transcribe recordings of crowded rooms, telephone interviews with bad reception, or anyone who speaks with an accent that isn’t American or British English. My attempt to run a recording of a German-accented speaker through Trint, for example, saw the engine interpret “it was rather cold, but the atmosphere was great” as “That heart is also all barf. Yes. His first face.”

“我们并不认为在几分钟的访谈中,这样的转录结果是完美的,”Trint的首席执行官杰夫·考夫曼(Jeff Kofman)说。“但是,只要有高质量音频,它就能接近完美。你可以搜索、重听、查错,就能在几秒内知道究竟说了什么。”

“We don’t claim that this turnaround in a couple of minutes of an interview like this is perfect,” says Jeff Kofman, Trint’s CEO. “But, with good audio, it can be close to perfect. You can search it, you can hear it, you [can] find the errors, and you know within seconds what was actually said.”

考夫曼表示,Trint的绝大多数用户都是记者,其次是定性研究的研究员以及商界和医疗保健客户——换句话说,都是需要在严格的规定时间内完成大量音频转录的职业。这与SwiftScribe的开发者Ryan Prenger和他的同事们收集到的匿名用户行为数据相一致。虽然Prenger推测有一些长尾用户,他们只是渴望测试SwiftScribe能力的人工智能爱好者,但他也看到一些日常使用该程序转录语音的“超级用户”。随着ASR技术的不断改进,他对该技术能够吸引的用户范围感到乐观。

According to Kofman, most of the people using Trint are journalists, followed by academics doing qualitative research and clients in business and healthcare—in other words, professions expected to transcribe a large volume of audio on tight deadlines. That’s in keeping with the anonymized data on user behavior being collected by the developer Ryan Prenger and his colleagues at SwiftScribe. While there is a long tail of users who Prenger speculates are simply AI enthusiasts eager to test out SwiftScribe’s capabilities, he’s also spotted several “power users” that are running audio through the program on almost a daily basis. It’s left him optimistic about the range of people the tool could attract as ASR technology continues to improve.

“这就是转录技术的一般情况,”Prenger说,“一旦精确度突破一定范围,所有人都有可能开始转录,至少在前几轮。”他预测,最终自动转录技术能够提升对转录工作的需求和供给。“未来可能会出现一个良性循环,更多人期望他们的音频能够被转录,因为快速转录将变得低价、方便。而且,它将成为转录一切的标准。”

“That’s the thing with transcription technology in general,” says Prenger. “Once the accuracy gets above a certain bar, everyone will probably start doing their transcriptions that way, at least for the first several rounds.” He predicts that, ultimately, automated transcription tools will increase both the supply of and the demand for transcripts. “There could be a virtuous circle where more people expect more of their audio that they produce to be transcribed, because it’s now cheaper and easier to get things transcribed quickly. And so, it becomes the standard to transcribe everything.”

未来,Trint将有意识地进行拓展。该公司刚刚募集到310万美元的种子基金,用于下一轮扩张。考夫曼和他的团队计划本月底在维也纳举行的全球编辑网络峰会上,展示该技术的能力。他们的目标是在峰会主题发言结束一小时内,将转录结果发布在《华盛顿邮报》的网站上。

It’s a future that Trint is consciously maneuvering itself to exploit. The company just?raised $3.1 million in seed money?to fund its next round of expansion. Kofman and his team plan to demonstrate its capabilities later this month at the Global Editors Network in Vienna. Their aim is to have the transcription of the event’s keynote address up on the?Washington Post’s website within the hour.

虽然人们预计会出现错误,但仍然难以准确预测这次转录结果将会如何。速记员很有可能像小贩和售冰员一样,进入被遗忘的职业行列。在辅助写作工具的协助下,记者可以花更多时间进行报道和写作,侦探可以更早地分析出犯罪嫌疑人证言中的矛盾。YouTube上的视频字幕将标准化,大量听障人士能够接触到广播节目和播客。与熟人、朋友、旧情人的通话能够像社交软件和电子邮件一样存档、搜索,也能被执法部门拦截、存储。

It’s difficult to predict precisely what this new order could look like, although casualties are expected. The stenographer would likely join the ranks of the costermonger?and the?iceman?in the list of forgotten professions. Journalists could spend more time reporting and writing, aided by a?plethora of assistive writing tools, while detectives could analyze the contradictions in suspect testimony earlier. Captioning on YouTube videos could be standard, while radio shows and podcasts could become accessible to the hard of hearing on a mass scale. Calls to acquaintances, friends, and old flames could be archived and searched in the same way that social-media messages and emails are, or intercepted and hoarded by law-enforcement agencies.

对于黄学东而言,转录技术只是ASR从根本上改变社会的一部分,这些变化已经能从Cortana,Siri和亚马逊的Alexa之类的语音助手中瞥见。“显而易见的是,下一波将让你彻底脱离设备,”他想象着计算技术逐渐植入工作环境中。“在未来的中心,用户界面技术将使人们从设备的束缚中解放出来。”

For Huang, transcription is just one of a whole range of changes ASR is set to provide that will fundamentally change society itself, one that can already be glimpsed in voice assistants like Cortana, Siri, and Amazon’s Alexa. “The next wave, clearly, is beyond the devices that you have to touch,” he says, envisioning computing technology discreetly woven into a range of working environments. “UI technology that can free people from being tethered to the device will be in the front and center.”

然而目前,自动转录器的工程师们还是需要更多的相关用户:例如在最后期限前拼搏的记者,或是想方设法描述一位男性在国会特选委员会上被砸了一脸奶油派的书记员。

For the moment, however, the engineers behind automated transcribers will have to content themselves with more germane users: the journalist sweating a deadline, or the transcriptionist working out the right way to describe a man being pied in a parliamentary select committee.

重点词汇

  • transcribe:转录

  • browser-based:基于浏览器的

  • Automatic speech recognition:自动语音识别

  • test set:测试集

  • internet giant:互联网巨头

  • anonymized data:匿名数据

  • user behavior:用户行为

  • long tail:长尾

  • AI enthusiasts:人工智能狂热者

  • power users:超级用户

  • seed money:种子基金

  • UI:用户界面

编译组出品。编辑:郝鹏程

+1
0

好文章,需要你的鼓励

参与评论
评论千万条,友善第一条
登录后参与讨论
提交评论0/1000

最新文章推荐

傅盛终于又成为创业导师了 15万,蔚来全新车“不讲武德”,乐道L60掀翻桌子 丰巢没有安全感 零食品牌“爱零食”跨界便利店,唐光亮“短平快”打法能否奏效? 大众被曝关闭南京工厂,神车帕萨特面临停产,CEO:中国挣不到钱了 《野孩子》没能成第二个《姐姐》,中秋档被“恐婚恐育”又创新低? 还有2000万美元薪水没拿,NBA场外的神却决定激流勇退 “花少6”VS《盲盒旅行局》,芒果、优酷互卷“旅行+”为哪般? iOS 18全面升级,发现七大亮点,还有隐藏功能... 麦当劳中国,有新任命
神译局
资深作者

36氪旗下翻译团队。

最近内容

为什么有些类型的爱比其他类型的爱更强烈?
扶不起的英特尔,拆分是唯一的机会
为什么那么多人要靠辞职来寻找人生目标?

提及的项目

查看项目库

尤卡城信息

landi

SAY

Access!

展开更多

下一篇

如何打造一个高质量的朋友圈?

如何打造一个高质量的朋友圈,用好人际关系的“弱连接”。

2017-07-11

热门标签

猎鹰9号 重大事件 二锅头 美食总动员 飞屋环游记 中式装修 食材 纹身师 欧美纹身 方特欢乐世界 最好的我们冠名方特主题乐园 方特梦幻王国 华强 红黄蓝 红黄蓝幼儿园 特许经营 面馆加盟 龙飞 北京客 太阳风暴 诺贝尔物理学奖 前端框架 c++ 前端性能 微信营销 gcp 生煎包 生煎 胡椒 德国国足
意见反馈
36氪APP让一部分人先看到未来
36氪
鲸准
氪空间

推送和解读前沿、有料的科技创投资讯

一级市场金融信息和系统服务提供商

聚焦全球优秀创业者,项目融资率接近97%,领跑行业

外国毕业证样本制作公司长沙代做海外学位证书定制广州补办海外成绩单大连国外留学生学位证办理南宁代做国外文凭毕业证成都制作国外留学学位证补办南昌制作海外文凭毕业证补办广州办理国外本科毕业证代做石家庄办海外本科毕业证代做兰州办理国外学位证书补办贵阳补办国外留学毕业证代做银川办国外学历文凭补办天津做国外学历代办重庆办理国外留学学位证兰州办海外文凭毕业证代做天津海外硕士毕业证代办兰州定做海外学位证书办理石家庄补办海外学历证代办贵阳办理海外文凭证书定做武汉制作国外学历定做哈尔滨代做海外留学学位证深圳办理国外留学学位证办理南宁补办国外大学毕业证代做兰州代办海外学位证福州做国外证件定做南京制作海外学位证代做济南做国外毕业证代办郑州制作国外学位代办大连代办国外本科毕业证代办沈阳办国外留学学位证定做南宁做国外学历办理淀粉肠小王子日销售额涨超10倍罗斯否认插足凯特王妃婚姻让美丽中国“从细节出发”清明节放假3天调休1天男孩疑遭霸凌 家长讨说法被踢出群国产伟哥去年销售近13亿网友建议重庆地铁不准乘客携带菜筐雅江山火三名扑火人员牺牲系谣言代拍被何赛飞拿着魔杖追着打月嫂回应掌掴婴儿是在赶虫子山西高速一大巴发生事故 已致13死高中生被打伤下体休学 邯郸通报李梦为奥运任务婉拒WNBA邀请19岁小伙救下5人后溺亡 多方发声王树国3次鞠躬告别西交大师生单亲妈妈陷入热恋 14岁儿子报警315晚会后胖东来又人满为患了倪萍分享减重40斤方法王楚钦登顶三项第一今日春分两大学生合买彩票中奖一人不认账张家界的山上“长”满了韩国人?周杰伦一审败诉网易房客欠租失踪 房东直发愁男子持台球杆殴打2名女店员被抓男子被猫抓伤后确诊“猫抓病”“重生之我在北大当嫡校长”槽头肉企业被曝光前生意红火男孩8年未见母亲被告知被遗忘恒大被罚41.75亿到底怎么缴网友洛杉矶偶遇贾玲杨倩无缘巴黎奥运张立群任西安交通大学校长黑马情侣提车了西双版纳热带植物园回应蜉蝣大爆发妈妈回应孩子在校撞护栏坠楼考生莫言也上北大硕士复试名单了韩国首次吊销离岗医生执照奥巴马现身唐宁街 黑色着装引猜测沈阳一轿车冲入人行道致3死2伤阿根廷将发行1万与2万面值的纸币外国人感慨凌晨的中国很安全男子被流浪猫绊倒 投喂者赔24万手机成瘾是影响睡眠质量重要因素春分“立蛋”成功率更高?胖东来员工每周单休无小长假“开封王婆”爆火:促成四五十对专家建议不必谈骨泥色变浙江一高校内汽车冲撞行人 多人受伤许家印被限制高消费

外国毕业证样本制作公司 XML地图 TXT地图 虚拟主机 SEO 网站制作 网站优化