cannot是什么词性:成功不可以复制?

来源:百度文库 编辑:偶看新闻 时间:2024/05/09 16:45:10

成功不可以复制?

一直以来,人们认为只要通过不断的练习、模仿、强化,成功便可以复制。然而,事实如此吗?一项关于NBA球员的数据统计揭示一个了大有不同的答案。
The world is a complicated place. Reality is dense with patterns, but these patterns are often subtle and inconsistent. We think we understand how things work — X always causes Y — but then Z happens. It’s very confusing.

世界是复杂的。现实中充满了各种理论,但有些理论其实往往并不奏效。我们觉得我们理解事情如何发生 — X导致Y — 但Z却发生了。这多少让人困惑。

Needless to say, such complexity poses a big problem for biology. How should animals learn from such unpredictable situations? What’s the best way to cope with contingency? We don’t need perfection, but we do require an efficient mental mechanism that allows us to maximize utility most of the time.

这种复杂性给生物学提出了一个大难题:动物如何在这样一个不可测的环境中学习?应付突发性事件最好的方法究竟是什么?我们不需要一个完美的方案,但我们的确需要一个有效率的机制能让我们的精力投入效用最大化。

Enter reinforcement learning, a theoretical framework that helps explain how the rewards and punishments of life get translated into effective behavior. It doesn’t matter if it’s monkeys responding to squirts of juice or rats jonesing for pellets or humans plying the stock market: The algorithms of reinforcement learning neatly describe our decisions. The persuasive power of reinforcement is why we give kindergartners gold stars and professionals a monetary bonus: Nothing influences outcomes like a bit of positive feedback. Furthermore, neuroscientists have identified several mechanisms in the cortex that seem to obey these computational principles. It’s an incredibly elegant link between the software of mind and the hardware of brain.

强化学习这种理论解释了奖赏与惩罚如何有助于形成更有效的行为。猴子对果汁龙头的反应,小白鼠面对小糖丸的表现,以及人们在股票市场里的流连往返,这些行为都可以通过强化学习的机制得以解释得清清楚楚。强化学习极具说服力地解释了为什么我们会给幼稚园小朋友以小红花,同时也为专家设立奖金:没有什么比正向反馈更能影响结果的了。神经科学家还进一步确认了在大脑皮层中存在一些结构符合强化学习的计算原则,这可是令人非常称奇的现象,它连结了人的软件(思维)与硬件(大脑)。

However, one of the longstanding limitations of much reinforcement learning research is the lack of naturalistic context, as scientists have been forced to rely on abstract games in the lab. We don’t observe rats in the wild — we track them in Plexiglas cages. We don’t watch monkeys swing through the forest — we give them sweet treats preceded by lights and bells. This makes the data easier to comprehend, but it also makes it unclear how these same mechanisms might operate in a more complicated environment. Does reinforcement learning always work? Or do the same habits that make us look so smart in the lab sometimes backfire in the real world? Is there such a thing as too much feedback?

然而,关于强化学习研究,一个长久的缺陷在于缺乏自然背景,科学家们仅仅依赖于实验室里进行的那些抽象化的游戏。我们不在野生环境中追踪老鼠 — 而是在玻璃箱中观察它。我们不在丛林中观察猴子在树上腾跃 — 而是在给它们糖果前亮灯响铃。这令数据更容易相关,也并未揭示当面对复杂的环境时,这种机制又如何运行。强化学习是否总是有效呢?在实验室里看上去很聪明的一些习惯,有没有可能在现实世界中的作用却正好相反呢?当反馈不再简单抽象,这种机制还会有效吗?

To answer these questions, Tal Neiman and Yonatan Loewenstein at the Hebrew University of Jerusalem turned to professional basketball. More specifically, they looked at 200,000 three-point shots taken by 291 leading players in the NBA between 2007 and 2009. (They also looked at 15,000 attempted shots by 41 leading players in the WNBA during the 2008 and 2009 regular seasons.) The scientists were particularly interested in how makes and misses influenced subsequent behavior. After all, by the time players arrives in the NBA, they’ve executed hundreds of thousands of shots and played in countless games. Perhaps all that experience reduces the impact of reinforcement, making athletes less vulnerable to the unpredictable bounces of the ball. A make doesn’t make them too excited and a miss isn’t too discouraging.

为了回答这些问题?耶路撒冷希伯来大学的塔尔 尼曼(Tal Neiman)与约纳坦 罗文斯坦(Yonatan Loewenstein)将目光对准了职业篮球。更特别的是,他们选择了NBA2007-2009年内291名球员200,000次三分投篮。(他们同样选取了WNBA2008-2009常规赛里41名球员15,000次三分投篮。)科学家特别感兴趣的是,投中与投失究竟如何影响接下来的投篮。毕竟,能够进入NBA打球的球员,他们都练习过几十万次的投篮和打过数不清的比赛。或许这些经验能够降低强化的影响,令运动员更不易受到篮球不可测性的影响。投中不会令他们过于兴奋,投失也不会太过于打击。

But that’s not what the scientists found. Instead, they discovered that professional athletes were exquisitely sensitive to reinforcement, so that a successful three-pointer made players far significantly more likely to attempt another distant shot. In fact, after a player made three three-point shots in a row — they were now “in the zone” — they were nearly 20 percent more likely to take another three-point shot. Their past success — the positive reinforcement of the made basket — altered the way they played the game.

但科学家的发现并非如此。相反,他们发现职业运动员对于强化却是十分敏感,因此投中一个三分球会让他们更可能选择再投一次。事实上,当一个球员连中三个三分球之后 — 手感正热 — 他们会比平时多20%的可能性再投一次三分。他们过去的成功 — 由投中的球造成的正向强化 — 改变了他们打球的方式。

In many situations, such reinforcement learning is an essential strategy, allowing people to optimize behavior to fit a constantly changing situation. However, the Israeli scientists discovered that it was a terrible approach in basketball, as learning and performance are “anticorrelated.” In other words, players who have just made a three-point shot are much more likely to take another one, but much less likely to make it:

在大多数情况下,这样的强化学习是一个重要的学习策略,让人们可以在一个稳定变化的环境中最优化自己的行为。然而,以色列科学家却在篮球比赛中得到了一个令人惊讶的发现,强化学习与表现竟然“负相关”。换句话说,尽管刚刚投中三分的球员更可能选择再投一次三分,可这再次的尝试却更可能投失!

What is the effect of the change in behaviour on players’ performance? Intuitively, increasing the frequency of attempting a 3pt after made 3pts and decreasing it after missed 3pts makes sense if a made/missed 3pts predicted a higher/lower 3pt percentage on the next 3pt attempt. Surprizingly [sic], our data show that the opposite is true. The 3pt percentage immediately after a made 3pt was 6% lower than after a missed 3pt. Moreover, the difference between 3pt percentages following a streak of made 3pts and a streak of missed 3pts increased with the length of the streak. These results indicate that the outcomes of consecutive 3pts are anticorrelated.

行为的改变究竟对球员表现产生了怎样的影响呢?从直觉上来看,如果投中/投失三分预示着下一投会有更高/更低的命中率,那么投中三分之后增加出手三分的次数,投失三分之后降低出手三分的次数便很合理。但数据表明的正与此相反:投中之后一投的命中率比投失之后一投的命中率要低6%。而且,这种差距还会随着连续投中与连续投失次数的增加而增加。这说明连续命中的三分之间也是负相关的。

This anticorrelation works in both directions. as players who missed a previous three-pointer were more likely to score on their next attempt. A brick was a blessing in disguise.
负相关表现在两个方面:球员如果投失一个三分,那么下一次出手三分则更可能投进。塞翁失马,焉知非福。
What’s the larger lesson? It turns out that professional athletes over-generalize from their most recent actions and outcomes. They modify their behavior based on the result of a single shot, even though the success of the shot was shaped by unpredictable forces (a butterfly flapping its wings in Tokyo, etc.) and depended on situational details that are unlikely to be repeated. (Perhaps the defender was momentarily distracted, or failed to run around the screen.) As the scientists note, “The behavior of basketball players shows the limitations of learning from reinforcement, especially in a complex environment such as a basketball game.”

这教给我们更多什么?这说明职业球员在他们的行为与表现之间是超常规的。他们上一投的结果改变了他们的行为,即使那投中的一球是受了某种不可预知力的影响(譬如一只蝴蝶在东京扇了扇翅膀)并依赖很难再现的情境(或许是防守球员忽然分心或是没跟上进攻球员)。科学家写道,“篮球运动员的行为说明强化学习的局限,尤其是在好比篮球比赛这样复杂的环境中。”

This problem, of course, isn’t confined to athletes. Investors modify behavior based on recent market performance, even though the market is mostly a random walk. Gamblers won’t leave a casino if they’re on a hot streak. Pundits who make an accurate prediction are convinced they’ve now solved the world. Military generals are always preparing for the last war. Although people can’t help but learn from the reinforcement signals of the world — that’s just the way the mind is designed — we need to remember that these signals come with stark limitations, especially when they emerge from a complex situation. Sometimes, the best thing we can do is not learn from what just happened.
当然,这个问题不仅表现在运动员身上。投资者根据市场表现改变行为,即使市场表现近乎随机性的变化。赌徒也绝不会在手风正顺时离开赌场。刚作出预测的专家总觉得已经解决了问题。将军们则总是在准备着最后一战。尽管人们不由自主地通过外界的强化信号学习 — 我们的思维正是被那样设计的 — 我们仍需要提醒自己:这些信号有着明显的局限,特别是当它们来自于复杂的环境。有时,最好的办法是我们不从刚刚发生的事情当中学习什么。