经济代写|微观经济学代考Microeconomics代写|ECON2516 Some models

经济代写|微观经济学代考Microeconomics代写|ECON2516 Some models

经济代写|微观经济学代考Microeconomics代写|Models of decision under bounded rationality

A first model of choice under bounded rationality is the “satisficing model” proposed by Simon (1982) in opposition to the classic “optimizing” model. The decision-maker judges actions by means of partial criteria $u_k$, to which are attributed the aspiration thresholds $\sigma_k$; he examines the actions in a predefined order and chooses the first one to attain the aspiration thresholds for all the criteria: $s_i$ such that $u_k\left(s_i\right) \geq \sigma_k$. As a particular case, one can consider a unique criterion $u$ (as in the case of optimisation), with its aspiration threshhold $\varepsilon$; the decision-maker chooses the action $s_i$ such that $u\left(s_i\right) \geq \sigma$. At first sight, the $\varepsilon$-rationality model of Radner fits this definition, by considering that the decision-maker chooses the first action that approaches to within $\varepsilon$ of the optimum: $u\left(s_i\right) \geq \max _i u\left(s_i\right)-\varepsilon$, but here the aspiration threshold actually depends on the maximum attainable utility, which is generally unknown to the decision-maker. It can be observed that the satisficing model admits the optimizing model as limiting case when the aspiration thresholds are high enough. However, the satisficing model is directly expressed in terms of bounded instrumental rationality and not bounded cognitive rationality. For this latter to appear, we must examine a process of deliberation by the decision-maker that brings into play cognitive constraints such that he is led to seek a satisfactory action. Such a process, which would have the advantage of endogenising the aspiration thresholds of the decision-maker, has not yet been proposed.
A second model of choice under limited rationality is the “probabilist choice model” (Anderson, de Palma, Thisse, 1992). From a finite set of possible actions, the decision maker chooses the action $i$ with probability $p_i$ such that: $p_i=w_i / \sum_j w_j$, where $w_i$ is a propensity to choose the action $i$ linked to an index of utility $u_i$ of the action $i$. In the linear model, the parameters $w_i$ are proportional to the index of utility: $w_i=u_i$. In the multinomial logit model, the parameters $w_i$ are written in exponential form: $w_i=e^{\mu u_i}$, with the convenient introduction of a parameter $\mu$. Here again, the logit model converges towards the optimising model when the parameter $\mu$ tends to infinity; the decision-maker acts then no more in a stochastic manner, but in a determinist manner (except in the case of indifference between two actions). Conversely, the logit model tends to a purely random choice model when $\mu$ tends to zero. The parameter $\mu$ thus appears to reflect the limited cognitive capacities of the decision maker, but yet again it operates in a model expressing limited instrumental rationality. However, two cognitive justifications of this model, endogenising the parameter $\mu$, have been put forward. In the first, the decision-maker is endowed with a random utility function, but remains optimising to such an extent that he implements each action with the probability that it is the optimising one. When the law of probability of the utility is chosen correctly (doubly exponential), the logit model is obtained. In the second justification (Mattsson-Weibull, 2002), the decision-maker chooses an action by arbitrating between its utility and a control cost in relation to a reference action. When the control cost is chosen correctly (in the form of entropy), the logit model is again obtained.

经济代写|微观经济学代考Microeconomics代写|Models of learning in static situations

The “fictitious play model” assumes that the decision-maker, during a repeated process of decision, is capable of predicting the future states of nature. Moreover, this model essentially expresses exploitation behavior. The decision-maker observes the past frequency of states of nature, deduces from it a distribution of probabilities on future states and chooses, for each period, the action which maximises his expected utility according to this distribution. Exploration behavior can be introduced through voluntary deviation from the above behavior, and this deviation can take two forms. In the ” $\varepsilon$-greedy fictitious play” model, the decision maker can either use the optimum action with the probability $1-\varepsilon$, or use another action drawn uniformly at random with the probability $\varepsilon$. In the “disturbed fictitious play” model, the decision-maker uses the logit (and no longer optimising) choice rule with, as index of utility, the expected utility calculated for each action. For the standard fictitious play, one can easily demonstrate that the decision process will converge towards the optimal action (in the sense of maximisation of expected utility) simply by means of the law of large numbers (the frequency of appearance of each state tends to its probability). For the variations proposed, on the contrary, this convergence is not sure because the random component generated by exploration does not disappear asymptotically.

The “CPR model” (Laslier-Topol-Walliser, 2000) is a model of reinforcement (Roth-Erev, 1995) which assumes that the decision-maker only observes the past performance of his actions and no longer observes the states of nature. It considers that the decision-maker adopts, as index of utility, the cumulative utility obtained for each action and that he chooses his future action with a probability proportional to this index. This model presents good properties as regards the exploration-exploitation dilemma. At the beginning of the process, as the indexes are often initialised uniformly, the decision-maker carries out a systematic exploration of all the actions. At the end of the process, if the index of one action becomes predominant in relation to the others, exploitation becomes very strong, although exploration is never abandoned (every action possesses a residual probability of being chosen). What is more, if one increases (decreases) the parameter $\mu$, one moves the exploration-exploitation compromise towards more exploitation (exploration). For $\mu=0$, there is pure exploration because all the actions are used with the same probability; for $\mu=\infty$, there is pure exploitation because only the action with the maximum index of utility is used. It can be demonstrated that the learning process thus defined converges towards the optimal action (still in the sense of expected utility) because the good actions are played more and more often, due to a retroactive effect of the cumulative utility, whereas exploration tends to zero.

经济代写|微观经济学代考Microeconomics代写|Models of decision under bounded rationality

有限理性下的第一个选择模型是 Simon (1982) 提出的与经典“优化”模型相对立的“满意模型”。决策者通过 部分标准来判断行动 $u_k$, 归因于吸入淢值 $\sigma_k$; 他按照预定义的顺序检查动作, 并选择第一个达到所有标准的 期望阈值的动作: $s_i$ 这样 $u_k\left(s_i\right) \geq \sigma_k$. 作为一种特殊情况, 可以考虑一个独特的标准 $u$ (如在优化的情况 下), 其原望河值 $\varepsilon$; 决策者选择行动 $s_i$ 这样 $u\left(s_i\right) \geq \sigma$. 乍一看, $\varepsilon$-Radner 的理性模型符合这个定义, 者 虑到决策者选择接近内部的第一个行云 $\varepsilon$ 最佳的: $u\left(s_i\right) \geq \max _i u\left(s_i\right)-\varepsilon$, 但这里的期望阈值实际上取 决于最大可达到的效用, 这通常是决策者不知道的。可以观察到, 当期望惐值足够高时, 满足模型将优化模 型视为极限情况。然而, 满足模型直接用有限的工具理性而不是有限的认知理性来表达。对于后者的出现, 涐们処须检查决策者的深思孰虑过程, 该过程使认知约束发挥作用, 从而导致他寻求今人满意的行动。尚末 提出这样一个过程, 官具有使决策者的原望阈值内生化的优势。
有限理性下的第二种选择模型是“概率选择模型” (Anderson, de Palma, Thisse, 1992) 。从一组有限的 数挂钩 $u_i$ 行动的 $i$. 在线性模型中, 参数 $w_i$ 与效用指数成正比: $w_i=u_i$. 在多项式logit模型中, 参数 $w_i$ 写 成指数形式: $w_i=e^{\mu u_i}$, 方便地引入一个参数 $\mu$. 同样, 当参数 $\mu$ 趋于无穷大; 决策者不再以随机方式行 动, 而是以炵定论方式行动 (除非两个行动之间没有差异)。相反, logit 模型在以下情况下趋向于纯随机 选择模型 $\mu$ 趋于零。参数 $\mu$ 因此似平反映了决策者有限的认知能力, 但它又一次在表达有限工具理性的模型 中运作。然而, 这个模型的两个许理由, 内生参数 $\mu$, 提出来了。在第一种情况下, 决策者被嘽予了一个 随机的效用函数, 但仍保持优化到这样的程度, 即他执行每个动作的概率都是优化的。当效用的概率定律选 择正确 (双指数) 时, 得到 Logit 模型。在第二个理由中 (Mattsson-Weibull, 2002), 决策者通过在其 效用和与参苦行云相关的控制成本之间进行仲裁来选择行云力。当控制成本选择正确 (以滳的形式) 时, 再次 获得 Logit 模型。

经济代写|微观经济学代考Microeconomics代写Models of learning in static situations

“虚拟游戏模型”假设决策者在重复决策过程中能够预测末来的自然状态。此外, 该模型本质上表达了剥削行 为。决策者观察过去的自然状态频率, 从中推导出末来㚭态的概率分布, 并根据该分布为每个时期选择最大 化其预期效用的行动。可以通过自愿偏离上述行为来引入探东行为, 这种偏离可以有两种形式。在里面 ” $\varepsilon-$ greedy fictitious play”模型, 决笨者可以使用概率为 $1-\varepsilon$, 或者使用另一个随机抽取的动作, 概率为 $\varepsilon$. 在“受干扰的虚拟游戏模型中, 决策者使用 logi十 (不再优化) 选择规则, 作为效用指标, 为每个动作计算 的预期效用。对于标准的虚构游戏, 只需借助大数定律(每个状态的出现频率趋向于其可能性)。相反, 对 于提出的变体, 这种收敛是不确定的,因为探索产生的随机成分不会渐近消失。
“CPR 模型”” (Laslier-Topol-Walliser, 2000) 是一种强化模型 (Roth-Erev, 1995),它假设决策者只观 察过去的行为表现, 不再观察自然状态。它认为决策者采用每次行动所获得的累积效用作为效用指标, 并以 与该指标成正比的摡率选择末来的行动。该模型在探索-开发困境方面表现出良好的特性。在过程开始时, 由于指标往往是统一祀始化的, 决策者对所有的动作进行糸统的探索。在这个过程的最后, 如果一个行动的 指标相对于其他行动变得占主导地位, 剥削就会变得非常强列, 尽管探索永远不会被放弃(每个动作都具有 被选择的剩余概率)。更重要的是, 如果增加(减少) 参数 $\mu$, 人们将探索-开发折衷方案转向更多的开发 (探系) 。为了 $\mu=0$, 存在纯粹的探索, 因为所有的动作都以相同的概率被使用; 为了 $\mu=\infty$, 存在纯 粹的剥削, 因为只使用具有最大效用指数的动作。可以证明, 如此定义的学入过程收敛于最佳动作 (仍然在 预期效用的意义上,,因为由于累积效用的追漰效应, 好的动作越来越频悔, 而探东龶于零.


