{"id":258,"date":"2023-08-25T00:01:38","date_gmt":"2023-08-24T16:01:38","guid":{"rendered":"http:\/\/106.52.213.145:21080\/?p=258"},"modified":"2023-08-25T00:13:02","modified_gmt":"2023-08-24T16:13:02","slug":"gongshituidao-klsandujiaochashangyusoftmaxdedaoshukdgongshi","status":"publish","type":"post","link":"https:\/\/apifj.com\/index.php\/2023\/08\/25\/gongshituidao-klsandujiaochashangyusoftmaxdedaoshukdgongshi\/","title":{"rendered":"[\u516c\u5f0f\u63a8\u5bfc] KL\u6563\u5ea6\u3001\u4ea4\u53c9\u71b5\u4e0esoftmax\u7684\u5bfc\u6570\u3001KD\u516c\u5f0f"},"content":{"rendered":"<h2>(1) <strong>KL<\/strong>\u6563\u5ea6<\/h2>\n<p>\u71b5\uff08Entropy\uff09\u662f\u4e00\u4e2a\u57fa\u672c\u6982\u5ff5\uff0c\u7528\u4e8e\u8861\u91cf\u968f\u673a\u53d8\u91cf\u7684\u4e0d\u786e\u5b9a\u6027\u6216\u4fe1\u606f\u91cf\u3002\u71b5\u8d8a\u9ad8\uff0c\u8868\u793a\u968f\u673a\u53d8\u91cf\u7684\u4e0d\u786e\u5b9a\u6027\u8d8a\u5927\uff0c\u4fe1\u606f\u91cf\u4e5f\u5c31\u8d8a\u5927\u3002<\/p>\n<p>KL\u6563\u5ea6\uff0c\u4e5f\u79f0\u4e3a\u76f8\u5bf9\u71b5\uff08Relative Entropy\uff09\u662f\u7528\u6765\u8861\u91cf\u4e24\u4e2a\u6982\u7387\u5206\u5e03\u76f4\u63a5\u5dee\u5f02\u7684\u65b9\u5f0f,\u5b83\u8861\u91cf\u4e86\u4e24\u4e2a\u6982\u7387\u5206\u5e03\u4e4b\u95f4\u7684\u5dee\u5f02\u6216\u8005\u8bf4\u662f\u4fe1\u606f\u635f\u5931\u3002<\/p>\n<p>\u7ed9\u5b9a\u4e24\u4e2a\u79bb\u6563\u6982\u7387\u5206\u5e03P\u548cQ\uff0cKL \u6563\u5ea6\u7684\u5b9a\u4e49\u5982\u4e0b\uff1a<\/p>\n<pre><code class=\"language-katex\">\nD_{KL}(P \\| Q) = \\sum_{x} P(x) \\log\\left(\\frac{P(x)}{Q(x)}\\right)\n<\/code><\/pre>\n<p>KL \u6563\u5ea6\u662f\u7528\u6765\u8861\u91cf\u4e24\u4e2a\u6982\u7387\u5206\u5e03\u4e4b\u95f4\u7684<strong>\u5dee\u5f02\u6216\u4fe1\u606f\u635f\u5931<\/strong>\u7684\u5ea6\u91cf\u3002<\/p>\n<pre><code class=\"language-katex\">\nD_{KL}(P \\| Q) = \\sum_{x} P(x) \\log\\left(\\frac{P(x)}{Q(x)}\\right)\\\\\n=\\sum_x [P(x)log(P(x))-P(x)log(Q(x))]\\\\\n<\/code><\/pre>\n<p>\u4ee4<\/p>\n<pre><code class=\"language-katex\">\nG(P) = \\sum_{x} P(x) \\log(P(x))\n<\/code><\/pre>\n<pre><code class=\"language-katex\">\nH(P, Q) = -\\sum_{x} P(x) \\log(Q(x))\n<\/code><\/pre>\n<p>\u6240\u4ee5\u516c\u5f0f\uff082\uff09\u5219\u4e3a<\/p>\n<pre><code class=\"language-katex\">\nD_{KL}(P \\| Q) =G(P)+H(P,Q)\n<\/code><\/pre>\n<p>\u5bf9\u4e8e\u7b2c\u4e00\u90e8\u4efd<code class=\"katex-inline\">G(P)<\/code>\u53ef\u4ee5\u81ea\u5df1\u5bf9\u4e8e\u81ea\u5df1\u7684\u5dee\u5f02\u8861\u91cf\uff0c\u800c\u5728\u8bad\u7ec3\u4e4b\u4e2d\uff0c<code class=\"katex-inline\">P<\/code>\u662f\u8bad\u7ec3\u7684\u6570\u636e\u96c6\uff0c\u8bad\u7ec3\u96c6\u662f\u56fa\u5b9a\u7684\uff0c\u4ed6\u7684\u5206\u5e03\u4e5f\u662f\u56fa\u5b9a\u7684\u3002\u4e5f\u5c31\u662f\u8bf4\uff0c\u7531KL\u6563\u5ea6\u7684\u5b9a\u4e49\u6765\u8bb2\uff0c\u4ed6\u7684\u610f\u4e49\u662f\u4ed6\u672c\u8eab\u5bf9\u4e8e\u6982\u7387\u6052\u5b9a\u4e3a1\u7684\u5dee\u5f02\u3002\u6240\u4ee5\u4ed6\u662f\u4e00\u4e2a\u5e38\u6570\u7684\u503c\u3002\u800c\u7b2c\u4e8c\u90e8\u5206\u7684<code class=\"katex-inline\">H(P,Q)<\/code>\u662f\u53d8\u5316\u7684\uff0c\u56e0\u4e3a\u4ed6\u542b\u6709\u8bad\u7ec3\u6a21\u578b\u7684\u9884\u6d4bQ\uff0c\u6240\u4ee5\u4ed6\u662f\u53d8\u5316\u7684\u3002\u4e5f\u5c31\u662f\u8bf4\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u6700\u5c0f\u5316<code class=\"katex-inline\">H(P,Q)<\/code>\u6765\u8ba9<code class=\"katex-inline\">Q(x)<\/code>\u8fbe\u5230\u4e0e<code class=\"katex-inline\">P(x)<\/code>\u5206\u5e03\u6700\u5c0f\u7684\u5dee\u5f02\u3002<\/p>\n<p>\u800c\u7b2c\u4e8c\u90e8\u5206<code class=\"katex-inline\">H(P,Q)<\/code>\u6211\u4eec\u79f0\u4e4b\u4e3a\u4ea4\u53c9\u71b5\uff0c\u5176\u6765\u6e90\u4e0e\u4fe1\u606f\u8bba\u4e2d\u7684\u4ea4\u53c9\u71b5\u4e00\u6837<\/p>\n<pre><code class=\"language-katex\">\nf(x)=-\\int p(x)log\\ g(x)dx\n<\/code><\/pre>\n<p>\u5176\u6ce8\u610f\uff0c\u4ea4\u53c9\u71b5\u662f\u4e00\u4e2a\u975e\u5bf9\u79f0\u7684\u5ea6\u91cf\uff0c\u5373<code class=\"katex-inline\">H(P, Q) \u2260 H(Q, P)<\/code>\u3002<\/p>\n<h2>(2)<strong>\u591a\u5206\u7c7b\u4ea4\u53c9\u71b5\u4e0e\u5355\u5206\u7c7b\u4ea4\u53c9\u71b5<\/strong><\/h2>\n<p>\u6211\u4eec\u53ef\u4ee5\u770b\u89c1\u4e24\u4e2a\u5f62\u5f0f\u7684\u4ea4\u53c9\u71b5\u51fd\u6570\u5982\u4e0b\uff1a<\/p>\n<pre><code class=\"language-katex\">\nL_{mul} = \\frac{1}{N}\\sum_{i} L_i = - \\frac{1}{N}\\sum_{i} \\sum_{c=0}^My_{ic}\\log(p_{ic})\n<\/code><\/pre>\n<p>c\u662f\u7c7b\u522b\uff0c<code class=\"katex-inline\">y_{ic}<\/code>\u8868\u793a\u8fd9\u91cc\u8868\u793a\u5f53i\u662fc\u7c7b\u7684\u65f6\u5019\u4e3a1\uff0c\u5426\u5219\u4e3a\u4e3a0\u3002<code class=\"katex-inline\">p_{ic}<\/code>\u8868\u793ai\u662fc\u7684\u6982\u7387\u3002<\/p>\n<pre><code class=\"language-katex\">\nL_{log}(y,p)=-[ylog(p)+(1-y)log(1-p)]\n<\/code><\/pre>\n<p>\u5728\u8a31\u591a\u535a\u5ba2\u4e4b\u4e2d\u90fd\u662f\u5148\u8aaa\uff088\uff09\u518d\u8aaa\uff087\uff09\uff0c\u6211\u8a8d\u70ba\u9019\u662f\u4e0d\u7b26\u5408\u908f\u8f2f\u7684\u3002\u6211\u8a8d\u70ba\uff0c\uff088\uff09\u662f\u7531\uff087\uff09\u6240\u63a8\u5c0e\u51fa\u4f86\u7684\u3002\u7576<code class=\"katex-inline\">L_{mul}<\/code>\u70ba\u4e8c\u5206\u985e\u7684\u6642\u5019\uff0c\u4e5f\u5c31\u662f\u8aaa\uff0cc\u7684\u53d6\u503c\u53ea\u6709\u53ef\u80fd\u70ba0\u548c1,\u7576\u70ba0\u7684\u6642\u5019\uff0ci\u7684\u53d6\u503c\u53ef\u80fd\u70ba0\u6216\u80051\uff0c\u5247\u6709<\/p>\n<pre><code class=\"language-katex\">\nL_{mul}|\\{c=0\\} = -1\/N [y_{00}log(p_{00})+[y_{10}log(p_{10})]]=-1\/N\\ y_{00}log(p_{00})\\\\\nL_{mul}|\\{c=1\\} = -1\/N [y_{01}log(p_{01})+[y_{11}log(p_{11})]]=-1\/N\\ y_{11}log(p_{11})\\\\\nL_{mul}=L_{mul}|\\{c=0\\}+L_{mul}|\\{c=1\\}=-1\/N[y_{00}log(p_{00})+y_{11}log(p_{11})]\n<\/code><\/pre>\n<p>\u7531\u65bc\u662f\u4e8c\u5206\u985e\uff0c\u6240\u4ee5<code class=\"katex-inline\">p_{00}<\/code>\u548c<code class=\"katex-inline\">p_{11}<\/code>\u662f\u5c0d\u7acb\u4e8b\u4ef6\uff0c\u4ee4<code class=\"katex-inline\">p_{00}<\/code>=p\uff0c\u5247<code class=\"katex-inline\">p_{11}=1-p<\/code>\uff0cy\u540c\u7406\u3002<\/p>\n<p>\u6240\u4ee5\u53ef\u4ee5\u63a8\u51fa<\/p>\n<pre><code class=\"language-katex\">\nL_{mul}=L_{mul}|\\{c=0\\}+L_{mul}|\\{c=1\\}=-1\/N[y_{00}log(p_{00})+y_{11}log(p_{11})]\\\\\n=-[ylog(p)+(1-y)log(1-p)]\n<\/code><\/pre>\n<h2>(3) <em>softmax<\/em>\u7684\u5bfc\u6570<\/h2>\n<p>\u6211\u4eec\u5bf9\u4e8e<em>softmax<\/em>\u7684\u7ed3\u6784\u6709\u5982\u4e0b\uff1a<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/106.52.213.145:21080\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-25-at-12.03.52-AM-300x200.png\" alt=\"\" \/><\/p>\n<p><em>softmax<\/em> \u516c\u5f0f\u5982\u4e0b\uff1a<\/p>\n<pre><code class=\"language-katex\">\nS_i=\\frac{e^{z_i}}{\\sum_k^ne^{z_k}}\n<\/code><\/pre>\n<p>\u5176\u4e2d\uff0c<code class=\"katex-inline\">z<\/code>\u662f\u6700\u540e\u7684\u5168\u8fde\u63a5\u5c42\u7684\u8f93\u51fa\uff0c<code class=\"katex-inline\">z_i<\/code>\u8868\u793a\u7b2ci\u4e2a\u5355\u5143\u7684\u8f93\u51fa<\/p>\n<pre><code class=\"language-katex\">\n\\frac{\\partial S_i}{\\partial z_j}=\n\n\\frac{\n\\partial(\\frac{e^{z_i}}{\\sum_k^ne^{z_k}})\n}{\n\\partial z_j\n}\\\\\n\n=\\frac{\n\\frac{\\partial e^{z_i}}{\\partial z_j}\u00b7 \\sum_k^n e^{z_k}\n-\ne^{z_i}\u00b7e^{z_j}\n}{\n[\\sum_k^n e^{z_k}]^2\n}\n<\/code><\/pre>\n<p>\u8fd9\u91cci\u548cj\u5e76\u4e0d\u4e00\u5b9a\u662f\u4e00\u6837\u7684\uff0c\u6240\u4ee5\u6211\u4eec\u9700\u8981\u5206\u60c5\u51b5\u8ba8\u8bba<\/p>\n<p>1&#8242;. \u82e5<code class=\"katex-inline\">i = j<\/code> \u6709<\/p>\n<pre><code class=\"language-katex\">\n\\frac{\\partial S_i}{\\partial z_i}=\\frac{\n\\frac{\\partial e^{z_i}}{\\partial z_i}\u00b7 \\sum_k^n e^{z_k}\n-\ne^{z_i}\u00b7e^{z_i}\n}{\n[\\sum_k^n e^{z_k}]^2\n}\\\\\n=\n\\frac{\ne^{z_i} \u00b7 \\sum_k^n e^{z_k}\n}{\n[\\sum_k^n e^{z_k}]^2\n}-\n\\frac{\n[e^{z_i}]^2\n}{\n[\\sum_k^n e^{z_k}]^2\n}\\\\=\n\\frac{\ne^{z_i}\n}{\n\\sum_k^n e^{z_k}\n}-\n[\\frac{\ne^{z_i}\n}{\n\\sum_k^n e^{z_k}\n}]^2\\\\\n=s_i-s_i^2\n<\/code><\/pre>\n<p>2&#8242;. \u82e5<code class=\"katex-inline\">i \\not= j<\/code> \u6709<\/p>\n<pre><code class=\"language-katex\">\n\\frac{\\partial S_i}{\\partial z_i}=\n\n\\frac{\n\\frac{\\partial e^{z_i}}{\\partial z_j}\u00b7 \\sum_k^n e^{z_k}\n-\ne^{z_i}\u00b7e^{z_j}\n}{\n[\\sum_k^n e^{z_k}]^2\n}\\\\\n\n=\n\\frac{\n0\u00b7 \\sum_k^n e^{z_k}\n-\ne^{z_i}\u00b7e^{z_j}\n}{\n[\\sum_k^n e^{z_k}]^2\n}\\\\\n\n=-\n\\frac{\ne^{z_i}\n}{\n\\sum_k^n e^{z_k}\n}\u00b7\n\\frac{\ne^{z_j}\n}{\n\\sum_k^n e^{z_k}\n}\\\\\n\n=-s_i\u00b7s_j\n<\/code><\/pre>\n<p>\u7d9c\u4e0a<\/p>\n<pre><code class=\"language-katex\">\n\\frac{\\partial S_i}{\\partial z_j}=\n\\begin{cases}\n  s_i-s_i^2 &amp;i=j\\\\\n  -s_i s_j&amp; i \\not= j\n\\end{cases}\n<\/code><\/pre>\n<h2>(4) \u4ea4\u53c9\u71b5\u7d50\u5408<em>softmax<\/em>\u7684\u5bfc\u6570<\/h2>\n<p>\u4ea4\u53c9\u71b5\u7ed3\u5408<em>softmax<\/em>\u7684\u7f51\u7edc\u7ed3\u6784\u5982\u4e0b\uff1a<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/106.52.213.145:21080\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-25-at-12.02.26-AM-300x121.png\" alt=\"\" \/><\/p>\n<pre><code class=\"language-katex\">\n\\frac{\\partial C}{\\partial z_i}=\\sum_j^n \\frac{\\partial C}{\\partial S_j} \\frac{\\partial S_j}{\\partial z_i}\\\\\n= \\sum_j^n {\n    \\frac{\\partial(-\\sum^n_k y_k log(S_k))} {\\partial S_j}\u00b7\n  \\frac{\\partial S_j}{\\partial z_i}\n}\n<\/code><\/pre>\n<p>\u5f53<code class=\"katex-inline\">k\\not=j<\/code>\u6709<\/p>\n<pre><code class=\"language-katex\">\n\\frac{\\partial(-\\sum^n_k y_k log(S_k))} {\\partial S_j}=0\n<\/code><\/pre>\n<p>\u5f53<code class=\"katex-inline\">k=j<\/code>\u6709<\/p>\n<pre><code class=\"language-katex\">\n\\frac{\\partial(-\\sum^n_k y_k log(S_k))} {\\partial S_j}=-\\frac{y_j}{s_j}\n<\/code><\/pre>\n<p>\u6240\u4ee5\uff0c\u516c\u5f0f(19)\u53ef\u4ee5\u63a8\u5bfc\u4e3a\u5982\u4e0b\uff1a<\/p>\n<pre><code class=\"language-katex\">\n(19)=-\\sum_j^n \\frac{y_j}{s_j}\u00b7 \\frac{s_j}{z_i}\\\\\n=-[\\frac{y_i}{s_i}\u00b7(s_i-(s_i)^2)+\\sum_{i\\not=j}^n \\frac{y_j}{s_j}(-s_i s_j)] \\\\\n=-[y_i - y_i s_i -\\sum_{i\\not=j}^n y_j s_i]\\\\\n=-[y_i -\\sum_{j=1}^n y_j s_i]\\\\\n=-[y_i -s_i\\sum_{j=1}^n y_j] )\\\\\n=s_i-y_i\n<\/code><\/pre>\n<p>\u5176\u4e2d\uff0c\u7b2c\u4e8c\u884c\u5e26\u5165\u4e86\u516c\u5f0f(18),\u7b2c5\u884c\u662f\u56e0\u4e3ay\u662fone-hot \u6216\u8005\u5728KD\u4e2dsoftmax(T)\uff0c\u6240\u4ee5\u603b\u548c\u4e3a1\u3002\u6240\u4ee5\u6700\u540e\u7684\u6c42\u5bfc\u7ed3\u679c\u4e3a\u9884\u6d4b\u7ed3\u679c\u51cf\u53bb\u6807\u7b7e\u3002<\/p>\n<h2>(5)\u63a8\u5bfcKD\u8bba\u6587\u4e2d\u516c\u5f0f(2)-(4)<\/h2>\n<p>soft labels \u7684\u635f\u5931\u51fd\u6570\u5982\u4e0b\uff1a<\/p>\n<pre><code class=\"language-katex\">\nL_{soft}=-\\sum_j^N t_j log(s_i)\n<\/code><\/pre>\n<p>\u5176\u5bfc\u6570\u4e3a<\/p>\n<pre><code class=\"language-katex\">\n\\frac{\\partial L_{soft}}{\\partial z_i}\n= \\frac{\\partial [-\\sum_j^N t_j log(s_i)]}{\\partial (Z_i\/T)}\\frac{\\partial (Z_i\/T)}{z_i}\\\\\n= \\frac{1}{T}(s_i-t_i)\\\\\n= \\frac{1}{T}(\\frac{e^{z_i\/T}}{\\sum_k^ne^{z_k\/T}}-\\frac{e^{v_i\/T}}{\\sum_k^ne^{v_k\/T}})\n<\/code><\/pre>\n<p>\u7531\u6cf0\u52d2\u516c\u5f0f\u53ef\u5f97<\/p>\n<pre><code class=\"language-katex\">\n\\lim_{T \\to \\infty }e^{z_i\/T} = 1+z_i\/T\n<\/code><\/pre>\n<p>\u6240\u4ee5\u5e26\u5165(23)\u6709<\/p>\n<pre><code class=\"language-katex\">\n\\lim_{T \\to \\infty }\\frac{\\partial L_{soft}}{\\partial z_i} = \n\\frac{1}{T}(\\frac{1+z_i\/T}{N+\\sum_k^nz_k\/T}-\\frac{1+v_i\/T}{N+\\sum_k^nv_k\/T})\n<\/code><\/pre>\n<p>\u5047\u8a2d<code class=\"katex-inline\">\\sum z_i =0<\/code>\u4e26\u4e14<code class=\"katex-inline\">\\sum v_i =0<\/code><\/p>\n<blockquote>\n<p>If we now assume that the logits have been zero-meaned separately for each transfer case so that \u2211 j z j = \u2211 j v j = 0 Eq. 3 simpli\ufb01es to:<\/p>\n<\/blockquote>\n<pre><code class=\"language-katex\">\n\\lim_{T \\to \\infty }\\frac{\\partial L_{soft}}{\\partial z_i} = \n\\frac{1}{T}(\\frac{1+z_i\/T}{N}-\\frac{1+v_i\/T}{N}) = \n\\frac{1}{NT}[(1+z_i\/T)-(1+v_i\/T)] = \n\\frac{1}{NT^2}[z_i-v_i]\n<\/code><\/pre>\n<p>\u56e0\u4e3a\u5728\u9ad8\u6e29\u6781\u9650\u4e0b\uff0c\u5bf9\u4e8e\u6307\u6570\u51fd\u6570\u7684\u8fd1\u4f3c\u8ba1\u7b97\u4e2d\uff0c<code class=\"katex-inline\">e^{z_i\/T}<\/code> \u548c <code class=\"katex-inline\">e^{v_i\/T}<\/code> \u4e2d\u7684 <code class=\"katex-inline\">1\/T<\/code> \u90e8\u5206\u5c06\u8fc5\u901f\u8d8b\u8fd1\u4e8e 1\uff0c\u4ece\u800c\u5bfc\u81f4\u5b83\u4eec\u7684\u6bd4\u4f8b\u53d8\u5f97\u4e0e\u6e29\u5ea6 <code class=\"katex-inline\">T<\/code> \u65e0\u5173\u3002\u8fd9\u610f\u5473\u7740\u5206\u6bcd\u4e2d\u7684 <code class=\"katex-inline\">N + \\sum_k^n z_k\/T<\/code> \u548c <code class=\"katex-inline\">N + \\sum_k^n v_k\/T<\/code> \u90e8\u5206\u4e5f\u8d8b\u8fd1\u4e8e <code class=\"katex-inline\">N<\/code>\uff0c\u5728\u9ad8\u6e29\u6781\u9650\u4e0b\uff0c\u5b83\u4eec\u7684\u503c\u5bf9\u7ed3\u679c\u7684\u5f71\u54cd\u4f1a\u51cf\u5f31\u3002<\/p>\n<p>\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u5bfc\u6570\u7684\u6781\u9650 <code class=\"katex-inline\">\\lim_{T \\to \\infty }\\frac{\\partial L_{soft}}{\\partial z_i}<\/code> \u53ef\u4ee5\u7b80\u5316\u4e3a\uff1a<\/p>\n<pre><code class=\"language-katex\">\n\\lim_{T \\to \\infty }\\frac{\\partial L_{soft}}{\\partial z_i} = \n\\frac{1}{T}(\\frac{1+z_i\/T}{N}-\\frac{1+v_i\/T}{N}) = \n\\frac{1}{NT}[(1+z_i\/T)-(1+v_i\/T)] = \n\\frac{1}{NT^2}[z_i-v_i]\n<\/code><\/pre>\n<p>\u5728\u9ad8\u6e29\u4e0b\uff0c\u5373\u6e29\u5ea6\u53c2\u6570 T \u8d8b\u8fd1\u4e8e\u65e0\u7a77\u5927\u65f6\uff0csoft labels \u635f\u5931\u51fd\u6570\u7684\u5bfc\u6570\u8d8b\u8fd1\u4e8e <code class=\"katex-inline\">1\/2(z_i - v_i)^2<\/code> \u8fd9\u662f\u56e0\u4e3a\u6307\u6570\u51fd\u6570\u548c\u5206\u6bcd\u7684\u8fd1\u4f3c\u884c\u4e3a\u3002<\/p>\n<p>\u5728\u63a8\u5bfc\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u6cf0\u52d2\u516c\u5f0f\uff1a<\/p>\n<pre><code class=\"language-katex\">\n\\lim_{T \\to \\infty} e^{z_i\/T} = 1 + \\frac{z_i}{T}\n<\/code><\/pre>\n<p>\u5f53\u6e29\u5ea6 T \u975e\u5e38\u5927\u65f6\uff0c\u5206\u6bcd\u7684\u6307\u6570\u51fd\u6570\u9879 <code class=\"katex-inline\">\\frac{e^{z_i\/T}}{\\sum_k^n e^{z_k\/T}}<\/code> \u548c <code class=\"katex-inline\">\\frac{e^{v_i\/T}}{\\sum_k^n e^{v_k\/T}}<\/code> \u90fd\u4f1a\u63a5\u8fd1\u4e8e1\u3002\u8fd9\u662f\u56e0\u4e3a\u6307\u6570\u51fd\u6570\u5728\u5206\u6bcd\u4e2d\u7684\u5206\u5b50\u9879\u5728\u65e0\u7a77\u5927\u7684 T \u503c\u4e0b\u4f1a\u8d8b\u8fd1\u4e8e1\uff0c\u540c\u65f6\u5206\u6bcd\u4e2d\u7684\u6240\u6709\u6307\u6570\u9879\u4e5f\u4f1a\u8d8b\u8fd1\u4e8e1\u3002\u8fd9\u5c31\u4f7f\u5f97 soft labels \u635f\u5931\u51fd\u6570\u7684\u5bfc\u6570\u5728\u9ad8\u6e29\u4e0b\u8d8b\u8fd1\u4e8e\uff1a<\/p>\n<pre><code class=\"language-katex\">\n\\frac{1}{T} \\left(1 - 1\\right) = 0\n<\/code><\/pre>\n<p>\u7136\u800c\uff0c\u5728\u63a5\u8fd1\u4e8e\u65e0\u7a77\u5927\u7684\u60c5\u51b5\u4e0b\uff0c\u6cf0\u52d2\u516c\u5f0f\u4f1a\u5bfc\u81f4\u9879 <code class=\"katex-inline\">1+z_i\/T<\/code> \u548c <code class=\"katex-inline\">1+v_i\/T<\/code> \u4e2d\u7684\u7ebf\u6027\u9879\u9010\u6e10\u51cf\u5c0f\uff0c\u4ece\u800c\u635f\u5931\u51fd\u6570\u7684\u5bfc\u6570\u4f1a\u8d8b\u8fd1\u4e8e\u4e8c\u6b21\u9879 <code class=\"katex-inline\">1\/2(z_i - v_i)^2<\/code>\u3002\u8fd9\u5c31\u89e3\u91ca\u4e86\u5728\u9ad8\u6e29\u4e0b\u4e3a\u4ec0\u4e48 soft labels \u635f\u5931\u51fd\u6570\u7684\u5bfc\u6570\u4f1a\u8d8b\u8fd1\u4e8e <code class=\"katex-inline\">1\/2(z_i - v_i)^2<\/code>\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(1) KL\u6563\u5ea6 \u71b5\uff08Entropy\uff09\u662f\u4e00\u4e2a\u57fa\u672c\u6982\u5ff5\uff0c\u7528\u4e8e\u8861\u91cf\u968f\u673a\u53d8\u91cf\u7684\u4e0d\u786e\u5b9a\u6027\u6216\u4fe1\u606f\u91cf\u3002\u71b5\u8d8a\u9ad8\uff0c\u8868\u793a\u968f\u673a\u53d8\u91cf&#8230; &raquo; <a class=\"read-more-link\" href=\"https:\/\/apifj.com\/index.php\/2023\/08\/25\/gongshituidao-klsandujiaochashangyusoftmaxdedaoshukdgongshi\/\">\u9605\u8bfb\u5168\u6587<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-258","post","type-post","status-publish","format-standard","hentry","category-dl"],"_links":{"self":[{"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/posts\/258","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/comments?post=258"}],"version-history":[{"count":11,"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/posts\/258\/revisions"}],"predecessor-version":[{"id":271,"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/posts\/258\/revisions\/271"}],"wp:attachment":[{"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/media?parent=258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/categories?post=258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/apifj.com\/index.php\/wp-json\/wp\/v2\/tags?post=258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}