<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>叠搭宝箱</title>
  
  
  <link href="/atom.xml" rel="self"/>
  
  <link href="http://stackbox.cn/"/>
  <updated>2018-12-18T02:59:03.992Z</updated>
  <id>http://stackbox.cn/</id>
  
  <author>
    <name>叠搭宝箱</name>
    
  </author>
  
  <generator uri="http://hexo.io/">Hexo</generator>
  
  <entry>
    <title>Factorization Machine</title>
    <link href="http://stackbox.cn/2018-12-factorization-machine/"/>
    <id>http://stackbox.cn/2018-12-factorization-machine/</id>
    <published>2018-12-16T12:16:53.000Z</published>
    <updated>2018-12-18T02:59:03.992Z</updated>
    
    <content type="html"><![CDATA[<h2 id="引言"><a href="#引言" class="headerlink" title="引言"></a>引言</h2><p>Factorization Machine 主要目标是: 在数据稀疏的情况下解决组合特征的问题</p><a id="more"></a><h2 id="推导"><a href="#推导" class="headerlink" title="推导"></a>推导</h2><p>一般的线性模型为(n为特征维度)</p><p>\begin{align}<br>\hat y = \omega_0 + \sum_{i=1}^{n}\omega_{i}x_{i}<br>\end{align}</p><p>线性模型的各个特征是独立的, 但实际上有一些组合特征是很有意义的, 比如在新闻推荐系统中, 喜欢军事新闻的也很可能喜欢国际新闻,  喜欢时尚新闻的也很可能喜欢娱乐新闻, 把两两的特征组合考虑进去就能得到度为2的FM模型</p><p>\begin{align}<br>\hat y = \omega_0 + \sum_{i=1}^{n}\omega_{i}x_{i} + \sum_{i=1}^{n}\sum_{j=i+1}^{n}\omega_{ij}x_{i}x_{j}<br>\end{align}</p><p>对于 \( \omega_{ij} \) 构成的实对称矩阵 \( W \), 该矩阵为正定矩阵, 可以进行矩阵分解</p><p>\begin{align}<br>W = VV^{T}<br>\end{align}</p><p>其中矩阵 \( V \) 为一个  \( n \times k \) 的矩阵, 关于k的选择论文中是这么说的</p><blockquote><p>It is well known that for any positive definite matrix W, there exists a matrix V such that W =V · Vt provided that k is sufficiently large. This shows that a FM can express any interaction matrix W if k is chosen large enough. Nevertheless in sparse settings, typically a small k should be chosen because there is not enough data to estimate complex interactions W. Restricting k – and thus the expressiveness of the FM – leads to better generalization and thus improved interaction matrices under sparsity</p></blockquote><p>那么 FM公式中 \( \omega_{ij} \) 就可以表示为 矩阵\( V \) 中两个向量的点积</p><p>\begin{align}<br>\hat y &amp; = \omega_0 + \sum_{i=1}^{n}\omega_{i}x_{i} + \sum_{i=1}^{n}\sum_{j=i+1}^{n} &lt;v_{i}, v_{j}&gt;x_{i}x_{j}<br>\end{align}</p><p>这样二项式参数的数量就由原来的 \( \frac{n(n-1)}{2}\) 变成了  \( nk \), 而且对于原来的参数 \( \omega_{hi}\) 和 \( \omega_{hj} \) , 这两个参数是相互独立的, 在因子化之后就等同于 \( &lt;v_{h}, v_{i}&gt;\) 和  \( &lt;v_{h}, v_{j}&gt;\), 他们之间是有共同项的, 因此 所有包含 \( x_{i} \)的非0组合特征都可以用来学习隐向量 \( v_{i}\), 很大程度减少了数据稀疏带来的影响, 此外,上述公式的二次项部分算法复杂度为  \( O(kn^2) \)， 还可以进行优化</p><p>\begin{align}<br>\sum_{i=1}^{n}\sum_{j=i+1}^{n} &lt;v_{i}, v_{j}&gt;x_{i}x_{j} &amp; = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n} &lt;v_{i}, v_{j}&gt;x_{i}x_{j} - \frac{1}{2}\sum_{i=1}^{n} &lt;v_{i}, v_{i}&gt;x_{i}x_{i} \\<br>&amp; =\frac{1}{2} (\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{f=1}^{k}v_{i,f}v_{j,f}x_{i}x_{j} - \sum_{i=1}^{n}\sum_{f=1}^{k}v_{i, f}v_{i,f}x_{i}x_{i}) \\<br>&amp; = \frac{1}{2}\sum_{f=1}^{k}((\sum_{i=1}^{n}v_{i,f}x_{i})(\sum_{j=1}^{n}v_{j,f}x_{j}) - \sum_{i=1}^{n} v_{i,f}^2 x_i^2)\\<br>&amp; = \frac{1}{2}\sum_{f=1}^{k}((\sum_{i=1}^{n}v_{i,f}x_{i})^2 - \sum_{i=1}^{n} v_{i,f}^2 x_i^2)\\<br>\end{align}</p><p>此时的算法复杂度为 \( O(kn) \)</p><p>完整的FM模型公式为</p><p>\begin{align}<br>\hat y = \omega_0 + \sum_{i=1}^{n}\omega_{i}x_{i} + \frac{1}{2}\sum_{f=1}^{k}((\sum_{i=1}^{n}v_{i,f}x_{i})^2 - \sum_{i=1}^{n} v_{i,f}^2 x_i^2)<br>\end{align}</p><h2 id="训练"><a href="#训练" class="headerlink" title="训练"></a>训练</h2><p>因子分解机可以处理 Regression/Classification/Ranking问题</p><h3 id="回归问题"><a href="#回归问题" class="headerlink" title="回归问题"></a>回归问题</h3><p>回归问题中, 使用最小均方误差作为优化目标</p><p>\begin{align}<br>loss^{R}(\hat y, y) = \frac{1}{2m}\sum_{i=1}^{m}(\hat y^{(i)} - y^{(i)})^2<br>\end{align}</p><h3 id="分类问题"><a href="#分类问题" class="headerlink" title="分类问题"></a>分类问题</h3><p>CTR预测本质上就是一个二元分类问题, 结果为点击的概率。分类问题中 logloss 做损失函数, \( \sigma \) 为 sigmoid函数</p><p>\begin{align}<br>loss^{C}(\hat y, y) = \sum_{i=1}^{m} -ln\sigma(\hat y^{(i)}y^{(i)})<br>\end{align}</p><h3 id="梯度下降"><a href="#梯度下降" class="headerlink" title="梯度下降"></a>梯度下降</h3><p>梯度下降公式为</p><p>\begin{align}<br>\theta = \theta - \alpha\frac{\partial{loss}}{\partial\theta}<br>\end{align}</p><p>其中 \( \theta \) 为带求解的参数, 为 \( \omega_{0}\), \( \omega_{i} \), \( v_{i,f} \)</p><p>模型训练与预测的代码实现: <a href="https://github.com/Tara-X/algo/tree/master/fm" target="_blank" rel="noopener">https://github.com/Tara-X/algo/tree/master/fm</a></p><h2 id="应用"><a href="#应用" class="headerlink" title="应用"></a>应用</h2><p>作为一个2010年提出的模型, FM在工业界还依然有着很广泛的应用， 比如CTR预估, LearningToRank, 而且衍生出了DeepFM这种融合了深度学习的模型, FM模型的优点很明显</p><ul><li>适用于CTR预估这种稀疏矩阵的情况</li><li>做好特征工程后就不用考虑组合特征了, 如果是在LR模型中添加组合特征, 则需要极其深刻的领域知识</li><li>在线上进行预测时候速度快, 模型训练也快</li></ul><p>下面简单的使用FM对 Kaggle criteo challenge 进行点击率预估</p><h2 id="工业实现"><a href="#工业实现" class="headerlink" title="工业实现"></a>工业实现</h2><ul><li>libFM</li><li>SVDFeature</li><li>difacto</li></ul><h2 id="资料"><a href="#资料" class="headerlink" title="资料"></a>资料</h2><ul><li><a href="https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf" target="_blank" rel="noopener">Factorization Machines论文</a></li><li><a href="https://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf" target="_blank" rel="noopener">libfm实现</a></li><li><a href="https://zhuanlan.zhihu.com/p/39848122" target="_blank" rel="noopener">从FM推演各深度CTR预估模型</a></li><li><a href="https://zhuanlan.zhihu.com/p/50426292" target="_blank" rel="noopener">FM（Factorization Machines）的理论与实践</a></li><li><a href="https://blog.csdn.net/g11d111/article/details/77430095" target="_blank" rel="noopener">FM算法（Factorization Machine）</a></li><li><a href="https://blog.csdn.net/hiwallace/article/details/81333604" target="_blank" rel="noopener">FM系列算法解读（FM+FFM+DeepFM）</a></li></ul><blockquote><p>原创文章，转载请注明：转载自<a href="http://stackbox.cn">叠搭宝箱</a>的博客<br>本文链接地址: <a href="https://stackbox.cn/2018-12-factorization-machine/">https://stackbox.cn/2018-12-factorization-machine/</a></p></blockquote>]]></content>
    
    <summary type="html">
    
      因子分解机从入门到实践
    
    </summary>
    
      <category term="机器学习" scheme="http://stackbox.cn/categories/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/"/>
    
    
      <category term="算法" scheme="http://stackbox.cn/tags/%E7%AE%97%E6%B3%95/"/>
    
  </entry>
  
  <entry>
    <title>逻辑回归</title>
    <link href="http://stackbox.cn/2018-11-logistic-regression/"/>
    <id>http://stackbox.cn/2018-11-logistic-regression/</id>
    <published>2018-11-01T13:16:53.000Z</published>
    <updated>2018-12-20T04:47:32.970Z</updated>
    
    <content type="html"><![CDATA[<p>逻辑回归用于解决分类问题</p><a id="more"></a><h2 id="决策边界"><a href="#决策边界" class="headerlink" title="决策边界"></a>决策边界</h2><table><thead><tr><th>sigmoid</th><th>z函数</th></tr></thead><tbody><tr><td>\(h(z) = \frac {1} {1 + e^{-z}}\)</td><td>\(z = \omega^{T} x  + \omega_{0} \)</td></tr></tbody></table><p>此时函数y叫做sigmoid函数, 该函数可以将值分布在0和1之间, h(z)是一个条件概率</p><p>举例 肿瘤数据集合有多个特征， 假设恶性肿瘤概率 h(z) = 0.7, 也就是说</p><p>$$ h(z) = p(y=1 |x;\omega;\omega_{0}) $$</p><p>翻译成人话就是在条件 \(x;\omega;\omega_{0}\) 下 为恶性肿瘤的概率为0.7, 其中 \(x\) 为特征向量, \(\omega\)为权重向量, \(\omega_{0}\)为bias</p><p>而所谓的决策边界就是个线/面, 能分开函数z的输入集合, 假设有两个特征<br>\(x_1\) \(x_2\), 分类数为2, 分别用 <strong>圆圈</strong>  和  <strong>红叉</strong> 表示, 可画成下面的图</p><table><thead><tr><th>线性决策额边界</th><th>非线性决策边界</th></tr></thead><tbody><tr><td><img src="http://mirror.tarax.cn/blog/lr1.png" alt="image"></td><td><img src="http://mirror.tarax.cn/blog/lr2.png" alt="image"></td></tr></tbody></table><h2 id="损失函数"><a href="#损失函数" class="headerlink" title="损失函数"></a>损失函数</h2><p>如果使用线性回归的那种均方差做损失函数, 不是一个凸函数, 这样就无法梯度下降求最优解了, 应该采用另外一种方式</p><table><thead><tr><th>均方差做损失函数</th><th>图形</th></tr></thead><tbody><tr><td>\(Loss = \frac {1} {2m} \sum_{i=1}^{m}(y_{i} - \frac {1} {1 + e^{-\omega^{T}x}})^2\)</td><td><img src="http://mirror.tarax.cn/blog/lr3.png" alt="image"></td></tr></tbody></table><p>假设只有两个标签1 / 0, 即 \(y_n \in {0,1}\), 我们把采集的任意一组样本看作一个时间的话, 它发生的概率为\(p\), 即模型y值等于标签1的概率为 <code>p</code></p><p>\begin{align}<br>P_{y=1} = \frac {1} {1 + e^{-w^Tx}}<br>\end{align}<br>由于标签值不是就是0, 那么 \(P_{y=0} = 1 -p\)</p><p>上面的两个公式等价于:</p><p>\begin{align}<br>P(y_{i}|x_{i}) = p^{y_i}(1-p)^{1-y_i}<br>\end{align}<br>意义为: 对于样本 \((x_i, y_i)\), 对于 \(x_i\) 这组数据，它的标签是 \(y_i\)的概率是 \(p^{y_i}(1-p)^{1-y_i}\)</p><p>那么假如有一组数据 \({ (x_1, y_1),(x_2, y_2),(x_3, y_3)…(x_n, y_n) }\) 这个合成的总事件发生的改率即每个样本发生概率的乘积</p><p>\begin{align}<br>P_{total} &amp;= P(y_1|x_x)  P(y_2|x_2)  P(y_3|x_3) … P(y_n|x_n) \\<br>&amp;= \prod_{i=1}^{n} p^{y_i}  (1-p)^{1-y_i}<br>\end{align}</p><p>我们的目标就是求合适的参数使得 \(P_{total}\)最大, 由于连乘不好算, 而且观察公式发现其函数值正比于其对数, 有以下公式</p><p>\begin{align}<br>F(\omega) &amp;= ln(P_{total}) \\<br>&amp;= ln(\prod_{i=1}^{n} p^{y_i}  (1-p)^{1-y_i}) \\<br>&amp;= \sum_{i=1}^{n}ln(p^{y_i}(1-p)^{1-y_i}) \\<br>&amp;= \sum_{i=1}^{n}(y_iln(p) + (1-y_i)ln(1-p))<br>\end{align}</p><p>最后问题就变成了找到一个 \(w^{*}\), 使得 \(F(\omega)\) 最大, 也就是使得 \(-F(\omega)\) 最小, 数学上表达为</p><p>$$ \omega^{*} = \mathop{argmax}_{\omega} F(\omega)$$ </p><p>即 $$ \omega^{*} = - \mathop{argmin}_{\omega} F(\omega)$$</p><h2 id="梯度下降"><a href="#梯度下降" class="headerlink" title="梯度下降"></a>梯度下降</h2><p>首先对 \(p\)求导数, 结合sigmoid函数的性质, 其对向量 \(\omega\)的导数为 \(p’ = p(1-p)x\)</p><p>求损失函数的偏导数</p><p>\begin{align}<br>\nabla{F(\omega)} &amp;=  \nabla (\sum_{i=1}^{n}(y_iln(p) + (1-y_i)ln(1-p)))\\<br>&amp;= \sum_{i=1}^{n}(y_i ln’(p) + (1-y_i)ln’(1-p)) \\<br>&amp;= \sum_{i=1}^{n}(y_i\frac{1}{p}p’ + (1-y_i)\frac{1}{1-p}(1-p)’) \\<br>&amp;= \sum_{i=1}^{n}(y_i\frac{1}{p}p(1-p)x_i - (1-y_i)\frac{1}{1-p}p(1-p)x_i) \\<br>&amp;= \sum_{i=1}^{n}(y_i(1-p)x_i - (1-y_i)px_i)\\<br>&amp;= \sum_{i=1}^{n}(y_i - p)x_i<br>\end{align}</p><p>在SGD中， 只要能随机的选取一个样本 \((x_i, y_i)\), 然后再把值乘以N, 就相当于获取梯度的无偏估计</p><p>所以参数的随机梯度下降的公式为</p><p>\begin{align}<br>\omega_{t+1} = \omega_{t} + \eta N (y_i - \frac {1} {1 + e^{-\omega^Tx_i}})<br>\end{align}</p><p>其中 \(\eta N\)为一个常量, 所以参数的更新公式最终为</p><p>\begin{align}<br>\omega_{t+1} = \omega_{t} + \eta (y_i - \frac {1} {1 + e^{-\omega^Tx_i}})<br>\end{align}</p><h2 id="代码实现"><a href="#代码实现" class="headerlink" title="代码实现"></a>代码实现</h2><p><a href="https://github.com/Tara-X/algo/tree/master/LR" target="_blank" rel="noopener">https://github.com/Tara-X/algo/tree/master/LR</a></p><h2 id="非线性分类"><a href="#非线性分类" class="headerlink" title="非线性分类"></a>非线性分类</h2><p>用kernel trick即可实现非线性分类</p><h2 id="与SVM-感知机的区别"><a href="#与SVM-感知机的区别" class="headerlink" title="与SVM/感知机的区别"></a>与SVM/感知机的区别</h2><ul><li>感知机模型将分离超平面对数据分割，寻找出所有错误的分类点，计算这些点到超平面的距离，使这一距离和最小化，也就是说感知机模型的最优化问题是使得错误分类点到超平面距离之和最小化。</li><li>逻辑斯蒂回归是将分离超平面作为sigmoid函数的自变量进行输入，获得了样本点被分为正例和反例的条件概率，然后用极大似然估计极大化这个后验概率分布，也就是说逻辑斯蒂回归模型的最优化问题是极大似然估计样本的后验概率分布。</li><li>支持向量机的最优化问题是最大化样本点到分离超平面的最小距离。</li></ul><h2 id="参考资料"><a href="#参考资料" class="headerlink" title="参考资料"></a>参考资料</h2><ul><li><a href="https://blog.csdn.net/u012328159/article/details/51068427" target="_blank" rel="noopener">逻辑斯谛回归之决策边界</a></li><li><a href="https://www.zhihu.com/question/35322351" target="_blank" rel="noopener">sigmoid背后的数学原理</a></li><li><a href="https://www.cnblogs.com/fernnix/p/4100871.html" target="_blank" rel="noopener">逻辑回归要点</a></li><li><a href="https://blog.csdn.net/han_xiaoyang/article/details/49123419" target="_blank" rel="noopener">逻辑回归初步</a></li><li><a href="https://zhuanlan.zhihu.com/p/44591359" target="_blank" rel="noopener">逻辑回归公式推导</a></li><li><a href="https://blog.csdn.net/mounty_fsc/article/details/51588794" target="_blank" rel="noopener">矩阵求导</a></li><li><a href="https://blog.csdn.net/zrh_CSDN/article/details/80920329" target="_blank" rel="noopener">逻辑斯蒂回归和感知机模型、支持向量机模型对比比</a></li></ul>]]></content>
    
    <summary type="html">
    
      逻辑回归
    
    </summary>
    
      <category term="机器学习" scheme="http://stackbox.cn/categories/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/"/>
    
    
      <category term="算法" scheme="http://stackbox.cn/tags/%E7%AE%97%E6%B3%95/"/>
    
  </entry>
  
  <entry>
    <title>线性回归</title>
    <link href="http://stackbox.cn/2018-10-linear-regression/"/>
    <id>http://stackbox.cn/2018-10-linear-regression/</id>
    <published>2018-10-05T05:41:41.000Z</published>
    <updated>2018-12-17T11:01:07.861Z</updated>
    
    <content type="html"><![CDATA[<h2 id="引言"><a href="#引言" class="headerlink" title="引言"></a>引言</h2><p>这篇文章本来是2015年看Ng的视频做的笔记, 目前看有一些东西当时理解的太浅, 故修补之后重新发一下</p><a id="more"></a><h2 id="基本概念"><a href="#基本概念" class="headerlink" title="基本概念"></a>基本概念</h2><ul><li>代价函数/损失函数 <strong>(Cost fuction)</strong></li></ul><p>:线性回归的目标是让 假设函数与训练集尽量的拟合，使得代价函数最小,前面的 \( 1/2 \)的作用是因为平方误差函数求导之后正好抵消,使得在梯度下降里面的偏导数项系数正好为1</p><p>$$ J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2 $$</p><h2 id="梯度下降算法"><a href="#梯度下降算法" class="headerlink" title="梯度下降算法"></a>梯度下降算法</h2><h3 id="梯度下降公式"><a href="#梯度下降公式" class="headerlink" title="梯度下降公式"></a>梯度下降公式</h3><p>以 \( \theta_0 , \theta_1, J(\theta_0,\theta_1) \) 画坐标系, 会得到类似下面的图像, 实际上对于线性的假设函数来说，整个图像会是一个弓形图，<br>从任何地方会收敛到同一个最优点。梯度是一个方向导数, 在该方向上函数变化最大。</p><table><thead><tr><th>梯形图</th><th>梯度下降公式</th></tr></thead><tbody><tr><td><img src="http://mirror.tarax.cn/liner-01.webp" style="width:300px"></td><td>$$ \theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1) $$</td></tr></tbody></table><h3 id="理解梯度"><a href="#理解梯度" class="headerlink" title="理解梯度"></a>理解梯度</h3><ul><li>梯度即变化率最大的那个方向导数</li><li>偏导数是一类特殊的方向导数</li><li>求得某点的各方向偏导数, 可以构成一个向量, 沿着该向量变化最大, 即方向导数</li></ul><h2 id="多元线性回归"><a href="#多元线性回归" class="headerlink" title="多元线性回归"></a>多元线性回归</h2><p>上面的那个公式推导其实有一些蠢, 我们来考虑多元的情况, k为dimension</p><p>\begin{align}<br>\hat{y} = \omega_{0} + \sum_{i=1}^{k}\omega_{i}x_{i}<br>\end{align}</p><p>然后其损失函数为</p><p>\begin{align}<br>loss^{R} =  \frac{1}{2m}\sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})^2<br>\end{align}</p><p>该损失函数的偏导数为 (<strong> 可以直接求偏导, 比如 \( \hat{y} \) 对 \( \omega_2 \) 求偏导就是 \( x_2 \) </strong> )</p><p>\begin{align}<br>\frac{\partial{loss^{R}}}{\partial\theta} = \begin{cases}<br>\frac{1}{m}\sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})   &amp; \text{if  } \theta \text{ equals to } \omega_{0} \\<br>\\<br>\\<br>\frac{1}{m}x_{t}\sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})  &amp; \text{if  } \theta \text{ equals to } \omega_{t}<br>\end{cases}<br>\end{align}</p><p>然后就可以梯度下降求解, 常见的梯度下降方法有</p><ul><li>Batch Gradient Descent (BGD) 批量梯度下降, 每次迭代使用全部训练数据</li><li>Stochastic Gradient Descent (SGD) 随机梯度下降, 每次仅仅选取一个样本来求梯度</li><li>Mini-Batch Gradient Descent (MBGD) 小批量梯度下降，每次抽取x个样本进行训练</li></ul><h2 id="模型评估"><a href="#模型评估" class="headerlink" title="模型评估"></a>模型评估</h2><p>对于线性回归模型, 可以用以下方法来进行模型效果评估</p><h2 id="代码实现"><a href="#代码实现" class="headerlink" title="代码实现"></a>代码实现</h2><p><a href="https://github.com/Tara-X/algo/tree/master/linear" target="_blank" rel="noopener">https://github.com/Tara-X/algo/tree/master/linear</a></p><h2 id="参考资料"><a href="#参考资料" class="headerlink" title="参考资料"></a>参考资料</h2><ul><li><a href="https://blog.csdn.net/qq_37553011/article/details/79795092" target="_blank" rel="noopener">梯度下降之导数与梯度理解</a></li><li><a href="https://www.zhihu.com/question/36301367" target="_blank" rel="noopener">如何直观形象的理解方向导数与梯度以及它们之间的关系？</a></li><li><a href="https://blog.csdn.net/u012421852/article/details/79558833" target="_blank" rel="noopener">梯度下降推导</a></li><li><a href="https://blog.csdn.net/fffupeng/article/details/73522425" target="_blank" rel="noopener">方向导数偏导数梯度</a></li><li><a href="https://blog.csdn.net/kwame211/article/details/80364079" target="_blank" rel="noopener">随机梯度下降</a></li></ul><blockquote><p>原创文章，转载请注明：转载自<a href="http://stackbox.cn">叠搭宝箱</a>的博客<br>本文链接地址: <a href="https://stackbox.cn/2018-10-linear-regression">https://stackbox.cn/2018-10-linear-regression</a></p></blockquote>]]></content>
    
    <summary type="html">
    
      机器学习之线性回归
    
    </summary>
    
      <category term="机器学习" scheme="http://stackbox.cn/categories/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/"/>
    
    
      <category term="机器学习" scheme="http://stackbox.cn/tags/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/"/>
    
  </entry>
  
  <entry>
    <title>理解一致性Hash算法</title>
    <link href="http://stackbox.cn/2017-10-consistent-hashing-intro/"/>
    <id>http://stackbox.cn/2017-10-consistent-hashing-intro/</id>
    <published>2017-10-12T08:15:53.000Z</published>
    <updated>2018-12-17T11:05:04.962Z</updated>
    
    <content type="html"><![CDATA[<h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>一致哈希是一种特殊的哈希算法。在使用一致哈希算法后，哈希表槽位数（大小）的改变平均只需要对 K/n 个关键字重新映射，其中  K是关键字的数量，n是槽位数量, 因为这个特性, 一致性hash经常用于分布式存储系统中</p><a id="more"></a><h2 id="算法描述"><a href="#算法描述" class="headerlink" title="算法描述"></a>算法描述</h2><h3 id="业务情景"><a href="#业务情景" class="headerlink" title="业务情景"></a>业务情景</h3><p>假设有这么一个场景: 有10亿条数据, 需要放在N台机器上的缓存里, 应该怎么设计一个规则将这些数据均衡的放在这些机器中. 一个简单的方法是对每条记录 hash 然后取模 (即<code>hash(record）mod N</code>), 十分的简单, 假如我们要给这个已经运行的分布式的缓存系统加一台机器呢, 或者由于某些特殊的原因挂掉了一台器, 为了保证新的记录能够正确的映射, 那么取模的值就要变成 N+1 或者 N-1了, 进而导致现有的数据几乎全部要几星rebalnce操作, 耗费巨大</p><h3 id="算法特性"><a href="#算法特性" class="headerlink" title="算法特性"></a>算法特性</h3><p>一致性hash解决的就是上述问题: 在增减机器后, 尽可能少的减少需要reblance的记录个数, 一致性hash算法应该满足以下几个特点</p><ul><li>平衡性: 尽可能的将记录hash到所有节点当中, 最大化利用空间</li><li>单调性: <strong>哈希的结果应能够保证原有已分配的内容可以被映射到原有的或者新的缓冲中去，而不会被映射到旧的缓冲集合中的其他缓冲区</strong>, 可以这么理解, 在增加一个节点之后, 原有的hash结果要么不迁移, 要么迁移到新的节点, 不会迁移到旧的节点中, 所以取模的那个方法, 增加节点之后很大一部分的key都会重新映射到原来的缓存系统的其他节点当中, 故不符合单调性, 在P2P系统中常用的DHT也用到了一致性Hash算法, 缓冲的变化等价于Peer加入或退出系统，这一情况在P2P系统中会频繁发生，因此会带来极大计算和传输负荷。单调性就是要求哈希算法能够应对这种情况。</li><li>分散性: 分布式环境中用户可能看不到所有的节点, 所以可能导致相同记录映射到不同节点上, 这种情况显然不太好, 分散性就是定义上述情况发生的严重程度, 应该尽量降低分散性</li><li>负载: 既然不同的终端可能将相同的内容映射到不同的分片节点中，那么对于一个特定的节点而言，也可能被不同的用户映射为不同的内容, 好的hash算法应该能够尽量降低节点负载</li><li>平滑性: 缓存服务器数量能够和记录数量的改变能够保持一致</li></ul><h3 id="算法实现"><a href="#算法实现" class="headerlink" title="算法实现"></a>算法实现</h3><p>一致性hash的算法实现如下</p><ol><li>根据(ip,port,mac等)求出节点的hash值, 分布在0-2^32的圆环上<br><img src="/image/2017-10/consistent-hash-1.png" alt=""></li><li>如果有一个写入缓存的请求，其中Key值为K，计算器hash值Hash(K)， Hash(K) 对应于图 – 1环中的某一个点，如果该点对应没有映射到具体的某一个机器节点，那么顺时针查找，直到第一次找到有映射机器的节点，该节点就是确定的目标节点，如果超过了2^32仍然找不到节点，则命中第一个机器节点。比如 Hash(K) 的值介于A~B之间，那么命中的机器节点应该是B节点</li><li>如果增加一个节点, 会初始化该节点到现有的环上, 比如加入了节点F, 初始该节点后集群状态如下<br><img src="/image/2017-10/consistent-hash-1.png" alt=""><br>那么只有C-F之间的区域的数据会出现节点不命中的情况, 将该区域的数据rebalance即可</li><li>如果将F节点去掉, 那么还是只有C-F之间的区域数据会收到影响, 按照算法只要将F节点数据挪到D节点上即可</li></ol><p>在实际的应用中, 如果节点数量过少, 会出现节点在环上比较近, 导致平衡性很低, 可以给具体实现的时候加入虚拟节点的思想: 为某个真实节点分配多个虚拟节点, 这样便能够一直分布不均匀的情况, <a href="https://github.com/RJ/ketama" target="_blank" rel="noopener">Ketama</a>库就采用的这种方法, 除此之外, 在上面那个讲F节点去掉的情况中, 原有的F节点上的数据都会落到D上, 可以实现数据落到C&amp;D上, 减少了服务器压力</p><h2 id="工程应用"><a href="#工程应用" class="headerlink" title="工程应用"></a>工程应用</h2><h3 id="ShardedJedis"><a href="#ShardedJedis" class="headerlink" title="ShardedJedis"></a>ShardedJedis</h3><p>Jedis中使用ShardedJedis实现了集群特性(redis3的redis cluster也原生支持了), 实现一致性hash的主要思路是:</p><ol><li>虚拟节点采取TreeMap存储, 这样就能通过tailMap方法来实现环的特性</li><li>真实节点采用LinkedHashMap存储, 这个当然也是环啊, 虽然实现中没有特意用到这个特性</li></ol><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Sharded</span>&lt;<span class="title">R</span>, <span class="title">S</span> <span class="keyword">extends</span> <span class="title">ShardInfo</span>&lt;<span class="title">R</span>&gt;&gt; </span>&#123;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> <span class="keyword">int</span> DEFAULT_WEIGHT = <span class="number">1</span>;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">private</span> TreeMap&lt;Long, S&gt; nodes;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">private</span> <span class="keyword">final</span> Hashing algo;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">private</span> <span class="keyword">final</span> Map&lt;ShardInfo&lt;R&gt;, R&gt; resources = <span class="keyword">new</span> LinkedHashMap&lt;ShardInfo&lt;R&gt;, R&gt;();</span><br><span class="line"></span><br><span class="line">    <span class="comment">/*</span></span><br><span class="line"><span class="comment">     * 初始化过程, 可谓比较暴力了, 直接按节点顺序&amp;Name来进行hash, 默认的hash算法是MurmurHash</span></span><br><span class="line"><span class="comment">     */</span></span><br><span class="line">    <span class="function"><span class="keyword">private</span> <span class="keyword">void</span> <span class="title">initialize</span><span class="params">(List&lt;S&gt; shards)</span> </span>&#123;</span><br><span class="line">        nodes = <span class="keyword">new</span> TreeMap&lt;Long, S&gt;();</span><br><span class="line"></span><br><span class="line">        <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i != shards.size(); ++i) &#123;</span><br><span class="line">            <span class="keyword">final</span> S shardInfo = shards.get(i);</span><br><span class="line">            <span class="keyword">if</span> (shardInfo.getName() == <span class="keyword">null</span>) <span class="keyword">for</span> (<span class="keyword">int</span> n = <span class="number">0</span>; n &lt; <span class="number">160</span> * shardInfo.getWeight(); n++) &#123;</span><br><span class="line">                nodes.put(<span class="keyword">this</span>.algo.hash(<span class="string">"SHARD-"</span> + i + <span class="string">"-NODE-"</span> + n), shardInfo);</span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">else</span> <span class="keyword">for</span> (<span class="keyword">int</span> n = <span class="number">0</span>; n &lt; <span class="number">160</span> * shardInfo.getWeight(); n++) &#123;</span><br><span class="line">                nodes.put(<span class="keyword">this</span>.algo.hash(shardInfo.getName() + <span class="string">"*"</span> + shardInfo.getWeight() + n), shardInfo);</span><br><span class="line">            &#125;</span><br><span class="line">            resources.put(shardInfo, shardInfo.createResource());</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">/*</span></span><br><span class="line"><span class="comment">     * 先获取虚拟节点, 然后再获取真实节点</span></span><br><span class="line"><span class="comment">     */</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> R <span class="title">getShard</span><span class="params">(<span class="keyword">byte</span>[] key)</span> </span>&#123;</span><br><span class="line">        <span class="keyword">return</span> resources.get(getShardInfo(key));</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">/*</span></span><br><span class="line"><span class="comment">     * 获取虚拟节点</span></span><br><span class="line"><span class="comment">     */</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> S <span class="title">getShardInfo</span><span class="params">(<span class="keyword">byte</span>[] key)</span> </span>&#123;</span><br><span class="line">        SortedMap&lt;Long, S&gt; tail = nodes.tailMap(algo.hash(key));</span><br><span class="line">        <span class="keyword">if</span> (tail.isEmpty()) &#123;</span><br><span class="line">        <span class="keyword">return</span> nodes.get(nodes.firstKey());</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> tail.get(tail.firstKey());</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><ol><li><a href="http://www.cnblogs.com/lpfuture/p/5796398.html" target="_blank" rel="noopener">一致性哈希算法原理</a></li><li><a href="https://github.com/digoal/blog/blob/master/201607/20160723_03.md" target="_blank" rel="noopener">一致性哈希在分布式数据库中的应用探索</a></li><li><a href="https://yikun.github.io/2016/06/09/%E4%B8%80%E8%87%B4%E6%80%A7%E5%93%88%E5%B8%8C%E7%AE%97%E6%B3%95%E7%9A%84%E7%90%86%E8%A7%A3%E4%B8%8E%E5%AE%9E%E8%B7%B5/" target="_blank" rel="noopener">一致性哈希算法的理解与实践</a></li><li><a href="http://m635674608.iteye.com/blog/2297632" target="_blank" rel="noopener">Jedis之ShardedJedis虚拟节点一致性哈希分析</a></li><li><a href="https://publicobject.com/2016/02/08/linkedhashmap-is-always-better-than-hashmap/" target="_blank" rel="noopener">LinkedHashMap is always better than HashMap</a></li></ol>]]></content>
    
    <summary type="html">
    
      一致哈希是一种特殊的哈希算法。在使用一致哈希算法后，哈希表槽位数（大小）的改变平均只需要对 K/n 个关键字重新映射，其中K是关键字的数量，n是槽位数量
    
    </summary>
    
    
      <category term="算法" scheme="http://stackbox.cn/tags/%E7%AE%97%E6%B3%95/"/>
    
  </entry>
  
  <entry>
    <title>2017年中总结</title>
    <link href="http://stackbox.cn/2017-07-2017-mid-schedule/"/>
    <id>http://stackbox.cn/2017-07-2017-mid-schedule/</id>
    <published>2017-07-25T09:57:04.000Z</published>
    <updated>2018-12-17T11:05:09.253Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>首先, 恋爱了, 很开心, 这是一种很神奇的感觉！</p></blockquote><a id="more"></a><p>上半年做的主要关于 <code>直播+广告</code> 相关产品的一个尝试, 总体来说不尽如人意</p><ul><li>一开始的时候做的是一个 主播榜单类似的东西, 首先这个花费了大量的人力物力, 但其实并没有产生很好的效果, 我想总结起来原因有以下几点<ul><li>榜单这个东西只是数据的极其粗糙的处理, 并没有发掘出更深入的价值</li><li>再着, 基于某项新事物赚钱的前提应该是大趋势上依赖于这个新事物的利益相关方都赚到钱了, 比如微信公众号, 对于直播来讲, 各个平台亏得要死, 除了公会主播拿了大部分钱, 真没看到什么第三方赚到大钱, 在整体形势上都看不到未来的话, 基于直播做To B类产品盈利只是空中楼阁</li><li>每当一个新的媒介形式出现之后, 目前来看差不多都是得靠着广告主爸爸来养着, 对于直播这种新兴的媒介, 它只是在占据人们的时间上来说很有优势, 但是传播性上不及微信微博, 精细程度上平均水平又很难和优酷爱奇艺相比, 广告预算只有可能是一些比较激进的并且财大气粗的广告主往直播上扔一点点, 没有很大体量的广告预算, 可以说营销这套体系根本就玩不转</li></ul></li><li>然后又做了一个公会运营管理工具, 这个东西还算对公会有用, 但是这个项目权当是公益了, 因为这个系统加了一个直播弹幕的爬虫及分析系统, 还是略有一点复杂的</li><li>然后算是自暴自弃的搞了一个平台demo出来, 算是结束了这个方向的尝试</li></ul><p>这个项目实在是没有啥成就感, 产品设计层面是在是没有什么方法论, 在做数据产品的时候没有一个完整的比较高大上的方法论无异于一个找死行为, 技术上就各种练习了基于flask写web接口, 熟悉一些python的中间件, 练习各种抓包写爬虫之外就没有任何长进了</p><p>所以下半年的目标大概是这样的</p><ul><li>出去旅游两次</li><li>抽时间考完驾照</li><li>学英语</li><li>能够深入的做Hadoop开发, 构建自己的大数据开发能力</li><li>阅读一些比较重的代码, 比如spring, 有大型项目的建模能力</li><li>numpy/pandas能够熟练使用</li><li>拜厄跟车尔尼599进度都达到一半</li><li>博客两周更新一篇（数据&amp;项目设计方向）</li><li>口头表达能力, 尽量自己在家没事练习讲Slide</li><li>lintcode刷一遍</li><li>至少看五本书</li><li>看一下外面的机会</li></ul>]]></content>
    
    <summary type="html">
    
      2017年中总结以及下半年规划
    
    </summary>
    
      <category term="生活记录" scheme="http://stackbox.cn/categories/%E7%94%9F%E6%B4%BB%E8%AE%B0%E5%BD%95/"/>
    
    
  </entry>
  
  <entry>
    <title>生产环境使用Airflow</title>
    <link href="http://stackbox.cn/2017-05-use-airflow-in-production/"/>
    <id>http://stackbox.cn/2017-05-use-airflow-in-production/</id>
    <published>2017-05-12T10:36:37.000Z</published>
    <updated>2018-12-17T11:05:14.828Z</updated>
    
    <content type="html"><![CDATA[<p>airflow是airbnb家的基于DAG(有向无环图)的任务管理系统, 最简单的理解就是一个高级版的crontab, 他对标的是Azkaban，oozie，luigi,  为什么选airflow的原因在于, oozie实在是太古老了, luigi更新速度感人, Azkaban是java栈的, 对比下来airflow（1.8.1版本）是最能满足当下需求的了, 而且交互上的设计还是蛮优美的</p><a id="more"></a><h3 id="DAG设计"><a href="#DAG设计" class="headerlink" title="DAG设计"></a>DAG设计</h3><p>一个DAG是由一个或多个任务组成的, 这一块比较考验你对整个数据流程的设计, 具体可以参考<a href="https://segmentfault.com/a/1190000005078547" target="_blank" rel="noopener">这篇文章</a></p><h3 id="Broker与Executor选择"><a href="#Broker与Executor选择" class="headerlink" title="Broker与Executor选择"></a>Broker与Executor选择</h3><p>请务必使用RabbitMQ+CeleryExecutor, 毕竟这个也是Celery官方推荐的做法, 这样就可以使用一些很棒的功能, 比如webui上点击错误的Task然后ReRun</p><h3 id="Supervisor"><a href="#Supervisor" class="headerlink" title="Supervisor"></a>Supervisor</h3><p>在使用supervisor的启动worker,server,scheduler的时候, 请务必给配置的supervisor任务加上</p><blockquote><p>environment=AIRFLOW_HOME=xxxxxxxxxx</p></blockquote><p>主要原因在于如果你的supervisor是通过调用一个自定义的脚本来运行的,  在启动worker的时候会另外启动一个serve_log服务, 如果没有设置正确的环境变量, serve_log 会在默认的AIRFLOW_HOME里找日志, 导致无法在webui里查看日志</p><h3 id="Serve-log"><a href="#Serve-log" class="headerlink" title="Serve_log"></a>Serve_log</h3><p>如果在多个机器上部署了worker, 那么你需要iptables开启那些机器的8793端口, 这样webui才能查看跨机器worker的任务日志</p><h3 id="AMPQ库"><a href="#AMPQ库" class="headerlink" title="AMPQ库"></a>AMPQ库</h3><p>celery提供了两种库来实现amqp, 一种是默认的kombu, 另外一个是librabbitmq, 后者是对其c模块的绑定,  在1.8.1版本中,  使用的kombu的时候会出现scheduler自动断掉的问题, 这个应该是其对应版本4.0.2的问题, 当切成librabbitmq的时候, server 与 scheduler运行正常, 但是worker的从来不consume任务, 最后查出原因: Celery4.0.2的协议发生了变化但是librabbitmq还没有对应修改, 解决方法是, 修改源码里的 executors/celery_executor.py文件然后加入参数</p><blockquote><p>CELERY_TASK_PROTOCOL = 1</p></blockquote><h3 id="RabbitMQ连接卡死"><a href="#RabbitMQ连接卡死" class="headerlink" title="RabbitMQ连接卡死"></a>RabbitMQ连接卡死</h3><p>运行一段时间过后, 由于网络问题导致所有任务都在queued状态, 除非把worker重启才能生效, 查资料有人说是clelery的broker pool有问题, 继续给celery_executor.py加入参数</p><blockquote><p>BROKER_POOL_LIMIT=0  //不使用连接池</p></blockquote><p>另外这样只会减少卡死的几率, 最好使用crontab定时重启worker</p><h3 id="特定任务只在特殊机器上运行"><a href="#特定任务只在特殊机器上运行" class="headerlink" title="特定任务只在特殊机器上运行"></a>特定任务只在特殊机器上运行</h3><p>可以给DAG中的task指定一个queue, 然后在特定的机器上运行 airflow worker -q=QUEUE_NAME 即可实现</p><h3 id="RabbitMQ中的queue数量过多问题"><a href="#RabbitMQ中的queue数量过多问题" class="headerlink" title="RabbitMQ中的queue数量过多问题"></a>RabbitMQ中的queue数量过多问题</h3><p>celery为了让scheduler知道每个task的结果并且知道结果的时间为 O(1) , 那么唯一的解决方式就是给每一个任务创建一个UUID的queue, 默认这个queue的过期时间是1天, 可以通过更改celery_executor.py的参数来调节这个过期时间</p><blockquote><p>CELERY_TASK_RESULT_EXPIRES = <strong>time in seconds</strong></p></blockquote>]]></content>
    
    <summary type="html">
    
      Airflow是一个基于DAG(有向无环图)的任务管理系统
    
    </summary>
    
    
  </entry>
  
  <entry>
    <title>深入理解Supervisor事件机制</title>
    <link href="http://stackbox.cn/2017-02-delve-deep-into-supervisor-event/"/>
    <id>http://stackbox.cn/2017-02-delve-deep-into-supervisor-event/</id>
    <published>2017-02-06T10:36:37.000Z</published>
    <updated>2018-12-17T11:05:22.103Z</updated>
    
    <content type="html"><![CDATA[<h2 id="事件协议"><a href="#事件协议" class="headerlink" title="事件协议"></a>事件协议</h2><p>事件机制是在supervisor v3.0开始引入的一个高级特性, 常用于守护程序崩溃时候的报警(发邮件/发短信)<br><a id="more"></a><br>该事件机制是一个简单的 <code>Listener/Notification</code>模型, Listener通过标准输入来获取supervisor发来的事件通知, 然后通过标准输出来告诉supervisor事件处理结果。过程中传递的EventNotification 由head和body两部分组成</p><p>可以先通过stdout输出一个 <code>READY\n</code> 字符串来表明开始接受事件, 然后通过 <code>sys.stdin.readline()</code> 来获取head信息, head的结构如下:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ver:3.0 server:supervisor serial:35 pool:event_listener poolserial:35 eventname:PROCESS_STATE_RUNNING len:91</span><br></pre></td></tr></table></figure><ul><li>ver: 版本信息</li><li>serial: supervisor给事件的编号, 第一个事件为1, 之后事件编号递增</li><li>eventpool: 产生event的event_listener名字</li><li>poolserial: 与serial不同的是, 由于可以有多个eventpool,而且eventpool可以检测的范围事件范围可以不同， 这个poolserial是相对某个eventpool的编号</li><li>eventname: supervisor 标准定义的事件状态</li><li>len: <strong>data长度, 此长度十分重要,需要再通过标准输入读入len长度的数据, 某个event_notification才算读取完毕</strong></li></ul><p>然后按照head的信息, 读入长度为len的数据, 这个数据就是event的data部分:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">processname:application_demo_03 groupname:application_demo_03 from_state:STARTING pid:81292</span><br></pre></td></tr></table></figure><ul><li>processname: 触发事件的applicaiton名称</li><li>groupname: 触发事件的application的组名称</li><li>from_state: 事件触发状态之前的那个状态</li><li>pid: 进程id</li></ul><p>处理完事件之后, 可以通过标准输出 <code>RESULT 2\nOK</code> 来告诉supervisor已经处理完事件</p><h2 id="使用"><a href="#使用" class="headerlink" title="使用"></a>使用</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> sys</span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">write_stdout</span><span class="params">(s)</span>:</span></span><br><span class="line">    sys.stdout.write(s)</span><br><span class="line">    sys.stdout.flush()</span><br><span class="line">    </span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">main</span><span class="params">()</span>:</span></span><br><span class="line">    <span class="keyword">while</span> <span class="number">1</span>:</span><br><span class="line">        <span class="comment"># transition from ACKNOWLEDGED to READY</span></span><br><span class="line">        write_stdout(<span class="string">'READY\n'</span>)</span><br><span class="line"></span><br><span class="line">        <span class="comment"># read header line and print it to stderr</span></span><br><span class="line">        line = sys.stdin.readline()</span><br><span class="line"></span><br><span class="line">        <span class="keyword">with</span> open(<span class="string">'event.log'</span>, <span class="string">'a'</span>) <span class="keyword">as</span> f:</span><br><span class="line">            f.write(line)</span><br><span class="line"></span><br><span class="line">        headers = dict([ x.split(<span class="string">':'</span>) <span class="keyword">for</span> x <span class="keyword">in</span> line.split() ])</span><br><span class="line">        data = sys.stdin.read(int(headers[<span class="string">'len'</span>]))</span><br><span class="line"></span><br><span class="line">        <span class="keyword">with</span> open(<span class="string">'event.log'</span>, <span class="string">'a'</span>) <span class="keyword">as</span> f:</span><br><span class="line">            f.write(data)</span><br><span class="line">            f.write(<span class="string">'\n\n'</span>)</span><br><span class="line"></span><br><span class="line">        write_stdout(<span class="string">'RESULT 2\nOK'</span>)</span><br><span class="line">        </span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    main()</span><br></pre></td></tr></table></figure><p>上边这种是比较原始事件处理方法, supervisor自带的childutils可以帮助你方便的处理事件</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> sys</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> supervisor <span class="keyword">import</span> childutils</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">main</span><span class="params">()</span>:</span></span><br><span class="line">    <span class="keyword">while</span> <span class="number">1</span>:</span><br><span class="line">        headers, payload = childutils.listener.wait(sys.stdin, sys.stdout)</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">with</span> open(<span class="string">'event.pro.log'</span>, <span class="string">'a'</span>) <span class="keyword">as</span> f:</span><br><span class="line">            f.write(json.dumps(headers))</span><br><span class="line">        childutils.listener.ok(sys.stdout)</span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    main()</span><br></pre></td></tr></table></figure><h2 id="注意事项"><a href="#注意事项" class="headerlink" title="注意事项"></a>注意事项</h2><ol><li>在配置文件中配置event_listener的时候, 需要配置一个 events 的选项, 用以表明只监听某些类型的事件, 可以设置多个事件类型</li><li>监听器的处理事件流程为: <code>readline()读取head -&gt; 读取固定长度的data -&gt; 输出状态信息</code> <strong>所以尽量避免在其中使用 <code>print</code> 等标准输出,</strong>,否则会破坏协议的完整性导致监听器失效, 如果想查看输出日志可以用文件或者网络传输等方式</li><li><a href="https://github.com/Tara-X/supervisor-event-listener-demo" target="_blank" rel="noopener">测试代码</a></li></ol>]]></content>
    
    <summary type="html">
    
      事件机制是在supervisor v3.0开始引入的一个高级特性, 常用情景是supervisor的报警系统
    
    </summary>
    
    
      <category term="工具" scheme="http://stackbox.cn/tags/%E5%B7%A5%E5%85%B7/"/>
    
  </entry>
  
  <entry>
    <title>使用CasperJS生成长图片</title>
    <link href="http://stackbox.cn/2017-01-use-casperjs-to-capture-screenshot/"/>
    <id>http://stackbox.cn/2017-01-use-casperjs-to-capture-screenshot/</id>
    <published>2017-01-27T10:33:00.000Z</published>
    <updated>2018-12-17T11:05:25.208Z</updated>
    
    <content type="html"><![CDATA[<p>最近有一个类似 <strong>生成微博长图片</strong> 类似的需求, 实现思路就是用类似PhantomJS的这种无GUI浏览器访问网页并截图</p><h2 id="CasperJS"><a href="#CasperJS" class="headerlink" title="CasperJS"></a>CasperJS</h2><p>CasperJS是一个基于PhantomJS的工具套件, 相比原生的PhantomJS使用起来更人性化, 比如可以通过下面的代码来对网页截图<br><a id="more"></a></p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">var</span> casper = <span class="built_in">require</span>(<span class="string">'casper'</span>).create();</span><br><span class="line">casper.start().thenOpen(<span class="string">'http://card.zrank.cn/card?room_key=yizhibo_24752948&amp;share=1'</span>, <span class="function"><span class="keyword">function</span>(<span class="params"></span>) </span>&#123;</span><br><span class="line">        <span class="keyword">this</span>.capture(<span class="string">'demo.png'</span>)</span><br><span class="line">&#125;)</span><br></pre></td></tr></table></figure><p>为了获取移动端浏览器下的渲染效果, 需要给CasperJS/PhantomJS加上<a href="https://github.com/enesser/phantom-capture/blob/master/lib/devices.js" target="_blank" rel="noopener">ViewPort</a>, 例如</p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">var</span> casper = <span class="built_in">require</span>(<span class="string">'casper'</span>).create(&#123;</span><br><span class="line">      viewportSize: &#123;</span><br><span class="line">        name: <span class="string">'Apple iPhone 6'</span>,</span><br><span class="line">        active: <span class="literal">true</span>,</span><br><span class="line">        width: <span class="number">375</span>,</span><br><span class="line">        height: <span class="number">627</span>,</span><br><span class="line">        userAgent: <span class="string">'Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4'</span></span><br><span class="line">    &#125;</span><br><span class="line">&#125;);</span><br></pre></td></tr></table></figure><p>大家都知道iPhone用的是Retina屏幕, 虽然分辨率真的是 635 * 375 , 可是如果真的用这个分辨率截图, 最终效果真心惨不忍睹, 解决方法很简单: 分辨率加一倍, 浏览器视角缩小一倍, 这样截图就清晰很多了</p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">var</span> casper = <span class="built_in">require</span>(<span class="string">'casper'</span>).create(&#123;</span><br><span class="line">      verbose: <span class="literal">true</span>,</span><br><span class="line">      logLevel: <span class="string">"debug"</span>,</span><br><span class="line">      viewportSize: &#123;</span><br><span class="line">        name: <span class="string">'Apple iPhone 6'</span>,</span><br><span class="line">        active: <span class="literal">true</span>,</span><br><span class="line">        width: <span class="number">375</span> * <span class="number">2</span> ,</span><br><span class="line">        height: <span class="number">627</span> * <span class="number">2</span>,</span><br><span class="line">        userAgent: <span class="string">'Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4'</span></span><br><span class="line">    &#125;</span><br><span class="line">&#125;);</span><br><span class="line"></span><br><span class="line">casper.start().zoom(<span class="number">2</span>).thenOpen(<span class="string">'http://stackbox.cn'</span>, <span class="function"><span class="keyword">function</span>(<span class="params"></span>) </span>&#123;</span><br><span class="line">        <span class="keyword">this</span>.echo(<span class="keyword">this</span>.getTitle())</span><br><span class="line">        <span class="keyword">this</span>.capture(<span class="string">'demo.png'</span>)</span><br><span class="line">&#125;)</span><br></pre></td></tr></table></figure><h2 id="字体问题"><a href="#字体问题" class="headerlink" title="字体问题"></a>字体问题</h2><p>当把程序部署在服务器上时, 发现截图无法渲染字体, 想一想也是, 毕竟服务器上压根就没有安装那些字体, 能渲染出来就有鬼了</p><p>一开始的时候参照某些方法用的是 <code>bitmap-fonts bitmap-fonts-cjk</code>, 非常丑, 直接老老实实安装网页中依赖的ttf字体即可</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">$ sudo yum install fontconfig</span><br><span class="line">$ yum remove bitmap-fonts bitmap-fonts-cjk  </span><br><span class="line">$ mkdir /usr/share/fonts/custom</span><br><span class="line">$ cp *.ttf  /usr/share/fonts/custom</span><br><span class="line">$ <span class="built_in">fc</span>-cache -fv</span><br></pre></td></tr></table></figure><p>至于如何知晓网页依赖的字体? 打开开发者工具运行一下 <code>$(&#39;body&#39;).css(&quot;font-family&quot;)</code> 就可以了 <a href="http://ojwx27vxt.bkt.clouddn.com/screenshot/20170127/yizhibo-24752948-1485514066098.png" target="_blank" rel="noopener">效果截图</a></p>]]></content>
    
    <summary type="html">
    
      用CasperJS对网页进行截图, 并处理CentOS下的字体渲染问题
    
    </summary>
    
    
      <category term="工具" scheme="http://stackbox.cn/tags/%E5%B7%A5%E5%85%B7/"/>
    
  </entry>
  
  <entry>
    <title>2017年计划</title>
    <link href="http://stackbox.cn/2017-01-2017-plans/"/>
    <id>http://stackbox.cn/2017-01-2017-plans/</id>
    <published>2017-01-15T18:49:07.000Z</published>
    <updated>2018-12-17T11:05:28.465Z</updated>
    
    <content type="html"><![CDATA[<p>2017年过了好几天了, 看了一下 <a href="http://stackbox.cn/2016-01-2016-plans/">2016年计划</a> , 感觉整整一年过得还是蛮懵逼的, 仅以此文总结一下</p><a id="more"></a><h2 id="2016总结"><a href="#2016总结" class="headerlink" title="2016总结"></a>2016总结</h2><h3 id="原有目标"><a href="#原有目标" class="headerlink" title="原有目标"></a>原有目标</h3><ul><li>数据库没有怎么碰, 由于产品用的是RDS, 好像也就是写SQL 6了一点</li><li>目前只在 withdata这个项目里用到了spring的技术栈, 多多少少是清楚了一些基础的东西, 但是关于spring的高阶玩法, 不过netflix或者pivotal那些microservice的玩法, 还是多多少少有些懵逼</li><li>前端的发展果然是不能预估的, 本来以为react会在2016大放异彩, 没想到最后大放异彩的确实Vue, 不过最近在做项目的时候发现, 可以的追求SPA其实是蛮不可取的, 而且前后端分离这种事情, 也不一定非要SPA做, 业务上分离就好, 希望2017年有个一好的实践</li><li>数据处理这一块没有动过, 由于做的项目太小, 全用python给怼过去了,不过zrank项目最后还是用了(不是我写的), 因为你会发现有时候刻意的算增量来计算数据反而是最慢的，不如直接读全部数据然后批处理， 极力减少IO的影响是最吼的</li><li>没有看多少书, 夏天之后由于部门接的项目很零碎, 很懵逼 , 然后就没有时间弘扬正法了   微笑脸</li><li>略有成就感的事大概是github上涨了star和fans, 看自己的那些demo代码能忙一些刚入门的工程师还是蛮开心的</li><li>直到年底才开始健身, 虽然没几次但是对于精神状态的还是很有用的, 比如现在也开始注意外表之类的, 毕竟抵制自卑的方法就是开始喜欢自己, 然而还是没有超过50kg</li><li>妹子额， 一来还是那个自卑+抑郁的问题, 不太敢想这些事情， 二来自己还是丑且穷， 三是我自己好无趣啊聊天都不会吐槽无能，大概就是如果自己是个妹子都不会喜欢上我自己， 希望新年里有一个积极的自我认知</li><li>年底的时候又开始练琴了, 目前是拜厄70多, 599到了20多章, 后面的学习速度明显感觉快了很多, 按照现在的这种节奏大概2017年底能弹完 拜厄+599+巴赫初级。。。恩。。只是个估计</li></ul><h3 id="计划之外"><a href="#计划之外" class="headerlink" title="计划之外"></a>计划之外</h3><ul><li>一整年都在自我怀疑自我否定, 感觉2016年过的还是相当快的, 转眼间自己都毕业三年了</li><li>整个Team相关的Social业务都拆给其他组了, 关于Social这一块还是感慨良多, 年初的时候大家都以为微博已死, 微信永存, 不过到年底发现weibo的市值已经超过了Twitter, 微信相关的营销有种小打小闹的感觉, 可以理解为微信自己的工具属性, 在信息传播方面会很容易达到天花板,  而微博在转向媒体的努力, 还是很成功的</li><li>虽然Team改做MarketingAutomation, 但是由于特殊的国情, 主要还是接项目给大佬们做demo, 其他的时间是做直播相关的项目, 毕竟整个2016年这个算是为数不多的资本风口了, 大概2017年会残存几个大佬, 也是从zrank项目开始, 项目的技术栈就由Java转向的Python(Flask), 说实话, 换语言之后的效率提升了很多, 毕竟对于Startup来说, 快速迭代是最重要的, 而且superviosr, beanstalkd, sentry这些第三方工具给人的体验真的很棒</li><li>PS4又入了几款游戏, 主要是初音和实况足球, 手残党表示很心塞</li><li>2016真的可以称作人工智能的元年了, 各种工具框架层出不穷, AlphaGO/Master 干翻一众的九段, 即使是3A大作 《泰坦降临》中, AI的那种理性的反差萌也是没话说 PRPRPR</li><li>上半年买了个神船, 6.x的时候练了好多小号, 不过7.x的时候由于付费机制的改变就再也没有碰过了, 也就偶尔打开sc2爽一下</li><li>下半年每日日常就是看SC2的比赛, 亲眼目睹了韩国职业联赛的衰弱</li><li>入手了俩滑板, 由于帝都各种雾霾其实没有玩多少次, 动作还都不会</li></ul><h2 id="2017计划"><a href="#2017计划" class="headerlink" title="2017计划"></a>2017计划</h2><ul><li>换一个黑框眼镜+1s</li><li>学习, 读书, 构建自己的思考体系, 目前的状况比较尴尬, 感觉说什么都知道, 但是没有进一步的思考事物代表的背后, 这大概就是 大忽悠  和 有思想者的差别了吧， 目前的自我认知是一个大忽悠</li><li>增强沟通能力, 目前的情况是各种怯场, 情商低无法勾起人聊天的欲望</li><li>锻炼身体, 力争上110斤</li><li>钢琴达到3级水平</li><li>着力于数据相关的技术, 目前来说基于Python的MachineLearning技术栈会来一个大爆发</li><li>继续写技术博客, 另外要弄个公众号吐槽, 构建自己的产品思维, 增强写作能力</li><li>英语 and 日语</li><li>滑板玩到会基础动作</li><li>sc2上白银</li><li>考驾照</li><li>女朋友。。。。(<a href="javascript:void(0" target="_blank" rel="noopener">#小箱子不要怂啊！！！</a>)</li></ul>]]></content>
    
    <summary type="html">
    
      2016年总结与2017新年计划
    
    </summary>
    
      <category term="生活记录" scheme="http://stackbox.cn/categories/%E7%94%9F%E6%B4%BB%E8%AE%B0%E5%BD%95/"/>
    
    
  </entry>
  
  <entry>
    <title>一个奇怪的wait4行为</title>
    <link href="http://stackbox.cn/2016-09-an-unusual-wait4-system-call-behavior/"/>
    <id>http://stackbox.cn/2016-09-an-unusual-wait4-system-call-behavior/</id>
    <published>2016-09-07T06:16:32.000Z</published>
    <updated>2018-12-17T11:05:36.749Z</updated>
    
    <content type="html"><![CDATA[<h1 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h1><p>最近写Python的时候发现了一个Mac奇怪的问题, 代码逻辑大致为</p><ul><li>给<code>SIGCHLD</code>信号绑定一个singal handler</li><li>fork多个子进程, 子进程阻塞</li><li>主进程使用wait来阻塞, 并打印关闭的子进程信息</li></ul><a id="more"></a><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> time</span><br><span class="line"><span class="keyword">import</span> signal</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">handler</span><span class="params">(a, b)</span>:</span></span><br><span class="line">    <span class="keyword">print</span> (<span class="string">'xxxxxxx:'</span>, a, b)</span><br><span class="line">    signal.signal(signal.SIGCHLD, handler)</span><br><span class="line">signal.signal(signal.SIGCHLD, handler)</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">0</span>, <span class="number">5</span>):</span><br><span class="line">    pid = os.fork()</span><br><span class="line">    <span class="keyword">if</span> pid == <span class="number">0</span>:</span><br><span class="line">        <span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">            time.sleep(<span class="number">2</span>)</span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    pid , sta = os.wait()</span><br><span class="line">    <span class="keyword">print</span> (<span class="string">'pid:'</span>, pid, <span class="string">'stat:'</span>, sta)</span><br></pre></td></tr></table></figure><p>而奇怪的行为就是</p><ul><li>Mac下wait如果没有被try except, 会扔一个EINTR错误 (慢系统调用中断错误)</li><li>Linux下及时没有try except却没有什么问题</li></ul><h1 id="分析"><a href="#分析" class="headerlink" title="分析"></a>分析</h1><p>一开始怀疑的是 Python在OSX下的特殊bug, 然后我就用pyenv从2.7.1到3.5.0全部安了一遍, 最后发现3.5.0之后竟然没有EINTR错误, 查了一下3.5.0的 <a href="https://docs.python.org/3.5/whatsnew/changelog.html#python-3-5-0-final" target="_blank" rel="noopener">release note</a>, 此版本解决了<a href="http://bugs.python.org/issue19850" target="_blank" rel="noopener">#Issure19580</a>, 就是在添加signal handler的时候添加了了一句 <code>signal.siginterrupt(sig, False)</code>, 这样产生的效果就是某个Signal中断系统调用时, 不再抛出EINTER异常, 而是系统调用会自动重启。</p><p>但是这样还是无法解释老版本python在不同平台行为不一致的问题，那么会不会是另外一种情况? 信号并不会对 wait system call产中中断, 虽说各种手册都说wait跟read一样都属于slow system call, 感觉上应该不是这个问题, 不过为了严谨起见还是测试了一下, 大概是给上面的demo绑定一个 <code>SIGWINCH</code> 信号的signal handler, 这个信号会在终端宽度变化时会触发, 果不其然, 无论你主进程是用 <code>os.read</code> 还是 <code>os.wait</code> 来阻塞, 无论是在 OSX还是Linux下, 触发SIGWINCH都会抛出EINTR错误。</p><p>这就很尴尬了, 难道说是Linux下对SIGCHLD信号有特殊的关爱? 由于没见任何手册说过, 表示对这个猜想持保留意见, 幸运的是, 查资料的时候发现了 <code>strace/dtruss</code> 这类工具, 可以方便的跟踪系统调用信号 </p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"># OSX下dtruss系统调用信息</span><br><span class="line">83483/0xb96f97:     43901 13551362     10 wait4(0xFFFFFFFF, 0x7FFF5D80553C, 0x0)                 = -1 Err#4</span><br><span class="line">83483/0xb96f97:     43915      80      2 sigreturn(0x7FFF5D805470, 0x1E, 0x0)            = 0 Err#-2</span><br><span class="line">83483/0xb96f97:     43951      10      6 write_nocancel(0x1, &quot;(&apos;xxxxxxx:&apos;, 20, &lt;frame object at 0x102970c90&gt;)\n\0&quot;, 0x30)                = 48 0</span><br><span class="line">83483/0xb96f97:     43960       4      0 sigaction(0x14, 0x7FFF5D805128, 0x7FFF5D805150)                 = 0 0</span><br><span class="line">83483/0xb96f97:     43977       5      2 write_nocancel(0x1, &quot;exception\n\0&quot;, 0xA)               = 10 0</span><br><span class="line">83483/0xb96f97:     43987       8      5 wait4(0xFFFFFFFF, 0x7FFF5D80553C, 0x0)          = 83520 0</span><br><span class="line">83483/0xb96f97:     43995       4      1 write_nocancel(0x1, &quot;(&apos;pid:&apos;, 83520, &apos;stat:&apos;, 9)\n\0&quot;, 0x1C)            = 28 0</span><br></pre></td></tr></table></figure><p>这个是OSX下kill一个子进程之后的跟踪报告, 为了方便我在os.wait 外包了一层try cache, 可以看到, 第一行上来wait就扔了一个Err#4, 查了一下FreeBSD的文档发现 这个 Err#4 代表的是 Interrupted 的意思, 这个跟想象中的一样, SIGCHLD信号中断了 wait, sigreturn是和signal hanlder成对出现的(sigreturn的设计很有意思, 这个以后再细说), 在追踪报告中第三行就是handler的代码, 由于主进程EINTR了, 系统又重新绑定了一次signal handler(因为系统其实已经挂掉了..只不过try except +1s了, 所以需要重新绑信号), 然后此时又开始wait阻塞了,而且正好有一个僵尸进程, wait就开开心心的跑起来了</p><p>Note: 如果显式的调用 <code>signal.signal(signal.SIGCHLD, signal.SIG_IGN)</code> , 子进程被kill时会直接没掉, 不会产生僵尸进程, 此时主进程wait就不能感知子进程挂掉了, 如果绑定的是一个自定义的handler, 子进程还是会转成僵尸进程, 就会被主进程wait感知</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"># Linux下strace系统调用信息</span><br><span class="line">[pid 25814] wait4(-1,</span><br><span class="line">[pid 25814] &lt;... wait4 resumed&gt; [&#123;WIFSIGNALED(s) &amp;&amp; WTERMSIG(s) == SIGQUIT &amp;&amp; WCOREDUMP(s)&#125;], 0, NULL) = 25904</span><br><span class="line">[pid 25814] --- SIGCHLD (Child exited) @ 0 (0) ---</span><br><span class="line">[pid 25814] rt_sigreturn(0xffffffff)    = 25904</span><br><span class="line">[pid 25814] write(1, &quot;(17, &lt;frame object at 0x7ffc4a17&quot;..., 39(17, &lt;frame object at 0x7ffc4a171910&gt;)) = 39</span><br><span class="line">[pid 25814] write(1, &quot;--&gt; 25904 131\n&quot;, 14--&gt; 25904 131) = 14</span><br></pre></td></tr></table></figure><p>然后这个是代码在Linux下的追踪报告, 第一行是表示的是wait目前在阻塞状态, kill一个子进程时, 第二行竟然是 <strong> wait4 resumed </strong> , 查看文档可以明白 <strong> <xxxx resumed=""> </xxxx></strong> 代表system call 返回的意思, 注意注意: 此时还特么没有产生SIGCHLD信号, 也就是说:</p><blockquote><p>在子进程转换成僵尸进程的时候就立刻被主进程wait感知了, 而且此时主进程还没有第一时间接收到SIGCHLD信号,自然SIGCHLD信号就不会中断系统调用了。** </p></blockquote><p>后面就是正常的跑 signal handler和主进程的代码了, 得知这种真相的我, 内心有点崩溃</p><h1 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h1><h2 id="处理僵尸进程"><a href="#处理僵尸进程" class="headerlink" title="处理僵尸进程"></a>处理僵尸进程</h2><p>一般而言处理僵尸进程的方式有两种</p><ul><li><code>signal.signal(signal.SIGCHLD, signal.SIG_IGN)</code></li><li>主进程wait处理关闭的子进程, 此时需要注意此文说明的问题</li></ul><h2 id="正确使用wait"><a href="#正确使用wait" class="headerlink" title="正确使用wait"></a>正确使用wait</h2><p>为了不抛出EINTR异常, 可以有以下方式</p><ul><li>绑定signal handler的时候, 手动设置 <code>signal.siginterrupt(sig, False)</code>, 虽然3.5.0会自动设置, 但是为了老版本最好手动加一下</li><li>主进程 <code>os.wait</code> 的时候try except</li><li>主进程 不用 <code>os.wait</code> 阻塞, 而是不停地 <code>os.waitpid(-1, os.WNOHANG)</code> 来获取子进程信息, 返回结果为0时直接continue</li><li>也可以在 signal handler里进行 <code>os.wait</code> , 这样就保证了信号和终端的顺序, 就不会产生EINTR错误, 一些官方的linux c教程也是这么写的</li></ul>]]></content>
    
    <summary type="html">
    
      OSX/Linux 下 signal handler 和 wait4同时使用引起的奇怪问题
    
    </summary>
    
    
      <category term="计算机基础" scheme="http://stackbox.cn/tags/%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%9F%BA%E7%A1%80/"/>
    
  </entry>
  
  <entry>
    <title>Tail Notes</title>
    <link href="http://stackbox.cn/2016-08-tail/"/>
    <id>http://stackbox.cn/2016-08-tail/</id>
    <published>2016-08-28T12:33:48.000Z</published>
    <updated>2018-12-17T11:05:45.698Z</updated>
    
    <content type="html"><![CDATA[<p>周日一觉醒来看到airflow上这么一排红色的 <strong>failed</strong> , 心中万头羊驼呼啸而过</p><p><img src="http://box-images.qiniudn.com//blog/airflow-wrong.png" alt=""></p><p>需求很简单, 就是一直读取squid的日志然后把日志塞到队列里(通过http请求的方式), 大致的代码如下</p><a id="more"></a><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">tail -f access.log |</span><br><span class="line">  <span class="keyword">while</span> IFS= <span class="built_in">read</span> -r line</span><br><span class="line">  <span class="keyword">do</span></span><br><span class="line">    curl -XPOST -H <span class="string">'Content-Type: application/json'</span> -d <span class="string">'&#123;"url":"'</span><span class="variable">$line</span><span class="string">'"&#125;'</span> http://example.com/api</span><br><span class="line">    <span class="built_in">echo</span> <span class="string">''</span></span><br><span class="line">  <span class="keyword">done</span></span><br></pre></td></tr></table></figure><p>处于懵逼状态的我兴冲冲的跑到公司, 发现access.log文件依然是稳定的增长, 但是显然脚本tail不到任何东西了, 虽然以前也出现过这种问题, 但都是比较暴力的重启脚本解决的, 如此稳定的异常的肯定是那个地方做的不对, 仔细的看了下日志文件, 发现在凌晨的时候squid对其进行了切分压缩(周期7天), 正好跟印象中的异常出现的频率一直, 最后怀疑是日志切分导致这类问题</p><p>经测试。。果然特么是的 (测试过程不做赘述)</p><p>查了一下资料, 使用 <code>-f</code> 的时候, 解释如下</p><blockquote><p>   With  –follow  (-f),  tail  defaults to following the file descriptor,which means that even if a tail’ed file is renamed, tail will  continue to  track  its  end.   This  default behavior is not desirable when you eally want to track the actual name of the file, not the file descriptor (e.g., log rotation).  Use –follow=name in that case.  That causes tail to track the named file  in  a  way  that  accommodates  renaming, removal and creation.</p></blockquote><p>也就是说, <code>-f</code> 模式下监听的是文件描述符, 如果文件改名了, 监听的还是该文件(改名后文件描述符是不变的), 解决方法是使用 <code>-F</code> 参数</p><blockquote><p>-F: same as –follow=name –retry<br>–retry: keep trying to open a file even when it is or becomes inaccessible; useful when following by name, i.e., with –follow=name</p></blockquote><p>类似情况还有Nginx的日志切分, nginx日志切分完成之后需要向nginx master进程发送一个 <a href="http://weizhifeng.net/nginx-signal-processing-and-upgrade.html" target="_blank" rel="noopener"><code>USR1</code> 信号</a> 来让nginx master进程重新打开日志文件, 如果不发送该信号的话, access.log 会无法记录日志</p><p>Over</p>]]></content>
    
    <summary type="html">
    
      使用tail命令处理日志文件时需要注意的问题
    
    </summary>
    
    
      <category term="计算机基础" scheme="http://stackbox.cn/tags/%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%9F%BA%E7%A1%80/"/>
    
  </entry>
  
  <entry>
    <title>记录一次奇葩的性能调优经历</title>
    <link href="http://stackbox.cn/2016-07-some-performance-realated-tools/"/>
    <id>http://stackbox.cn/2016-07-some-performance-realated-tools/</id>
    <published>2016-07-29T14:23:00.000Z</published>
    <updated>2018-12-17T11:05:49.079Z</updated>
    
    <content type="html"><![CDATA[<p>今天在写一个Koa2程序的时候无意间瞥了一眼日志, 发现某个简单的保存表单的API竟然平均耗时 <strong>900ms</strong>, <strong>900ms</strong> 啊同学们! 这种需求的正常耗时应该再除以10一下</p><a id="more"></a><h2 id="SQL-Profile分析"><a href="#SQL-Profile分析" class="headerlink" title="SQL Profile分析"></a>SQL Profile分析</h2><p>首先需要知道sql语句在哪个阶段慢, mysql提供了profile工具来帮助我们做性能分析:</p><blockquote><p>MYSQL&gt; set profiling=1<br>MYSQL&gt; insert into t_test_table values (‘hello’)<br>MYSQL&gt; show profile for query 1</p></blockquote><p>输入完上面的sql语句后, 会发现主要的耗时是在一个叫 <strong>query end</strong> 的阶段上, <a href="http://inetkiller.github.io/2014/05/20/mysql语句性能分析与优化/" target="_blank" rel="noopener">这篇文章</a>对此阶段的描述如下</p><blockquote><p>google上得到答案，将mysql的配置文件my.conf里加上一句innodb_flush_log_at_trx_commit = 0。 经过验证，成功解决问题，速度提升非常明显（上面的改动同时对insert操作也起了作用）。 同时留下疑问：query end是什么状态，为什么会用这么久的时间，为什么加上innodb_flush_log_at_trx_commit = 0后性能提升会这么大？</p></blockquote><blockquote><p>query end是什么状态？ mysql的官方文档解释是：This state occurs after processing a query but before the freeing items state.我的理解是语句执行完毕了，但是还有一些后续工作没做完时的状态。</p></blockquote><blockquote><p>那么freeing items 又是什么状态呢？ The thread has executed a command. Some freeing of items done during this state involves the query cache. This state is usually followed by cleaning up.就是释放查询缓存里面的空间（因为是update操作，所以相应的缓存里的记录就无效了，所以需要有这一步做处理）。</p></blockquote><blockquote><p>innodb_flush_log_at_trx_commit的默认值是1，此时的行为是： the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file。log buffer的作用:允许事务在执行完成之后才将日志（事务需要维护一个日志）写到磁盘上，时间主要应该就是耗费在磁盘IO上？</p></blockquote><blockquote><p>而将innodb_flush_log_at_trx_commit的值改为0后，行为如下： If the value of innodb_flush_log_at_trx_commit is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit。 可以看到，改成0后，本来应该每次提交都进行的操作，变成了每秒钟才进行一次，所以及大的节省了时间。</p></blockquote><blockquote><p>将innodb_flush_log_at_trx_commit的值设置为0有一个副作用：任何服务器端mysql程序的崩溃会导致最后一秒的事务丢失(还没来得及到到日志文件中)。但是考虑到本应用对事务不必有如此严格的要求，所以这是可以接受的。</p></blockquote><p>但是, 这样直接该参数太暴力了, 可能会丢数, 不推荐这么搞, 而SQL慢可能是IO压力太大的缘故, 下面通过一些工具来查看一下机器负载</p><h2 id="查看硬盘负载"><a href="#查看硬盘负载" class="headerlink" title="查看硬盘负载"></a>查看硬盘负载</h2><p><a href="http://linux.die.net/man/1/iostat" target="_blank" rel="noopener"><strong>iostat</strong></a> 命令可以查看磁盘负载, 输入命令</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ sudo iostat -d -x -k 1 40</span><br></pre></td></tr></table></figure><p><img src="http://box-images.qiniudn.com//blog/iostat-1.png" alt=""></p><p>其中 <strong>%util</strong> 字段表示一秒中有百分之多少的时间用于 I/O 操作，即被io消耗的cpu百分比, 如果 %util 接近 100%，说明产生的I/O请求太多，I/O系统已经满负荷，该磁盘可能存在瓶颈</p><h2 id="查看进程IO占用"><a href="#查看进程IO占用" class="headerlink" title="查看进程IO占用"></a>查看进程IO占用</h2><p>那么如何知道哪个程序占用的IO比较高呢? Linux下的 <a href="http://guichaz.free.fr/iotop/" target="_blank" rel="noopener"><strong>iotop</strong></a>命令包你满意, 该命令可以用yum进行安装, 在机器上运行iotop的结果如下图所示</p><p><img src="http://box-images.qiniudn.com//blog/iotop-1.png" alt=""></p><p>结果发现读IO进程主要是一个rsync脚本, 写IO进程是一堆爬虫以及elasticsearch…</p><h2 id="解决方法"><a href="#解决方法" class="headerlink" title="解决方法"></a>解决方法</h2><p>So..最后让另外的爬虫程序爬慢一点, insert 语句的速度变有了明显的提升 (<a href="javascript:void" target="_blank" rel="noopener">#还是因为太穷没机器..ORZ#</a>)</p>]]></content>
    
    <summary type="html">
    
      几个常用的性能分析工具
    
    </summary>
    
    
      <category term="性能优化" scheme="http://stackbox.cn/tags/%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96/"/>
    
  </entry>
  
  <entry>
    <title>微信公众号爬虫</title>
    <link href="http://stackbox.cn/2016-07-21-weixin-spider-notes/"/>
    <id>http://stackbox.cn/2016-07-21-weixin-spider-notes/</id>
    <published>2016-07-21T06:41:57.000Z</published>
    <updated>2018-12-17T11:07:42.777Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>最近需要持续更新3w左右公众号的文章, 受技术leader的影响重新写了一下爬虫的一些代码, 效果不错, 写此文记录一下</p></blockquote><a id="more"></a><h1 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h1><ul><li>无论是新方案还是旧方案, 获取公众号文章列表, 获取阅读点赞, 获取评论等接口可以通过抓包来获取</li><li>以上接口都是需要授权的, 授权参数主要有以下几个<ul><li>uin : 用户对于公众号的唯一ID, 本来是一个数字, 传的是base64之后的结果</li><li>key : 与公众号和uin绑定, 过期时间大概是半小时</li><li>pass_ticket: 另外一个验证码, 与uin进行绑定</li><li>req_id: 在文章里HTML里, 每次请求会不一样, 用来构成获取阅读点赞接口的RequestBody, 一次有效</li><li>获取阅读点赞接口有频率限制, 测试的结果是一个微信号5分钟可以查看30篇文章的阅读点赞</li></ul></li></ul><h1 id="旧方案"><a href="#旧方案" class="headerlink" title="旧方案"></a>旧方案</h1><p>在2015年的时候微信网页版限制还是没那么严格的, 当时采用的主要思路是使用微信网页版, 然后用requests去模拟登陆一下,</p><p>然后不停的去访问类似下面的接口爬取信息:</p><blockquote><p><a href="https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxcheckurl?requrl=encodeURIComponent(&#39;http://mp.weixin.qq.com/mp/getmasssendmsg?__biz=MjM5NzQ3ODAwMQ==#wechat_redirect&#39;)" target="_blank" rel="noopener">https://wx.qq.com/cgi-bin/mmwebwx-bin/webwxcheckurl?requrl=encodeURIComponent(&#39;http://mp.weixin.qq.com/mp/getmasssendmsg?__biz=MjM5NzQ3ODAwMQ==#wechat_redirect&#39;)</a></p></blockquote><p>当时为了能让爬虫多个实例跑, 用了一下 <code>Celery</code> 框架(现在想简直智障, 多个实例跑直接把程序启动N次就行了啊。。摔), 由于是模拟登陆, 所以又写了一套复杂的东西去生成二维码, 然后获取登陆URL, 具体的模拟登陆原理参考这个 <a href="https://github.com/0x5e/wechat-deleted-friends" target="_blank" rel="noopener">wechat-deleted-friends</a>, 另外相关的Celery Task里写的逻辑太复杂了, 一个Task里就带上了 requests断线重连机制, 模拟登陆机制, 解析列表, 解析文章等, 另外由于是web版微信有一套蛮复杂的sync机制, 有时候直接掉线需要再次的去手动登陆, 很是麻烦。</p><p>之后web版微信已经无法的获取Key了(2016年开始), 此方案就废弃了。。</p><h1 id="新方案"><a href="#新方案" class="headerlink" title="新方案"></a>新方案</h1><p>经leader提醒, 改了一下架构, 其中项目的整体结构如下: </p><p><img src="http://7jptw8.com1.z0.glb.clouddn.com/spider-wx.png" alt="微信爬虫架构图"></p><ul><li>Seeds 是一个producer, 在此处指通过某种方式获取 <strong>uin, key, pass_ticket</strong> 信息, 思路类似中间人攻击+解析squid日志</li><li>Consumer C1从Q1队列中取出seeds后爬取某个公众号的文章列表, 解析后将文章Meta信息放入队列Q2</li><li>Consumer C2获取文章原信息后就可以直接做入库&amp;爬取操作了</li><li>之后可以继续加队列然后去实现爬取文章阅读点赞的相关数据了, 由于有频率限制。一个微信号一天只能最多获取8000篇文章的阅读点赞信息</li><li>抛弃了Celery和其默认选用的RabbitMQ队列, 这种东西实在太重了。。改用beanstalkd做消息队列</li><li>目前的效果是单微信号每日更新4w左右的公众号文章, 如果想继续增加数量可以通过加机器来扩展</li></ul><h3 id="Update"><a href="#Update" class="headerlink" title="Update"></a>Update</h3><ul><li>生成key的方式是写按键精灵的脚本去不断地生成文章列表URL然后不停的点击, 用squid做代理来获取带Key的URL(squid需要配置一下ssl-bump透明代理)</li><li>经@tinkerz 提醒, 按键精灵可以用Java Robot类替换, 这样代码更好写一些(毕竟VBA语法是在是太丑了)</li></ul><p>Over</p>]]></content>
    
    <summary type="html">
    
      目前运行效果良好的一个微信爬虫方案
    
    </summary>
    
    
      <category term="爬虫" scheme="http://stackbox.cn/tags/%E7%88%AC%E8%99%AB/"/>
    
  </entry>
  
  <entry>
    <title>SpringBoot-单元测试</title>
    <link href="http://stackbox.cn/2016-04-springboot-test/"/>
    <id>http://stackbox.cn/2016-04-springboot-test/</id>
    <published>2016-04-25T08:24:45.000Z</published>
    <updated>2018-12-17T11:07:39.699Z</updated>
    
    <content type="html"><![CDATA[<p>对Controller层进行测试的时候, 如果是测试 REST接口, 使用 <a href="https://github.com/jayway/rest-assured" target="_blank" rel="noopener">rest-assured</a> 是一个十分不错的选择</p><a id="more"></a><ul><li>单元测试的配置与以前基于XML的项目差不错</li><li>如果写了多个TestCase文件, 为了使得他们公用一个SpringContext, 应该写一个抽象类来进行测试相关的配置, 然后其他的TestCase类继承自这个抽象类即可</li><li>由于测试启动Context(如果带mvc)是启动了随机的接口, 在setUp阶段需要给 rest-assured 设置一下使用的端口</li></ul><h3 id="示例代码"><a href="#示例代码" class="headerlink" title="示例代码"></a>示例代码</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">@RunWith(SpringJUnit4ClassRunner.class)</span><br><span class="line">@SpringApplicationConfiguration(classes = MyApplication.class)</span><br><span class="line">@WebAppConfiguration</span><br><span class="line">@IntegrationTest(&quot;server.port:0&quot;)</span><br><span class="line">@ActiveProfiles(&quot;test&quot;)</span><br><span class="line">public abstract class AbstractTestCase &#123;</span><br><span class="line">/**</span><br><span class="line"> * mvctest启动的随机端口号</span><br><span class="line"> */</span><br><span class="line">@Value(&quot;$&#123;local.server.port&#125;&quot;)   //</span><br><span class="line">protected int port;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">public abstract class AbstractRouteTestCase extends AbstractTestCase &#123;</span><br><span class="line">    /**</span><br><span class="line">     * 初始化MVC TEST</span><br><span class="line">     * 1. 设置RestAssured绑定端口</span><br><span class="line">     * 2. 可以完成一些其他的操作</span><br><span class="line">      */</span><br><span class="line">    protected void initMVC() &#123;</span><br><span class="line">        RestAssured.port = port;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">public class RouterWxapiStatusTest extends AbstractRouteTestCase &#123;</span><br><span class="line">    @Before</span><br><span class="line">    public void setUp() &#123;</span><br><span class="line">        initMVC();</span><br><span class="line">    &#125;</span><br><span class="line">    @Test</span><br><span class="line">    public void testController() &#123;</span><br><span class="line">    JSONObject params = new JSONObject();</span><br><span class="line"></span><br><span class="line">    //更多语法可以详见rest-assured的wiki</span><br><span class="line">    given().header(&quot;Content-Type&quot;, &quot;application/json&quot;)</span><br><span class="line">                .header(&quot;Origin&quot;, &quot;http://baidu.com&quot;)</span><br><span class="line">                .header(&quot;Authorization&quot;, &quot;Bearer testtoken&quot;).body(params.toJSONString())</span><br><span class="line">                .when().post(&quot;/test/api&quot;).then().log().all()</span><br><span class="line">                .statusCode(HttpStatus.SC_OK)</span><br><span class="line">                .body(&quot;status&quot;, equalTo(200))</span><br><span class="line">                .body(&quot;data.items.size()&quot;, not(0));</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="Update"><a href="#Update" class="headerlink" title="Update"></a>Update</h2><p>从1.4.0(目前尚未Stable)开始, 单元测试的使用方式有一些变化, 新增了 <code>@WebMvcTest</code> ,<code>@SpringBootTest</code>,<code>SpringRunner</code>, 对TestNG的支持也做一些优化</p><ul><li><a href="https://github.com/spring-projects/spring-boot/tree/master/spring-boot-samples/spring-boot-sample-test/src/test/java/sample/test" target="_blank" rel="noopener">1.4.0基于Junit4的单元测试</a></li><li><a href="https://github.com/spring-projects/spring-boot/blob/master/spring-boot-samples/spring-boot-sample-testng/src/test/java/sample/testng/SampleTestNGApplicationTests.java" target="_blank" rel="noopener">1.4.0基于TestNG的单元测试</a> </li></ul>]]></content>
    
    <summary type="html">
    
      在基于SpringBoot的项目中使用单元测试
    
    </summary>
    
    
      <category term="Java" scheme="http://stackbox.cn/tags/Java/"/>
    
  </entry>
  
  <entry>
    <title>SpringBoot-缓存相关</title>
    <link href="http://stackbox.cn/2016-03-spring-boot-cache/"/>
    <id>http://stackbox.cn/2016-03-spring-boot-cache/</id>
    <published>2016-03-18T02:49:29.000Z</published>
    <updated>2018-12-17T11:07:49.483Z</updated>
    
    <content type="html"><![CDATA[<a id="more"></a>]]></content>
    
    <summary type="html">
    
      在SpringBoot项目中使用Ehcache/Redis
    
    </summary>
    
    
      <category term="Java" scheme="http://stackbox.cn/tags/Java/"/>
    
  </entry>
  
  <entry>
    <title>SpringBoot-导出报表</title>
    <link href="http://stackbox.cn/2016-03-spring-boot-export/"/>
    <id>http://stackbox.cn/2016-03-spring-boot-export/</id>
    <published>2016-03-09T02:03:18.000Z</published>
    <updated>2018-12-17T11:07:46.691Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>虽然标题是这个, 但其实跟SpringBoot没啥关系, 恩, 命名强迫症</p></blockquote><a id="more"></a><h2 id="导出Excel"><a href="#导出Excel" class="headerlink" title="导出Excel"></a>导出Excel</h2>]]></content>
    
    <summary type="html">
    
      在SpringBoot中导出报表
    
    </summary>
    
    
      <category term="Java" scheme="http://stackbox.cn/tags/Java/"/>
    
  </entry>
  
  <entry>
    <title>SpringBoot-日志相关</title>
    <link href="http://stackbox.cn/2016-03-springboot-log/"/>
    <id>http://stackbox.cn/2016-03-springboot-log/</id>
    <published>2016-03-02T11:49:26.000Z</published>
    <updated>2018-12-17T11:07:36.544Z</updated>
    
    <content type="html"><![CDATA[<h2 id="基础使用"><a href="#基础使用" class="headerlink" title="基础使用"></a>基础使用</h2><p>SpringBoot提供了一套基本的日志系统, 默认是基于 Logback+SLF4J, 最基本的配置文件如下</p><a id="more"></a><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">include</span> <span class="attr">resource</span>=<span class="string">"org/springframework/boot/logging/logback/base.xml"</span> /&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- 用来显示mybatis的sql --&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">logger</span> <span class="attr">name</span>=<span class="string">"cn.stackbox.mapper"</span> <span class="attr">level</span>=<span class="string">"DEBUG"</span>/&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><p>其中 <code>base.xml</code> 的代码如下</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">include</span> <span class="attr">resource</span>=<span class="string">"org/springframework/boot/logging/logback/base.xml"</span> /&gt;</span></span><br><span class="line"></span><br><span class="line">    <span class="tag">&lt;<span class="name">root</span> <span class="attr">level</span>=<span class="string">"info"</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">appender-ref</span> <span class="attr">ref</span>=<span class="string">"CONSOLE"</span> /&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">appender-ref</span> <span class="attr">ref</span>=<span class="string">"FILE"</span> /&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">root</span>&gt;</span></span><br><span class="line"></span><br><span class="line">    <span class="tag">&lt;<span class="name">logger</span> <span class="attr">name</span>=<span class="string">"cn.stackbox.mapper"</span> <span class="attr">level</span>=<span class="string">"DEBUG"</span>/&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><p>可以看到springboot已经定义了基本的 ROOT-LOGGER, CONSOLE-APPENDER, FILE-APPENDER</p><h2 id="Spring整合"><a href="#Spring整合" class="headerlink" title="Spring整合"></a>Spring整合</h2><p>可以使用spring来扩展profile的支持, 必须以 <strong>logback-spring.xml</strong> 命名</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">include</span> <span class="attr">resource</span>=<span class="string">"org/springframework/boot/logging/logback/base.xml"</span> /&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">logger</span> <span class="attr">name</span>=<span class="string">"org.springframework.web"</span> <span class="attr">level</span>=<span class="string">"INFO"</span>/&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">logger</span> <span class="attr">name</span>=<span class="string">"org.springboot.sample"</span> <span class="attr">level</span>=<span class="string">"TRACE"</span> /&gt;</span></span><br><span class="line"></span><br><span class="line">    <span class="tag">&lt;<span class="name">springProfile</span> <span class="attr">name</span>=<span class="string">"dev"</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">logger</span> <span class="attr">name</span>=<span class="string">"org.springboot.sample"</span> <span class="attr">level</span>=<span class="string">"DEBUG"</span> /&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">springProfile</span>&gt;</span></span><br><span class="line"></span><br><span class="line">    <span class="tag">&lt;<span class="name">springProfile</span> <span class="attr">name</span>=<span class="string">"staging"</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">logger</span> <span class="attr">name</span>=<span class="string">"org.springboot.sample"</span> <span class="attr">level</span>=<span class="string">"INFO"</span> /&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">springProfile</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h2 id="日志分割"><a href="#日志分割" class="headerlink" title="日志分割"></a>日志分割</h2><p>可以用如下代码进行日志分割</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">appender</span> <span class="attr">name</span>=<span class="string">"MZRollingFileAppender"</span> <span class="attr">class</span>=<span class="string">"ch.qos.logback.core.rolling.RollingFileAppender"</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">File</span>&gt;</span>/home/data/superalsrk/SLF4J/stackbox-eureka/eureka.log<span class="tag">&lt;/<span class="name">File</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">rollingPolicy</span> <span class="attr">class</span>=<span class="string">"ch.qos.logback.core.rolling.TimeBasedRollingPolicy"</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">FileNamePattern</span>&gt;</span>/home/data/superalsrk/SLF4J/stackbox-eureka/eureka.%d&#123;yyyy-MM-dd&#125;.log<span class="tag">&lt;/<span class="name">FileNamePattern</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">maxHistory</span>&gt;</span>3000<span class="tag">&lt;/<span class="name">maxHistory</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">rollingPolicy</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">encoder</span>&gt;</span></span><br><span class="line">         <span class="tag">&lt;<span class="name">Pattern</span>&gt;</span>%d&#123;YYYY-MM-dd HH:mm:ss.SSS&#125; [%thread] %-5level %logger&#123;35&#125; - %msg %n<span class="tag">&lt;/<span class="name">Pattern</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">encoder</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">appender</span>&gt;</span></span><br></pre></td></tr></table></figure><h2 id="一些扩展"><a href="#一些扩展" class="headerlink" title="一些扩展"></a>一些扩展</h2><p>目前我的需求是</p><ul><li>线上会有Rolling日志, 放到磁盘的某个特殊位置</li><li>本地Console即可</li></ul><p>所以我的脚本为</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">include</span> <span class="attr">resource</span>=<span class="string">"org/springframework/boot/logging/logback/base.xml"</span> /&gt;</span></span><br><span class="line"></span><br><span class="line">    <span class="tag">&lt;<span class="name">springProfile</span> <span class="attr">name</span>=<span class="string">"production"</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">appender</span> <span class="attr">name</span>=<span class="string">"STBRollingFileAppender"</span> <span class="attr">class</span>=<span class="string">"ch.qos.logback.core.rolling.RollingFileAppender"</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">File</span>&gt;</span>/home/data/superalsrk/SLF4J/stackbox-eureka/eureka.log<span class="tag">&lt;/<span class="name">File</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">rollingPolicy</span> <span class="attr">class</span>=<span class="string">"ch.qos.logback.core.rolling.TimeBasedRollingPolicy"</span>&gt;</span></span><br><span class="line">                <span class="tag">&lt;<span class="name">FileNamePattern</span>&gt;</span>/home/data/superalsrk/SLF4J/stackbox-eureka/eureka.%d&#123;yyyy-MM-dd&#125;.log<span class="tag">&lt;/<span class="name">FileNamePattern</span>&gt;</span></span><br><span class="line">                <span class="tag">&lt;<span class="name">maxHistory</span>&gt;</span>3000<span class="tag">&lt;/<span class="name">maxHistory</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;/<span class="name">rollingPolicy</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">encoder</span>&gt;</span></span><br><span class="line">                <span class="tag">&lt;<span class="name">Pattern</span>&gt;</span>%d&#123;YYYY-MM-dd HH:mm:ss.SSS&#125; [%thread] %-5level %logger&#123;35&#125; - %msg %n<span class="tag">&lt;/<span class="name">Pattern</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;/<span class="name">encoder</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;/<span class="name">appender</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">springProfile</span>&gt;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">    <span class="tag">&lt;<span class="name">springProfile</span> <span class="attr">name</span>=<span class="string">"development"</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">root</span> <span class="attr">level</span>=<span class="string">"INFO"</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">appender-ref</span> <span class="attr">ref</span>=<span class="string">"CONSOLE"</span> /&gt;</span></span><br><span class="line">        <span class="tag">&lt;/<span class="name">root</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">springProfile</span>&gt;</span></span><br><span class="line"></span><br><span class="line">    <span class="tag">&lt;<span class="name">springProfile</span> <span class="attr">name</span>=<span class="string">"production"</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">root</span> <span class="attr">level</span>=<span class="string">"INFO"</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">appender-ref</span> <span class="attr">ref</span>=<span class="string">"CONSOLE"</span> /&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">appender-ref</span> <span class="attr">ref</span>=<span class="string">"STBRollingFileAppender"</span> /&gt;</span></span><br><span class="line">        <span class="tag">&lt;/<span class="name">root</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">springProfile</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><p>然后在 <code>application.yml</code> 里设置一下默认 <strong>Profile</strong></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">spring:</span><br><span class="line">  profiles:</span><br><span class="line">    default: production</span><br></pre></td></tr></table></figure><p>这样不加参数的时候就会用 <code>production</code> 这个Profile,然后为了让 IDE使用 <code>development</code>, 可以加入一个 Program arguments(还有其他设置Profile的方法)</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">--spring.profiles.active=development</span><br></pre></td></tr></table></figure>]]></content>
    
    <summary type="html">
    
      SpringBoot中关于Logback使用的一些tips
    
    </summary>
    
    
      <category term="Java" scheme="http://stackbox.cn/tags/Java/"/>
    
  </entry>
  
  <entry>
    <title>SpringBoot-多数据源</title>
    <link href="http://stackbox.cn/2016-03-spring-boot-multi-datasource/"/>
    <id>http://stackbox.cn/2016-03-spring-boot-multi-datasource/</id>
    <published>2016-03-01T03:08:36.000Z</published>
    <updated>2018-12-17T11:07:30.149Z</updated>
    
    <content type="html"><![CDATA[<h1 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h1><p>新项目使用了主从数据库, 从数据库用来查询报表数据, 主数据库用来CRUD业务数据以及定时插入报表数据, 而且项目中同时使用了 <em>Spring Data JPA</em> 和 <em>Mybatis</em> , 配置多个数据源就成了一个很繁琐的问题。</p><a id="more"></a><p>按照平常的思路, 就是一个数据源配置一个 <code>DataSource</code> , 然后对于Mybatis来讲就要配置多个 <code>SqlSessionFactory</code> , DAO和Repository都需要根据文件夹进行区分, 好了, 等你配置完直到能跑的时候就会发现, 项目已经炸了。</p><p>一种比较优雅的方法是, 对外只提供一个 <code>DataSource</code> 的虚拟中介, 在配置 <code>SessionFactory</code> / <code>SqlSessionFactory</code> 的时候用的是这个虚拟中介数据源, 等具体要用数据源的时候, 根据某个 Key值来决定到底使用哪一个数据源。 <strong>AbstractRoutingDataSource</strong> 类就提供了这种功能。</p><h1 id="原理"><a href="#原理" class="headerlink" title="原理"></a>原理</h1><p><strong>AbstractRoutingDataSource</strong> 的源码如下, 这个类实现了 <strong>DataSource</strong> 接口无误</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">public abstract class AbstractRoutingDataSource extends AbstractDataSource implements InitializingBean &#123;</span><br><span class="line">    public Connection getConnection() throws SQLException &#123;  </span><br><span class="line">        return determineTargetDataSource().getConnection();  </span><br><span class="line">    &#125; </span><br><span class="line">    public Connection getConnection(String username, String password) throws SQLException &#123;  </span><br><span class="line">        return determineTargetDataSource().getConnection(username, password);  </span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>然后具体是怎么获取 Connection的呢? <strong>determineTargetDataSource</strong> 具体实现是这样的</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">protected DataSource determineTargetDataSource() &#123;  </span><br><span class="line">    Assert.notNull(this.resolvedDataSources, &quot;DataSource router not initialized&quot;);  </span><br><span class="line">    Object lookupKey = determineCurrentLookupKey();  </span><br><span class="line">    DataSource dataSource = this.resolvedDataSources.get(lookupKey);  </span><br><span class="line">    if (dataSource == null &amp;&amp; (this.lenientFallback || lookupKey == null)) &#123;  </span><br><span class="line">        dataSource = this.resolvedDefaultDataSource;  </span><br><span class="line">    &#125;  </span><br><span class="line">    if (dataSource == null) &#123;  </span><br><span class="line">        throw new IllegalStateException(&quot;Cannot determine target DataSource for lookup key [&quot; + lookupKey + &quot;]&quot;);  </span><br><span class="line">    &#125;  </span><br><span class="line">    return dataSource;  </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>好了, 重点来了, 这段代码的核心其实只有两点</p><ul><li><strong>resolvedDefaultDataSource</strong> : 一个 <code>Map&lt;Object, DataSource&gt;</code> , 就是在配置的时候手动配置的Key与数据源的对应关系</li><li><strong>determineCurrentLookupKey()</strong> : 用来获取 Key 值, 需要在子类中实现获取Key的策略</li></ul><h1 id="思路"><a href="#思路" class="headerlink" title="思路"></a>思路</h1><ol><li>项目中配置主从数据源, 并配置自己实现的AbstractRoutingDataSource子类做 主要的(@Primary)的DataSource</li><li>实现AbstractRoutingDataSource子类, Key获取策略为从一个LocalThread变量中获取</li><li>设计一个自定义注解,用于在Service层, DAO层, Repository层中使用</li><li>通过AOP的方式去读取自定义注解, 然后根据注解往LocalThread里塞Key</li><li>因为jetty可能会重用LocalThread, 所以需要在完成之后清空LocalThread变量, 至此, 多数据源配置完成</li></ol><h1 id="实现"><a href="#实现" class="headerlink" title="实现"></a>实现</h1><p>首先, 写一个自定义的注解, 用在Service中的各个Method上</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">@Target(&#123;ElementType.METHOD, ElementType.TYPE&#125;)</span><br><span class="line">@Retention(RetentionPolicy.RUNTIME)</span><br><span class="line">@Documented</span><br><span class="line">public @interface MzDataSource &#123;</span><br><span class="line">    String name() default MzDataSource.master;</span><br><span class="line">    public static String master = &quot;masterDataSource&quot;;</span><br><span class="line">    public static String slave = &quot;slaveDataSource&quot;;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>然后再写一个类用来存放LocalThread变量</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">public class DynamicDataSourceResolver extends AbstractRoutingDataSource &#123;</span><br><span class="line">    @Override</span><br><span class="line">    protected Object determineCurrentLookupKey() &#123;</span><br><span class="line">        String key =  DataSourceRouteHolder.getDataSourceKey();</span><br><span class="line">        if(StringUtils.isBlank(key)) &#123;</span><br><span class="line">            return MzDataSource.master;</span><br><span class="line">        &#125;</span><br><span class="line">        return key;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>再写一个普通风格的AbstractRoutingDataSource实现, 策略就是直接从LocalThread里直接取Key</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">public class DataSourceRouteHolder &#123;</span><br><span class="line">    private static final ThreadLocal&lt;String&gt; dataSources = new ThreadLocal&lt;&gt;();</span><br><span class="line">        public static void setDataSourceKey(String customType) &#123;</span><br><span class="line">    dataSources.set(customType);</span><br><span class="line">    &#125;</span><br><span class="line">    public static String getDataSourceKey() &#123;</span><br><span class="line">        return (String) dataSources.get();</span><br><span class="line">    &#125;</span><br><span class="line">    public static void clearDataSourceKey() &#123;</span><br><span class="line">        dataSources.remove();</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>使用注解AOP的方式来读取Service方法上的自定义注解, 然后塞进ThreadLocal里, 下面的实现既支持Service接口里的注解, 也支持Service实现中注解, 实现优先级大于接口</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">@Component</span><br><span class="line">@Aspect</span><br><span class="line">public class DataSourceAspect &#123;</span><br><span class="line">    @Pointcut(&quot;execution(* cn.stackbox.service..*(..))&quot;)</span><br><span class="line">    public void aspect() &#123;&#125;</span><br><span class="line">    @Before(&quot;aspect()&quot;)</span><br><span class="line">    public void doBefore(JoinPoint point) throws Throwable &#123;</span><br><span class="line">        final MethodSignature methodSignature = (MethodSignature) point.getSignature();</span><br><span class="line">        Method method = methodSignature.getMethod();</span><br><span class="line">        MzDataSource mzDataSource = method.getAnnotation(MzDataSource.class);</span><br><span class="line">        if(method.getDeclaringClass().isInterface()) &#123;</span><br><span class="line">            method = point.getTarget().getClass().getMethod(method.getName(), method.getParameterTypes());</span><br><span class="line">        &#125;</span><br><span class="line">        mzDataSource = method.getAnnotation(MzDataSource.class);</span><br><span class="line">        if(null != mzDataSource) &#123;</span><br><span class="line">            DataSourceRouteHolder.setDataSourceKey(mzDataSource.name());</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    @After(&quot;aspect()&quot;)</span><br><span class="line">    public void doAfter() &#123;</span><br><span class="line">        DataSourceRouteHolder.clearDataSourceKey();</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>最后配置一下主从数据源, 需要注意的是需要在DynamicDataSourceResolver上加一个 <code>@Primary</code> 的注解, 不然会抛出一个类qualifier多个实例的异常</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">@Bean</span><br><span class="line">@Primary</span><br><span class="line">public DataSource dataSource() &#123;</span><br><span class="line">    DynamicDataSourceResolver resolver = new DynamicDataSourceResolver();</span><br><span class="line">    Map&lt;Object, Object&gt; dataSources = Maps.newHashMap();</span><br><span class="line">    dataSources.put(&quot;masterDataSource&quot;, masterDataSource());</span><br><span class="line">    dataSources.put(&quot;slaveDataSource&quot;, slaveDataSource());</span><br><span class="line">    resolver.setTargetDataSources(dataSources);</span><br><span class="line">    return resolver;</span><br><span class="line">&#125;</span><br><span class="line">@Bean</span><br><span class="line">@ConfigurationProperties(prefix=&quot;spring.datasource.master&quot;)</span><br><span class="line">public DataSource masterDataSource() &#123;</span><br><span class="line">    return new org.apache.tomcat.jdbc.pool.DataSource();</span><br><span class="line">&#125;</span><br><span class="line">@Bean</span><br><span class="line">@ConfigurationProperties(prefix=&quot;spring.datasource.slave&quot;)</span><br><span class="line">public DataSource slaveDataSource() &#123;</span><br><span class="line">    return new org.apache.tomcat.jdbc.pool.DataSource();</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h1 id="注意"><a href="#注意" class="headerlink" title="注意"></a>注意</h1><ol><li>要及时清空LocalThread变量, 防止LocalThread重用引起的错误</li><li>这种方式, 在配置分布式事务的时候相当复杂, 具体参考 <a href="http://hungryant.github.io/java/2015/11/26/java-spring-boot-jta.html" target="_blank" rel="noopener">此文</a></li></ol>]]></content>
    
    <summary type="html">
    
      使用AbstractRoutingDataSource来配置多个数据源
    
    </summary>
    
    
      <category term="Java" scheme="http://stackbox.cn/tags/Java/"/>
    
  </entry>
  
  <entry>
    <title>2016年计划</title>
    <link href="http://stackbox.cn/2016-01-2016-plans/"/>
    <id>http://stackbox.cn/2016-01-2016-plans/</id>
    <published>2016-01-21T18:02:02.000Z</published>
    <updated>2018-12-17T11:07:15.762Z</updated>
    
    <content type="html"><![CDATA[<h2 id="2015年反省"><a href="#2015年反省" class="headerlink" title="2015年反省"></a>2015年反省</h2><ul><li>一个大写的 <code>浮躁</code> ,技术上铺的面太广, 导致每一项都无深入研究</li><li>买了一堆书, 一页没看</li></ul><a id="more"></a><h2 id="2016年计划"><a href="#2016年计划" class="headerlink" title="2016年计划"></a>2016年计划</h2><ul><li>研究下数据库(mysql, postgresql), 特别是优化之类的</li><li>研究下spring的体系, 不得不说如今这套东西已经相当复杂了, 特别是和spring-cloud然后加上一对微服务什么的, 有机会看一下源码</li><li>如果有机会可以看一下前端, 赶脚2016年会来一个react大爆发, angular1可以去死了</li><li>hadoop这块也不能落下, 用来以后忽悠人用</li><li>把2015年的书啃完 + 摩诃婆罗多</li><li>尽量再去造一些好用的轮子</li><li>体重50kg+健身</li><li>至于女朋友什么的, 大概。。。早自暴自弃了</li></ul><blockquote><p>总之, 希望以后能留在北京(PS: 见识到魔都人民之后。。发现帝都好土鳖~~~orz, )<br>PS: 过年回来之后又好想回郑州。。</p></blockquote>]]></content>
    
    <summary type="html">
    
      新年计划
    
    </summary>
    
      <category term="生活记录" scheme="http://stackbox.cn/categories/%E7%94%9F%E6%B4%BB%E8%AE%B0%E5%BD%95/"/>
    
    
      <category term="生活记录" scheme="http://stackbox.cn/tags/%E7%94%9F%E6%B4%BB%E8%AE%B0%E5%BD%95/"/>
    
  </entry>
  
  <entry>
    <title>使用Gitlab CI进行持续集成</title>
    <link href="http://stackbox.cn/2016-02-gitlab-ci-conf/"/>
    <id>http://stackbox.cn/2016-02-gitlab-ci-conf/</id>
    <published>2015-12-31T07:19:40.000Z</published>
    <updated>2018-12-17T11:07:19.138Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>公司用的gitlab社区版, 跑CI的话需要折腾一下, 总体来说, 本地RUNNER最方便    </p></blockquote><a id="more"></a><h1 id="原理"><a href="#原理" class="headerlink" title="原理"></a>原理</h1><p>在Gitlab-CI中有一个叫 <code>Runner</code> 的概念, 按照官方定义, Runner一共有三种类型</p><ul><li>本地Runner (优点:部署方便 , 缺点:使用的是开发机器的资源  MAC/WIN)</li><li>普通的服务器上的Runner (优点: 没找到 , 缺点: 在RHEL系列的机器里特别难配置,至今未成功过)</li><li>基于Docker的Runner (优点: 这可是Docker啊就问你怕不怕 , 缺点:至今没研究明白怎么用maven本地仓库,Build时候处理依赖极慢)</li></ul><p>Runner安装成功之后, 就可以根据配置中的URL和Token 跟CI进行绑定, 之后这两端之间就会各种消息交互, 然后自动的Build&amp;返回结果</p><h1 id="使用"><a href="#使用" class="headerlink" title="使用"></a>使用</h1><p>先来安装 <a href="https://gitlab.com/gitlab-org/gitlab-ci-multi-runner" target="_blank" rel="noopener">gitlab-ci-multi-runner</a> , 在MAC下使用最新版的 <code>homebrew</code> 安装即可, 其他系统见<a href="https://gitlab.com/gitlab-org/gitlab-ci-multi-runner" target="_blank" rel="noopener">官方文档</a></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">$ brew update</span><br><span class="line">$ brew install gitlab-ci-multi-runner</span><br><span class="line"></span><br><span class="line"><span class="comment">#然后启动Runner去和CI进行绑定</span></span><br><span class="line">$ gitlab-ci-multi-runner register</span><br><span class="line"></span><br><span class="line"><span class="comment">#--&gt;然后让你输入上图的CI URL</span></span><br><span class="line"><span class="comment">#--&gt;然后让你输入上图的Token</span></span><br><span class="line"><span class="comment">#--&gt;然后随便给Runner命名</span></span><br><span class="line"><span class="comment">#--&gt;然后类型的话， 请务必选 Shell</span></span><br><span class="line"><span class="comment">#--&gt;完毕</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#把Runner当成Service启动</span></span><br><span class="line">$ <span class="built_in">cd</span> ~</span><br><span class="line">$ gitlab-ci-multi-runner install</span><br><span class="line">% gitlab-ci-multi-runner start</span><br></pre></td></tr></table></figure><p>和 <code>travis-ci</code> 类似, 请在你的项目根目录下创建一个文件 <code>.gitlab-ci.yml</code> , 加入以下测试代码</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">build:</span><br><span class="line">    script: &quot;pwd &amp; mvn test&quot;</span><br></pre></td></tr></table></figure><p>不出意外的话, 项目中已经有一个Build在开始跑了</p><h1 id="注意事项"><a href="#注意事项" class="headerlink" title="注意事项"></a>注意事项</h1><ul><li>本地Runner用的bash去构建的, 所以务必确保把环境变量配置全, 比如 <code>JAVA_HOME</code>, <code>PATH</code></li></ul>]]></content>
    
    <summary type="html">
    
      本地Runner对小团队来说还是挺好用的
    
    </summary>
    
    
      <category term="工具" scheme="http://stackbox.cn/tags/%E5%B7%A5%E5%85%B7/"/>
    
  </entry>
  
  <entry>
    <title>2015年流水账</title>
    <link href="http://stackbox.cn/2015-12-31-story/"/>
    <id>http://stackbox.cn/2015-12-31-story/</id>
    <published>2015-12-31T07:19:39.000Z</published>
    <updated>2018-12-17T11:07:06.128Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>2015年最后一天, 记录一下流水账。</p></blockquote><p>其实自从去年从废都跑到帝都之后, 气运上好了很多, 虽然没有什么卵用。</p><a id="more"></a><h2 id="职场"><a href="#职场" class="headerlink" title="职场"></a>职场</h2><p>年后依然是在媒体组搞一个不怎么赚钱的项目, 前端是外包的, 改动起来十分的痛苦, 一而再再而三的Delay<br>使得工作态度十分消极, 最后磕磕绊绊的算是能把项目拿出去卖了, 不过体验上十分的渣。另外做了一些二次开发的技术支持工作, 不过很墨迹, 估摸已经把其他人恼死了。</p><p>然后Team的PM们离职的离职, 然后Leader也离职了, 进而也拆分了, 本来对自己的定位就十分的不清晰, 前端？后端？数据? 客户端维护? 趁着这个机会就跑到了另外一个组里, 至少现在看来这个选择还是不错的。至少不用写该死的Dephi了。</p><p>新的Team是做Social相关的, 相对于原来的Team来说互联网的味道更浓一些, 至少开发流程上还是很有互联网范的, 对于产品来说, 要做的就是迅速迭代然后 获取新客户 or 赚钱, 特别是这种技术栈这么丰富的年代, 闷头写项目写个大半年是十分的不合理的。</p><p>这边主要写web api, 复杂的项目用SpringBoot, 简单的小项目用express, 偶尔做一些数据相关的东西。 没多久就把代码量刷上去了。</p><h2 id="生活"><a href="#生活" class="headerlink" title="生活"></a>生活</h2><p>上半年窝在天通苑, 早上挤地铁十分的痛苦, 好在下半年直接搬到了望京, 每天走路上下班吸一吸雾霾的感觉还是挺好的, 大概是二十好几的原因, 每天越来越不想出门了, 只想窝在床上, 偶尔搞一下绝食什么的, 直接导致体重降得很厉害, 下半年开始练拜厄, 截止到目前还有1/3, 希望明年上半年至少弄完一本车尔尼。</p><p>买了个空气炸锅, 偶尔烤一下鸡翅还是挺好的。</p><p>买了个PS4, 买游戏的花费大概已经超过1500了, 具体列表如下</p><ul><li>《真三国无双-猛将传》 : 日了狗了才买这个, 砍菜好头晕</li><li>《GTA5》 : 富兰克林有一种莫名的代入感</li><li>《驾驶俱乐部》 : 另外买了摩托车的DLC, 撞墙撞的想摔手柄</li><li>《以撒的结合-重生》 : 大晚上的听这个背景音乐还是很恐怖的, 十分看脸, 十分考验操作</li><li>《COD12》 : 对于我来说简直是本年度最佳, 以前没有碰过FPS, 打多人的AI不亦乐乎</li></ul><h2 id="技术"><a href="#技术" class="headerlink" title="技术"></a>技术</h2><ul><li>学会了一点hadoop的写法</li><li>学会了一点shell脚本的写法</li><li>Expess还是不太熟练</li><li>SpringBoot用上瘾了</li><li>对Oauth2有了更深刻的理解</li><li>写了几篇垃圾文章投稿在开发者头条上</li></ul><h2 id="感情"><a href="#感情" class="headerlink" title="感情"></a>感情</h2><p>大概是身体太过瘦弱的原因, 被超级多的人误认为是基, 现在都懒得解释了, 回顾一下惨不忍睹的历史, 丑的人没有青春</p><ul><li>以前暗恋好多好多年的妹子彻底变成白富美了, 恩, 怂了</li><li>被拉黑，被拉黑，被拉黑</li><li>Ex..额。。会心一击, 让我对自己不在保持什么美好的幻想</li></ul><p>虽然估计家里人已经开始跟我折腾相亲的事了, 恩, 顺其自然, 在这之前可以学习帝老师得道修仙看小葵花教你做人。</p><h2 id="未来"><a href="#未来" class="headerlink" title="未来"></a>未来</h2><p>第一次产生了想有房子的想法, 又不想跑到二线, 强烈预感到2016年自己又会做出一个改变人生轨迹的选择。</p><p>不过。。。好像也没有什么未来</p><h2 id="附录"><a href="#附录" class="headerlink" title="附录"></a>附录</h2><p>2016-01-01和flymeal小伙伴喝咖啡 , 顺便<a href="https://img1.doubanio.com/view/status/median/public/adcdf1a52808b3b.jpg" target="_blank" rel="noopener">cosplay山下智博</a></p>]]></content>
    
    <summary type="html">
    
      去他妹的！
    
    </summary>
    
      <category term="生活记录" scheme="http://stackbox.cn/categories/%E7%94%9F%E6%B4%BB%E8%AE%B0%E5%BD%95/"/>
    
    
      <category term="生活记录" scheme="http://stackbox.cn/tags/%E7%94%9F%E6%B4%BB%E8%AE%B0%E5%BD%95/"/>
    
  </entry>
  
  <entry>
    <title>写了一个Hexo插件</title>
    <link href="http://stackbox.cn/2015-12-an-hexo-extension-to-display-pdf/"/>
    <id>http://stackbox.cn/2015-12-an-hexo-extension-to-display-pdf/</id>
    <published>2015-12-04T06:50:41.000Z</published>
    <updated>2018-12-17T11:08:41.646Z</updated>
    
    <content type="html"><![CDATA[<p>项目地址为: <a href="https://github.com/superalsrk/hexo-pdf" target="_blank" rel="noopener">https://github.com/superalsrk/hexo-pdf</a> , 已经PR到<a href="https://hexo.io/plugins/" target="_blank" rel="noopener">官网</a>, 欢迎吐槽, 做这个插件的原因是</p><a id="more"></a><blockquote><ol><li>Slideshare在赵国被墙, WTF</li><li>国内的豆丁,CSDN,微盘分享都是基于Flash的, Safari不支持</li></ol></blockquote><p>而官方的demo要么太简单要么太复杂, 作为一个css手残党连抄代码都不会抄, 进而一个猥琐的想法便诞生了</p><blockquote><ol><li>github page部署一个官方的Viewer, pdf文件地址从url参数中读取</li><li>hexo 页面中嵌入一个iframe, src为 Viewer地址+pdf地址</li></ol></blockquote><p>昂, 具体安装方法见 <a href="https://github.com/superalsrk/hexo-pdf/blob/master/README.md" target="_blank" rel="noopener">README.md</a>, 最终效果如下</p><h3 id="Normal-PDF"><a href="#Normal-PDF" class="headerlink" title="Normal PDF"></a>Normal PDF</h3><div class="row">    <embed src="http://7xov2f.com1.z0.glb.clouddn.com/bash_freshman.pdf" width="100%" height="550" type="application/pdf"></div><h3 id="Google-drive"><a href="#Google-drive" class="headerlink" title="Google drive"></a>Google drive</h3><div class="row"><iframe src="https://drive.google.com/file/d/0B6qSwdwPxPRdTEliX0dhQ2JfUEU/preview" style="width:100%; height:550px"></iframe></div><h3 id="Slideshare"><a href="#Slideshare" class="headerlink" title="Slideshare"></a>Slideshare</h3><iframe src="http://www.slideshare.net/slideshow/embed_code/key/8Jl0hUt2OKUOOE" style="width:100%;height:550px" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" allowfullscreen> </iframe> <div style="margin-bottom:5px"><h2 id="History"><a href="#History" class="headerlink" title="History"></a>History</h2><ul><li>2015-01-02: 支持嵌入googledoc和slideshare的文档</li><li>2015-12-04: 支持嵌入原始pdf</li></ul><p>THE END</p></div>]]></content>
    
    <summary type="html">
    
      用来在Hexo中插入PDF
    
    </summary>
    
    
  </entry>
  
  <entry>
    <title>【深夜概谈】大数据的一些胡思乱想</title>
    <link href="http://stackbox.cn/2015-11-some-about-big-data/"/>
    <id>http://stackbox.cn/2015-11-some-about-big-data/</id>
    <published>2015-11-29T16:08:17.000Z</published>
    <updated>2018-12-17T11:07:02.513Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>Update: 此文真的是在扯淡, 只是默默的想装个13</p></blockquote><a id="more"></a><p>今天看到一篇的文章 <a href="http://www.douban.com/note/524648018/" target="_blank" rel="noopener">美股大数据公司谁有前途</a> , 观点很有意思, 和前几天来我司培训的知名曲艺界人士刘鹏老师的观点冲突很大, 故写此文记录一下。</p><p>其中最大的分歧是Plantir这家公司到底是不是可持续的, 咱先说说这家十分低调的公司, 在当初PayPal还未被收购的时候, 工程师们设计了<br>一套工具来处理海量数据用以筛选洗钱等违规交易,之后的日子里, PayPal黑帮们便创立了Plantir。正值伊拉克战争, 美国的情报机构便成为了Plantir的第一个大客户, 之后来自政府和金融巨鳄的订单源源不断,估值达到了惊人的200亿美元。</p><h2 id="原文观点"><a href="#原文观点" class="headerlink" title="原文观点"></a>原文观点</h2><p>Hortonworks与Cloudera和MapR前途不甚明朗,做基础架构工具有很强的可替代性(比如Hadoop官方版本使用起来也不差),但是Palantir是<br>像Thoughtworks那样提供咨询业务和解决方案的, 有很强的不可替代性, 就这点来说, Tableau和Plantir的股价和估值彪那么高还是情有可原的。</p><h2 id="刘鹏老师观点"><a href="#刘鹏老师观点" class="headerlink" title="刘鹏老师观点"></a>刘鹏老师观点</h2><ol><li>在线广告是目前大数据中唯一能够大规模变现的业务</li><li>数据产品的用户应该是机器而不是人, 出报表给老板看受人的局限很大, 没有足够的专业能力而去不停的让开发做报表是在浪费资源</li><li>成熟度曲线问题, 比如大数据, 共享经济等都是资本琢磨出的概念, 像Plantir这种严重靠卖人头来赚钱的公司很那撑起那么高的估值, 高估值是资本推动的, 用以赚一波, 总体来说这样的概念是好的, 之后还是会慢慢回升的</li></ol><h2 id="我的胡思乱想"><a href="#我的胡思乱想" class="headerlink" title="我的胡思乱想"></a>我的胡思乱想</h2><ol><li>果然除了在线广告, 国家安全金融安全是数据公司最先推销的概念, 比如现在各种P2P公司都说是依赖大数据来保证金融安全呢, 结果跑路的还是一个接着一个的跑</li><li>至于反恐啊之类的, 最常用的例子就是 <code>从多个数据源帮助中国XXX部/FBI/CIA 识别出恐怖分子,取得了良好效果</code> , 不过个人认为这只是新瓶装旧酒, 一个依托新技术写的OLAP系统, 当然, BigData也没有具体定义, 打上 <code>大数据</code> 三个字还是很唬人的</li><li>虽然我也觉得 Plantir 有点太过吹捧, 但是老纸真的好想去啊。。跟钱过不去是十分令人鄙视的行为</li><li>该好好写代码了, 好久没碰后台了, 写js真的很上瘾。。要努力戒掉</li></ol><h2 id="后续更新"><a href="#后续更新" class="headerlink" title="后续更新"></a>后续更新</h2><p>2015.12.7 更新: 晚上看到 <a href="http://weibo.com/lirenchen" target="_blank" rel="noopener">@陈立人</a> 老师分享的两篇Slide, 发现Plantir做的东西还是很复杂的, 两篇slide如下</p><ul><li>Plantir 产品技术解读</li></ul><div class="row">    <embed src="http://7xov2f.com1.z0.glb.clouddn.com/palantir-jiedu.pdf" width="100%" height="550" type="application/pdf"></div><ul><li>Plantir 技术阐述缩略版</li></ul><div class="row">    <embed src="http://7xov2f.com1.z0.glb.clouddn.com/palantir-jishuchanshu-preview.pdf" width="100%" height="550" type="application/pdf"></div><p>THE END</p>]]></content>
    
    <summary type="html">
    
      由一篇文章引起的胡思乱想
    
    </summary>
    
    
  </entry>
  
  <entry>
    <title>由升级GCC引发的惨案</title>
    <link href="http://stackbox.cn/2015-10-13-upgrade-gcc-error-md/"/>
    <id>http://stackbox.cn/2015-10-13-upgrade-gcc-error-md/</id>
    <published>2015-10-13T11:26:20.000Z</published>
    <updated>2018-12-17T11:06:58.519Z</updated>
    
    <content type="html"><![CDATA[<p>事件的起因是这样的, 今天在一台老旧的CentOS5服务器上装 <code>node-zerorpc</code> 的时候提示:</p><blockquote><p> 我们要用C++11辣, 快滚回去升级G++</p></blockquote><a id="more"></a><p>好吧, 既然都这么说了….然后就参考了<a href="http://engine.wohlnet.ru/forum/viewtopic.php?f=17&amp;t=330" target="_blank" rel="noopener">这个链接</a> 和一些SF的回答整出来下面这个脚本</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> /etc/yum.repos.d</span><br><span class="line">wget http://people.centos.org/tru/devtools-2/devtools-2.repo </span><br><span class="line">yum install devtools-2</span><br><span class="line">yum install devtoolset-2-gcc devtoolset-2-gcc-c++</span><br><span class="line"></span><br><span class="line"><span class="comment">## 之后讲/usr/bin的gcc备份了再将 /opt/rh/devtools-2 中的gcc/g++ 软连接过去</span></span><br><span class="line"><span class="comment">## 此处略</span></span><br></pre></td></tr></table></figure><p>我还心想着, 老纸都升到了4.8了, 还怕你C++11不成, 大概被小僧的魅力折服(骗鬼呢！), 安装过程中果然没报版本过低的错误,报的是 <strong>Assembler Error</strong>, 这个问题Google上无解, 唯一提及的是几个GCC的Bug Issure, 但显然无法解决问题, 当时猜测的是需要重新编译Node, 编译的过程的错误日志如下:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">/tmp/cc8TKj9o.s: Assembler messages:</span><br><span class="line">/tmp/cc8TKj9o.s:125: Error: unknown .loc sub-directive `discriminator&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:125: Error: junk at end of line, first unrecognized character is `1&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:140: Error: unknown .loc sub-directive `discriminator&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:140: Error: junk at end of line, first unrecognized character is `2&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:143: Error: unknown .loc sub-directive `discriminator&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:143: Error: junk at end of line, first unrecognized character is `2&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:146: Error: unknown .loc sub-directive `discriminator&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:146: Error: junk at end of line, first unrecognized character is `2&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:150: Error: unknown .loc sub-directive `discriminator&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:150: Error: junk at end of line, first unrecognized character is `2&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:171: Error: unknown .loc sub-directive `discriminator&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:171: Error: junk at end of line, first unrecognized character is `1&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:179: Error: unknown .loc sub-directive `discriminator&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:179: Error: junk at end of line, first unrecognized character is `1&apos;</span><br><span class="line">/tmp/cc8TKj9o.s:187: Error: unknown .loc sub-directive `discriminator&apos;</span><br></pre></td></tr></table></figure><p>之后又写了一个 <strong>HelloWorld</strong> 用新编译器编译下, 结果是同样类型的错误, 那么问题原因确定了: <strong>编译器出问题了!</strong></p><p>哪里出问题了呢？ 正在头大的时候突然想到一个Issure, 内容大致为</p><blockquote><p>gcc和g++的版本不一致的时候可能会出现Assember Error</p></blockquote><p>那么既然这样的话, 把整套GNU套件都装上会怎么样呢? 果然, 问题解决了。</p><p>附录: 正确的升级脚本</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> /etc/yum.repos.d</span><br><span class="line">wget http://people.centos.org/tru/devtools-2/devtools-2.repo </span><br><span class="line">yum install devtoolset-2</span><br><span class="line"></span><br><span class="line"><span class="comment">#临时改变gcc版本</span></span><br><span class="line">scl <span class="built_in">enable</span> devtoolset-2 bash</span><br></pre></td></tr></table></figure><h2 id="Update-2016-08-02"><a href="#Update-2016-08-02" class="headerlink" title="Update (2016.08.02)"></a>Update (2016.08.02)</h2><p>今天在一台纯CentOS6.4的Server上更新g++ 报错, 大致原因是有一堆GUI的依赖没有安装, 但是我只想更新编译器不想更新其他的,</p><p>查了一下文档发现devtools包含的模块主要有一下几个</p><table><thead><tr><th>Package Name</th><th>Description</th><th>Installed Components</th></tr></thead><tbody><tr><td>devtoolset-2-ide</td><td>Integrated Development Environment</td><td>Eclipse</td></tr><tr><td>devtoolset-2-perftools</td><td>Performance monitoring tools</td><td>SystemTap, Valgrind, OProfile, Dyninst</td></tr><tr><td>devtoolset-2-toolchain</td><td>Development and debugging tools</td><td>GCC,GDB,binutils, elfutils, dwz, memstomp, strace</td></tr><tr><td>devtoolset-2-vc</td><td>Reveision control systems</td><td>Git</td></tr></tbody></table><p>所以。。只需要安装 <strong>devtoolset-2-toolchain</strong> 即可</p><pre><code>$ cd /etc/yum.repos.d$ wget http://people.centos.org/tru/devtools-2/devtools-2.repo $ sudo yum install devtoolset-2-toolchain$ scl enable devtoolset-2 bash $ source /opt/rh/devtoolset-2/enable</code></pre>]]></content>
    
    <summary type="html">
    
      解决老旧的CentOS升级GCC引发的Assembler Error问题
    
    </summary>
    
      <category term="计算机基础" scheme="http://stackbox.cn/categories/%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%9F%BA%E7%A1%80/"/>
    
    
  </entry>
  
  <entry>
    <title>使用Gephi生成网络图</title>
    <link href="http://stackbox.cn/2015-08-about-gephi/"/>
    <id>http://stackbox.cn/2015-08-about-gephi/</id>
    <published>2015-08-26T03:38:00.000Z</published>
    <updated>2018-12-17T11:06:52.431Z</updated>
    
    <content type="html"><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>Gephi是一款开源免费跨平台基于JVM的复杂网络分析软件, 其主要用于各种网络和复杂系统, 特别是在处理网络关系数据这方面很有优势,下面是两个不错的例子</p><a id="more"></a><ul><li><a href="http://exploringdata.github.io/vis/programmers-search-relations/" target="_blank" rel="noopener">编程语言关系图</a></li></ul><p><img src="http://7jptw8.com1.z0.glb.clouddn.com/gephi/programming-rel.png" alt=""></p><ul><li><a href="http://www.weiboreach.com/Try/exa2.jsp?val=3839629461690386_1684941721" target="_blank" rel="noopener">微博传播分析</a></li></ul><p><img src="http://7jptw8.com1.z0.glb.clouddn.com/gephi/weibo.png" alt=""></p><p>那么,我们拿到原始数据后, 怎么才能画出这样的图表呢？</p><h2 id="布局文件生成"><a href="#布局文件生成" class="headerlink" title="布局文件生成"></a>布局文件生成</h2><p>通过上面两个例子可以分析出,这类图表可以通过 <a href="http://sigmajs.org/" target="_blank" rel="noopener">sigma.js</a> 画出来,但是插件本身并不提供预处理数据&amp;&amp;布局功能,所以在绘制图表的时候需要有一份数据文件来详细的表明<code>节点名称,颜色,大小,横坐标, 纵坐标,边的起始节点</code>,这类数据一般用 gexf(xml格式) 或者 json来表示. </p><p>生成gexf需要用到布局算法, 常见的有 <a href="https://en.wikipedia.org/wiki/Force-directed_graph_drawing" target="_blank" rel="noopener">Force-directed_graph_drawing</a> 力导向算法, <code>算法的核心思想是节点之间产生斥力,边给两个节点提供拉力,通过多次迭代最后维持一个稳定状态</code>，手动实现布局算法还是有一些复杂度的,好在gephi-tookit组件提供了API来处理数据, 首先在maven项目中加入gephi的仓库和依赖<br><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">repositories</span>&gt;</span></span><br><span class="line">     <span class="tag">&lt;<span class="name">repository</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">id</span>&gt;</span>gephi-snapshots<span class="tag">&lt;/<span class="name">id</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">name</span>&gt;</span>Gephi Snapshots<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">url</span>&gt;</span>http://nexus.gephi.org/nexus/content/repositories/snapshots/<span class="tag">&lt;/<span class="name">url</span>&gt;</span></span><br><span class="line">     <span class="tag">&lt;/<span class="name">repository</span>&gt;</span></span><br><span class="line">     <span class="tag">&lt;<span class="name">repository</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">id</span>&gt;</span>gephi-releases<span class="tag">&lt;/<span class="name">id</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">name</span>&gt;</span>Gephi Releases<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">url</span>&gt;</span>http://nexus.gephi.org/nexus/content/repositories/releases/<span class="tag">&lt;/<span class="name">url</span>&gt;</span></span><br><span class="line">     <span class="tag">&lt;/<span class="name">repository</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">repositories</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">dependencies</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">dependency</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">groupId</span>&gt;</span>org.gephi<span class="tag">&lt;/<span class="name">groupId</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">artifactId</span>&gt;</span>gephi-toolkit<span class="tag">&lt;/<span class="name">artifactId</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">version</span>&gt;</span>0.8.2<span class="tag">&lt;/<span class="name">version</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">dependency</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">dependencies</span>&gt;</span></span><br></pre></td></tr></table></figure></p><p>添加依赖完成之后,参考这个 <a href="http://www.slideshare.net/gephi/gephi-toolkit-tutorialtoolkit" target="_blank" rel="noopener">slide</a>, 根据需求构造一个有向图,并调用布局算法, 最后导出成gexf和pdf文件</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br></pre></td><td class="code"><pre><span class="line">ProjectController pc = Lookup.getDefault().lookup(ProjectController.class);</span><br><span class="line">pc.newProject();</span><br><span class="line">Workspace workspace = pc.getCurrentWorkspace();</span><br><span class="line"></span><br><span class="line"><span class="comment">//Generate a new random graph into a container</span></span><br><span class="line">Container container = Lookup.getDefault().lookup(ContainerFactory.class).newContainer();</span><br><span class="line"></span><br><span class="line">GraphModel graphModel = Lookup.getDefault().lookup(GraphController.class).getModel();</span><br><span class="line">DirectedGraph graph = graphModel.getDirectedGraph();</span><br><span class="line"></span><br><span class="line">Node n0 = graphModel.factory().newNode(<span class="string">"n0"</span>);</span><br><span class="line">n0.getNodeData().setLabel(<span class="string">"n0"</span>);</span><br><span class="line">Node n1 = graphModel.factory().newNode(<span class="string">"n1"</span>);</span><br><span class="line">n1.getNodeData().setLabel(<span class="string">"n1"</span>);</span><br><span class="line">Edge edge = graphModel.factory().newEdge(n0, n1, <span class="number">1f</span>, <span class="keyword">true</span>);</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">graph.addNode(n0);</span><br><span class="line">graph.addNode(n1);</span><br><span class="line">graph.addEdge(edge);</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span>(<span class="keyword">int</span> i = <span class="number">0</span> ; i &lt; <span class="number">100</span>; i++) &#123;</span><br><span class="line">   Node ntmp = graphModel.factory().newNode(<span class="string">"tmp"</span> + i);</span><br><span class="line">   Edge edgetmp = graphModel.factory().newEdge(n0, ntmp, <span class="number">1f</span>, <span class="keyword">true</span>);</span><br><span class="line"></span><br><span class="line">   graph.addNode(ntmp);</span><br><span class="line">   graph.addEdge(edgetmp);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">System.out.println(<span class="string">"Nodes: "</span> + graph.getNodeCount());</span><br><span class="line">System.out.println(<span class="string">"Edges: "</span> + graph.getEdgeCount());</span><br><span class="line"></span><br><span class="line"><span class="comment">//Layout for 15 seconds</span></span><br><span class="line">AutoLayout autoLayout = <span class="keyword">new</span> AutoLayout(<span class="number">20</span>, TimeUnit.SECONDS);</span><br><span class="line">autoLayout.setGraphModel(graphModel);</span><br><span class="line">YifanHuLayout firstLayout = <span class="keyword">new</span> YifanHuLayout(<span class="keyword">null</span>, <span class="keyword">new</span> StepDisplacement(<span class="number">1f</span>));</span><br><span class="line">ForceAtlasLayout secondLayout = <span class="keyword">new</span> ForceAtlasLayout(<span class="keyword">null</span>);</span><br><span class="line">AutoLayout.DynamicProperty adjustBySizeProperty = AutoLayout.createDynamicProperty(<span class="string">"forceAtlas.adjustSizes.name"</span>, Boolean.TRUE, <span class="number">0.1f</span>);<span class="comment">//True after 10% of layout time</span></span><br><span class="line">AutoLayout.DynamicProperty repulsionProperty = AutoLayout.createDynamicProperty(<span class="string">"forceAtlas.repulsionStrength.name"</span>, <span class="keyword">new</span> Double(<span class="number">500</span>.), <span class="number">0f</span>);<span class="comment">//500 for the complete period</span></span><br><span class="line">autoLayout.addLayout(firstLayout, <span class="number">0.9f</span>);</span><br><span class="line">autoLayout.addLayout(secondLayout, <span class="number">0.1f</span>, <span class="keyword">new</span> AutoLayout.DynamicProperty[]&#123;adjustBySizeProperty, repulsionProperty&#125;);</span><br><span class="line">autoLayout.execute();</span><br><span class="line"></span><br><span class="line"><span class="comment">//Export pdf &amp; gexf</span></span><br><span class="line">ExportController ec = Lookup.getDefault().lookup(ExportController.class);</span><br><span class="line"><span class="keyword">try</span> &#123;</span><br><span class="line"></span><br><span class="line">    File pdfFile = <span class="keyword">new</span> File(<span class="string">"/tmp/data.pdf"</span>);</span><br><span class="line">    File gexfFile = <span class="keyword">new</span> File(<span class="string">"/tmp/data.gexf"</span>);</span><br><span class="line"></span><br><span class="line">    pdfFile.getParentFile().mkdirs();</span><br><span class="line">    gexfFile.getParentFile().mkdirs();</span><br><span class="line">    ec.exportFile(pdfFile);</span><br><span class="line">    ec.exportFile(gexfFile);</span><br><span class="line">&#125; <span class="keyword">catch</span> (IOException ex) &#123;</span><br><span class="line">    ex.printStackTrace();</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="图表绘制"><a href="#图表绘制" class="headerlink" title="图表绘制"></a>图表绘制</h2><p>在得到数据文件后可以参考这个 <strong><a href="https://nagland.github.io/201509/sigmajs/index.html" target="_blank" rel="noopener">Online Demo</a></strong> 来绘制图表。</p><h2 id="参考资料"><a href="#参考资料" class="headerlink" title="参考资料"></a>参考资料</h2><ol><li><a href="http://gephi.github.io/" target="_blank" rel="noopener">http://gephi.github.io/</a></li><li><a href="http://www.slideshare.net/gephi/gephi-toolkit-tutorialtoolkit" target="_blank" rel="noopener">http://www.slideshare.net/gephi/gephi-toolkit-tutorialtoolkit</a></li><li><a href="https://github.com/gephi/gephi/wiki/How-to-code-with-the-Toolkit" target="_blank" rel="noopener">https://github.com/gephi/gephi/wiki/How-to-code-with-the-Toolkit</a></li></ol><h2 id="Update"><a href="#Update" class="headerlink" title="Update:"></a>Update:</h2><ol><li>关于gexf文件的生成, 可以用这个python库: <a href="https://github.com/paulgirard/pygexf" target="_blank" rel="noopener">https://github.com/paulgirard/pygexf</a></li><li>纯前端的话, sigma.js 提供了插件 <a href="https://github.com/jacomyal/sigma.js/tree/master/plugins/sigma.layout.forceAtlas2" target="_blank" rel="noopener">https://github.com/jacomyal/sigma.js/tree/master/plugins/sigma.layout.forceAtlas2</a> 来实现力导向算法</li></ol><p>THE END</p>]]></content>
    
    <summary type="html">
    
      Gephi是一款开源免费跨平台基于JVM的复杂网络分析软件, 擅长处理图数据
    
    </summary>
    
    
      <category term="工具" scheme="http://stackbox.cn/tags/%E5%B7%A5%E5%85%B7/"/>
    
  </entry>
  
  <entry>
    <title>PathFilter无法生效</title>
    <link href="http://stackbox.cn/2015-07-hdfs-pathfilter-not-work/"/>
    <id>http://stackbox.cn/2015-07-hdfs-pathfilter-not-work/</id>
    <published>2015-07-15T02:06:42.000Z</published>
    <updated>2018-12-17T11:06:49.249Z</updated>
    
    <content type="html"><![CDATA[<p>最近再写一个MapReduce的时候出现了一个诡异的问题, PathFilter无法生效, 具体描述如下:</p><ol><li>代码参考的是另外一个项目(用的是SequenceFileInputFormat),其PathFilter能正常工作</li><li>我写的这个MR用的公司的一个CombineFileInputFormat, 虽然设置了Filter但是程序运行的时候PathFilter甚至都没实例化。</li></ol><a id="more"></a><p>感觉从PathFilter初始化这个点找应该是个正确的方向,As we all know, <code>FileInputFormat</code> 是所有XXFileInputFormat的父类,果然在其中找到了如下代码</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">public abstract class FileInputFormat&lt;K, V&gt; extends InputFormat&lt;K, V&gt;  &#123;</span><br><span class="line"></span><br><span class="line">    public List&lt;InputSplit&gt; getSplits(JobContext job) throws IOException &#123;</span><br><span class="line">            //other code</span><br><span class="line">            List&lt;FileStatus&gt; files = listStatus(job);</span><br><span class="line">            //other code</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    protected List&lt;FileStatus&gt; listStatus(JobContext job) throws IOException &#123;</span><br><span class="line">            //other code</span><br><span class="line">            Path[] dirs = getInputPaths(job);</span><br><span class="line">            //other code</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    public static PathFilter getInputPathFilter(JobContext context) &#123;</span><br><span class="line">       Configuration conf = context.getConfiguration();</span><br><span class="line">       Class&lt;?&gt; filterClass = conf.getClass(PATHFILTER_CLASS, null,</span><br><span class="line">           PathFilter.class);</span><br><span class="line">       return (filterClass != null) ?</span><br><span class="line">           (PathFilter) ReflectionUtils.newInstance(filterClass, conf) : null;</span><br><span class="line">     &#125;</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure><p>这样的话, 只要确保 <code>getSplits</code> 方法调用了 <code>getInputPathFilter</code>, 那么PathFilter便能初始化成功. 所以实现自定义FileInputFormat<br>的时候要注意override 方法的实现,先看看CombineFileInputFormat和SequenceFileInputFormat是怎么实现的吧</p><ol><li>CombineFileInputFormat: 重写了getSplits方法, 但是在重写的方法里调用了 super.listStatus(job), 所以PathFilter正常</li><li>SequenceFileInputFormat: 重写了listStatus方法, 但是在重写方法里调用了 super.listStatus(job), 所以PathFilter也依然正常</li><li>那么再看看自己写的那个InputFormat, 也重写了getSplits方法, 但是获取block status是用 getInputPaths获取路径然后手动获取status的, 根本就没管 父类的listSt方法，因此导致PathFilter失效</li></ol><p>这个是一个PathFiter的<a href="https://gist.github.com/superalsrk/d8a33c5ce56b2bac89ab" target="_blank" rel="noopener">Demo</a></p><p>THE END</p>]]></content>
    
    <summary type="html">
    
      HDFS中PathFilter无法生效的问题
    
    </summary>
    
    
      <category term="数据开发" scheme="http://stackbox.cn/tags/%E6%95%B0%E6%8D%AE%E5%BC%80%E5%8F%91/"/>
    
  </entry>
  
  <entry>
    <title>构建安全的Mobile API</title>
    <link href="http://stackbox.cn/2015-06-build-safe-mobile-apis/"/>
    <id>http://stackbox.cn/2015-06-build-safe-mobile-apis/</id>
    <published>2015-06-27T15:30:46.000Z</published>
    <updated>2018-12-17T11:06:38.931Z</updated>
    
    <content type="html"><![CDATA[<blockquote><p>Update @ 2016.03.02: 此文描述的工程构建方式已经有些复杂了(原生SpringMVC), 目前比较流行的方式是使用SpringBoot(附带oauth2等项目)构建</p></blockquote><a id="more"></a><p>最近和小伙伴鼓捣一个APP, 没想到一开始在登陆注册这块就卡住了, 卡住的原因在于 <strong>如何对接口进行访问控制</strong> , 大家都知道, 在传统的web开发中由于有session/cookie的存在,请求可以保持状态, 但一般来讲,APP用到的API都是被设计成无状态的, 那应该如何解决问题呢?</p><h2 id="解决思路"><a href="#解决思路" class="headerlink" title="解决思路"></a>解决思路</h2><ul><li><p>对于平台类API来说,其目标用户一般是开发者, 诸如<a href="http://openapi.eleme.io/v2/quickstart.html" target="_blank" rel="noopener">饿了么OpenApi</a>或者 <a href="https://pusher.com/docs/rest_api#authentication" target="_blank" rel="noopener">Pusher.com</a> 这类服务,每次调用都是独立的, 无需保存状态信息, 数据权限和功能权限可以通过 <strong>AppId</strong> 这类唯一标识符来进行区分。安全上通过 <strong>auth_signature</strong> 的方式来进行校验。具体算法可以参见上面提到的两个文档。</p></li><li><p>如果目标对象是那些APP, 怎么办呢? , 刚工作那会解决这种需求的方法十分暴力:把用户名密码保存在app本地,调用接口的时候把用户名密码传过去做校验, 没有优雅性可言。目前来讲,在写Mobile API时, 直接使用 <strong>Oauth2</strong> 来处理权限问题是一种比较常用的方法。<strong>Oauth2</strong> 看起来略复杂,但其最终目的是获取一个 <strong>访问令牌</strong> , 获取令牌的模式一共有四种.</p></li></ul><blockquote><ol><li>授权码: 例子有微博第三方登陆,流程为: 第三方网站 -&gt; 跳转到微博让用户选择是否授权 -&gt; 用户授权并通过回调返回第三方一个授权码 -&gt; 第三方根据授权码向微博申请访问令牌 -&gt; 微博返回访问令牌</li><li>隐式授权: 流程为: 跳转到授权页面 -&gt; 授权成功之后回调返回访问令牌</li><li>密码模式: 流程为: 发送一个带用户名密码参数的请求(<a href="http://www.cnblogs.com/pengyingh/articles/2377968.html" target="_blank" rel="noopener">并附带Http Basic Authorization</a>) -&gt; 返回一个访问令牌</li><li>客户端模式: 这个方式很有意思,在这种模式下, 是以客户端的名义而不是以用户的名义进行令牌申请, 权限上并没有区分,也就不存在授权问题了, 流程为: 向认证服务器发起请求 -&gt; 以某种方式验证客户端的方式(比如根据appId,appSecret) -&gt; 返回访问令牌</li></ol></blockquote><p>如果是编写Mobile API, 密码模式是一种比较简单的选择: 这样,登录过程就变成了获取令牌的过程,登录成功之后把令牌存到本地,之后的API调用带上令牌即可。</p><h2 id="工程实践"><a href="#工程实践" class="headerlink" title="工程实践"></a>工程实践</h2><p>对于NodeJs开发者来说, 由于有 <strong>passport.js</strong>及一众package的存在, 编写一个 <strong>受不记名访问令牌保护的API</strong> 十分的简单, 可以参考 <a href="http://aleksandrov.ws/2013/09/12/restful-api-with-nodejs-plus-mongodb/#Step1" target="_blank" rel="noopener">这篇教程</a> 搭建基础环境。 下面的内容是在java环境中使用spring-security-oauth2+springmvc的工程实践。</p><p>不得不说,采用 Annotation 方式配置spring是一种非常好的实践, 可读性上比XML强太多, 详细配置请参考 <a href="https://github.com/Nagland/spring-security-rest-with-oauth2" target="_blank" rel="noopener">示例项目</a></p><ul><li>配置spring-security</li></ul><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@Configuration</span></span><br><span class="line"><span class="meta">@EnableWebSecurity</span></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">WebSecurityConfig</span> <span class="keyword">extends</span> <span class="title">WebSecurityConfigurerAdapter</span> </span>&#123;</span><br><span class="line">    <span class="meta">@Override</span></span><br><span class="line">    <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">configure</span><span class="params">(AuthenticationManagerBuilder auth)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">        auth</span><br><span class="line">                .inMemoryAuthentication()</span><br><span class="line">                .withUser(<span class="string">"user"</span>).password(<span class="string">"password"</span>).roles(<span class="string">"USER"</span>).and()</span><br><span class="line">                .withUser(<span class="string">"stackbox"</span>).password(<span class="string">"123456"</span>).roles(<span class="string">"ADMIN"</span>);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Override</span></span><br><span class="line">    <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">configure</span><span class="params">(HttpSecurity http)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">        http</span><br><span class="line">                .csrf().disable();</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">/**</span></span><br><span class="line"><span class="comment">     * 这个Bean用于oauth2的密码授权模式的配置</span></span><br><span class="line"><span class="comment">     */</span></span><br><span class="line">    <span class="meta">@Override</span></span><br><span class="line">    <span class="meta">@Bean</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> AuthenticationManager <span class="title">authenticationManagerBean</span><span class="params">()</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="keyword">super</span>.authenticationManagerBean();</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>一般来讲spring-security还要加个过滤器,通过加入下面这个类,就能够不配置web.xml来加入过滤器了。<br><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">SpringSecurityInitializer</span> <span class="keyword">extends</span> <span class="title">AbstractSecurityWebApplicationInitializer</span></span>&#123;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p><ul><li>配置oauth2</li></ul><p><a href="http://projects.spring.io/spring-security-oauth/docs/oauth2.html" target="_blank" rel="noopener">项目文档</a> 里讲了几个核心接口,参照例子, 我们同样采用注解的方式进行配置。在代码里可以通过 <code>@EnableResourceServer</code> 来配置资源服务器, 资源服务器的配置和spring-security的权限配置十分类似,<code>@EnableAuthorizationServer</code> 来配置认证服务器。注意在文档中有这么一句话。</p><blockquote><p>The grant types supported by the AuthorizationEndpoint can be configured via the AuthorizationServerEndpointsConfigurer. By default all grant types are supported except password (see below for details of how to switch it on). The following properties affect grant types:</p></blockquote><p>也就是说,如果要用密码授权方式的话,需要注入一个 <code>authenticationManagerBean</code> , 它就是在上面spring-security配置中的那个bean。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@Configuration</span></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Oauth2ServerConfig</span> </span>&#123;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">protected</span> <span class="keyword">static</span> <span class="keyword">final</span> String RESOURCE_ID = <span class="string">"STACKBOX"</span>;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Configuration</span></span><br><span class="line">    <span class="meta">@EnableResourceServer</span></span><br><span class="line">    <span class="keyword">protected</span> <span class="keyword">static</span> <span class="class"><span class="keyword">class</span> <span class="title">ResourceServer</span> <span class="keyword">extends</span> <span class="title">ResourceServerConfigurerAdapter</span> </span>&#123;</span><br><span class="line">        <span class="meta">@Override</span></span><br><span class="line">        <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">configure</span><span class="params">(HttpSecurity http)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">            http</span><br><span class="line">                    .requestMatchers().antMatchers(<span class="string">"/admin/**"</span>).and()</span><br><span class="line">                    .authorizeRequests()</span><br><span class="line">                    .anyRequest().access(<span class="string">"#oauth2.hasScope('read')"</span>);</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="meta">@Override</span></span><br><span class="line">        <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">configure</span><span class="params">(ResourceServerSecurityConfigurer resources)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">            resources.resourceId(RESOURCE_ID);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Configuration</span></span><br><span class="line">    <span class="meta">@EnableAuthorizationServer</span></span><br><span class="line">    <span class="keyword">protected</span> <span class="keyword">static</span> <span class="class"><span class="keyword">class</span> <span class="title">AuthorizationServer</span> <span class="keyword">extends</span> <span class="title">AuthorizationServerConfigurerAdapter</span> </span>&#123;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">private</span> TokenStore tokenStore = <span class="keyword">new</span> InMemoryTokenStore();</span><br><span class="line"></span><br><span class="line">        <span class="meta">@Autowired</span></span><br><span class="line">        <span class="meta">@Qualifier</span>(<span class="string">"authenticationManagerBean"</span>)</span><br><span class="line">        <span class="keyword">private</span> AuthenticationManager authenticationManager;</span><br><span class="line"></span><br><span class="line">        <span class="meta">@Override</span></span><br><span class="line">        <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">configure</span><span class="params">(AuthorizationServerSecurityConfigurer oauthServer)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line"></span><br><span class="line">            <span class="comment">/**</span></span><br><span class="line"><span class="comment">             * allow表示允许在认证的时候把参数放到url之中传过去</span></span><br><span class="line"><span class="comment">             * <span class="doctag">@see</span> org.springframework.security.oauth2.provider.client.ClientCredentialsTokenEndpointFilter</span></span><br><span class="line"><span class="comment">             */</span></span><br><span class="line">            oauthServer.allowFormAuthenticationForClients();</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="meta">@Override</span></span><br><span class="line">        <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">configure</span><span class="params">(AuthorizationServerEndpointsConfigurer endpoints)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">            <span class="comment">//endpoints.tokenStore(tokenStore).authenticationManager(authenticationManager);</span></span><br><span class="line">            endpoints.tokenStore(tokenStore).authenticationManager(authenticationManager);</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="meta">@Override</span></span><br><span class="line">        <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">configure</span><span class="params">(ClientDetailsServiceConfigurer clients)</span> <span class="keyword">throws</span> Exception </span>&#123;</span><br><span class="line">            clients.inMemory().withClient(<span class="string">"client"</span>)</span><br><span class="line">                    .authorizedGrantTypes(<span class="string">"password"</span>,<span class="string">"refresh_token"</span>)</span><br><span class="line">                    .authorities(<span class="string">"ROLE_USER"</span>)</span><br><span class="line">                    .scopes(<span class="string">"read"</span>)</span><br><span class="line">                    .resourceIds(RESOURCE_ID)</span><br><span class="line">                    .secret(<span class="string">"secret"</span>).accessTokenValiditySeconds(<span class="number">3600</span>);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="其他策略"><a href="#其他策略" class="headerlink" title="其他策略"></a>其他策略</h2><ul><li><p><strong>JWT(Json Web Tokens)</strong> 目前还是一份草案, 与Oauth2项目在服务器端配置上更简单些,目前在一些使用 Angular, Ember的单页面应用中已经被使用。JWT在passport和spring-security中都能够支持。</p></li><li><p><strong>CAS for Mobile</strong>, CAS是一个在写web项目时常用的单点登录服务器, 它也能够支持<a href="https://wiki.jasig.org/display/casum/restful+api" target="_blank" rel="noopener">Rest API</a> ,不过在客户端的处理比较麻烦,不过已经有了第三方的repo能够支持移动端CAS <a href="https://github.com/justindancer/android-cas-client" target="_blank" rel="noopener">Android</a> / <a href="https://github.com/acu-dev/objc-cas-client" target="_blank" rel="noopener">iOS</a></p></li></ul><h2 id="参考资料"><a href="#参考资料" class="headerlink" title="参考资料"></a>参考资料</h2><ol><li><a href="http://www.ruanyifeng.com/blog/2014/05/oauth_2_0.html" target="_blank" rel="noopener">http://www.ruanyifeng.com/blog/2014/05/oauth_2_0.html</a></li><li><a href="http://www.cnblogs.com/smarterplanet/p/4088479.html?utm_source=tuicool" target="_blank" rel="noopener">http://www.cnblogs.com/smarterplanet/p/4088479.html?utm_source=tuicool</a></li><li><a href="http://www.cnblogs.com/pengyingh/articles/2377968.html" target="_blank" rel="noopener">http://www.cnblogs.com/pengyingh/articles/2377968.html</a></li><li><a href="http://haomou.net/2014/08/13/2014_web_token/" target="_blank" rel="noopener">http://haomou.net/2014/08/13/2014_web_token/</a></li></ol><blockquote><p>最后再次感慨下NodeJS开发者真幸福！！！</p></blockquote>]]></content>
    
    <summary type="html">
    
      讲解如何对REST接口进行访问控制
    
    </summary>
    
    
  </entry>
  
  <entry>
    <title>读取文件的正确方式</title>
    <link href="http://stackbox.cn/2015-06-right-way-to-read-hdfs-file/"/>
    <id>http://stackbox.cn/2015-06-right-way-to-read-hdfs-file/</id>
    <published>2015-06-18T17:48:34.000Z</published>
    <updated>2018-12-17T11:06:42.944Z</updated>
    
    <content type="html"><![CDATA[<h1 id="缘由"><a href="#缘由" class="headerlink" title="缘由"></a>缘由</h1><p>最近在写一个MapReduce程序的时候,出现了读取HDFS文件截断的情况,代码如下:</p><a id="more"></a><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">//fs : FileSystem</span><br><span class="line">InputStream in = null;</span><br><span class="line">byte[] b = new byte[1024 * 1024 * 64];</span><br><span class="line">int len = 0;</span><br><span class="line">try &#123;</span><br><span class="line">in = fs.open(new Path(fileName));</span><br><span class="line">len = in.read(b);</span><br><span class="line">    &#125; catch (Exception e) &#123;</span><br><span class="line">    e.printStackTrace();</span><br><span class="line">    &#125; finally &#123;</span><br><span class="line">    try &#123;</span><br><span class="line">    in.close();</span><br><span class="line">    &#125; catch (IOException e) &#123;</span><br><span class="line">    e.printStackTrace();</span><br><span class="line">    &#125;</span><br><span class="line">    &#125;</span><br><span class="line">return new String(b, 0, len);</span><br></pre></td></tr></table></figure><p>理论上,bytes数组大小已经设置为了64MB, 远远大于要读取的文件,那为什么会出现这种情况呢？<br>一开始怀疑 <code>InputStream.read()</code> 方法导致截断,果然,改成用BufferedReader读取的方式就好用了。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">BufferedReader reader = null;</span><br><span class="line">StringBuilder sb = new StringBuilder();</span><br><span class="line">try &#123;</span><br><span class="line">    reader = new BufferedReader(new InputStreamReader(fs.open(new Path(fileName))));</span><br><span class="line">    String line = null;</span><br><span class="line"></span><br><span class="line">    while((line = reader.readLine()) != null) &#123;</span><br><span class="line">        sb.append(line);</span><br><span class="line">    &#125;</span><br><span class="line">    &#125; catch (Exception ioe) &#123;</span><br><span class="line">        System.out.println(fileName + &quot; does&apos;t exist!&quot;);</span><br><span class="line">    &#125; finally &#123;</span><br><span class="line">        try &#123;</span><br><span class="line">            reader.close();</span><br><span class="line">        &#125; catch (IOException e) &#123;</span><br><span class="line">            System.out.println(&quot;Reader close failed&quot;);</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    return sb.toString();</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>但真实的原因真的是这样么？</p><h1 id="分析"><a href="#分析" class="headerlink" title="分析"></a>分析</h1><p>由上段代码可知, <code>InputStream.read()</code>读取的byte长度和期望值不同,我在 <a href="http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html" target="_blank" rel="noopener">API  Docs</a>中发现了这么一个定义</p><blockquote><p>An attempt is made to read as many as len bytes, but a smaller number may be read.</p></blockquote><p>也就是说, <code>read()</code> 方法只是尽量的去读stream,不保证读取stream中全部的字节。类似的还有 <code>availabe()</code>方法,这个方法同样不保证返回正确的stream大小,而导致这些状况的原因可能有以下几点:</p><ol><li>硬件上的buffersize比较小</li><li>网络的IO比较慢</li><li>文件是分布式的,组合在一起时需要花费一些时间</li></ol><p>最后关于解决方法,除了使用Reader一行一行读以外, 使用 <code>DataInputStream.readFully()</code> 也能避免这种问题。</p><p>THE END</p>]]></content>
    
    <summary type="html">
    
      IO好复杂系列
    
    </summary>
    
      <category term="数据开发" scheme="http://stackbox.cn/categories/%E6%95%B0%E6%8D%AE%E5%BC%80%E5%8F%91/"/>
    
    
      <category term="计算机基础" scheme="http://stackbox.cn/tags/%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%9F%BA%E7%A1%80/"/>
    
  </entry>
  
  <entry>
    <title>maven大坑</title>
    <link href="http://stackbox.cn/2015-05-dammit-maven/"/>
    <id>http://stackbox.cn/2015-05-dammit-maven/</id>
    <published>2015-05-10T16:53:31.000Z</published>
    <updated>2018-12-17T11:06:32.552Z</updated>
    
    <content type="html"><![CDATA[<h2 id="乱码"><a href="#乱码" class="headerlink" title="乱码"></a>乱码</h2><p>再部署某产品的时候, 出现了诡异的编码错误,主要体现为:</p><ul><li>登陆提交的表单会自动加一串奇奇怪怪的乱码</li><li>Constant变量中的中文在当成message放在json中也会出现乱码</li></ul><a id="more"></a><p>一开始我以为是Linux的Locale环境变量引起的,但是改之依然没有效果,而从上面的那个第二条大致可以<br>猜出是文件编译的时候把encoding搞乱了。因为@FanFan童鞋用直接eclipse的export导出的war是可用的,那就是说打包的时候错误了。</p><p>最后的解决方法是: <strong> pom.xml配置编码方式 </strong></p><p>先配置:</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">properties</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">project.build.sourceEncoding</span>&gt;</span>UTF-8<span class="tag">&lt;/<span class="name">project.build.sourceEncoding</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">properties</span>&gt;</span></span><br></pre></td></tr></table></figure><p>然后再配置 maven-compiler-plugin</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">plugin</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">groupId</span>&gt;</span>org.apache.maven.plugins<span class="tag">&lt;/<span class="name">groupId</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">artifactId</span>&gt;</span>maven-compiler-plugin<span class="tag">&lt;/<span class="name">artifactId</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">version</span>&gt;</span>3.1<span class="tag">&lt;/<span class="name">version</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">source</span>&gt;</span>1.7<span class="tag">&lt;/<span class="name">source</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">target</span>&gt;</span>1.7<span class="tag">&lt;/<span class="name">target</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">encoding</span>&gt;</span>UTF-8<span class="tag">&lt;/<span class="name">encoding</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">plugin</span>&gt;</span></span><br></pre></td></tr></table></figure><h2 id="mybatis代理失效"><a href="#mybatis代理失效" class="headerlink" title="mybatis代理失效"></a>mybatis代理失效</h2><p>这个问题找的比较快,因为mybatis是通过动态代理模式来实现DAO接口的, 一看到CGLib失败就知道接口的代理出现了问题。<br>果不其然，在编译的结果里没有找到mybatis的xml。</p><p>好吧，项目的先人把XML放到了 <code>src/main/java</code> 下, 而默认会忽略掉这个文件夹下的配置文件的。而且先人还是通过eclipse-&gt;export导出war包的，所以就没有发现这个问题。</p><p>解决方法:</p><ul><li>比较暴力的方法是把xml,properties 都放到<code>src/main/resources</code>下</li><li>本着较少改动的原则,给pom.xml添加如下配置</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"> &lt;resources&gt;</span><br><span class="line">    &lt;resource&gt;</span><br><span class="line">        &lt;directory&gt;src/main/resources&lt;/directory&gt;</span><br><span class="line">        &lt;filtering&gt;true&lt;/filtering&gt;</span><br><span class="line">    &lt;/resource&gt;</span><br><span class="line"></span><br><span class="line">    &lt;resource&gt;</span><br><span class="line">        &lt;directory&gt;src/main/java&lt;/directory&gt;</span><br><span class="line">        &lt;includes&gt;</span><br><span class="line">            &lt;include&gt;**/*.xml&lt;/include&gt;</span><br><span class="line">            &lt;include&gt;**/*.properties&lt;/include&gt;</span><br><span class="line">        &lt;/includes&gt;</span><br><span class="line">        &lt;filtering&gt;true&lt;/filtering&gt;</span><br><span class="line">    &lt;/resource&gt;</span><br><span class="line">&lt;/resources&gt;</span><br></pre></td></tr></table></figure><p>注意,如果不加<code>&lt;includes&gt;</code>会把java文件也打进package。。orz</p><h2 id="其他"><a href="#其他" class="headerlink" title="其他"></a>其他</h2><p>配xml不能随便，不能随便</p><p>THE END</p>]]></content>
    
    <summary type="html">
    
      Damn it,maven！
    
    </summary>
    
    
  </entry>
  
  <entry>
    <title>Build Your Own Cas Service - Pro</title>
    <link href="http://stackbox.cn/2015-01-build-your-own-cas-service-pro/"/>
    <id>http://stackbox.cn/2015-01-build-your-own-cas-service-pro/</id>
    <published>2015-01-06T05:21:17.000Z</published>
    <updated>2018-12-17T11:09:10.064Z</updated>
    
    <content type="html"><![CDATA[<p>示例代码: <a href="https://github.com/superalsrk/modify-jasig-cas" target="_blank" rel="noopener">https://github.com/superalsrk/modify-jasig-cas</a> ,以下所有描述都基于版本 <a href="http://mvnrepository.com/artifact/org.jasig.cas/cas-server-core/3.5.2.1" target="_blank" rel="noopener">3.5.2.1</a></p><a id="more"></a><h2 id="Generally-Design"><a href="#Generally-Design" class="headerlink" title="Generally Design"></a>Generally Design</h2><p>我们可以把一个war项目作为dependency，然后创建一个web项目webapp，然后只要将创建项目的 web.xml 和 index.jsp 去掉, 整个项目就能跑了。</p><p>更重要的是，如果要对war进行扩展, 只要讲war对应的文件拷贝一份到webapp，打包的时候便能自动到替换。下面讲的 <strong>修改XXX文件</strong>, 都是对其拷贝进行修改,特此声明:</p><p>webapp module的pom为<a href="https://github.com/superalsrk/modify-jasig-cas/blob/master/webapp/pom.xml" target="_blank" rel="noopener">pom.xml</a></p><h2 id="Auth-Module"><a href="#Auth-Module" class="headerlink" title="Auth Module"></a>Auth Module</h2><h3 id="自定义Credentials"><a href="#自定义Credentials" class="headerlink" title="自定义Credentials"></a>自定义Credentials</h3><p>Credentials是一个用户凭证, 可以理解为一个简易的pojo, 只要实现Credentials接口即可，我们的自定义凭证中除了用户名密码，还加了一个字段 product : String, 表明要登录的产品类型</p><p>在Web Module中，需要进行如下修改</p><p>1 . 在登录表单增加product字段,具体操作详见下个Section<br>2 . 在 /WEB-INF/login-webflow.xml 中,修改credentials类型为自定义的Credentials<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&lt;var name=&quot;credentials&quot; class=&quot;com.nbrc.sso.cas.principal.NbrcCredentials&quot;/&gt;</span><br></pre></td></tr></table></figure></p><p>3 . 然后继续在 login-webflow.xml里找到 viewLoginForm ,进行数据绑定<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">&lt;view-state id=&quot;viewLoginForm&quot; view=&quot;casLoginView&quot; model=&quot;credentials&quot;&gt;  </span><br><span class="line">       &lt;binder&gt;  </span><br><span class="line">           &lt;binding property=&quot;username&quot; /&gt;  </span><br><span class="line">           &lt;binding property=&quot;password&quot; /&gt;  </span><br><span class="line">           &lt;binding property=&quot;product&quot;/&gt; &lt;!--增加这一行 --&gt;  </span><br><span class="line">       &lt;/binder&gt;  </span><br><span class="line">       ...  </span><br><span class="line">&lt;/view-state&gt;</span><br></pre></td></tr></table></figure></p><h3 id="自定义Handler"><a href="#自定义Handler" class="headerlink" title="自定义Handler"></a>自定义Handler</h3><p>自定义Handler只要实现接口 AuthenticationHandler 即可</p><p>1 . 如果要在前台显示一个 权限不足 的信息, 只需在Handler里throw一个自定义的 AuthenticationException 即可<br>2 . support 接口用来声明handler是否支持某种类型的凭证<br>3 . 修改 /WEB-INF/deployConfigContext.xml ，进行handler的配置<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">&lt;property name=&quot;authenticationHandlers&quot;&gt;</span><br><span class="line">            &lt;list&gt;</span><br><span class="line">                &lt;bean</span><br><span class="line">                    class=&quot;org.jasig.cas.authentication.handler.support.HttpBasedServiceCredentialsAuthenticationHandler&quot;</span><br><span class="line">                    p:httpClient-ref=&quot;httpClient&quot; p:requireSecure=&quot;false&quot; /&gt;</span><br><span class="line"></span><br><span class="line">                &lt;bean</span><br><span class="line">                    class=&quot;com.miaozhen.dashboard.darkportal.mechanism.DarkportalAuthenticationHandler&quot; /&gt;</span><br><span class="line">            &lt;/list&gt;</span><br><span class="line">&lt;/property&gt;</span><br></pre></td></tr></table></figure></p><h3 id="自定义Resolver"><a href="#自定义Resolver" class="headerlink" title="自定义Resolver"></a>自定义Resolver</h3><p>Resolver是一个Credentials 到 Principal的转换器， 其中Principal其实是javaEE中就已经定义好的</p><p>1 . 修改 /WEB-INF/deployConfigContext.xml ，进行Resolver的配置<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">&lt;property name=&quot;credentialsToPrincipalResolvers&quot;&gt;</span><br><span class="line">            &lt;list&gt;</span><br><span class="line">                &lt;bean</span><br><span class="line">                    class=&quot;com.miaozhen.dashboard.darkportal.mechanism.DarkportalCredentialsToPrincipalResolver&quot;&gt;</span><br><span class="line"></span><br><span class="line">                &lt;/bean&gt;</span><br><span class="line"></span><br><span class="line">                &lt;bean</span><br><span class="line">                    class=&quot;org.jasig.cas.authentication.principal.HttpBasedServiceCredentialsToPrincipalResolver&quot; /&gt;</span><br><span class="line">            &lt;/list&gt;</span><br><span class="line">&lt;/property&gt;</span><br></pre></td></tr></table></figure></p><p>2 . resolver可以返回一个Principal, 个人觉得比较好用的方式是返回一个 #SimplePrincipal# ,除了用户的user信息外，还可以返回一个 AttrMap，不过需要参考下章进行Resolver视图的修改<br><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">Map&lt;String, Object&gt; map = <span class="keyword">new</span> HashMap&lt;String, Object&gt;();</span><br><span class="line">map.put(ATTR_USERNAME, mzCredentials.getUsername());</span><br><span class="line">map.put(ATTR_PASSWORD, mzCredentials.getPassword());</span><br><span class="line"></span><br><span class="line">SimplePrincipal simple = <span class="keyword">new</span> SimplePrincipal(mzCredentials.getUsername(), map);</span><br></pre></td></tr></table></figure></p><p>##Web Module</p><h3 id="自定义登陆页面"><a href="#自定义登陆页面" class="headerlink" title="自定义登陆页面"></a>自定义登陆页面</h3><p>正常的做法应该是copy一份defaults文件夹，然后在resources里copy对应的主题配置文件，最后在cas.properties里配置一下主题，不过为了省事直接改defaults里的文件就可以了</p><p>default/ui/casLoginView.jsp 就是默认的登录界面，可以给form表单增加多余的字段。需要注意的是：form表单里还有一堆cas自带的input，这个在改页面的时候不能删掉。<br><br></p><h3 id="自定义返回用户信息"><a href="#自定义返回用户信息" class="headerlink" title="自定义返回用户信息"></a>自定义返回用户信息</h3><p>1 . 在resolver中虽然返回了更多Attr，不过默认的Resolver视图不支持返回更多属性，需要对 protocol/2.0/casServiceValidationSuccess.jsp 页面进行扩展.<br><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">%@</span> <span class="attr">page</span> <span class="attr">session</span>=<span class="string">"false"</span>%&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">%@</span> <span class="attr">taglib</span> <span class="attr">prefix</span>=<span class="string">"c"</span> <span class="attr">uri</span>=<span class="string">"http://java.sun.com/jsp/jstl/core"</span>%&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">%@</span> <span class="attr">taglib</span> <span class="attr">uri</span>=<span class="string">"http://java.sun.com/jsp/jstl/functions"</span> <span class="attr">prefix</span>=<span class="string">"fn"</span>%&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">cas:serviceResponse</span> <span class="attr">xmlns:cas</span>=<span class="string">'http://www.yale.edu/tp/cas'</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">cas:authenticationSuccess</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">cas:user</span>&gt;</span>$&#123;fn:escapeXml(assertion.chainedAuthentications[fn:length(assertion.chainedAuthentications)-1].principal.id)&#125;<span class="tag">&lt;/<span class="name">cas:user</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">c:if</span></span></span><br><span class="line"><span class="tag">            <span class="attr">test</span>=<span class="string">"$&#123;fn:length(assertion.chainedAuthentications[fn:length(assertion.chainedAuthentications)-1].principal.attributes) &gt; 0&#125;"</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">cas:attributes</span>&gt;</span></span><br><span class="line">                  <span class="tag">&lt;<span class="name">c:forEach</span> <span class="attr">var</span>=<span class="string">"attr"</span> <span class="attr">items</span>=<span class="string">"$&#123;assertion.chainedAuthentications[fn:length(assertion.chainedAuthentications)-1].principal.attributes&#125;"</span>&gt;</span></span><br><span class="line">                    <span class="tag">&lt;<span class="name">cas:$&#123;fn:escapeXml(attr.key)&#125;</span>&gt;</span>$&#123;fn:escapeXml(attr.value)&#125;<span class="tag">&lt;/<span class="name">cas:$&#123;fn:escapeXml(attr.key)&#125;</span>&gt;</span></span><br><span class="line">                <span class="tag">&lt;/<span class="name">c:forEach</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;/<span class="name">cas:attributes</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;/<span class="name">c:if</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">c:if</span> <span class="attr">test</span>=<span class="string">"$&#123;not empty pgtIou&#125;"</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">cas:proxyGrantingTicket</span>&gt;</span>$&#123;pgtIou&#125;<span class="tag">&lt;/<span class="name">cas:proxyGrantingTicket</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;/<span class="name">c:if</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;<span class="name">c:if</span> <span class="attr">test</span>=<span class="string">"$&#123;fn:length(assertion.chainedAuthentications) &gt; 1&#125;"</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;<span class="name">cas:proxies</span>&gt;</span></span><br><span class="line">                <span class="tag">&lt;<span class="name">c:forEach</span> <span class="attr">var</span>=<span class="string">"proxy"</span> <span class="attr">items</span>=<span class="string">"$&#123;assertion.chainedAuthentications&#125;"</span></span></span><br><span class="line"><span class="tag">                    <span class="attr">varStatus</span>=<span class="string">"loopStatus"</span> <span class="attr">begin</span>=<span class="string">"0"</span></span></span><br><span class="line"><span class="tag">                    <span class="attr">end</span>=<span class="string">"$&#123;fn:length(assertion.chainedAuthentications)-2&#125;"</span> <span class="attr">step</span>=<span class="string">"1"</span>&gt;</span></span><br><span class="line">                    <span class="tag">&lt;<span class="name">cas:proxy</span>&gt;</span>$&#123;fn:escapeXml(proxy.principal.id)&#125;<span class="tag">&lt;/<span class="name">cas:proxy</span>&gt;</span></span><br><span class="line">                <span class="tag">&lt;/<span class="name">c:forEach</span>&gt;</span></span><br><span class="line">            <span class="tag">&lt;/<span class="name">cas:proxies</span>&gt;</span></span><br><span class="line">        <span class="tag">&lt;/<span class="name">c:if</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;/<span class="name">cas:authenticationSuccess</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">cas:serviceResponse</span>&gt;</span></span><br></pre></td></tr></table></figure></p><p>2 . 在client端，使用如下代码就可以获取多余属性<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">AttributePrincipal attribute = (AttributePrincipal) request.getUserPrincipal();</span><br><span class="line">AttributePrincipal.getName()  就是 Resolver中返回的SimplePrincipal名字</span><br><span class="line">AttributePrincipal.getAttributes() 就是Resolver中返回的SinmplePrincipal的attributes</span><br></pre></td></tr></table></figure></p><p>3 . 注意把deployerConfigContext.xml中 serviceRegistryDao全部删掉(cas),<a href="http://www.open-open.com/lib/view/open1329744257937.html" target="_blank" rel="noopener">参考资料</a></p><p><br></p><h3 id="CAS退出功能"><a href="#CAS退出功能" class="headerlink" title="CAS退出功能"></a>CAS退出功能</h3><p>默认的JASIG退出成功后会跳到一个 推出成功页面, 但我们想要的效果是退出CAS，并且退出已经登录的应用, 那么可以进行如下的配置：</p><ol><li>如果只是退出应用，那么在此访问页面的时候，cas-client又会向cas-server端进行请求验证,然后自动登录,所以同时退出cas和应用即可</li><li>修改 cas-servlet.xml , 在 logoutController 的bean中增加属性 p:followServiceRedirects=”true”</li><li>假如应用已经有一个退出controller，此contoller用来清空session,那么链接 <a href="http://cas.example.org/logout?service=http://localhost:8080/logout" target="_blank" rel="noopener">http://cas.example.org/logout?service=http://localhost:8080/logout</a> 便可以正常退出</li></ol><p>THE END</p>]]></content>
    
    <summary type="html">
    
      对JASIG-CAS进行页面和业务的扩展具体步骤
    
    </summary>
    
    
  </entry>
  
  <entry>
    <title>Build Your Own Cas Service - Basic</title>
    <link href="http://stackbox.cn/2015-01-build-your-own-cas-service-basic/"/>
    <id>http://stackbox.cn/2015-01-build-your-own-cas-service-basic/</id>
    <published>2015-01-06T05:21:16.000Z</published>
    <updated>2018-12-17T11:09:00.844Z</updated>
    
    <content type="html"><![CDATA[<h2 id="预备知识"><a href="#预备知识" class="headerlink" title="预备知识"></a>预备知识</h2><p>具体的CAS协议见, <a href="http://jasig.github.io/cas/4.0.x/protocol/CAS-Protocol.html" target="_blank" rel="noopener">CAS Protocal</a>,接下来我们讲jasig的CAS Implementation的几个重要的点，以下所有描述都基于版本 <a href="http://mvnrepository.com/artifact/org.jasig.cas/cas-server-core/3.5.2.1" target="_blank" rel="noopener">3.5.2.1</a><br><a id="more"></a></p><!-- more --><p>JASIG有以下几个比较重要的接口</p><ul><li><p><a href="https://github.com/Jasig/cas/blob/v3.5.2.1/cas-server-core/src/main/java/org/jasig/cas/authentication/principal/Credentials.java" target="_blank" rel="noopener">Credentials</a> 用户认证凭证, CAS的默认凭证只有用户名密码，所以如果想在认证的时候除了用户名密码外还要验证产品信息，就要自定义一个Credentials了，下面的Handler和Resolver都有一个support接口，用来判断是否支持处理某种类型的Credentials</p></li><li><p><a href="https://github.com/Jasig/cas/blob/v3.5.2.1/cas-server-core/src/main/java/org/jasig/cas/authentication/handler/AuthenticationHandler.java" target="_blank" rel="noopener">AuthenticationHandler</a> 前台页面提交登录信息后，此接口判断登录信息是否能认证通过,接口会抛出一个AuthenticationException异常，用以在上层代码中catch并在前台页面显示错误信息</p></li><li><p><a href="https://github.com/Jasig/cas/blob/v3.5.2.1/cas-server-core/src/main/java/org/jasig/cas/authentication/principal/CredentialsToPrincipalResolver.java" target="_blank" rel="noopener">CredentialsToPrincipalResolver</a> CAS-Client端与CAS-Server交互时返回结果,默认只有一个username，如果想附带其他属性，可以自己实现一个Resolver，此外，jasig提供了一些与LDAP等系统集成的Resolver，功能也十分强大</p></li><li><p><a href="https://github.com/Jasig/cas/blob/v3.5.2.1/cas-server-core/src/main/java/org/jasig/cas/authentication/handler/AuthenticationException.java" target="_blank" rel="noopener">AuthenticationException</a> 在authentication阶段可能会抛出异常，抛出的异常信息可以前台页面中进行展示</p></li></ul><h2 id="CAS部署与配置"><a href="#CAS部署与配置" class="headerlink" title="CAS部署与配置"></a>CAS部署与配置</h2><p>对于版本 3.5.x, 部署的war包为 module文件夹下的 <strong>cas-server-webapp-3.5.2.1.war</strong></p><h3 id="无https配置"><a href="#无https配置" class="headerlink" title="无https配置"></a>无https配置</h3><ul><li><p>修改 /WEB-INF/deployerConfigContext.xml， 设置安全属性</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">&lt;bean class=&quot;org.jasig.cas.authentication.handler.support.HttpBasedServiceCredentialsAuthenticationHandler&quot;</span><br><span class="line">  p:httpClient-ref=&quot;httpClient&quot;  p:requireSecure=&quot;false&quot;/&gt;</span><br></pre></td></tr></table></figure></li><li><p>修改 /WEB-INF/spring-configuration/ticketGrantingTicketCookieGenerator.xml</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">&lt;bean id=&quot;ticketGrantingTicketCookieGenerator&quot;</span><br><span class="line">    class=&quot;org.jasig.cas.web.support.CookieRetrievingCookieGenerator&quot;</span><br><span class="line">    p:cookieSecure=&quot;false&quot;</span><br><span class="line">    p:cookieMaxAge=&quot;-1&quot;  </span><br><span class="line">    p:cookieName=&quot;CASTGC&quot;</span><br><span class="line">    p:cookiePath=&quot;/cas&quot; /&gt;</span><br><span class="line"> &lt;/beans&gt;</span><br></pre></td></tr></table></figure></li><li><p>修改 \WEB-INF\spring-configuration\warnCookieGenerator.xm</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&lt;bean id=&quot;warnCookieGenerator&quot;</span><br><span class="line">    class=&quot;org.jasig.cas.web.support.CookieRetrievingCookieGenerator&quot;</span><br><span class="line">    p:cookieSecure=&quot;true&quot;  </span><br><span class="line">    p:cookieMaxAge=&quot;-1&quot;  </span><br><span class="line">    p:cookieName=&quot;CASPRIVACY&quot;</span><br><span class="line">    p:cookiePath=&quot;/cas&quot; /&gt;</span><br></pre></td></tr></table></figure></li></ul><blockquote><ol><li>参数p:cookieSecure=”true”，TRUE为采用HTTPS验证，与deployerConfigContext.xml的参数保持一致。</li><li>参数p:cookieMaxAge=”-1”，简单说是COOKIE的最大生命周期，-1为无生命周期，即只在当前打开的IE窗口有效，IE关闭或重新打开其它窗口，仍会要求验证。可以根据需要修改为大于0的数字，比如3600等，意思是在3600秒内，打开任意IE窗口，都不需要验证。</li></ol></blockquote><p>THE END</p>]]></content>
    
    <summary type="html">
    
      JASIG-CAS基础配置,以及对几个核心接口的分析
    
    </summary>
    
    
  </entry>
  
  <entry>
    <title>使用jsonp解决跨域问题</title>
    <link href="http://stackbox.cn/2014-06-jsonp-usage/"/>
    <id>http://stackbox.cn/2014-06-jsonp-usage/</id>
    <published>2014-06-21T09:16:00.000Z</published>
    <updated>2018-12-17T11:06:18.738Z</updated>
    
    <content type="html"><![CDATA[<p>首先，需要明确记住的是，jsonp不是ajax的一种特例，而是使用动态script来获取数据的一种方式。</p><a id="more"></a><h2 id="原理"><a href="#原理" class="headerlink" title="原理"></a>原理</h2><p>由于<a href="http://baike.baidu.com/link?url=LEaAmZN5IYfQA1MwEnUm8eIgio8sTU9lRdsvwtJKKHIuGFYxKRtOOXumMICnUHFHLyQk5kLzfyXzTm_ERmJkfK" target="_blank" rel="noopener">同源策略</a>,一般来说位于 server1.example.com的网页无法与不是 server1.example.com 的服务器沟通， 而 HTML的 <code>&lt;script&gt;</code> 元素是个例外,利用这个策略，可以实现跨域获取数据的功能。</p><p>所以，我们只要构建一个<code>&lt;script&gt;</code>元素，然后将 <code>src</code> 属性赋值成我们请求资料的地址即可（参数适用get方式进行拼接），比如：</p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">&lt;script type=<span class="string">"text/javascript"</span></span><br><span class="line">src=<span class="string">"http://server2.example.com/userlist?userId=1823&amp;callback=sayHello"</span>&gt;</span><br><span class="line">&lt;<span class="regexp">/script&gt;</span></span><br></pre></td></tr></table></figure><p>浏览器请求这个资源，服务器端进行一些特殊的处理,给浏览器返回如下所示的资源。</p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">sayHello(&#123;</span><br><span class="line"><span class="string">'userId'</span> : <span class="number">1823</span>,</span><br><span class="line"><span class="string">'name'</span> : <span class="string">'stackbox'</span></span><br><span class="line">&#125;)</span><br></pre></td></tr></table></figure><p>即全局运行了一个sayHello函数，参数为获取的json数据。</p><p>jQuery内置提供的jsonp功能,最简单的使用方式如下:</p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">jQuery.getJSON(url+<span class="string">"&amp;callback=?"</span>, <span class="function"><span class="keyword">function</span>(<span class="params">data</span>) </span>&#123;</span><br><span class="line">    alert(<span class="string">"Symbol: "</span> + data.symbol + <span class="string">", Price: "</span> + data.price);</span><br><span class="line">&#125;);</span><br></pre></td></tr></table></figure><p>此时jQuery会生成一个命名随机的callback方法， 比如 <strong>jQuery18308848262811079621_1393981029347</strong>，然后会<br>将这个函数附加到全局window，这样返回资源的时候就能调用这个函数了。</p><p>也可以指定自己的callback名，比如:</p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$.ajax(&#123;</span><br><span class="line">       url:<span class="string">"http://localhost:20002/MyService.ashx?callback=?"</span>,</span><br><span class="line">       dataType:<span class="string">"jsonp"</span>,</span><br><span class="line">       jsonpCallback:<span class="string">"person"</span>,</span><br><span class="line">       success:<span class="function"><span class="keyword">function</span>(<span class="params">data</span>)</span>&#123;</span><br><span class="line">           alert(data.name + <span class="string">" is a a"</span> + data.sex);</span><br><span class="line">       &#125;</span><br><span class="line">  &#125;);</span><br></pre></td></tr></table></figure><p>除了使用 <a href="http://api.jquery.com/jquery.ajax/" target="_blank" rel="noopener">$.ajax</a>,也可以使用 <a href="http://www.w3schools.com/Jquery/ajax_ajaxsetup.asp" target="_blank" rel="noopener">$.ajaxSetup</a>对请求进行设置。</p><h2 id="示例"><a href="#示例" class="headerlink" title="示例"></a>示例</h2><h2 id="备注"><a href="#备注" class="headerlink" title="备注"></a>备注</h2><p>JSONP是一个非标准的规范，其优点在于浏览器兼容性，而且由于发展<br>的比较早目前有大量基于JSONP的api(Yahoo,Twitter, etc) 和库(jQUery, YUI)<br>它的缺点也很明显：</p><ol><li>使得rest风格的api不再那么优雅</li><li>安全问题</li><li>与跨域的接口交互困难，无法post，无法直接给接口传一个json(虽然可以URLEncode成一个参数，但是比较丑陋)</li><li>调试复杂，具体参见 <a href="http://johnnywey.wordpress.com/2012/05/20/jsonp-how-does-it-work/" target="_blank" rel="noopener">这篇文章</a></li></ol><p>而且，解决跨域问题的方式不止有JSONP这一种，还有 <a href="http://zh.wikipedia.org/wiki/%E8%B7%A8%E4%BE%86%E6%BA%90%E8%B3%87%E6%BA%90%E5%85%B1%E4%BA%AB" target="_blank" rel="noopener">Cross-Origin Resource Sharing (CORS)</a> 和 Proxy两种。</p><ul><li>CORS的关键在于: XMLHttpRequest 在Level 2时新增了跨域访问的功能，需要在服务器端设置一些特殊header，兼容性你懂得。</li><li>Proxy方式就是： 使用apache/Nginx将另外一个site的接口映射到同源的URL下，简单暴力。缺点在于每次新的api都要修改proxy的配置。</li></ul><h2 id="后记"><a href="#后记" class="headerlink" title="后记"></a>后记</h2><p>资料来源：</p><ol><li><a href="http://zh.wikipedia.org/wiki/JSONP" target="_blank" rel="noopener">http://zh.wikipedia.org/wiki/JSONP</a></li><li><a href="http://blog.csdn.net/patern_pan/article/details/7588755" target="_blank" rel="noopener">http://blog.csdn.net/patern_pan/article/details/7588755</a></li><li><a href="http://forum.jquery.com/topic/jsonp-and-randomly-generated-callback-function" target="_blank" rel="noopener">http://forum.jquery.com/topic/jsonp-and-randomly-generated-callback-function</a></li><li><a href="http://stackoverflow.com/questions/22186703/modifying-jquery-jsonp-callback-function" target="_blank" rel="noopener">http://stackoverflow.com/questions/22186703/modifying-jquery-jsonp-callback-function</a></li><li><a href="http://johnnywey.wordpress.com/2012/05/20/jsonp-how-does-it-work/" target="_blank" rel="noopener">http://johnnywey.wordpress.com/2012/05/20/jsonp-how-does-it-work/</a></li></ol><p>PS:今天在和GR童鞋谈论校长的 <a href="http://book.douban.com/subject/24335672/" target="_blank" rel="noopener">《淘宝技术这十年》</a> 时候, 发现这么一个知识点:</p><blockquote><p>生成首页后，对Web前端稍微有点常识的人都应该知道，浏览器下一步会加载页面中用到的CSS、JS（JavaScript）、图片等样式、脚本和资源文件。但是可能相对较少的人才会知道，你的浏览器在同一个域名下并发加载的资源数量是有限的，例如IE 6和IE 7是两个，IE 8是6个，chrome各版本不大一样，一般是4～6个。我刚刚看了一下，我访问淘宝网首页需要加载126个资源，那么如此小的并发连接数自然会加载很久。</p></blockquote><p>好了，就到这里了。</p><p>THE END</p>]]></content>
    
    <summary type="html">
    
      关于jsonp的原理与使用
    
    </summary>
    
    
  </entry>
  
  <entry>
    <title>vagrant虚拟机使用</title>
    <link href="http://stackbox.cn/2014-03-vagrant-using/"/>
    <id>http://stackbox.cn/2014-03-vagrant-using/</id>
    <published>2014-03-28T01:46:55.000Z</published>
    <updated>2018-12-17T11:06:15.236Z</updated>
    
    <content type="html"><![CDATA[<h2 id="Vagrant-虚拟机使用"><a href="#Vagrant-虚拟机使用" class="headerlink" title="Vagrant 虚拟机使用"></a>Vagrant 虚拟机使用</h2><p>vagrant 是一款用于创建和部署虚拟化开发环境，一般都是用virtualbox做provider的，不过也可以使用其他虚拟机,比如vmware和docker,<br>有国外大牛做出了这个<a href="https://github.com/philspitler/vagrant-docker" target="_blank" rel="noopener">vagrant-docker</a>项目,就是使用docker作为provider.</p><a id="more"></a><h2 id="安装"><a href="#安装" class="headerlink" title="安装"></a>安装</h2><ul><li>安装 <a href="http://download.virtualbox.org/virtualbox/4.3.6/VirtualBox-4.3.6-91406-Win.exe" target="_blank" rel="noopener">VirtualBox-4.3.6-91406-Win.exe</a></li><li>安装 <a href="http://966b.http.dal05.cdn.softlayer.net/data-production/1835e881651ac8f27a9e4b815754f1934db71fe6?filename=Vagrant_1.4.3.msi" target="_blank" rel="noopener">Vagrant_1.4.3.msi</a></li></ul><h2 id="配置"><a href="#配置" class="headerlink" title="配置"></a>配置</h2><ul><li><p>从<a href="http://www.vagrantbox.es/" target="_blank" rel="noopener">www.vagrantbox.es</a>‎下载虚拟机，我们使用32位的ubuntu版本 <a href="http://files.vagrantup.com/lucid32.box" target="_blank" rel="noopener">lucid32.box</a>(这个是ubuntu10不推荐使用)</p></li><li><p>将box文件拷贝到计算机的某个文件夹中，msys运行添加虚拟机命令</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ vagrant box add lucid32 ./lucid32.box</span><br></pre></td></tr></table></figure></li><li><p>创建内容如下的 <code>Vagrantfile</code> 文件</p></li></ul><figure class="highlight ruby"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br></pre></td><td class="code"><pre><span class="line">  <span class="comment"># -*- mode: ruby -*-</span></span><br><span class="line">  <span class="comment"># vi: set ft=ruby :</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Vagrantfile API/syntax version. Don't touch unless you know what you're doing!</span></span><br><span class="line">  VAGRANTFILE_API_VERSION = <span class="string">"2"</span></span><br><span class="line"></span><br><span class="line">  Vagrant.configure(VAGRANTFILE_API_VERSION) <span class="keyword">do</span> <span class="params">|config|</span></span><br><span class="line">  <span class="comment"># All Vagrant configuration is done here. The most common configuration</span></span><br><span class="line">  <span class="comment"># options are documented and commented below. For a complete reference,</span></span><br><span class="line">  <span class="comment"># please see the online documentation at vagrantup.com.</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Every Vagrant virtual environment requires a box to build off of.</span></span><br><span class="line">  config.vm.box = <span class="string">"lucid32"</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># The url from where the 'config.vm.box' box will be fetched if it</span></span><br><span class="line">  <span class="comment"># doesn't already exist on the user's system.</span></span><br><span class="line">  config.vm.box_url = <span class="string">"http://files.vagrantup.com/lucid32.box"</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Create a forwarded port mapping which allows access to a specific port</span></span><br><span class="line">  <span class="comment"># within the machine from a port on the host machine. In the example below,</span></span><br><span class="line">  <span class="comment"># accessing "localhost:8080" will access port 80 on the guest machine.</span></span><br><span class="line">  <span class="comment"># config.vm.network :forwarded_port, guest: 80, host: 8080</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Create a private network, which allows host-only access to the machine</span></span><br><span class="line">  <span class="comment"># using a specific IP.</span></span><br><span class="line">  <span class="comment"># config.vm.network :private_network, ip: "192.168.33.10"</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Create a public network, which generally matched to bridged network.</span></span><br><span class="line">  <span class="comment"># Bridged networks make the machine appear as another physical device on</span></span><br><span class="line">  <span class="comment"># your network.</span></span><br><span class="line">  <span class="comment"># config.vm.network :public_network</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># If true, then any SSH connections made will enable agent forwarding.</span></span><br><span class="line">  <span class="comment"># Default value: false</span></span><br><span class="line">  <span class="comment"># config.ssh.forward_agent = true</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Share an additional folder to the guest VM. The first argument is</span></span><br><span class="line">  <span class="comment"># the path on the host to the actual folder. The second argument is</span></span><br><span class="line">  <span class="comment"># the path on the guest to mount the folder. And the optional third</span></span><br><span class="line">  <span class="comment"># argument is a set of non-required options.</span></span><br><span class="line">  <span class="comment"># config.vm.synced_folder "../data", "/vagrant_data"</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Provider-specific configuration so you can fine-tune various</span></span><br><span class="line">  <span class="comment"># backing providers for Vagrant. These expose provider-specific options.</span></span><br><span class="line">  <span class="comment"># Example for VirtualBox:</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># config.vm.provider :virtualbox do |vb|</span></span><br><span class="line">  <span class="comment">#   # Don't boot with headless mode</span></span><br><span class="line">  <span class="comment">#   vb.gui = true</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment">#   # Use VBoxManage to customize the VM. For example to change memory:</span></span><br><span class="line">  <span class="comment">#   vb.customize ["modifyvm", :id, "--memory", "1024"]</span></span><br><span class="line">  <span class="comment"># end</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># View the documentation for the provider you're using for more</span></span><br><span class="line">  <span class="comment"># information on available options.</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Enable provisioning with Puppet stand alone.  Puppet manifests</span></span><br><span class="line">  <span class="comment"># are contained in a directory path relative to this Vagrantfile.</span></span><br><span class="line">  <span class="comment"># You will need to create the manifests directory and a manifest in</span></span><br><span class="line">  <span class="comment"># the file base.pp in the manifests_path directory.</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># An example Puppet manifest to provision the message of the day:</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># # group &#123; "puppet":</span></span><br><span class="line">  <span class="comment"># #   ensure =&gt; "present",</span></span><br><span class="line">  <span class="comment"># # &#125;</span></span><br><span class="line">  <span class="comment"># #</span></span><br><span class="line">  <span class="comment"># # File &#123; owner =&gt; 0, group =&gt; 0, mode =&gt; 0644 &#125;</span></span><br><span class="line">  <span class="comment"># #</span></span><br><span class="line">  <span class="comment"># # file &#123; '/etc/motd':</span></span><br><span class="line">  <span class="comment"># #   content =&gt; "Welcome to your Vagrant-built virtual machine!</span></span><br><span class="line">  <span class="comment"># #               Managed by Puppet.\n"</span></span><br><span class="line">  <span class="comment"># # &#125;</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># config.vm.provision :puppet do |puppet|</span></span><br><span class="line">  <span class="comment">#   puppet.manifests_path = "manifests"</span></span><br><span class="line">  <span class="comment">#   puppet.manifest_file  = "init.pp"</span></span><br><span class="line">  <span class="comment"># end</span></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Enable provisioning with chef solo, specifying a cookbooks path, roles</span></span><br><span class="line">  <span class="comment"># path, and data_bags path (all relative to this Vagrantfile), and adding</span></span><br><span class="line">  <span class="comment"># some recipes and/or roles.</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># config.vm.provision :chef_solo do |chef|</span></span><br><span class="line">  <span class="comment">#   chef.cookbooks_path = "../my-recipes/cookbooks"</span></span><br><span class="line">  <span class="comment">#   chef.roles_path = "../my-recipes/roles"</span></span><br><span class="line">  <span class="comment">#   chef.data_bags_path = "../my-recipes/data_bags"</span></span><br><span class="line">  <span class="comment">#   chef.add_recipe "mysql"</span></span><br><span class="line">  <span class="comment">#   chef.add_role "web"</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment">#   # You may also specify custom JSON attributes:</span></span><br><span class="line">  <span class="comment">#   chef.json = &#123; :mysql_password =&gt; "foo" &#125;</span></span><br><span class="line">  <span class="comment"># end</span></span><br><span class="line"></span><br><span class="line">  $script = <span class="string">%Q&#123;</span></span><br><span class="line"><span class="string">    sudo apt-get update</span></span><br><span class="line"><span class="string">    sudo apt-get install nasm make build-essential grub qemu zip -y</span></span><br><span class="line"><span class="string">  &#125;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">  config.vm.provision <span class="symbol">:shell</span>, <span class="symbol">:inline</span> =&gt; $script</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">  <span class="comment"># Enable provisioning with chef server, specifying the chef server URL,</span></span><br><span class="line">  <span class="comment"># and the path to the validation key (relative to this Vagrantfile).</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># The Opscode Platform uses HTTPS. Substitute your organization for</span></span><br><span class="line">  <span class="comment"># ORGNAME in the URL and validation key.</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># If you have your own Chef Server, use the appropriate URL, which may be</span></span><br><span class="line">  <span class="comment"># HTTP instead of HTTPS depending on your configuration. Also change the</span></span><br><span class="line">  <span class="comment"># validation key to validation.pem.</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># config.vm.provision :chef_client do |chef|</span></span><br><span class="line">  <span class="comment">#   chef.chef_server_url = "https://api.opscode.com/organizations/ORGNAME"</span></span><br><span class="line">  <span class="comment">#   chef.validation_key_path = "ORGNAME-validator.pem"</span></span><br><span class="line">  <span class="comment"># end</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># If you're using the Opscode platform, your validator client is</span></span><br><span class="line">  <span class="comment"># ORGNAME-validator, replacing ORGNAME with your organization name.</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment"># If you have your own Chef Server, the default validation client name is</span></span><br><span class="line">  <span class="comment"># chef-validator, unless you changed the configuration.</span></span><br><span class="line">  <span class="comment">#</span></span><br><span class="line">  <span class="comment">#   chef.validation_client_name = "ORGNAME-validator"</span></span><br><span class="line"><span class="keyword">end</span></span><br></pre></td></tr></table></figure><ul><li>进入Vagrantfile的同名目录</li></ul><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#查看帮助</span></span><br><span class="line">$ vagrant --<span class="built_in">help</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#启动虚拟机</span></span><br><span class="line">$ vagrant up</span><br><span class="line"></span><br><span class="line"><span class="comment">#关闭虚拟机</span></span><br><span class="line">$ vagrant halt</span><br><span class="line"></span><br><span class="line"><span class="comment">#ssh连接</span></span><br><span class="line">$ vagrant ssh</span><br><span class="line"></span><br><span class="line"><span class="comment">#显示add的所有box</span></span><br><span class="line">$ vagrant box list</span><br><span class="line"></span><br><span class="line"><span class="comment">#remove box,第一个参数是box的名称，第二个是provider的名称</span></span><br><span class="line">$ vagrant box remove precise64 virtualbox</span><br><span class="line"></span><br><span class="line"><span class="comment">#摧毁一个vm，(在VagrantFile相同文件夹下)，注意与 vagrant box remove的不同</span></span><br><span class="line">$ vagrant destroy</span><br><span class="line"></span><br><span class="line"><span class="comment">#也可以通过ssh-client连接，用户名密码都为vagrant,端口为2222</span></span><br><span class="line">$ ssh -p 2222 vagrant@localhost</span><br></pre></td></tr></table></figure><h2 id="导出Box"><a href="#导出Box" class="headerlink" title="导出Box"></a>导出Box</h2><ul><li>步骤<ol><li>cd into the directory with your <strong>Vagrantile</strong></li><li>run <code>vagrant package</code>· This will export a box file called package.box by default</li><li>run <code>vagrant box add foo package.box</code> virtualbox to add package.box to your existing boxes. (Assuming you are using virtualbox and not VMWare)</li><li>run <code>vagrant box list</code> to verify it was added.</li></ol></li></ul><p> Now you can just create a new folder, run vagrant init as normal and set your box to the following:<code>config.vm.box = &quot;foo&quot;</code>,The new VM will spin up with the exact data that was present in the previous VM.</p><h2 id="时间同步"><a href="#时间同步" class="headerlink" title="时间同步"></a>时间同步</h2><p>把<code>virtualbox/bin</code>加入环境变量，运行一下命令，设置时间同步（win下也可）<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#List vms</span></span><br><span class="line">$ VBoxManage list vms</span><br><span class="line"></span><br><span class="line"><span class="comment">#get status of time sync</span></span><br><span class="line">$ VBoxManage getextradata &lt;vm-name&gt; VBoxInternal/Devices/VMMDev/0/Config/GetHostTimeDisabled</span><br><span class="line"></span><br><span class="line"><span class="comment">#<span class="doctag">NOTE:</span> Make sure to restart the VM after changing these settings.</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#disable time sync</span></span><br><span class="line">$ VBoxManage setextradata &lt;vm-name&gt; VBoxInternal/Devices/VMMDev/0/Config/GetHostTimeDisabled 1</span><br><span class="line"></span><br><span class="line"><span class="comment">#enable time sync</span></span><br><span class="line">$ VBoxManage setextradata &lt;vm-name&gt; VBoxInternal/Devices/VMMDev/0/Config/GetHostTimeDisabled 0</span><br></pre></td></tr></table></figure></p><h2 id="使用"><a href="#使用" class="headerlink" title="使用"></a>使用</h2><ul><li>网络配置<ol><li>较为常用是端口映射，就是将虚拟机中的端口映射到宿主机对应的端口直接使用 ，在Vagrantfile中配置：<code>config.vm.network :forwarded_port, guest: 80, host: 8080</code>,guest: 80 表示虚拟机中的80端口， host: 8080 表示映射到宿主机的8080端口</li><li>如果需要自己自由的访问虚拟机，但是别人不需要访问虚拟机， ，在Vagrantfile中配置：<code>config.vm.network :private_network, ip: &quot;192.168.1.104&quot;</code>,192.168.1.104表示虚拟机的IP，多台虚拟机的话需要互相访问的话，设置在相同网段即可</li><li>如果需要将虚拟机作为当前局域网中的一台计算机，由局域网进行DHCP，那么在Vagrantfile中配置：<code>config.vm.network :public_network</code></li></ol></li></ul><ul><li><p>目录映射</p><p>虚拟机初始化启动时， host的当前工作目录就会映射到 guest的 <code>/vagrant</code> 文件夹下</p><p>也可以通过VagrantFile <code>config.vm.synced_folder &quot;wwwroot/&quot;, &quot;/var/www&quot;</code> 完成映射配置</p></li></ul><h2 id="常见问题"><a href="#常见问题" class="headerlink" title="常见问题"></a>常见问题</h2><ul><li>找不到rsync命令:  参考 <a href="http://stackoverflow.com/questions/34176041/vagrant-with-virtualbox-on-windows10-rsync-could-not-be-found-on-your-path" target="_blank" rel="noopener">这个</a>, 配置文件中加入 <code>config.vm.synced_folder &quot;.&quot;, &quot;/vagrant&quot;, type: &quot;virtualbox&quot;</code> 即可</li><li><code>vagrant up</code> 的时候私钥错误, 可以暂时删除 <code>.vagrant/**/private_key.</code>, 用用户名密码登陆</li><li>推荐fedora cloud镜像</li></ul>]]></content>
    
    <summary type="html">
    
      vagrant是一个用于创建和部署虚拟化开发环境,这篇文章是vagrant的使用教程及一些tips
    
    </summary>
    
    
      <category term="工具" scheme="http://stackbox.cn/tags/%E5%B7%A5%E5%85%B7/"/>
    
  </entry>
  
</feed>
