From 327166bf429a0296e84b77b4b53f6b61b8ef3623 Mon Sep 17 00:00:00 2001 From: disksing Date: Wed, 21 Nov 2018 11:27:09 +0800 Subject: [PATCH 01/18] add schedule limit design Signed-off-by: disksing --- media/schedule-limit-new.png | Bin 0 -> 14042 bytes media/schedule-limit-old.png | Bin 0 -> 15817 bytes text/2018-09-17-schedule-limit.md | 114 ++++++++++++++++++++++++++++++ 3 files changed, 114 insertions(+) create mode 100644 media/schedule-limit-new.png create mode 100644 media/schedule-limit-old.png create mode 100644 text/2018-09-17-schedule-limit.md diff --git a/media/schedule-limit-new.png b/media/schedule-limit-new.png new file mode 100644 index 0000000000000000000000000000000000000000..cc8d43ef889c25db29123a184a0e7f1f54ec035d GIT binary patch literal 14042 zcmd6Oc|6qJ`?r0klr2JX7a}4{jZhSoY-L{)vhT*k7)yl;gM?cdl4Kc1wlVgt2-%sz zShE{OmN6L1^BMPjf1ls;eV*U%_s8>kJ+J4F@!8I~&biLH&h@_D*EwDr>g#Z_39`}9 z&~Vvy$2u=yf$`BDA#O7qY9$gc0oROW3G&uR|_pK)cm7C<`ML zRZ9AY&I)6b33e!HnKZdaT7&L9XP4Vatv`^x5h33yz>hy*-aaBt>B=515KYFt+skrg zqwjxWT|k6QWVJt<@LfD)ZrBUU93yAXhV5_PiIs1~{K4m}>`i&P@j6GPZlW@5auU2Y z0n^R6bI|C!u9#n4y|<_pA4-7mTmuR_x9F*a?LQXz-SFdTKA#N4dPuDxxVDsN<$iQ? zx>>*>M7eH-U8SM7$f{D}rhnVeRz}*|A#Ws`L#*$0%y$1MeUNe&Q;)FW+C*Jw^-_(H zq@jRAY-clMyk^-)#6UQQZ`)*@G)*_v(Grz`fc z^0E?UW@Zg^S*jmt>6vS_Zvh24#v)W(|NSxr52wDI;`-|&a0P~l`uIOD^bm$I-|suZ zVQTX2{5_^>v!%KN;Nq%4ugxQL(a{PF&!3uu?+!kqyoJ`6Ts#QdwW;L0!Bxlfh%i?E zxkPK>JU>P7fIPV`Q$?tkduV&bN2gI3!pUx#iJANIlpV{zUH9?3LWCtDbZ;{(^XFls ziD&cp3!BI9C?^&(YYz}J>>|Z9@lt0f3IbhC4nE#4q(650=Qzg;(zro-NBV|X}Pxu)i1XGsjpGNQoN zOu~}9kPUWwqc&ZWQkLsqI-@VtQMU9e2e!8n{pi(SYaUiMSB1;4ec-HEd+I>SN@FSm zd&XbX8II0Os4Fu%)$YH*bs4m5%bkUxOuo81PG%SARM$7}c#9~Vk3@mEEPQb5E_FM~ zFHan~i-a3&E;IV~uQASJr@Rtb#=rCi26`M$)WW~RjTRi zFu~_)9!SYGs8WAl{BdR@iB|F!j4x+Kl_ikvUS3aNqiSWCfTHaP1nXS@!q@Rf>fT4%PO4AYfmicaiGjB*EroHQL zaF(%u5nQTNPV5e`X^Fmy8}fs`lw~Kb>-v@(yr}uaT#a~4{x)O9h`yoj4ZS!SPh7oo zBUrFegiQapJSJi_Gb*v@aXMJk>gLe5tK-?Y*^2G6spCD{r`H0!?Z`Wx795X+KQ!Dq zx9FB8T7}Hc0%xf0B+ai5iuEpzWBYhf5zx_0@4*? zCUF~!eqM)OzdB9Aq>md^nk~BZr}v-Usz4krmxY;%Vvd))EW7*5Ldq|TN!N|EBUA<0 zDZ(nt^Qc+i$Y#Gkb=%yK?=kP;VcS}vcb{+X`j7Qae+D;;wqkBHygix>4q^K)J%%bT zoA{J_oo-oXM}l_nUf&FCR6oq~+r;SlUItZQs>nm9S8I)bvaSxz+?TL>)PH-_$~U-w1r+tLu*ji8ny^eMs_3jYsn*=U$$qBPM5A1W5$R&HeL4q9q3wY|A8EnbxZZ6x^B$a#=>U3oXn&^ z_@T$nQF1NhB@@}l9M}I|7@a1p?#snZFRb% zW8O!EX^x&ajC%<-dl@pxm!fg1V@TZoNQt%<+8io+qOGh2;cXL6MkRE&WEIAn8*_UyF`zsi!KV+XQ>_VtM=bk+c_sl43}Tmw)zmZrQH5Ecu^KvGZItM>>o& zbDH$3&9sDMnpaLin3y!HN23|k6;uoqWQ_Rje1MmG9jKNjX zL5gu&Gil`xqR{5x#)HoVEk8)N!7vbPL5&k6yt*#f(ESV23*+Wp$3aFACrOVY^1Jq$cw!jQs zb`B1DjixvIdTFrQah_H_#lDY&)j~oqkGB2!d-t(`$xsw29I8uVM`VZEmqTw@*YJsHqyVw=)1qFH5ch(0zz_N`O@S=Y9~J3whj(8w z@URnOG@S#iui=zD8(|L@8UB1q4*^z7G*mYBz1d5q3VDdaO3-=5AJ2(6e+jL53!jGT zg}JaX{{DXD6BwotJ@bhK)ibCHN(TDG338i%uKsyd+=hqtk9Z@lQgBt0*b4V_Fwxa% z)>b$LR}(UF78`^GbD(P!vi2>{q-Z42lI62*JYa1@OF$I1JPTNbzm13?)TWPAwf4lD z7qYW9AIe}w2S9Cw8no8XkR8D3O0-Yp&y~cMYeiRps_0k6yh91gR%2 zm{)1Bn9xX`{O`Op!~fGF9Ts3PxZ#X~$NGXvQ`(}=Og@2TFZ%j442;%?<%B>en zbtz`gBP?}d|`so`TtN?kB%*|9U_a#yBz9SJ*I(k3E3 zatIgI;F3IMC^HXLKF07}b~{WjHkF#vEh&zQigMiDn4`7^G(voUaUVSF*Og0lw98q! zMnmlZK(Cy?%Qfu#3DQm0!=M|>$l{ERgV(1e-b!SKCLPCL)cI51?+7tC4CVm)K`fH5hyvjsb9jJD7N;#s!;_6 z1+yK}#Sh39k}}?|MV1wp#^ZBxSBF?jJh_BH?;d2_sM;M)MjvKg_kZf&){IudquB=V zL5t~Ii0(ef-SP)*UQ4#n)e2y=fnLGaM@YZE$2_`HWY?g6P-v9bcDNkvK3w;o|J*4v`fX%@K#OzA|kuKthZtODGgOLuo60(*~HosVv4vc zG&hR6uE;Tr3ZT)p`-Q<-Y0IM!E1&p*H`B5L1ohgK{G?A;>h6e-K8IrErM8G)bh|5VRG4if^h({1VK#4IiCN*lOjA1_B{YXGLZSD?$;aXw=-?={x-+j3(TNaMn^}t zRHhlp@I%=fR2~lq?QGoZf;QEC`En6(Wdb;ciV^7;E0Jsl*`+rMdVOrGvX76qcfeCMs zPkZjQ!i6$9>O2HmFKk+A$LY`8mYv&er5Ac@W(VV?E*4*T*spJ|XS-`mb2*7%73B+L zjas$cCIj318AqAL_SUB~$epKM&(uH$u(R}{D1)fk&Iw^V>mjKIHWuorEiepEhLLI= zyOnSkYeEAVeovMClGql$ul~$NOChxR+iY)^7BC`}M8MmBS-aVa$<9#mw!PB~n^0nZ zdxukxhAB02ZD(WeGjU`30^<#zVj7I}&%2RL-@i6TB{OZ4CWjMh{W}6S_)~?aCNo5v zQZL;;L2Z|p9TGVcJ;Z4s0&~;Yjp^(bt*sZ^{Z(LcynZ>x4B93`U33))%Z#&^W%1D8 zZVkHfF(OorrTK_4%R+B!u%73=eMf?ube5Aig_MtRO5Vr2c=Co@00;zpsCT#6M@2Vp@kM5|XatcxH$fphH!OVSy&~St*-Cx;|zusW@@doOQDq#3ZK^TV$2WA*z1k3;*|g#acKLp3RQ)u@00C06;0+yq z0;F20$oM{0MOpca#?R8Qqr-qIS{UGf+taW6_a8Aj)0co)x06comEHyey+_Xv9pcrn zPek`}WHkKj%3X{`<$I5G;Aig*4B7poWG3wxqc=q+uYRjvcWJ{DkIv~-q~phUA{^S_ z7hsfaPus~*9<7hcH?0JhPjZOdz%%^OV- z8sqH6vYJ;1xL#esKd*@!9J=+@rw(+IjqHB#J%&HG6FeE({`SJs5jA?Lh9A;Fs2Lld zwjGWs*{^O-(hnyqz3r-N^Yf*IZScWXm0o%hHLj)Uji}yQz1lJ`Lm+z&QtPAQRzv@w zyEw8ndcB-()0Pft@fUY z#FPPdZW+cAQz-HI$ zoXql7DVDU5u?YpKytYZTk~Kqx;FX>77Hhcpjvvwh7>U%g#DZ7LL+YeWhpx_y>%ZNq zWw_WUYh2woy4Hvu-1B>j5~*LryhsyfMdJzE#8t{=)eKiI?&YJUviZ-v4?4F-f(N%7 z1X*C%L*Dy#J~mqKQQIGe{>)44t5-VDU#R5vwR~ITMh5)iFFf6Z#!kDg)Yv4Ya`Szv zg&@da)vE2IL;I+0TaIKneEO>NnwKx<<;ym9HvEy74}unt8r-eZy^xCgkWmP>qIAE1 z*RAF#VBjs^y0hWDLP7d3dOtTblElAG& zZbbvP_ZdW#fJquy=DrT^HR-XVZmEHRqlz$3B-e`*iOXlpGvImzR}ITA+J<4^)#bw{ zjxmIf{$16k9sI(``hh^u+RmsXrz#)f#dfx~4h4$_iK^O;hoUD&BKU;C)rrd;pS@&IB1u(pKy}CEWu*#Ji=T5dGVv z_T@XkxRF{mv=)lM`<|SLhUK(*-o@H-QE*5=?cP~jt|$=(a18PWYp0#5%8ko~+ZM@& zk>-)OkGZy`Z1rr#h4-^EPzE}@rUdiISfR?oR#~L{!FS@4)VlHw{cvKC7OuoQLY7&NB8i|v;QTjAP=z4z#SBn{>c9Y<}Fz<&8JLuO* z10j+DMFFUe6H1|>w{sI;V9OFedgc=B+j2Z-;P4H@wiMncMSoz(#BtT7i8|;*8ykdm z*a4|4{#P;`@%f7lb0hKHfGOF8J$MRO#8c75br+PYk6O;^xCtojvjHgHnW_XU?or?P z!)LEt@yXBv+RNTT=<&j*Ba`m^Srg<3)2BtDyL+D%kc2-TQ~b;JK5zD}y9r$;C(6Nc z{ZGwWAq}u;8>IG8ZsZXf@rl+n)8po6uJmGqj+iGs5Fl)?)ABKUk=6QLMcwosC%j+xK~Y4_+M>CGy*MKb zd-oP+P!@C_d#)`miFP8|u}hHl+0-}O{j9q%xOcLo90SaKs;nE2iW4loim)^EZ;I^F z817F__x{oEzvghYhWUhkSojwnpZLzzrN(iP!#cxPjO_Qyho${*2uV{*mE3tNn0KLA zkZ_TK2+?;?m(-qW0iIx-xpoz^%hy`bvxCqIXi+JroHr;$OG3)$Pp3$k5!Rx(&-+>*u`(*qGh#GyB^_7U&Q!^F>GpjC(fP zQ;e7|5(McWQ=wSrPULyk+2tmyoL&}FUPIndGX+u#??x#Ub7_%Hh2MR4LImTiSNe$g zTYrCvo7$w96oa;B>d>vYc)fbXC(`aScuj2 zZY;kX`u$dC9yD)sWM7jJjrR!f?w{1|3taxVgO|U-W)_UGx?62M<+UmQtBN1+=%D8H z)oVw7A23=)J)!#H4att=5~VjiY@Kb?o-d(Om-TLx*>>Vjr^L4`M>cO*klNQC;ck=K zE0d^b$T(*5+Bk1$Sjc)cH2u=26LR=jL5-ycjJAw;vp+FUzL7uoFc6tEg>%Q{9?ex+ zVCpy6UxFl*?)zRaof{02qMMe^?@}NWkkgx+CkpLw`Jh8P7IOkZV4xV4=!r=4W1InJst4_85lsbc z78pD?-)G4fk^l#Dwg%p~v{tV^a#(cml`tzYxFql{Wt2yH%>s8X`>AkE@oA9$ki?{K z1;Fhkj6I?Puhdb(yVk|p4aVzqm zoMbSR9>dQ}ykr%qVX<4d|IosCBK7px%-&}|k2!gD*<@X$hA*uD2>AIWn?ZR7m-9u( z@iSR~+S$(a3GF!umXals|g^>mHAW<$LgWh3L^WWWS+$ zX7>#~s!^_Tg54Hp7R-mPII2m@HX;>vQ#JD5?xo&#tox=F{xGt{$Q!Gx^2_L!DQGkm zZq`|($y?oqg2!M797Un+cd%;Oljh4%=#oy>YlZE8&Z1)fH}UuehCf8^;RD_7222DJ z53i5xSXPj8gd#Im$#Ny_Als>l_0(ZUnDefaoYljiIJ+mO7qaWDJEj6aa^GGP9)(x~ zD?b@FlvbGg(8Jg>$+hS#Ux7H2V~!^&zxNL_yUc*1U(T&%sH6%bU>wlCuu{m0oi;;W zqp$qF2z}TaJ2r4Ni4~wvp}eTZ7=>T@uk?177Hf*ih*b_#w(Z!}G|YtgVpUxtLR0qp zA6G+pxc~%Zi>!$(1^4LI1(JU7>KpI$Os-K=mY7OU=nI5)o@wTmMb^_W4M9LG!^O~mES`13 zo%`Wi#PJ$k)AjFOxl&`DpdpygyDaQj;}1 z>D%?N-Kg!9oc2X|honP#rZ3Zdy7ICS%ytTYoC#Q;0*FGZM5RoP(#YmO9m2HxcoH3~ z?T_anR-SrV2G|-tP6_=VE;d~NQ=SDqKK_*ai{JK{*coLH0URAQR=|_$2kv{cAW{sxd%un^{`XZ#R21H1d&1T`bC`Q^^+_Z(tob;?xeE*vz1&y4wNT zt-G!J=r}p!+h=WOmkA$b+zHy8_tiUJY+DBcHpL;qFeVOxgsoM)${Dp_#q8i?{y(Sc zrm8pzjX<1(k_s;y6aC#X)yJu!19DE5jvQ zdmO?b#WWDu4Wx5xq9IHJ$9!y(K5HH1$zyt-^p?tjqfUrs$kSNW*OxbIWD68Vk7k28hwz#^xs0V?^UU2l0Iyr=Kg*m7CFx! zeZxFm+&CWCv&Wdly0x&_mMyamiBk^&@P3w}{ar1g@v1Z8_-mAmaj^v{?z$%Fc<2B7 z-JTXg)ib^758wXjQ*aqVDK^N=n$Z9Kcvi}|;qsbVm5Qxv3?Pw~ z%+^;pCX^L?Ajz2(44e;WtkX0`KxY;lm5`y_)H;UpQ}vj}jXf8|>*BFjROVs+5^;*) z`06pcC6Me|)tOF0vyvi-9p~wQfe$EP&GHVHYTJ0$cb;94)nq}u@R2JYN70;hy))4h zotSej%#Il6OlGBOy*n)?&X|b zlaS5lEcLUUA-Bxwh9n!fq`=Ke=$fi|KZfRa7vFI+)dk==}-{vV$hi{Jf8v zU)E{g8-?zDFIpvw-urm@`~RHMVaY0ut~rNM3!l!Q>97u8h_SC~ZY-`zoNwxFWH&8l z#IQ4*a9-*8=$*aqOn+|6i&R`=w3%K|K6DS1h_@u+8>x(0)^-u zJr1gfM^QAF=^-%l&}Kxudc!Fiw0vX1@bZE-K`G0>^nCh)NM**jPI1h}7SGmLO80mF&$){tX8!jQV}3`;Lbz7PBurbDpjzTJb=B{A~ZvDmbpi9&!4*Z9HJ9 zHgv=?`YS7ytw}7S>wz_XmnDKM%QBFI%trA>J1*jEJ%dC%K++^;;- zr@6E7yLxSyw6Kdrm7?ctsNyI}EcTV!SX-n+rP1Oi4vcdgP%RNEHC2CwWv!a?;-}QI&*k&GXT(I3r9SSBiV7h}OZ6{Ry4f{+4NN{7pyG&-scUGm%AX z@XplXlkl}MntUAHv!~H(f|@g4qVq?0b~N# zKY27M&`q{rURm~8sc9xzok1;xsJuD!N6o?Zx$RXy7)-5EbQ?I^$`#J0q6P|mx@=jG za4$m!*C|k9TvZAgh&VJ^z}%P2Q^#ebwt9LeuEodLbO;gVF z)yv$jm%;g6gvvtvo`%B0lP?P;me@hjp;@ZY!76)lidOZ;@v~hlln=by@-0oHPdno~ zD$>2bUTF+?fSTig>UvvX;D>*D&Z?_;SS{2eS#gaSUO%e1%=8u&g*%|Y+*q4?$JVHm zh`~?0QZfxlw}4YSwB&B%kRD+~#+NzEs5t{O2hDN^ok3UNm=wcUkNG-&rFSV`)+{}h zB*Qj0g&u2{WCcD=!K*QY?GN*8$qUZD*QxF}yCacULs+l#yzZ3-V1$nj?*+a6GK-Dc zJSOTXtb&tL#>M;_izRhuthuP(JkRg5VO>T8j=~w`_qTuq&COt}P1c^%u}fzdGx~m4 zuDnl<=xTbvBg+gpXo&IkQpg+uTwn3G4293V`@hE8CyOPhh|wr)^6Ebz2GE87h#0`v z|A-jJ;{O-Kn1Wx@`5Q4Z>}YONmArJDtxJyGsVIU9Ie-8-PXVsDYsqZkqR&4{hsxRh z3o!t#{tYnz#m^4S0Whe6APfAX18^j%sdlaX7#D#ywAqG=Ih@`asvT>1OE8fPuNCcq z@=y_nTEB~PZnL}A9dni?OEy20AG^Q*>DGZA;Q<7IaSaeOGkK~Zq;syv=!vc!nwfKj zTwE7yasDIfPQ@5VgLBEMWfqBa}i0B=_fUDUgv? z=XqT><~se?cV01_#E5q&`{=Or(a=FUlz~mQKAI;}`XfN#ASQT|V@wJUoIJiw|0+V& zNeGC8U%JplsDAPQZccKZ^U3u90ukVbiTKVF2vk1NerrUO#sT)P7%qT$ga(4X|5|?I zlbwOgTJD z-_H1cn2TMeg`bw+{_oS51shv5!W0ke!e+{0;%ST!U8{xQ@#Y4-p3bnNeT}0>g6D@y z2=5oiejU(|QIF7~V_y~Ol(P3t(~J)@^wY{qixeI^l^RyT_H=~a#;g4F)?XMIpHz1e z&V(O|sQosNDTV!7L~R}IT$?FHFhX(;s&Y7&s+QBiMBfo~aCWI(Y$@j$_oV?QLmC2( z{Jh-g)ZMn&!;HUYVX4E|Wm$l9KxMJiT|znV_XY0K938T?Vr$N5nZ1jE{e$?n;9vNzep758J; zr9G9xM;p3(up{M%+n)n}HwI+xAxT5Pze}LZ(2O2)nH}hk2~3Ijz`p7AuV7=Y!Uucc zLXX|jqK)F77fjo_srlJ1OLw|=cq&y`g*;noxILQz(%Viz* z?oO&vlnhaC4Oe;q zc{+nN6XI20n|dLUJ>JBrg8DoTw*V2yGgUUoP#UYZ(D9?0i{z?rz;SAUf6+(G9-MPl=AQt=U3trWGxSr>(jnlzrNmUM(>E=oJ>rZ>77e! z97mT#bz6zczo=3Qx0tq?ny8pU>1W?8(s#BdIwoJ8tLU%^A+&l)Su9*d{*+Yscr+AY z2vWcA1paoj=m1GSrFzPg8R!#IYh9TMCRArNvYp5dj9#R2c)@SCTH_@~-^kOv_shB6 zp%;lfmTfAQY(oTpF=SSSWK&Aq5$RnykKYQ*SdqQ-?6W%fL<|GbxxoV|zR-!@*g@J^ zj*b~!9+dJl7{MwC(c%~D6IRtBVl>i4@J1WboJX=J&{CRmxN@+eR$d>nIZ8di>;VQT z#+y)9$ke8}yMFuQcJuf=Q#ljlplh@tZZHDslkQ7g~3{Q$#wy*)03o4!c z5y8m&eMsPBYX9p@t`dh{wcs45BueRgT{$BphRd;Oe4fU!jzwqFHRIWqd!ulcKPRzZ z<$V=Fg2z=M@o9`zi`dkj_Pw|i?dd?dVl^MLmfV0r->(S`hivdT4XCZyTA{_FrCbKs%l=tT_SSw` z^Pzfv^Ma+$6mFrgoF}C+K=I)tOZD`8t~YC+hkx89xTua>zH;WW60|5o0%-wh{H3e) z9?lVZorNCA+TB_?V|0a#tSVnrr9v8sqv|G-y8>&pGdW-L&Bo7L} literal 0 HcmV?d00001 diff --git a/media/schedule-limit-old.png b/media/schedule-limit-old.png new file mode 100644 index 0000000000000000000000000000000000000000..c1ac00e559a933629c3acbb71c83a1916435a002 GIT binary patch literal 15817 zcma)j2UwHOvo46zl&;djhJZ*>n)IrONKvGBkX}Nh6CffW9aI#Qj)*8NN)3c2Rq4_~ zN$9;Kv?M?X+%Nh&?cV=6_xL=R&9}R=v-{1yGxP2y)<92-fu4(=f`WoU`@ZI53JOXB z@CTu#2A=fOv_w)+D816wRC^LQv6;p6)@URYtMC0h==t;KciymQosOUm{qjCHYeDw` zm;HTp))LJG)`*uvYA%=5Ptj@Lj}xWSbRFIo{BoL(soXp2n$+c}3n6a9--fw#LIqB5ng~6=t5n8P$5gQ%25tI&PzWXHZ>0NJVfu(b2emC7uJf6!M?>S1UmwKe_WQtfAYBs;xiv`y=l6BS&xuSAk{pveTONBiT|=FOl}YD z^UU#$Z`u>&ok}Yu>XQbDsM_U3e|9|%?D0OEBnq%5g>6JjGKZYHO~Zqersdy=-Gp?E z4boM3t`!*~fi-_L>xUM%*^h!C`Kv$M#Ro}U(nJC3bYH==kD@f}`dJO~Wrl@&TEu%) zR%<4eWo1da?&G|RM($|(VySl^ZmIjP&c-6BDXXEmW1+73c}Z0mn4WY%A$x#AO`6boA_G{$tb^1Td_= zr54c@%l6#k9-c~;X(oYmYd*xBHHn;OjsZ?rC%?uaoQ}KV87voekfipASW}erw*hH$ zXC&ydbUu^wvGM4(x#FLVVs|$N0`DmMP)0XCK{R1UyOv42M^Kzu;b9X>P2Ca@CMB>JEF|kOnxh9W8@sKNyfhh)-@6QfiJ$?sHY1 zXAP*`tkQgq^H*wU)oxV~kmDf_H2t!^EdTZ^ zVF;QChxt@{@y8?hbStdz3fV`G)`TU*RReAOb8aws_6^-rQi>3p9hF?asMkNXo=MFX zJZS@8ZqD{JnR=r9K!cmX(~xhDk-T+Uip~ebe(a5_Q08z*PV?JqJh=HF)6;YM%Xm{C zySa}ESGrL&@}~gk1&?x_Q`zCG;FroF^s0nU^%jaUyGXB~$yd(9B`TI5RZ)LXZ0zcA zf(dW=fP)8Xr$NH`GSmGnF1pamRJO@wS@cznWoD4kc+mNP(Ofl2H!BBwcLHy;VdDnr zFlq^Xbea@`IZ8TNK{hP-}foNmXt0Y`nc^ z1{k;{LDzTLMq(*(w>i2ydlUXg;z2Na1OYxCgnYM!Wdo) zCAeg{OwA-m!lmBIZ_0dftAmkGaeq=Hnjus-`PTsrjx4p#USs;@$(`cWo+c2V(pnh` zZ1~uLJiB~8no*zc!)4E+WS=zUVqokl<4lqy% zoIqkV<4*ngLK<+mad>)un}^+i2WSkBSnFK+L^>F@dHm1UoT?MZB}`M{eLysrW+YKf zn;0*cc`~W`9e6Ou_{9eJ;DD-<0CCOxRBPKH>WFF?CCRG3G84;|O=HY#bOGV*_@2lB)b4RZA3A&YPSyg$%7E%M{F!ZD{H*zHCyZE(PZ87P3IoYUEt7D46LTc^!kzkV zg`X3qeLbLzoks+LwvG`Dq~ziso=nY6^0*}1$xSp(sv%r?uor;70 zi}495DCx({A%og`Q&jT3RVtiYL+538hKK2WnO+`iRSLOs!Z9B!YcVKn*on?|i9a`x=xjn}K?I>mw(l zoB!#>LDsjOK5B4W_)qz1j40|yLKsUh^XGla$g|YtH2LK<^P!IvtP;tqEs7Djog_%) z?23!=oQml({bf|!clNFjW<5RzQlCxqWp=W-7py&qp`V&NGb7N}Foet4SNq)%cFAw7 zUw%IL0sZhJoofc%9}mrDT6yB<#?fcMAbO`>JJirwl!H$A?(B?x@BqdhJZ+_kvXEyc zP^&Pbv9Qe&IRD<%3w6MxDS|ao(M~-tF9!a$nTxt3^(pHD0U0qqYM{dAjz>oFMbar} zu_#aX{`TMpV=YN~@tT8e9!`Up+j{jgpmkMLji#q#IfyN>kuAW@x5KdE18#Dy8x$^%)Ht@xXYcZByh%8a7ru_EM%v$HP8(@0{}h_saa|nm z`NOuysTOfIhG?17r>OEJWlX;Zo-?kj9A&e)i+*1zj_ko5C|vA$t0RmNJ)RoQh7rKL z1lJ2}gz8*A;{$i*01DOdgp;++5#MuieT0A_BTD@aq( z$*il4iW5q@2-->$*yKL+2sko5nKUh`$+ZZ3BaU!lGz3GdKWQaiFnXFSTac}P3x ziqDSx&WUX6^hfj92(r1BQ#^mh_22l3f3UXjfASOmAbHOJ*H8SLH~M!k^gkf^e|20p zctQ{N4g^8WCud{1FgBo5RMlq*mDEM!9kIV@8b|&!_bB|7v zM!CP*(Aw3yveC8i&NBgJa@^!_bQyQ{Ph^vz@+s_ZbV{z$MBO@m1&AR>&mH`ao5&#$ zxN~1)jU9+_u3|W#_ep<4A97X9zY>$Q9gVYUDmE^CMnL)W$Id5m%hna>q~Y&RXeIL% z@T&doWZ{;?r!-MjU7dX=TiV@t9J{`lX5(-8`5}0+22}X(t-Qd&sHmj|25*D|t;+x_Zqthr5p5a_E3x5HVgTwGc^&3|=2C#SH3*`vIxC*Wuo( z;Al;nxJ`CkZ5h9;SLS4W(607e*_~Dgy6;Jjwr?AEz61*DXNNpi_5+#2YKY%M)-o3j zw?#Zj{@d;Bh~&@>2f;Gv;yV*mqU_YlOD6uXk3COb!Ws|VtT`67f=t5B>k+upM98^C z61?@zC@k{csp~A3aBsUCuWC@wbhyIShQA87pnZpQSLNsq(E-z&#$}ct;!4X>0?qEk ze$lkD7X&{EVZCeRJ}2sSy~HQ~UgCp!S9f=vp#c!9?pfISJ8`-yQ;%M&mJ)~d!jLlS z26^63LdhC0&;4SvE-2isGbzaXmU3P90SwvPBD3*9NJP|gMe_G3Nm#}1gRt_uf&zz^ zddugnUEOlkApfXYJ_#M7Bz1{+!CSoqW zeE0K4k``akHGPz{52N#*z7p<njDC;1dLt96^|ELc5esuDlf{X&A9rzsq;rm&c59@ z>rCButUcEFEd#DGD}TgCnbrS{Ad-{C5>R{Cx_-0Nnc?D@5NXND*8tN{$xQbNL& z;;IB6YOjOlLpQT%ZcV#?8eKwq&9TfknOc|g;l{>Oi>CtsNR%3B|0;t#nj)NF-BqI z-Qh^b?bz=^0d<-W)k}D%f_=MN#;@A32F5m^wu4JV{9D)S`RG!#R1rfLa;TZ{^|hsw z?5qTD**eX~v*$bbltiN7)6f%=Am4gQv?>)ngOhLnN6X#swjx$+uBEG!^I^tL!4LcT z#^0R!n>~V~8;qM+MJz&7^>F#u`t0jpYr8_ zWAIv5eChvR{V`@wz^IGIy)T1#&(#Q0& z^@m}4?=C{?BL^<{SNHtMBHMWX1$qK73M7tyA(#1I;_81wXn@@LcRc=sb^kqfCvE@9 zSHUovR{pST$s;L2O>+JUhN-Vv3}43}Eqxdg&y&;EwuXs=>Y4`8(pSeMBEkyc|NH*m zVT!YAro$lSmzOhr7#aR)dJ!J9yHu$TppG`x-M{kA@ZIrlX5f%`p6E;F46k6S+7`Rh zhVHn(*GUsA;JFkT18ImQzP_sElKCe`CX=Ip3I5NNBlCYss73H-4kl1w4|@NrKB9XX zT)^$u=OSx6(_F8AzcqV)H6r?x-6}`c^!hR|xg`WM_PsSr zZs%seasc1GJH-h*Wl~urV}iO~2n&IpEx_-_=sR0cZ%ePGXj9|qdtdEZUm_6vQ*oxo z!fB;--B~L57v*6jHn7T>8!9EjKirlHVf#MHDi%>k%`~5D`@@cPVxGR+%}4h7=hPhx z#Xt&T6%yHu6$G|oAb#NGzAOxCw*&5s=PzI|kqc=LCsROIFu0y@Dw;Y$0x!7+ zN1LB&#!O~80CBOKUmKL~1a2|6WikGNw;})4h!_RlbL9AdOiJm|L~vXLXplc^(|YQ= z(jN*1u)^wS;-m_xsorp=Du;J%VO=M3Z*k;w?etdMWDlOj> zO2qFS>B^$tPNlWO#6iM(QHaG_UM6u8Dp6BW_BpvOxAC0 z_(>%|8tP>$jw|7Y4Og-BgxU8YOBDx0^q92%P^?> z;FZ9zxoq0cbP;QNQOZ=+ZpTL7)xmo7j{Im4sVvbat`oH*54$69MDM)5JHZQ^zIOCc zW_Z<9^2EE8@87K{^|r*60nK-%JvG4QjO~wYJ5#&{-(jCkkAExG8ied^TY0TT!@z}1 zeD(~#8#Xu7;Qi3Wev~9J*;u!hbO`0u>1J$w0Y{c^){@TJ=xnx%( zIe;PP8RXkZ!-0(@>@GhIU;k0bBi^_{Pe|)O+SG0mx9~H$am2ds_WsAZH8ViRHfaNb zgpH=rq?Y0!SM$lcfP}aW+h?A!&sMt<1=y<+9~iC5=Zx4r48%bK>$f|be5dQN#MSAN zKD#oh*wqOd=&R41@qsrnQG;XUyB}T<9Jk$v%l7Az%9>jH6-5d#RR)T(XnbvDiF9jn z_rntrrYMZ;r?9471*XEq3`dW*vWpr|8XZ94gG9{d&DG}%5S61KYaV0kuEKgIGiu#! zVe9tSeobHn1Fl&t08iGq7&uZH=VBuoi@>pwj1)|`O>$^y@O4X==o+Vq0s|B-qs&CP zEJsp&$k;sR7WZ4n{PTrjhRSi{lH?a~yXxh&Mo1S?zeKh>tKhebG0kn0wA!9hI~Sz~ zUUu=VPHlOwz8KypweV`Xn#E#qr_`?)%>Jx9+SNr=@u9|cRF(nnM!J5_ud_a*;~+o& z1~32Ycs*Xz&MuwrJCBavAyaq7e~MJ-z~F~MCRM5ZM0(DtbWv!+>6bwyF+Q>Cn z9|Nza*K*?`E6)sZd9eKmw2-W(bv^Q6`WUh}z#(R4{Clg}-+590&;K_HQb(V&t z;kq|C{T?^3g;q_y?}8q?ZcoY0lAbyDw6Y`P;Hi3M)|Haqa82WbC(+66O8y^oN(4H< z*-vTb4r7rhM+pzxS4Mm{6n2Bhl-h&dQNCHe?5}@=1n>3j{MOvHH1i_@?OK?%bT5g= z3^&I&NyEL;EyWBZ^i z4l;UAOnT}jI9aPO)fh!LFD0cemip4FkTJDNgB~H!>iBUl@2Bd z|2W4GU3OL>?6Mohrymiq;9FqX>7pBrnfo!<*v|r+IIExVMUk}huh10@{%007UZr2931dlb zAEP$Y?JGk%@hpEJ@09EPFP>LTdHO7eQ(u;*xjLDbgm}pC^%Z$)SdA_rB=i<_)xR9E z+iT5vb*PsGl}2@65c|&u9o{|&=?nsC_U&*u8OUm{f$v7=5=At3+VvSSL$lC zQ&Yt#^w%Qu;Tc)K%o`s=k=Mv@4lzPG4$t?Rvg$)3Kd}O*Y`NM7t-26aK_8K5Ueh^Y za1DB6`{9Uuk>^;vLMWV4OU$iz=|WB^*!{D6M56a4+lc3Z8df(k=>^UV#y6+8XS)5n zqx;9#8sAM1M2WTLH&D#pt5x4rbJEmLDIa#F?*L zoLCa&!4AA+LJWM;!IG6m`U&J(mu17d8Z-n4Q=bsgKy9+Ua=ZyvU;HhDF%3Lh zeEWOD>1wdvfuXWr1A|M;_}x5z+n4%C_xxNJf_YrP2|(%-zs5l>cR;+m1Tgr@q{A17 zNlFuL`sHn*dU(k_Rv@f{m?^w>%5i%hb;Al{A*~wwK2k@_N!r*4Hvt~)kZR}d^YRNF ztt&4Z31Z`hyd*st{H9k}`8I&ANZu+>>9mYa8m$|1et0j&9v(@<&@>TvnjG`P`jK*({Y~q zshjEm5uuDQUGV#E1`2v}$u-g{jxZ8Xd2o&#w0>f`g7N*vIw%tyCI0di(bVf!Vp(_X zTS{^`K1jn7AaBYk6!3oZlP7Nv-{fF=fYvnh9w0AfGw)AF`hwAv&3W12@wn4XZ?PVu z!9K#)4fN#B{}vHns-__q0mh;Yulgs*21k$a9bO^_-hV>vP-ZX?N&h*^{U;Fn&q4Tq zbRHmU!$D>~82?oSzZveh7IDZ&^H(m+0gs(t$sUKk#amn*4%p~SXrKV318M2{2hrt3 z?K9zc3+`dT?3n6nxtFEffJCD#r<1)K_&ycr@#W6HPm19 z?ULR2Hp6P2$v~leE+{Y*A$xYgLjam4E7FcAx zj_zW9g=)EU8gAO&60|Su~I$aTanze1v6Tw3(o)!5VN+5y!6&+ zEvacQP6W-XlbTwUcnMS33^e_8hQ_XHcf`cz?#<#?T1hhGS&`WH@P6hY5)VOTV`C?C#^`y1? zEgk4tp0U>_f-Z+aMM4>~+0lE-i;u?3cR6de&f55~HVHe68_ENg9y2nJ@)0oT3=ACP;_`Fy0Nn(I);wG6^1_jvb3#*Z113m#u#_(C zE<{7?Zn+@dbhPt?rz3%L$kT2V2ego3qXJ9~_GFDE8OUA>K%E~IFO`Tk?sSQe8PwGi zq&1`zaj=wu5dB#fHrEzgX44ebuv0G!tU{}vL6#v7jI;`P#An5G{t@Fh)>^;y(fS{U zNxTXVxSCK0<(xpm<|jK9c-4{2mc6_0_oTEl3wnI;R*RkHZfi7CfTyRY7df{R(O5kE zQnV-Iil1@GM zY3`X=g72<?rO^(eHWs%m6&ydwHgjr@vo8VniH(4?_rt0KF0eIrA73 zWQqHBRm`R_)6bWjs5eZ+6!UfT$FK9UnQ<@O$RbQbRm5Syc)(iB6-941o+~W`q2S+( zWDgz{#t^kvh6?+!tyO`` z7ME`^_SV5Lb}oIB)4?stJ>SPi(J$;{f9q$%`yg$JtqjYk#iq72ePN03s+>h00RbQT z>LxC>$h-gio`^XO+#R6r*{;}!uvvN+#zp)-SEpb)`$6DR|3&8Pacu-Q&X#>B<^4m# z2~7Nze}O}2IOXrJxuvR{)#txkc!moFREd7+7p0vW*9|xoqabv_dnAA{-K=--)SXWL z47HpM=U4VV-ifGRyGPR78_z<#d&ka!O&U=1&H=12n?v_peVv+bc#{3WG$`~5nmLPf7F zumv1<;U=bAQI~@$=yDB_du0TM@uX zsJE)W#MVb1+~(6H71)l|Dpo6qhh>;^juBXT|Cu5cPSL4LlMl;(bI((QGA$y$X@v&T zttQPJ(TbNRPObDOc3D=`viqPmqHNW+h+kuf_)9%Q;U?G5-(U;iu76T)aa{0|`28p3 z_9*1FjK|KR^eEEojs^0PM;t6e^v8j*N}vSTW*rLpDOAxH2YHY0vXu~0Onc<)N#_*= zEYEE!lThV}dic}J*3MP)kcMy-j)61O4ZV>26EECPZg$Fs)b_7`hQs=44SVM>Z#Hln zfxYOY!u|4d-6!yq&8bs_iKQXA^9)IPcTr6D>_!6&oMRMJQEJJQCZvq_8w_DFAHuHN z+($`gSh8CUZ8-b%6V1LjB}j&h>+N0fhMk<4GJ_i8AS-h6sS8*+m)(k8^|`WgG=DpC z!Fj&;<9K7A{w|mgkE>IOVF%MtfMMFhh~mZ{@rccvId3XqrR=}~1?|Ge9A?_et}PR7 z#AN;Nv%hqdF$O}$)E!f!8x~hmAES^##t%qIF>OH&ASi=%0uxcq;4&Dr=l3*Yyn{&j z!t;t^W+f%E&Ha!Fp%xFl(95BG$Y-Ga|@#aVizI46bj)8o~=LRKWts;-}Kq@Zp znk0@IxEIvMyQes2bn;mtLy-OfQJXp8>JfDwn!UXj7gG+-V+Ae63VtLK5@Ivrup zl$K~7R)2lyh!EFk>s~BEvR99$UCREd3-cqNU2DP0cmA14w_wQLUqPqebmVzcu^m3WZ)i_Jcr=>vF%(qN%ir zgu2c{@A!}P;{o|;Z^55cq+Cyya2qSuo0D_FHiupJQH9qxr*o^zL0jrd`OOrzM&6sI zDio#*$r>gN4bS|7hZ#3StYny7^d_1bltf)^Qby9Mv+qwdb$y0pT72recp%TchLyg_ zJvuBxBTCnOiDEbs-o4Ma(_wN?F{EMPw)h+QQ9-F>nz{PP{)@}OS3zL$2RkY?oa&Md zernbmKJW1j&XRay2)y?AMD7PJI>Of(!t_H?{8cB3RCz%B270QGqlyK$0m(ek%;t&AO-_(G$%#*pD_dXL{NiGQ~GIG8Pc6o->y{w0Bg* z>wcv@%4eTRZPs}vB}9xkGe|!_wb?yrn=|p`!-+=HH0~oL9)_g*VJlpM^mbAa6sq|q zW^&I@`LgTD=jy%_nXO#FTvjN7t^Wev+keOii9o$!3Cm!AHU8_@`m)&jP4)>-gqW+1 zP2&92!n_rh=E8OjJc9(v<4D?SDQ{gSnM&+3w37ngoZ!A0oMyr|KXm-u)U^EOh$)%p z;SNFKJrU84)S&h0uAz>|;us=r$e-hWSjiOx54|OJ*778S4hHw-Vxz6oYlCzs(sWL(m6B?oW-XzaL_Js8a&P)boONmK^ z)Xe(t#h23soZFR#;!lcJCh~stC}oDL%auG4O=8J6vGK{ohR(4uN;QhRt5-edU;I>A(t4M zzpF~H-vxuVuGp20IjJBfd1+J6I?b|?JbWYB5fS09KcrM5;vDj5@Ai#`>Y@TpZ>!9} zJhz5zo&1<$iMOFxndVwqW`d)#IDUzNGG zN#yuU?T5TLT&&2Q76?9p{y~_jk~1y+Ell2j7AA|D>-!Vb!w^*x$@cRgX+|%m}b&WUai>KN2`gLa_(6 zw#+P0mEf{uVu9*6lQX9RQB0Q92N3y<1Jze)`x!F@_bS9ELyxU59-MH8%8kF2Bxctz zRCW-zqC@yv%k_`W9SPVZ!HC4O$SPo+u$~Bf9$@=CA2r@tc~Wf zuN{at2i^L0a)tYg(!;0C)D&}5R+W!<9c72n<2(i7;k85Ss}<$AghG}8GLBntobT_EZl}rK<@OI{ z6f3=NKzMPZOm61`=f~$LE@5&HbJ+*qI&>Jlvjq5%rzq{2*eQ#>krw zJVHfoCjZ=Dy5{`*T9ez}e;>RU4-$GQnX(i1u{slQfgc9WChR9E0Ml4eTVeaw?Y5Ea zJz{b(Rp20pjhd`r+dxT(V-`x}uY&Fu=Oa$4v?fw$otfpyQhFjodJjLEG^w>4UGHr2~e4@L@zL$0nk_|ri$Yi zALU=ox?lQ*RWkf(+fNZbo3ML_^4r=S?`*vb;lvcLlhVSuz6%6wm zw+xtR_jJ^cKBipj@UDd(NS9EEHIk?a5jg0?X5iCi3G2PrZu2dZPPqH>drZ$~Ul?g8 zu!o)=RH4Xib*QHZjXb8LczM>oe&y<`|3zv3%*xe0CJG5l=J~t);s1*ghqx+#9+UzY zmqYxJ_7k}zzsR`%PCR5n=|qLEc%_oB5i7D<1=QZxuMnJ^`uf>#V*s;PhgLZdBT-SD zL&6Lla_fJ6dGgHM6I+KeGUv3fU&~M4dpfBwQduHuJV^2Wm{P)$%+rvMbt%pP+~M}e z9r0($pui(5vKyGPO^i8BfosR%*o zk+*~2Lbk*keIF~BOb^uV2QTU`pFEqa`#5*p*3b8Fmj{o8pS#HrWn?qhu*$c>PqFuH z_RSaVNCBs{$hAyX{esIHtn?)q$8&xwJJ!`|ie zhaFBx>5)dh?_9ge{W-7vv78Vesjuf#H8Ex)b8hIlo~K5HNfl)kQ*33FA+eG&r|c1M zTwZ+0Fqz$_JutFd(xjc)>g!4I)%$_TcCom<#EQk6qmwl?*dux7xCwN09KsneH_m zuFUnY5q;kiJLt&s>v|eNnSsB~2s?`WooYGfx5NpJ)+;Ow$s@J%=v5h=*Q)8S2OeI{ z>~-^n?WXrmCTPA8c{%Gl;nV{G4A=9OJ9sW3uU*YZ*ZX4? zPWl|MiH)mnDXhEj#%m}U5hIAfiNR#~k6hr0+>*<-`CL8X&Epuvxten(qoLj2 zx?hnG%NDQcV9NhuhmTgVxDbr_b@JU@H4jnrP|Ru^B0!? z5vw1^ zsff|%CR8?)23&nk3LUW?Xm*2aZLZPRnGZ z5iPv@dTp!nmE8?;Ka>EV0C)Ga=rlvPNWZ5=Ryf|DPrR2`EFp7$wrqc(%K9Pd-M!TH z>H3S?FXh`B8V_S4mHa!4ON-I(At~p;Coo6E#ZZPTc5(^w4hFU;BQ>C`aZ6 zYNUr8>h!e-nJKnD88VEkccy+iSFq~n+>_Xb(3Cp~w6~%CMWPW0y}TKx-8W(&p?H=m zVoBkZjjs%lOsVefPsbmt%YHmFX|v)BFQNRVg#OqdWOoEUz6Rg+3SzkZCLuAu`UC70 z+=6TSv(mPtkJAE!8*ttEMw_?mXjxLRt|M1oH+%(_BrW;+)_2Z2O-o5jx~7)#=H}5k z1rbbM-uLAWQRPW6kH1@Q($j&3y9Rd{DCV?gJxnS@`GqnEzl|H+0-JEf>Dx)S4psp_ zb8~oTKCOk~q$nmmXltqF503M=RCJsx;f7?@#EH9W0B?-JSbEDA%!><}XsMcSYyAKL+1n7inY2U6$^vnT;=d`ixSnE z_|8OB=vM;;Bp78Qol8e7Mr-92(vz_&7alF9>31cgl>K8uQj23LO*XwFtvp*!!Q?nG z#wsQ&^lNfiVj6#7!}UNxpqsembf?)O+o^G4>B}CClhp;Gx5t*s+;|C`cxxnS7 z9gN36&iFsv@_&?C0x1t^NfBhXN_O*kR=-s!n!VNj2^;=($?i`mp>{&SAr`ZJYLDfR z=1=Kkvg&_ZN?^|=*M4EzTOC>^Xm6pxrjm*e*TopPxbz@d_$L+~9vcJ&bN|hm6RO~6 W!m$cISIOxih4x)N%`$b{7yk!u^M!H% literal 0 HcmV?d00001 diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md new file mode 100644 index 00000000..c551a793 --- /dev/null +++ b/text/2018-09-17-schedule-limit.md @@ -0,0 +1,114 @@ +# Summary + +This RFC proposes a new scheme to limit concurrent executing schedule operators. Compared to old implementation, it is more intuitive, easy to configure, and has better adaptability. + +# Motivation + +In order to prevent PD from generating a large number of schedule operators in a short period of time and affecting the performance of the tikv cluster, and to avoid the problem of over-balance, currently PD uses a limitation based scheme. + +Specifically, PD uses 4 different limit configurations to control the number of concurrently running operators: + +* leader-schedule-limit + + Controls the number of operators scheduling region leaders at the same time + +* region-schedule-limit + + Controls the number of operators scheduling region peers at the same time + +* replica-schedule-limit + + Controls the number of operators scheduling region replicas at the same time + +* merge-schedule-limit + + Controls the number of region merge operators at the same time + +![Old schedule limit scheme](../media/schedule-limit-old.png) + +The mechanism of scheduling limiter is shown in the figure. All running operators are maintained as a map in the coordinator, and the Limiter is updated synchronously when running operators change. Different schedulers will check whether running operators have exceeded the limit before generating operators. The operators generated by schedulers will be added to the _runningOperators_ map via addOperator method. + +This scheme is not good in these aspects: +* Various limit configurations are confusing, tricky to use them correctly +* Fixed limit configuration does not apply to clusters of different sizes +* The preemption strategy between different schedulers is not fair enough +* Scheduling is often concentrated on a small number of tikv-servers, which leads to slow scheduling sometimes and may affect the latency of tikv (we do consider snapshot count, but this statistic is reported after tikv receives the schedule) +* Rely on schedulers correctly checking limit to ensure limit config not been violated -- hotbed of bugs. See [#1193](https://github.com/pingcap/pd/pull/1193) [#1155](https://github.com/pingcap/pd/pull/1155) + +# Detailed design + +## Introducing waitingOperators queue + +![New schedule limit scheme](../media/schedule-limit-new.png) + +As shown in the figure, the biggest change is the introduction of the _waitingOperators_ in the coordinator. The schedule operators generated by schedulers are not directly put into the _runningOperators_, but are queued first in _waitingOperators_. The coordinator is responsible for continuously promoting operators in _waitingOperators_ queue to _runningOperators_ according to certain rule and configuration. + +## Evaluating operator affect + +Schedule operators are consisted of operator steps. The effect of an operator on the cluster can be calculated as the sum of the effects of all steps. There several types of operator steps: + +* TransferLeader +* AddPeer +* AddLearner +* PromoteLearner +* RemovePeer +* MergeRegion +* SplitRegion + +Obviously, these steps only affect a subset of all tikv-servers -- not affect the entire cluster. For example, _TransferLeader_ can only affect the original and the new leader of the region. _AddLearner_ can affect the store to add the learner and the leader of the region. + +Of course, the overhead of different types of operator are different. We can assign different cost values to them based on experience. For example, we can arbitrarily set the cost of _TransferLeader_ to 1, the cost of _RemovePeer_ to 2, and the cost of _AddLearner_ to 5 (leader) and 8 (new peer). + +## How to promote waiting operators + +Given a MaxScheduleCost configuration (say 20), when the coordinator promotes an operator, it need to guarantee that if we add all the schedule cost in the _runningOperators_, the total cost of any store will not exceed the configured value. + +If the first operator in the waiting queue will make a store overload, it will wait until the conflicting operator finishes or times out before it can be moved to the _runningOperators_. + +In order to improve efficiency of executing, when a non-first operator does not conflict with any operator in front of it, it can be moved to _runningOperators_ too. + +## How to ensure fair competition between different schedulers + +This is actually quite straightforward. We only need to limit the amount of generated operators by each scheduler in the _waitingOperators_ queue. For example, up to 3 operators from a same scheduler. + +Only when an old operator is moved into the _runningOperators_, the corresponding scheduler will have the opportunity to insert more operators into _waitingOperators_. This strategy not only ensures fairness between different schedulers, but also encourages schedulers to generate non-conflicting operators, thus reducing the impact on cluster performance. + +As before, if different schedulers generate operators of a same region, the later one is rejected directly. Dropping the later operator when it is about to promote is also an option. + +## Configuration and customization + +There is only 1 important configuration for basic usage: + +* MaxScheduleCost + + The maximum summary of cost of running operators for each store. + +We can also allow to customize following configurations to better fit special needs. (Tests are needed to decide if each of them is worth making configurable) + +* StoreMaxScheduleCost + + Similar to _MaxScheduleCost_, but for a specific store. If could be useful when performance of tikv-servers are different. + +* OperatorStepCost + + We will assign cost values based on experience. They may need adjustment in production. + +* MaxWaitingOperator + + The maximum count of waiting operators of each scheduler. + +* SchedulerMaxWaitingOperator + + Similar to _MaxWaitingOperator_, but for a specific scheduler. We can use it to control priority of different schedulers. + +# Drawbacks + +The migration from the old way to the new scheme has a certain cost, involves the update of the deployment script, and compatibility with the old cluster configuration in the future. + +# Alternatives + +It has been considered that the schedulers use a cooperative rather than a competitive scheme to generate operators. However, this approach is more dependent on the better quality of implementation of all the schedulers, and the schedulers need to perceive each other, which will increase the coupling degree of the system. + +# Unresolved questions + +The arbitrarily determined step cost may not correctly reflect the scheduling overhead, we need to test to verify that it is appropriate. \ No newline at end of file From a5909186e60eca0ec68eae53cf8ec5e5b28003e6 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:25:46 +0800 Subject: [PATCH 02/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index c551a793..4d68302a 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -29,7 +29,7 @@ Specifically, PD uses 4 different limit configurations to control the number of The mechanism of scheduling limiter is shown in the figure. All running operators are maintained as a map in the coordinator, and the Limiter is updated synchronously when running operators change. Different schedulers will check whether running operators have exceeded the limit before generating operators. The operators generated by schedulers will be added to the _runningOperators_ map via addOperator method. This scheme is not good in these aspects: -* Various limit configurations are confusing, tricky to use them correctly +* Various limit configurations are confusing, tricky to use correctly * Fixed limit configuration does not apply to clusters of different sizes * The preemption strategy between different schedulers is not fair enough * Scheduling is often concentrated on a small number of tikv-servers, which leads to slow scheduling sometimes and may affect the latency of tikv (we do consider snapshot count, but this statistic is reported after tikv receives the schedule) @@ -111,4 +111,4 @@ It has been considered that the schedulers use a cooperative rather than a compe # Unresolved questions -The arbitrarily determined step cost may not correctly reflect the scheduling overhead, we need to test to verify that it is appropriate. \ No newline at end of file +The arbitrarily determined step cost may not correctly reflect the scheduling overhead, we need to test to verify that it is appropriate. From 77c0b6c163b8686e081758a7256bf4207b73cc41 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:26:07 +0800 Subject: [PATCH 03/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 4d68302a..ad517bd3 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -32,7 +32,7 @@ This scheme is not good in these aspects: * Various limit configurations are confusing, tricky to use correctly * Fixed limit configuration does not apply to clusters of different sizes * The preemption strategy between different schedulers is not fair enough -* Scheduling is often concentrated on a small number of tikv-servers, which leads to slow scheduling sometimes and may affect the latency of tikv (we do consider snapshot count, but this statistic is reported after tikv receives the schedule) +* Scheduling is often concentrated on a small number of tikv-servers, which leads to slow scheduling sometimes and may affect the latency of tikv (we do consider snapshot count, but this statistic is reported after tikv receives the scheduling request) * Rely on schedulers correctly checking limit to ensure limit config not been violated -- hotbed of bugs. See [#1193](https://github.com/pingcap/pd/pull/1193) [#1155](https://github.com/pingcap/pd/pull/1155) # Detailed design From 0a2a2358ef32903c301f8d4679956afa941a3acf Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:26:28 +0800 Subject: [PATCH 04/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index ad517bd3..22fc9fe6 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -33,7 +33,7 @@ This scheme is not good in these aspects: * Fixed limit configuration does not apply to clusters of different sizes * The preemption strategy between different schedulers is not fair enough * Scheduling is often concentrated on a small number of tikv-servers, which leads to slow scheduling sometimes and may affect the latency of tikv (we do consider snapshot count, but this statistic is reported after tikv receives the scheduling request) -* Rely on schedulers correctly checking limit to ensure limit config not been violated -- hotbed of bugs. See [#1193](https://github.com/pingcap/pd/pull/1193) [#1155](https://github.com/pingcap/pd/pull/1155) +* Rely on schedulers correctly checking limit to ensure the limit configuration has not been violated -- hotbed of bugs. See [#1193](https://github.com/pingcap/pd/pull/1193) [#1155](https://github.com/pingcap/pd/pull/1155) # Detailed design From 0ec9af680510d6b553401654ad028f10b455d086 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:26:58 +0800 Subject: [PATCH 05/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 22fc9fe6..f203452f 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -41,7 +41,7 @@ This scheme is not good in these aspects: ![New schedule limit scheme](../media/schedule-limit-new.png) -As shown in the figure, the biggest change is the introduction of the _waitingOperators_ in the coordinator. The schedule operators generated by schedulers are not directly put into the _runningOperators_, but are queued first in _waitingOperators_. The coordinator is responsible for continuously promoting operators in _waitingOperators_ queue to _runningOperators_ according to certain rule and configuration. +As shown in the figure, the biggest change is the introduction of the _waitingOperators_ in the coordinator. The schedule operators generated by schedulers are not directly put into the _runningOperators_, but are queued first in _waitingOperators_. The coordinator is responsible for continuously promoting operators in _waitingOperators_ queue to _runningOperators_ according to a certain rule and configuration. ## Evaluating operator affect From 3c6ec9aa407607fa1f028b994ce5667d855bedc8 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:27:41 +0800 Subject: [PATCH 06/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index f203452f..b96f0d51 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -45,7 +45,7 @@ As shown in the figure, the biggest change is the introduction of the _waitingOp ## Evaluating operator affect -Schedule operators are consisted of operator steps. The effect of an operator on the cluster can be calculated as the sum of the effects of all steps. There several types of operator steps: +Schedule operators consist of operator steps. The effect of an operator on the cluster can be calculated as the sum of the effects of all steps. There are several types of operator steps: * TransferLeader * AddPeer From 28427fe5edc5a3ec96030358827b6055db1347ed Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:29:53 +0800 Subject: [PATCH 07/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index b96f0d51..3edc7e79 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -103,7 +103,7 @@ We can also allow to customize following configurations to better fit special ne # Drawbacks -The migration from the old way to the new scheme has a certain cost, involves the update of the deployment script, and compatibility with the old cluster configuration in the future. +The migration from the old way to the new scheme has a certain cost, involving the update of the deployment script and compatibility with the old cluster configuration in the future. # Alternatives From 6c3aab4041b2088f937af9fd9259568eabb0849b Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:30:43 +0800 Subject: [PATCH 08/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 3edc7e79..7f4dfeff 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -83,7 +83,7 @@ There is only 1 important configuration for basic usage: The maximum summary of cost of running operators for each store. -We can also allow to customize following configurations to better fit special needs. (Tests are needed to decide if each of them is worth making configurable) +We can also allow to customize following configurations to better fit special needs. (Tests are needed to decide if each of them is worth being made configurable) * StoreMaxScheduleCost From bc61d135913cef50d506954119d43ef7cfa17a25 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:31:16 +0800 Subject: [PATCH 09/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 7f4dfeff..06596a9d 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -69,7 +69,7 @@ In order to improve efficiency of executing, when a non-first operator does not ## How to ensure fair competition between different schedulers -This is actually quite straightforward. We only need to limit the amount of generated operators by each scheduler in the _waitingOperators_ queue. For example, up to 3 operators from a same scheduler. +This is actually quite straightforward. We only need to limit the number of generated operators by each scheduler in the _waitingOperators_ queue. For example, up to 3 operators from a same scheduler. Only when an old operator is moved into the _runningOperators_, the corresponding scheduler will have the opportunity to insert more operators into _waitingOperators_. This strategy not only ensures fairness between different schedulers, but also encourages schedulers to generate non-conflicting operators, thus reducing the impact on cluster performance. From c36ffd29acdc263742819ff4e45c27cb58f8b7d2 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:31:48 +0800 Subject: [PATCH 10/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 06596a9d..d2cd1ead 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -63,7 +63,7 @@ Of course, the overhead of different types of operator are different. We can ass Given a MaxScheduleCost configuration (say 20), when the coordinator promotes an operator, it need to guarantee that if we add all the schedule cost in the _runningOperators_, the total cost of any store will not exceed the configured value. -If the first operator in the waiting queue will make a store overload, it will wait until the conflicting operator finishes or times out before it can be moved to the _runningOperators_. +If the first operator in the waiting queue makes a store overload, it will wait until the conflicting operator finishes or times out before it can be moved to _runningOperators_. In order to improve efficiency of executing, when a non-first operator does not conflict with any operator in front of it, it can be moved to _runningOperators_ too. From 763bf959b9e8972051cd49c637227d54c396669a Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:32:33 +0800 Subject: [PATCH 11/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index d2cd1ead..4ee6809f 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -55,7 +55,7 @@ Schedule operators consist of operator steps. The effect of an operator on the c * MergeRegion * SplitRegion -Obviously, these steps only affect a subset of all tikv-servers -- not affect the entire cluster. For example, _TransferLeader_ can only affect the original and the new leader of the region. _AddLearner_ can affect the store to add the learner and the leader of the region. +Obviously, these steps only affect a subset of all tikv-servers -- not the entire cluster. For example, _TransferLeader_ can only affect the original and the new leader of the region. _AddLearner_ can affect the store to add the learner and the leader of the region. Of course, the overhead of different types of operator are different. We can assign different cost values to them based on experience. For example, we can arbitrarily set the cost of _TransferLeader_ to 1, the cost of _RemovePeer_ to 2, and the cost of _AddLearner_ to 5 (leader) and 8 (new peer). From baf9065fda62fc9f997464553f021e8bbbd8a8ab Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 18 Dec 2018 16:32:48 +0800 Subject: [PATCH 12/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: disksing Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 4ee6809f..4e84ec7b 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -61,7 +61,7 @@ Of course, the overhead of different types of operator are different. We can ass ## How to promote waiting operators -Given a MaxScheduleCost configuration (say 20), when the coordinator promotes an operator, it need to guarantee that if we add all the schedule cost in the _runningOperators_, the total cost of any store will not exceed the configured value. +Given a MaxScheduleCost configuration (say 20), when the coordinator promotes an operator, it needs to guarantee that if we add all the schedule cost in the _runningOperators_, the total cost of any store will not exceed the configured value. If the first operator in the waiting queue makes a store overload, it will wait until the conflicting operator finishes or times out before it can be moved to _runningOperators_. From 35bdffc582c36f5c871f3e42e2432bc1e0f2ee59 Mon Sep 17 00:00:00 2001 From: disksing Date: Wed, 19 Dec 2018 10:34:16 +0800 Subject: [PATCH 13/18] Update 2018-09-17-schedule-limit.md Signed-off-by: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 4e84ec7b..22d8f984 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -43,7 +43,7 @@ This scheme is not good in these aspects: As shown in the figure, the biggest change is the introduction of the _waitingOperators_ in the coordinator. The schedule operators generated by schedulers are not directly put into the _runningOperators_, but are queued first in _waitingOperators_. The coordinator is responsible for continuously promoting operators in _waitingOperators_ queue to _runningOperators_ according to a certain rule and configuration. -## Evaluating operator affect +## Operator effect evaluation Schedule operators consist of operator steps. The effect of an operator on the cluster can be calculated as the sum of the effects of all steps. There are several types of operator steps: From ef3406b94d840d66db62d282eae72fcc0924f565 Mon Sep 17 00:00:00 2001 From: Hoverbear Date: Thu, 20 Dec 2018 14:48:28 -0800 Subject: [PATCH 14/18] Fix lints Signed-off-by: Hoverbear --- text/2018-09-17-schedule-limit.md | 174 ++++++++++++++++++++---------- 1 file changed, 118 insertions(+), 56 deletions(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 22d8f984..d01f6660 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -1,114 +1,176 @@ -# Summary +# Schedule Limit -This RFC proposes a new scheme to limit concurrent executing schedule operators. Compared to old implementation, it is more intuitive, easy to configure, and has better adaptability. +## Summary -# Motivation +This RFC proposes a new scheme to limit concurrent executing schedule +operators. Compared to old implementation, it is more intuitive, easy to +configure, and has better adaptability. -In order to prevent PD from generating a large number of schedule operators in a short period of time and affecting the performance of the tikv cluster, and to avoid the problem of over-balance, currently PD uses a limitation based scheme. +## Motivation -Specifically, PD uses 4 different limit configurations to control the number of concurrently running operators: +In order to prevent PD from generating a large number of schedule operators in +a short period of time and affecting the performance of the TiKV cluster, and +to avoid the problem of over-balance, currently PD uses a limitation based +scheme. -* leader-schedule-limit +Specifically, PD uses 4 different limit configurations to control the number of +concurrently running operators: - Controls the number of operators scheduling region leaders at the same time +* _leader-schedule-limit_ -* region-schedule-limit + Controls the number of operators scheduling region leaders at the same + time. - Controls the number of operators scheduling region peers at the same time +* _region-schedule-limit_ -* replica-schedule-limit + Controls the number of operators scheduling region peers at the same time. - Controls the number of operators scheduling region replicas at the same time +* _replica-schedule-limit_ -* merge-schedule-limit + Controls the number of operators scheduling region replicas at the same + time. - Controls the number of region merge operators at the same time +* _merge-schedule-limit_ + + Controls the number of region merge operators at the same time. ![Old schedule limit scheme](../media/schedule-limit-old.png) -The mechanism of scheduling limiter is shown in the figure. All running operators are maintained as a map in the coordinator, and the Limiter is updated synchronously when running operators change. Different schedulers will check whether running operators have exceeded the limit before generating operators. The operators generated by schedulers will be added to the _runningOperators_ map via addOperator method. +The mechanism of scheduling limiter is shown in the figure. All running +operators are maintained as a map in the coordinator, and the Limiter is +updated synchronously when running operators change. Different schedulers will +check whether running operators have exceeded the limit before generating +operators. The operators generated by schedulers will be added to the +_runningOperators_ map via addOperator method. This scheme is not good in these aspects: + * Various limit configurations are confusing, tricky to use correctly * Fixed limit configuration does not apply to clusters of different sizes * The preemption strategy between different schedulers is not fair enough -* Scheduling is often concentrated on a small number of tikv-servers, which leads to slow scheduling sometimes and may affect the latency of tikv (we do consider snapshot count, but this statistic is reported after tikv receives the scheduling request) -* Rely on schedulers correctly checking limit to ensure the limit configuration has not been violated -- hotbed of bugs. See [#1193](https://github.com/pingcap/pd/pull/1193) [#1155](https://github.com/pingcap/pd/pull/1155) +* Scheduling is often concentrated on a small number of `tikv-server`s, which + leads to slow scheduling sometimes and may affect the latency of TiKV (we do + consider snapshot count, but this statistic is reported after TiKV receives + the scheduling request) +* Rely on schedulers correctly checking limit to ensure the limit configuration + has not been violated -- hotbed of bugs. See + [#1193](https://github.com/pingcap/pd/pull/1193) + [#1155](https://github.com/pingcap/pd/pull/1155). -# Detailed design +## Detailed design -## Introducing waitingOperators queue +### Introducing waitingOperators queue ![New schedule limit scheme](../media/schedule-limit-new.png) -As shown in the figure, the biggest change is the introduction of the _waitingOperators_ in the coordinator. The schedule operators generated by schedulers are not directly put into the _runningOperators_, but are queued first in _waitingOperators_. The coordinator is responsible for continuously promoting operators in _waitingOperators_ queue to _runningOperators_ according to a certain rule and configuration. +As shown in the figure, the biggest change is the introduction of the +_waitingOperators_ in the coordinator. The schedule operators generated by +schedulers are not directly put into the _runningOperators_, but are queued +first in _waitingOperators_. The coordinator is responsible for continuously +promoting operators in _waitingOperators_ queue to _runningOperators_ according +to a certain rule and configuration. -## Operator effect evaluation +### Operator effect evaluation -Schedule operators consist of operator steps. The effect of an operator on the cluster can be calculated as the sum of the effects of all steps. There are several types of operator steps: +Schedule operators consist of operator steps. The effect of an operator on the +cluster can be calculated as the sum of the effects of all steps. There are +several types of operator steps: -* TransferLeader -* AddPeer -* AddLearner -* PromoteLearner -* RemovePeer -* MergeRegion -* SplitRegion +* _TransferLeader_ +* _AddPeer_ +* _AddLearner_ +* _PromoteLearner_ +* _RemovePeer_ +* _MergeRegion_ +* _SplitRegion_ -Obviously, these steps only affect a subset of all tikv-servers -- not the entire cluster. For example, _TransferLeader_ can only affect the original and the new leader of the region. _AddLearner_ can affect the store to add the learner and the leader of the region. +Obviously, these steps only affect a subset of all `tikv-server`s -- not the +entire cluster. For example, _TransferLeader_ can only affect the original and +the new leader of the region. _AddLearner_ can affect the store to add the +learner and the leader of the region. -Of course, the overhead of different types of operator are different. We can assign different cost values to them based on experience. For example, we can arbitrarily set the cost of _TransferLeader_ to 1, the cost of _RemovePeer_ to 2, and the cost of _AddLearner_ to 5 (leader) and 8 (new peer). +Of course, the overhead of different types of operator are different. We can +assign different cost values to them based on experience. For example, we can +arbitrarily set the cost of _TransferLeader_ to 1, the cost of _RemovePeer_ to +2, and the cost of _AddLearner_ to 5 (leader) and 8 (new peer). -## How to promote waiting operators +### How to promote waiting operators -Given a MaxScheduleCost configuration (say 20), when the coordinator promotes an operator, it needs to guarantee that if we add all the schedule cost in the _runningOperators_, the total cost of any store will not exceed the configured value. +Given a _MaxScheduleCost_ configuration (say 20), when the coordinator promotes +an operator, it needs to guarantee that if we add all the schedule cost in the +_runningOperators_, the total cost of any store will not exceed the configured +value. -If the first operator in the waiting queue makes a store overload, it will wait until the conflicting operator finishes or times out before it can be moved to _runningOperators_. +If the first operator in the waiting queue makes a store overload, it will wait +until the conflicting operator finishes or times out before it can be moved to +_runningOperators_. -In order to improve efficiency of executing, when a non-first operator does not conflict with any operator in front of it, it can be moved to _runningOperators_ too. +In order to improve efficiency of executing, when a non-first operator does not +conflict with any operator in front of it, it can be moved to +_runningOperators_ too. -## How to ensure fair competition between different schedulers +### How to ensure fair competition between different schedulers -This is actually quite straightforward. We only need to limit the number of generated operators by each scheduler in the _waitingOperators_ queue. For example, up to 3 operators from a same scheduler. +This is actually quite straightforward. We only need to limit the number of +generated operators by each scheduler in the _waitingOperators_ queue. For +example, up to 3 operators from a same scheduler. -Only when an old operator is moved into the _runningOperators_, the corresponding scheduler will have the opportunity to insert more operators into _waitingOperators_. This strategy not only ensures fairness between different schedulers, but also encourages schedulers to generate non-conflicting operators, thus reducing the impact on cluster performance. +Only when an old operator is moved into the _runningOperators_, the +corresponding scheduler will have the opportunity to insert more operators into +_waitingOperators_. This strategy not only ensures fairness between different +schedulers, but also encourages schedulers to generate non-conflicting +operators, thus reducing the impact on cluster performance. -As before, if different schedulers generate operators of a same region, the later one is rejected directly. Dropping the later operator when it is about to promote is also an option. +As before, if different schedulers generate operators of a same region, the +later one is rejected directly. Dropping the later operator when it is about +to promote is also an option. -## Configuration and customization +### Configuration and customization There is only 1 important configuration for basic usage: -* MaxScheduleCost +* _MaxScheduleCost_ + + The maximum summary of cost of running operators for each store. - The maximum summary of cost of running operators for each store. +We can also allow to customize following configurations to better fit special +needs. (Tests are needed to decide if each of them is worth being made +configurable): -We can also allow to customize following configurations to better fit special needs. (Tests are needed to decide if each of them is worth being made configurable) +* _StoreMaxScheduleCost_ -* StoreMaxScheduleCost + Similar to _MaxScheduleCost_, but for a specific store. If could be useful + when performance of `tikv-server`s are different. - Similar to _MaxScheduleCost_, but for a specific store. If could be useful when performance of tikv-servers are different. +* _OperatorStepCost_ -* OperatorStepCost - - We will assign cost values based on experience. They may need adjustment in production. + We will assign cost values based on experience. They may need adjustment + in production. -* MaxWaitingOperator +* _MaxWaitingOperator_ The maximum count of waiting operators of each scheduler. -* SchedulerMaxWaitingOperator +* _SchedulerMaxWaitingOperator_ - Similar to _MaxWaitingOperator_, but for a specific scheduler. We can use it to control priority of different schedulers. + Similar to _MaxWaitingOperator_, but for a specific scheduler. We can use + it to control priority of different schedulers. -# Drawbacks +## Drawbacks -The migration from the old way to the new scheme has a certain cost, involving the update of the deployment script and compatibility with the old cluster configuration in the future. +The migration from the old way to the new scheme has a certain cost, involving +the update of the deployment script and compatibility with the old cluster +configuration in the future. -# Alternatives +## Alternatives -It has been considered that the schedulers use a cooperative rather than a competitive scheme to generate operators. However, this approach is more dependent on the better quality of implementation of all the schedulers, and the schedulers need to perceive each other, which will increase the coupling degree of the system. +It has been considered that the schedulers use a cooperative rather than a +competitive scheme to generate operators. However, this approach is more +dependent on the better quality of implementation of all the schedulers, and +the schedulers need to perceive each other, which will increase the coupling +degree of the system. -# Unresolved questions +## Unresolved questions -The arbitrarily determined step cost may not correctly reflect the scheduling overhead, we need to test to verify that it is appropriate. +The arbitrarily determined step cost may not correctly reflect the scheduling +overhead, we need to test to verify that it is appropriate. From a06cb7dacc38bbefdedad2369353957bf4f69dfc Mon Sep 17 00:00:00 2001 From: "A. Hobden" Date: Fri, 21 Dec 2018 13:57:36 +0800 Subject: [PATCH 15/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: A. Hobden Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index d01f6660..ebc397fa 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -84,7 +84,7 @@ several types of operator steps: * _MergeRegion_ * _SplitRegion_ -Obviously, these steps only affect a subset of all `tikv-server`s -- not the +These steps only affect a subset of all `tikv-server`s -- not the entire cluster. For example, _TransferLeader_ can only affect the original and the new leader of the region. _AddLearner_ can affect the store to add the learner and the leader of the region. From 9711d5339ee602ddcb7ce4303705783d0f7d3259 Mon Sep 17 00:00:00 2001 From: "A. Hobden" Date: Fri, 21 Dec 2018 13:57:49 +0800 Subject: [PATCH 16/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: A. Hobden Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index ebc397fa..89d96905 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -89,7 +89,7 @@ entire cluster. For example, _TransferLeader_ can only affect the original and the new leader of the region. _AddLearner_ can affect the store to add the learner and the leader of the region. -Of course, the overhead of different types of operator are different. We can +The overhead of different types of operator are different. We can assign different cost values to them based on experience. For example, we can arbitrarily set the cost of _TransferLeader_ to 1, the cost of _RemovePeer_ to 2, and the cost of _AddLearner_ to 5 (leader) and 8 (new peer). From 452978aed0e458554600c4a89705cdd1d11a826b Mon Sep 17 00:00:00 2001 From: disksing Date: Fri, 21 Dec 2018 14:27:40 +0800 Subject: [PATCH 17/18] use Region instead of region Signed-off-by: disksing --- text/2018-09-17-schedule-limit.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 89d96905..8ed07d07 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -18,21 +18,21 @@ concurrently running operators: * _leader-schedule-limit_ - Controls the number of operators scheduling region leaders at the same + Controls the number of operators scheduling Region leaders at the same time. * _region-schedule-limit_ - Controls the number of operators scheduling region peers at the same time. + Controls the number of operators scheduling Region peers at the same time. * _replica-schedule-limit_ - Controls the number of operators scheduling region replicas at the same + Controls the number of operators scheduling Region replicas at the same time. * _merge-schedule-limit_ - Controls the number of region merge operators at the same time. + Controls the number of Region merge operators at the same time. ![Old schedule limit scheme](../media/schedule-limit-old.png) @@ -86,8 +86,8 @@ several types of operator steps: These steps only affect a subset of all `tikv-server`s -- not the entire cluster. For example, _TransferLeader_ can only affect the original and -the new leader of the region. _AddLearner_ can affect the store to add the -learner and the leader of the region. +the new leader of the Region. _AddLearner_ can affect the store to add the +learner and the leader of the Region. The overhead of different types of operator are different. We can assign different cost values to them based on experience. For example, we can @@ -121,7 +121,7 @@ _waitingOperators_. This strategy not only ensures fairness between different schedulers, but also encourages schedulers to generate non-conflicting operators, thus reducing the impact on cluster performance. -As before, if different schedulers generate operators of a same region, the +As before, if different schedulers generate operators of a same Region, the later one is rejected directly. Dropping the later operator when it is about to promote is also an option. From e0d8b2f9c7e514a775e430faf76909a6e06be36b Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Fri, 21 Dec 2018 17:28:51 +0800 Subject: [PATCH 18/18] Update text/2018-09-17-schedule-limit.md Signed-off-by: Caitin <34535727+caitinchen@users.noreply.github.com> Co-Authored-By: disksing --- text/2018-09-17-schedule-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/2018-09-17-schedule-limit.md b/text/2018-09-17-schedule-limit.md index 8ed07d07..1644a1d7 100644 --- a/text/2018-09-17-schedule-limit.md +++ b/text/2018-09-17-schedule-limit.md @@ -173,4 +173,4 @@ degree of the system. ## Unresolved questions The arbitrarily determined step cost may not correctly reflect the scheduling -overhead, we need to test to verify that it is appropriate. +overhead, so we need to test to verify that it is appropriate.